Word Count MapReduce Program in Hadoop

The first MapReduce program most of the people write after installing Hadoop is invariably the word count MapReduce program.

That’s what this post shows, writing word count MapReduce program in Java programming language, IDE used is Eclipse.

Creating and copying file

If you already have a file in HDFS which you want to use as input then you can skip this step.

First thing is to create a file which will be used as input and copy it to HDFS.

Let’s say you have a file wordcount.txt with the following content.


Hello wordcount MapReduce Hadoop program.
This is my first MapReduce program.

You want to copy this file to /user/process directory with in HDFS. If that path doesn’t exist then you need to create those directories first.


hdfs dfs -mkdir -p /user/process

Refer HDFS Commands Reference List for HDFS commands.

Then copy the file wordcount.txt to this directory.


hdfs dfs -put /netjs/MapReduce/wordcount.txt /user/process

Word count example MapReduce code

Now you can write your wordcount MapReduce code. WordCount example reads text files and counts the frequency of the words. Each mapper takes a line of the input file as input and breaks it into words. It then emits a key/value pair of the word (In the form of (word, 1)) and each reducer sums the counts for each word and emits a single key/value with the word and sum.

Refer How MapReduce Works in Hadoop to see in detail how data is processed as (key, value) pairs in map and reduce tasks.

In the code there is a Mapper class (MyMapper) with map function and a Reducer class (MyReducer) with a reduce function.


import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class WordCount {
    // Map function
    public static class MyMapper extends Mapper<LongWritable, Text, Text, IntWritable>{
         private Text word = new Text();
         public void map(LongWritable key, Text value, Context context) 
                 throws IOException, InterruptedException {
             // Splitting the line on spaces
             String[] stringArr = value.toString().split("\\s+");
             for (String str : stringArr) {
                 word.set(str);
                 context.write(word, new IntWritable(1));
             }           
         }
    }

    // Reduce function
    public static class MyReducer extends Reducer<Text, IntWritable, Text, IntWritable>{        
        private IntWritable result = new IntWritable();
        public void reduce(Text key, Iterable<IntWritable> values, Context context) 
                throws IOException, InterruptedException {
          int sum = 0;
          for (IntWritable val : values) {
            sum += val.get();
          }
          result.set(sum);
          context.write(key, result);
        }
    }
    public static void main(String[] args)  throws Exception{
        Configuration conf = new Configuration();

        Job job = Job.getInstance(conf, "WC");
        job.setJarByClass(WordCount.class);
        job.setMapperClass(MyMapper.class);    
        job.setReducerClass(MyReducer.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));
        System.exit(job.waitForCompletion(true) ? 0 : 1);
    }
}

You will also need to add at least the following Hadoop jars so that your code can compile. You will find these jars inside the /share/hadoop directory of your Hadoop installation. With in /share/hadoop path look in hdfs, mapreduce and common directories for required jars.


hadoop-common-2.9.0.jar
hadoop-hdfs-2.9.0.jar
hadoop-hdfs-client-2.9.0.jar
hadoop-mapreduce-client-core-2.9.0.jar
hadoop-mapreduce-client-common-2.9.0.jar
hadoop-mapreduce-client-jobclient-2.9.0.jar
hadoop-mapreduce-client-hs-2.9.0.jar
hadoop-mapreduce-client-app-2.9.0.jar
commons-io-2.4.jar

Creating jar of your MapReduce code

Once you are able to compile your code you need to create jar file. In the eclipse IDE righ click on your Java program and select Export – Java – jar file.

Running the code

You can use the following command to run the program. Assuming you are in your hadoop installation directory.


bin/hadoop jar /netjs/MapReduce/wordcount.jar org.netjs.WordCount  /user/process /user/out

/netjs/MapReduce/wordcount.jar is the path to your jar file.

org.netjs.WordCount is the fully qualified path to your Java program class.

/user/process– path to input directory.

/user/out– path to output directory.

One your word count MapReduce program is succesfully executed you can verify the output file.


hdfs dfs -ls /user/out

Found 2 items
-rw-r--r--   1 netjs supergroup          0 2018-02-27 13:37 /user/out/_SUCCESS
-rw-r--r--   1 netjs supergroup         77 2018-02-27 13:37 /user/out/part-r-00000

As you can see Hadoop framework creates output files using part-r-xxxx format. Since only one reducer is used here so there is only one output file part-r-00000. You can see the content of the file using the following command.


hdfs dfs -cat /user/out/part-r-00000

Hadoop      1
Hello       1
MapReduce   2
This        1
first       1
is          1
my          1
program.    2
wordcount   1

That's all for this topic Word Count MapReduce Program in Hadoop. If you have any doubt or any suggestions to make please drop a comment. Thanks!

Related Topics

You may also like -

>>>Go to Hadoop Framework Page

Word Count MapReduce Program in Hadoop

Trending Articles

A Rage Up In The Raleigh House: Man Says He Watched Jimmy Hoffa’s Body Get...

Moondru Mudichu 27-05-2016 – Polimer tv Serial

Korean Sex Porn Videos: XXX Videos & Free Porn Movies

Greg Gutfeld

Shatta Wale – You Shock Me (Prod. by Willis Beatz)

£2 million worth of cocaine estimated in supply plot by jailed Grantham men

Foreigner found dead in Kg Sungai Teraban area

Bas Tum Tak Lyrics Translation (Raanjhnaa/ Raanjhanaa/ Raanjhana)

Camila Ballon Arrested by Miami-Dade County Corrections on May 06, 2020

'Gleeful' street attacker broke face of victim who was on night out in...

Practice Sheet of Pronoun References for HSC Students

Boyfriend charged with murder of teen footballer

Nottingham court listings: Who has been appearing before magistrates?

Ae Dil Hai Mushkil (2016) (Music.Videos) Untouched - BluRay - AVC - TrueHD...

Neem Baba Extra Questions Answer Class 6 English Poorvi

Property developer set up cannabis factory to help pay off debts...

Best Suvichar in Hindi |बेस्ट सुविचार |शुभ विचार हिंदी में

Elle Duncan’s Husband Omar Abdul Ali

Cannot insert multiple records in SysCompileILTable (SysCompileILTable).The...

Man Chax ft Alifatiq – Kuti Univomele (Prod By Alifatiq)