Friday 8 March 2013

Explain the WordCount implementation via Hadoop framework ?


We will count the words in all the input file flow as below
input
Assume there are two files each having a sentence
Hello World Hello World (In file 1)
Hello World Hello World (In file 2)

Mapper : There would be each mapper for the a file
For the given sample input the first map output:
< Hello, 1>
< World, 1>
< Hello, 1>
< World, 1>

The second map output:
< Hello, 1>
< World, 1>
< Hello, 1>
< World, 1>

 Combiner/Sorting (This is done for each individual map)
So output looks like this
The output of the first map:
< Hello, 2>
< World, 2>
The output of the second map:
< Hello, 2>
< World, 2>
 Reducer :
It sums up the above output and generates the output as below
< Hello, 4>
< World, 4>
 Output
Final output would look like
Hello 4 times
World 4 times



1 comment: