Friday, 8 March 2013

How many maps are there in a particular Job?


The number of maps is usually driven by the total size of the inputs, that is, the total number of blocks of the input files.
Generally it is around 10-100 maps per-node. Task setup takes awhile, so it is best if the maps take at least a minute to execute.
Suppose, if you expect 10TB of input data and have a blocksize of 128MB, you'll end up with
82,000 maps, to control the number of block you can use the mapreduce.job.maps parameter (which only provides a hint to the framework).
Ultimately, the number of tasks is controlled by the number of splits returned by the InputFormat.getSplits() method (which you can override).

No comments:

Post a Comment