MapReduce

MapReduce is a distributed programming framework and it was created based on a paper Google published in 2004.

Like any programming framework, only practice can help you to gain expertise in MapReduce framework. So this section of the training gives more focus on hands-on experience using the basic MapReduce constructs. It focuses on:
  • Distributed Programming Framework – introduction
  • MapReduce Programming Framework
  • Job Tracker
  • Task Tracker
  • Map Tasks
  • Reduce Tasks
  • Input and Output Formatter classes
  • Multiple Output Formatter Classes
  • Counter
  • Handling of Failed Tasks
  • Handling of data issues in input file
Lab work :
  • Write MapReduce programs with constructs learned in this session.
  • Run jobs on cluster
  • Monitor the progress using hadoop commands
  • Manage input and output using hadoop commands