Monday 2 June 2014

MapReduce - the concept behind Big Data. Links to resources and a summary of what it actually is.

Here is the link to the Google Research Publication on MapReduce.
Link to the Wikipedia page on MapReduce.
There is also an extensive set of documentation about MapReduce here.

Essentially the data has a MAP function (filter and sort) performed on it then a REDUCE (summary) function performed on the result of the MAP function.  It uses parallel processing to do these  (a bit like a multi level tree structure) controlled by a framework which makes it quick, increases fault tolerance and reduces redundancy.  This is very similar to the structure of queries in Teradata using AMPS where one AMP controls the entire query.

The possibilities of this approach to me are exciting as it enables us to process large amounts of data in a short amount of time and give an end result that is worthwhile.



No comments:

Post a Comment

Note: only a member of this blog may post a comment.