- ●Functional operations do not modify data structures. They always create new data.
- ●Original data still exists in unmodified form.
- ●Order of operations does not matter
ÄMAP is the name of a higher-order function that applies a given function element-wise to a list of elements and returns a list of results
ÄOriginated in functional programming but common in many languages
square x = x * x
map square [1,2,3,4,5] will return [1,4,9,16.25]
ÄNotice that I can process “map square” in parallel:
map square [1,2,3] -> [1,4,9]
map square [4,5] -> [16,25]
●REDUCE (also known fold, accumulate, compress or inject) is a family of higher-order functions that iterate an arbitrary function over a data structure in some order and build up a return value. Typically, a REDUCE deals with two things: a combining function and a list of elements of some data structure. The fold then proceeds to combine elements of the data structure using the function in some systematic way.
●MAX (1, 2, 3, 10,15, 20) -> 20
●SUM (1, 7, 10) -> 18
●In Hadoop REDUCE function always takes MAP function as an input
ÄSome Examples of map/reduce:
●Count of URL Access Frequency: The map function processes logs of web page requests and outputs <URL, 1>. The reduce function adds together all values for the same URL and emits a <URL, total count> pair
●Reverse Web-Link Graph: The map function outputs <target, source> pairs for each link to a target URL found in a page named "source". The reduce function concatenates the list of all source URLs associated with a given target URL and emits the pair: <target, list(source)>
●Count Words in the Document: The map function outputs each word plus an associated count of occurrences ("1" in this case). The reduce function sums together all the counts returned for a particular word.
Difference between traditional and Mapreduce Programming: