Ex: Pig Aggregation


Step 1: Enter into grunt shell
Pig –x local

Step 2: Load data
log = LOAD ‘/var/lib/hadoop-0.20/inputs/pigfile1’ AS (user, id, welcome);

On loading data and on executing dump command on the above log, data is stored as shown below.

Step 3: Group the log by user id
grpd= GROUP log BY user;
On dumping grpd,grpd contains the below content

Step 4: 
cntd= FOREACH grpd GENERATE group, COUNT(log);

Step 5:  Store the output to a file
STORE cntd INTO ‘/var/lib/hadoop-0.20/inputs/pigfile1output2’;

The above is the final output.

