Breaking News


cogroup is a generalization of group.Instead of collecting records of one input based on
a key, it collects records of n inputs based on a key. The result is a record with a key
and one bag for each input. Each bag contains all records from that input that have the
given value for the key:
A = load 'input1' as (id:int, val:float);
B = load 'input2' as (id:int, val2:int);
C = cogroup A by id, B by id;
describe C;
C: {group: int,A: {id: int,val: float},B: {id: int,val2: int}}

Cogroup is a group of one data set. But in the case of more than one data sets, cogroup will group all the data sets and join them based on the common field. Hence, we can say that cogroup is a group of more than one data set and join of that data set as well
File a:                                File b
0,1,2                                    0,5,2
1,3,4                                    1,7,8



- See more at:
Toggle Footer