Critical tool for data processing.
¨ Inner Joins
¨ Outer joins
¨ Full joins
How to join in Pig:
1.Load records into a bag from input #1
2.Load records into a bag from input #2
3.Join the datasets(bags) by provided join key
Default join is inner join:
Rows are joined where the keys match
Rows that do not have matches are not included in the result
user5,Yet another blog,1343182044444
Inner join schema:
Join reuses the names of the input fields and prepends the name of the input bag.
Inner join with multiple keys:Userinfo = join aa By (user,date), bb BY (user,date);