Breaking News
Loading...

Inner Join





Joins Overview:
Critical tool for data processing.
Pig supports
¨      Inner Joins
¨      Outer joins
¨      Full joins
How to join in Pig:
Join steps:
1.Load records into a bag from input #1
2.Load records into a bag from input #2
3.Join the datasets(bags) by provided join key
Default join is inner join:
Rows are joined where the keys match
Rows that do not have matches are not included in the result







Inputs:
pigjoin1.txt:
user1,Funny story,1343182026191
user2,Cool deal,1343182022222
user5,Yet another blog,1343182044444
user4,Interesting post,1343182011111

pigjoin2.txt:
user1,12,1343182026191
user2,7,1343182021111
user3,0,1343182023333
user4,50,1343182027777


Code:

Output:
Inner join schema:
Join reuses the names of the input fields and prepends the name of the input bag.

Inner join with multiple keys:
Userinfo = join aa By (user,date), bb BY (user,date);
- See more at: http://labstrikes.blogspot.in/2012/08/adsense-middle-blog-post.html#sthash.gQgSkqx8.dpuf
 
Toggle Footer