Hive Architecture

Hive Architecture

Hive Architecture: Command line interface: It’s the default and the most common way of accessing hive. Hiveserver : Runs ...

Hive Introduction

Hive Introduction

  Hive:       Hive is a data warehousing package built on top of Hadoop.       It facilitates querying and managing large datase...

UDF

UDF

UDF: There are times when pigs built in operators and functions will not suffice. Pig provides the ability to implement your own ...

Parallelism

Parallelism

Parallelism can be incorporated by having multiple reducers. The number of reducers can be set explicitly using parallel keyword. ...

Inner Join

Inner Join

Joins Overview: Critical tool for data processing. Pig supports ¨       Inner Joins ¨       Outer joins ¨       Full joins ...

Writing Pig Scripts

Writing Pig Scripts

Pig Scripts: Step1:   Input file a 0,1,2 1,3,4 Step2: Create a file by name   pigex1.pig and add the below code to it /* ...

COGROUP

COGROUP

COGROUP: cogroup is a generalization of group.Instead of collecting records of one input based on a key, it collects records of n ...

no image

Twitter Example

/*  This program will find out the no of user's that a particular userID is following  */ Inputs: [User_ID]    [Follower_ID] 12       ...

Pig WordCount

Pig WordCount

Example 5: Word count example Input: Code: Output: Orange 10 Banana 10 Mange 10 Notes: For tuples, flatten ...

Union and Split

Union and Split

Union: Pig Latin provides union to put two data sets together by concatenating them instead of joining them. Unlike union in SQL, Pig doe...

Pig Commands

Pig Commands

Dump command: DUMP command is used for development only. If you DUMP an alias, the content is small enough to show on the screen...

Ex: Pig Filtering

Ex: Pig Filtering

Input data: Commands: Output:                    a 4                    b 4

Ex: Pig Aggregation

Ex: Pig Aggregation

Input: Step 1: Enter into grunt shell Pig –x local Step 2: Load data log = LOAD ‘/var/lib/hadoop-0.20/inputs/p...

Pig Compilation

Pig Compilation

Pig undergoes some steps when a Pig Latin Script is converted into MapReduce jobs. After performing the basic parsing and semantic check...

Pig Data Types

Pig Data Types

Data Models: •Supports 4 basic types – Atom : a simple atomic value (int, long, double, string)               ex: „Edureka. – Tuple : ...

 
Toggle Footer