Breaking News

Anatomy of File Read in hadoop

Anatomy of file read in Hadoop:
Consider a Hadoop cluster with one name node and two racks named R1 and R2 in a data center D1. Each rack has 4 nodes and they are uniquely identified as R1N1, R1N2 and so on. The replication factor is set to 3 and the HDFS block size is set to 64 MB(128MB in Hadoop V2) by default.

  1. Name node stores the HDFS block information like file location, permission, etc. in files called FSImage and edit logs. Files are stored in HDFS as blocks. These block information are not saved in any file. Instead it is gathered every time the cluster is started and this information is stored in namenode’s memory.
  2. Replication: Assuming the replication factor is 3; When a file is written from a data node (say R1N1), Hadoop attempts to save the first replica in same data node (R1N1). Second replica is written into another node (R2N2) in a different rack (R2). Third replica is written into another node (R2N1) in the same rack (R2) where the second replica was saved.
  3. Hadoop takes a simple approach in which the network is represented as a tree and the distance between two nodes is the sum of their distances to their closest common ancestor. The levels can be like; “Data Center” > “Rack” > “Node”. Example; ‘/d1/r1/n1’ is a representation for a node named n1 on rack r1 in data center d1. Distance calculation has 4 possible scenarios as;
    1. distance(/d1/r1/n1, /d1/r1/n1) = 0 [Processes on same node]
    2. distance(/d1/r1/n1, /d1/r1/n2) = 2 [different node is same rack]
    3. distance(/d1/r1/n1, /d1/r2/n3) = 4 [node in different rack in same data center]
    4. distance(/d1/r1/n1, /d2/r3/n4) = 6 [node in different data center]
Consider a sample.csv file of size 192 MB to be saved in to the cluster. The file is divided into 3 blocks of 64 MB each (B1, B2, B3) and it is stored in different data nodes as shown above. Along with the data a checksum is stored in each block to ensure that data read is done without any errors.
When the cluster is started, the metadata in the name node will look as shown in the fig below.

Let’s assume the file sample.csv is read from R1N2 then the HDFS client program will run on R1N2’s JVM.The Client Method calls the open() method on DistributedFileSystem(Subclass of FileSystem) Java class.DFS makes a RPC call and returns the blocks of the file. NN returns the address of the DN ORDERED with respect to the node from where the read is performed. The block information is saved in DFSInputStream which is wrapped in FSDataInputStream. In response to ‘ ()’, HDFS Client receives this FSDataInputStream.

 After obtaining the FSDataInputstream handle,HDFS Client invokes read() on the stream which reads the blocks in order. DFSIS connects to the closest node (R1N1) to read block B1. DFSIS connects to data node and streams data to client, which calls read() repeatedly on the stream. Also DFSIS verifies checksums for the data transferred to client. When the block is read completely, DFSIS closes the connection. For reading second block, the previous connection is closed and a fresh connection is made to the closest node (R1N1) of block B2.Similarly it reads the blocks corresponding to the complete file. Once the complete file is read, the HDFS client closes the connection.
Below figure pictorially represents the File read anatomy in Hadoop:

 DataNode Connection Error:

Let’s say there is some error while connecting to R1N1. DFSIS remembers this info, so it won’t try to read from R1N1 for future blocks. Then it tries to connect to next closest node (R2N3).
DataNode CheckSum Error:

If some checksum error exist which means that the block which is trying to be read is corrupt then the Information about this corrupt block is sent to the name node and  DFSIS tries to connect to next closest node (R2N3).
- See more at:
Toggle Footer