Hadoop World: Low Latency, Random Reads From Hdfs

  • Uploaded by: Oleksiy Kovyrin
  • 0
  • 0
  • June 2020
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Hadoop World: Low Latency, Random Reads From Hdfs as PDF for free.

More details

  • Words: 745
  • Pages: 20
RadFS

Random Access DFS

DFS in a slide 



Locates DataNode, opens socket, says hi DataNode allocates a thread to stream block contents from opened position



Client reads in as it processes



Great for streaming entire blocks



Positioned reads cut across the grain

A Different Approach 

Everything is a positioned read



All interactions with DN are stateless



Wrapped ByteService interfaces −

Caching, Network, Checksum



Each configurable, can be optimized for task



Connections pooled and re-used often



Fewer threads on server

DFS Anatomy – seek + read()

DFS seek+read con't 



Locates preferred block, caches DN locations Opens new Socket and BlockReader for each random read



Reads from the Socket



Object creation means GC debt





No optimizations for repeated reads of the same data Threads will consume resources on server after client hangup – files aren't automatically closed

RadFS Overview

RadFS seek+read, con't 

Transparently caches frequently read data



Automatically pools/manages file handles



Reduces network congestion (in theory)



Lower DataNode workload −

3 threads total instead of 1 per Xceiver



Configurable on client side for the task at hand



Network latency penalty on long-running reads



Checksum implementation means 2 reads per random read if caching is disabled

Implementation Notes 





Checksum is currently generated by wrapping CheckSumFileSystem around RadFileSystem −

Inefficient, reads 2 files over dfs



Improper – what if checksum block is corrupt?

CachingByteService implements lookahead (good) by copying bytes twice (bad) Permissions happen “by accident” at namenode −

Attackable by searching blockid space on DNs



Could exchange UserAccessToken on request

Benchmark Environment 

EC2 “Medium” - 2x2GHz, 1.7GB, shared I/O



Operations against 20GB sequence file







All tests run singlethreaded from the lightly loaded namenode Fast internal network, adequate memory but not enough to page entire file All benchmarks in a given set were run on the same instance, middle value from 3 runs

Random Reads - 2k 





10,000 random reads of 2k each over the length of a 20GB file DFS averaged 7.8ms while Rad with no cache averaged 4.4ms Caching added a full 2ms – hardcoded lookahead was no help and lots of unnecessary byte copying

Random Reads – 2kb (avg in ms) 10 8 6 Column 2

4 2 0

DFS

RadFS Rad - No Cache

SequenceFile Search 

Binary search over 10gb sequence file



DFS, RadFS with various cache settings



Indicative of potential filesystem uses −

Lucene



Large numbers of secondary indices



Ease of development



Read-only RDBMS-like systems built from ETLs or other long-running process

Sequence File Binary Search 5000 searches, avg ms per search 120

100

80

60

40

Column 2

20

0

DFS

RAD0 RAD128MB RAD16MB

Streaming 







DFS is inherently faster for streaming due to the dedicated server thread Checksumming is expensive! Early radfs builds beat dfs at 1-byte read()s because they didn't have checksumming Require a PipeliningByteService for use in streaming jobs that would make requests to Datanode, stream in and checksum in a separate client-side thread

Streaming – 1GB 1b reads, time in seconds 350

300

250

200

150

Column 2

100

50

0

DFS

RAD no checksum

Streaming 1GB 2k reads, time in seconds 160

140

120

100

80 Column 2 60

40

20

0

DFS

RAD

Going forward – modular reader

Going forward - Applications 



Could improve Hbase, solves file handle problem and improves latency Could be used to create low-latency lookup formats accessible from scripting languages −

Cache is automatic, simplifying development



“Table” directory with main store file and several secondary index files generated by ETL



Lucene indices? Can be built with MapReduce

Going forward - Development 





Copy existing HDFS method of interleaving checksums directly from datanode – one read −

Audit checksumming code for CPU efficiency – reading can be CPU bound



Implement as a ByteService instead of clumsy wrapper around FileSystem. Make configurable

Implement PipeliningByteService to improve streaming by pre-fetching pages Exchange UserAccessToken at each read, could possibly use for encryption of blockid

Contribute! 

Patch is at Apache JIRA issue HDFS-516



Will be on GitHub momentarily



Goals: −

Equivalent streaming performance to DFS



Faster random read, caching option



Lower resource consumption on server



3 doable tasks above



Large configuration space to explore



Email me: [email protected]

Related Documents


More Documents from ""