HBase Goes Realtime
Wednesday, June 10, 2009 Santa Clara Marriott
Quick Overview » » » » » » »
Who are we? What is HBase? HBase 0.20 Primary Goal HBase 0.20 Architecture and Specifics HBase 0.20 By The Numbers Zookeeper Integration What’s Next?
Who are we? » Jonathan Gray › › › › ›
CoFounder and CTO, Streamy.com Background in CE, databases, distributed systems User of technology as a competitive advantage Contributing to HBase for ~1 year HBase in production for ~9 months
» JeanDaniel Cryans › › › ›
HBase Committer Graduate student at the Université du Québec HBase consultant currently working for OpenPlaces.com Contributing to HBase for ~18 months
What is HBase? » HBase is a… › › › › › › ›
Sorted, Distributed, ColumnOriented, MultiDimensional, HighlyAvailable, HighPerformance, Persisted Storage System
HBase adds random access reads and writes atop HDFS Billions of Rows * Millions of Columns * Thousands of Versions
HBase 0.20 Primary Goal » First ever Performance Release 1. Random Access Time 2. Scan Time 3. Insert Time
» As a randomaccess store, we are well suited for the storing and serving of Web applications › But high latency and variability (100s of ms to seconds) has reduced the usefulness of HBase and required the use of external caching in the past
HBase 0.20 Architecture » The Guiding Philosophy – Unjavafy Everything! › › › › › ›
Zerocopy reads Blockbased storage, reading, and indexing Drastically reduce Object instantiation Eliminate widespread usage of Trees Sorted merges using Heap structures Fast and intelligent caching with memoryawareness
» Effort Lead By… › Jonathan Gray and Erik Holstad, Streamy.com › Michael Stack, Powerset/Microsoft › Ryan Rawson, StumbleUpon
HBase 0.20 Architecture – Storage » New Key Format – KeyValue › Contains only (byte [] buf, int offset, int length) › Compact binary format with binary comparators › Our “pointer” to keys inside blocks
» New File Format – HFile › › › ›
Originally based on TFile (HADOOP3315) and BigTable Block based binary format with a block index Contains any number of Meta blocks Persisted storage of List
HBase 0.20 Architecture – API » New Query API › › › ›
Put, Get, Scan, Delete operations Extended support for versioning Drastically reduces API size and complexity An API that more closely mirrors implementation
» New Result API and optimized Serialization › › › ›
Result is just a wrapper for KeyValue[] Userfriendly Trees are built ondemand, clientside Deserialization allocates a single byte[] for all KVs Zerocopy building, single allocation receiving
HBase 0.20 Architecture – Algorithms » New Scanners – KeyValueScanner / KeyValueHeap › Replace linear sort logic with an encapsulated Heap › Abstract the handling of versions, deletes, query params › Now capable of processing individual rows with millions of columns and versions › Linear (or worse) to Logarithmic, Logarithmic to Constant
» New Block Cache Concurrent LRU › › › ›
Backed by ConcurrentHashMap LRU eviction with scanresistance and block priorities Memorybound using HeapSize interface Nonblocking and unsynchronized LRU map
HBase 0.20 By The Numbers (Uncached) » Tall Table: 1 Million Rows with a single Column › Sequential insert – 24 seconds (.024 ms/row) › Random reads – 1.42 ms/row (average) › Full scan – 11 seconds (117 ms/10,000 rows, .011ms/row)
» Wide Table: 1000 Rows with 20,000 Columns each › Sequential insert – 312 seconds (312 ms/row) › Random reads – 121 ms/row (average) › Full scan – 146 seconds (14.6 seconds/100 rows, 146ms/row)
» Fat Table: 1000 Rows with 10 Columns,1MB values › Sequential insert – 68 seconds (68 ms/row) › Random reads – 56.92 ms/row (average) › Full scan – 35 seconds (3.53 seconds/100 rows, 35ms/row) Each test yielded >1 region, additional rows have no impact on performance
HBase 0.20 Performance Conclusion » We surprised even ourselves › Random read times similar to that of an RDBMS • 20100 times faster with far less variability
› Scan times reduced • 30 times faster than previous versions
› Insert times reduced • 210 times faster with less than half the memory usage
» We improved our performance by more than an order of magnitude in most cases › While drastically improving our memory usage and code readability
Zookeeper Integration
Why? » » » »
Takes 2 mins to figure a RegionServer’s death Clients have to ask Master for ROOT address Managing shared state in HBase is a zoo ;) And...
»Master is a SPOF!
Zookeeper? • Project under Hadoop started by Y! • Centralized service for maintaining configuration information, naming, providing distributed synchronization, and group services. • Highly available when used on an ensemble of machines, typically 5 or more. • ZK’s data model is a simple namespace with permanent and ephemeral nodes.
Tough Decisions » Should we impose the usage of a ZK quorum on every setup? » How much should we rely on ZK for HA, are there better alternate solutions for some of our problems? » Should we have an HBase implementation of ZK ? » Should we package our own version of ZK?
Major Integration Points » » » » » »
Master address is stored in ZK Master election is a race for that lock ROOT address is also stored in ZK Region Servers are all registered in ZK The RSs watch the Master’s node Backup Masters are watching both Master’s node and a “cluster state” node
What it Changes for You » Standalone and pseudodistributed setups: › a ZK server that listens on localhost is started for you. It starts/stops with the rest of the cluster.
» Fullydistributed setup: › poss. to keep the managed ZK server but have to make it point on a nonlocal IP/hostname. › better is to get a quorum, can also use it for other purposes, for higher availability.
Fullydistributed setup » What you have to do with ZK: › hbasesite.xml: set hbase.cluster.distributed to true, also notice that hbase.master is deprecated. › hbaseenv.sh: export HBASE_MANAGES_ZK=false › zoo.cfg: server.0=… server.1=… etc. You also have to configure those servers per ZK doc.
» You want backup masters? › ${HBASE_HOME}/bin/hbasedaemon.sh start master › It’s also a good idea to set hbase.master.dns.nameserver and hbase.master.dns.interface to have them binding at the right place.
New Features from ZK integration in 0.20 » No more SPOF › Automatic Master failover
» Rolling upgrades of point releases » Modify some cluster configuration without full cluster restart
What’s next? » More performance and reliability › 0.20 was mostly a RegionServer rewrite › 0.21 will rewrite Master with better ZK integration
» HBase 0.21 Roadmap › Decentralized Master responsibilities + More ZK • • • •
› › › ›
Further capability to modify configurations at run time State sharing via ZK nodes Ephemeral nodes for region ownership Distributed queue for region assignment
Languageagnostic, binary RPC Native C/C++ client library MultiDC Replication Further optimizations on algorithms and data structures
More Information about HBase » HBase Website and Wiki › http://www.hbase.org › http://wiki.apache.org/hadoop/Hbase
» Mailing List › http://hadoop.apache.org/hbase/mailing_lists.html
» IRC Channel › #hbase on Freenode › All committers and core contributors are here
» Follow us on Twitter! › @hbase