Entries tagged [apache]

Thursday December 17, 2015

Offheaping the Read Path in Apache HBase: Part 1 of 2

Detail on the work involved making it so the Apache HBase read path could work against off heap memory (without copy).[Read More]

Offheaping the Read Path in Apache HBase: Part 2 of 2

by HBase Committers Anoop Sam John, Ramkrishna S Vasudevan, and Michael Stack

This is part two of a two part blog. Herein we compare before and after off heaping. See part one for preamble and detail on work done.

Performance Results

There were two parts to our performance measurement.  

  1. Using HBase’s built-in Performance Evaluation (PE) tool.  

  2. Using YCSB to measure the throughput.

The PE test was conducted on a single node machine.  Table is created and loaded with 100 GB of data.  Table has one CF and one column per row. Each cell value size is 1K. Configuration of the node :

System configuration

CPU : Intel(R) Xeon(R) CPU with 8 cores.
RAM : 150 GB
JDK : 1.8

HBase configuration

  <value> 104448 </value>

20% of the heap is allocated to the L1 cache (LRU cache). When L2 is enabled, L1 holds no data, just index and bloom filter blocks. 102 GB off heap space is allocated to the L2 cache (BucketCache)

Before performing the read tests, we have made sure that all data is loaded into BucketCache so there is no i/o.  The read workloads of PE tool run with multiple threads.  We considerthe average completion time for each of the threads to do the required reads.

1. Each thread does 100 row multi get operations for 1000000 times. We can see that there is a 55 – 82 % reduction in average run time. See the graph below for the test for 5 to 75 threads reading. Y axis shows average completion time, in seconds, for one thread.

Here each thread is doing 100000000 row get and converting this to throughput numbers we can see

5 Threads

10 Threads

20 Threads

25 Threads

50 Threads

75 Threads

Throughput Without Patch







Throughput With HBASE-11425







Throughput gain







So without the patch case, at the 20 threads level, the system goes to peak load situation and throughput starts to fall off. But with HBASE-11425 this is not the case and even with 75 threads. It is mostly linear scaling with more loading.  The major factor which helps us here is reduced GC activity.

2. Each thread is doing a range scan of 10000 rows with filtering of all the data on the server side. The filtering is done to see the server side gain alone and avoid any impact of network and/or client app side bottleneck. Each thread is doing the scan operation 10000 times. We can see that there is 55 – 74 % reduction in average run time of each thread. See below the graph for the test for 5 to 75 threads reading. Y axis shows average completion time, in seconds, for one thread.

3.  Another range scan test is performed with part of data returned back to client.  The test will return 10% of total rows back to the client and the remaining rows are filtered out at the server. The below graph is for a test with 10, 20, 25 and 50 threads. Again Y axis gives the average thread completion time, in seconds. The gain is  28 – 39% latency.

The YCSB test is done in same cluster with 90 GB of data. We had similar system configuration and HBase configuration as for the PE tests.

The YCSB setup involves creating a table with a single column family with around 90 GB of data. There are 10 columns each with 100 bytes of data (each row having 1k of data). All the readings are taken after ensuring that the entire data set is in the BucketCache. The multi get test includes each thread doing 100 row gets and 5000000 such operations. Range scan does random range scans with 1000000 operations.


With HBASE-11425(Ops/sec)















With HBASE-11425(Ops/Sec)














For multi get there is 20 – 160 % throughput gain whereas for range scan it is 20 - 80%.

With HBASE-11425 we can see there is linear throughput increase with more threads whereas old code starts performing badly when more threads (See when 50 and 75 threads)

GC graphs

With HBASE-11425, we serve data directly from the off heap cache rather than copy each of the blocks on heap for each of the reads.  So we should be doing much better with respect to GC on the  RegionServer side.  Below are the GC graph samples taken on one RegionServer during the PE test run.  We can clearly notice that with the changes in HBASE-11425, we are doing much less (better) GC.

MultiGets – without HBASE-11425 (25 threads)


Multigets – with HBASE-11425(25 threads)


ScanRange10000 – without HBASE-11425 (20 threads)


ScanRange10000 – with HBASE-11425 (20 threads)


Future Work

One implication of the findings above is that we should run with off heap Cache on always. We were reluctant to do this in the past when reads from the off heap cache took longer. This is no longer the case. We will look into making this option on by default. Also, this posting has described our conversion of the read pipeline to make it run with offheap buffers. Next up, naturally, would be making the write pipeline offheap.


Some parts of this work made it into branch-1 but to run with a fully off heap read path, you will have to wait on the HBase 2.0 release which should be available early next year (2016). Enabling L2 (BucketCache) in off heap mode will automatically turn on the off heap mechanisms described above. We hope you enjoy these improvements made to the Apache HBase read path.

Wednesday May 01, 2013

Migration to the New Metrics Hotness – Metrics2

by Elliott Clark

HBase Committer and Cloudera Engineer 

NOTE: This blog post describes the server code of HBase. It assumes a general knowledge of the system. You can read more about HBase in the blog posts here.


HBase is a distributed big data store modeled after Google’s Bigtable paper. As with all distributed systems, knowing what’s happening at a given time can help  spot problems before they arise, debug on-going issues, evaluate new usage patterns, and provide insight into capacity planning.

Since October 2008, version 0.19.0 , (HBASE-625) HBase has been using Hadoop’s metrics system to export metrics to JMX, Ganglia, and other metrics sinks. As the code base grew, more and more metrics were added by different developers. New features got metrics. When users needed more data on issues, they added more metrics. These new metrics were not always consistently named, and some were not well documented.

As HBase’s metrics system grew organically, Hadoop developers were making a new version of the Metrics system called Metrics2. In HADOOP-6728 and subsequent JIRAs, a new version of the metrics system was created. This new subsystem has a new name space, different sinks, different sources, more features, and is more complete than the old metrics. When the Metrics2 system was completed, the old system (aka Metrics1) was deprecated. With all of these things in mind, it was time to update HBase’s metrics system so HBASE-4050 was started.  I also wanted to clean up the implementation cruft that had accumulated.


The implementation details are pretty dense on terminology so lets make sure everything is defined:

  • Metric: A measurement of a property in the system.

  • Snapshot: A set of metrics at a given point in time.

  • Metrics1: The old Apache Hadoop metrics system.

  • Metrics2: The new overhauled Apache Hadoop Metrics system.

  • Source: A class that exposes metrics to the Hadoop metrics system.

  • Sink: A class that receives metrics snapshots from the Hadoop metrics system.

  • JMX: Java Management Extension. A system built into java that facilitates the management of java processes over a network; it includes the ability to expose metrics.

  • Dynamic Metrics: Metrics that come and go. These metrics are not all known at compile time; instead they are discovered at runtime.


The Hadoop Metrics2 system implementations in branch-1 and branch-2 have diverged pretty drastically. This means that a single implementation of the code to move metrics from HBase to metrics2 sinks would not be performant or easy. As a result I created different hadoop compatibility shims and a system to load a version at runtime. This led to using ServiceLoader to create an instance of any class that touched parts of Hadoop that had changed between branch-1 and branch-2.

Here is an example of how a region server could request a Hadoop 2 version of the shim for exposing metrics about the HRegionServer. (Hadoop 1’s compatibility jar is shown in dotted lines to indicate that it could be swapped in if Hadoop 1 was being used)

This system allows HBase to support both Hadoop 1.x and Hadoop 2.x implementations without using reflection or other tricks to get around differences in API, usage, and naming.

Now that HBase can use either the Hadoop 1 or Hadoop 2 versions of the metrics 2 systems, I set about cleaning up what metrics HBase exposes, how those metrics are exposed, naming, and performance of gathering the data.

Metrics2 uses either annotations or sources to expose metrics. Since HBase can’t require any part of the metrics2 system in the core classes I exposed all metrics from HBase by creating sources. For metrics that are known ahead of time I created wrappers around classes in the core of HBase that the metrics2 shims could interrogate for values. Here is an example on how HRegionServer’s metrics(the non-dynamic metrics) are exposed :

The above pattern can be repeated to expose a great deal of the metrics that HBase has. However metrics about specific regions are still very interesting but can’t be exposed following the above pattern. So a new solution that would allow metrics about regions to be exposed by whichever HRegionServer is hosting that region was needed. To complicate things further Hadoop’s metrics2 system needs one MetricsSource to be responsible for all metrics that are going to be exposed through a JMX mbean. In order for metrics about regions to be well laid out, HBase needs a way to aggregate metrics from multiple regions into one source. This source will then be responsible for knowing what regions are assigned to the regionserver. These requirements led me to have one aggregation source that contains pseudo-sources for each region. These pseudo-sources each contain a wrapper around the region. This leads to something that looks like this:


That’s a lot of work to re-do a previously working metrics system, so what was gained by all this work? The entire system is much easier to test in unit and systems tests. The whole system has been made more regular; that is everything follows the same patterns and naming conventions. Finally everything has been rewritten to be faster.

Since the previous metrics have all been added on as needed they were not all named well. Some metrics were named following the pattern: “metricNameCount” others were named following “numMetricName” while still others were named like “metricName_Count”. This made parsing hard and gave a generally chaotic feel. After the overhaul metrics that are a counter start with the camel cased metric name followed by the suffix “Count.” The mbeans were poorly laid out. Some metrics we spread out between two mbeans. Metrics about a region were under an mbean named Dynamic, not the most descriptive name. Now mbeans are much better organized and have better descriptions.

Tests have found that single threaded scans run as much as 9% faster after HBase’s old metrics system has been replaced. The previous system used lots of ConcurrentHashMap’s to store dynamic metrics. All accesses to mutate these metrics required a lookup into these large hash maps. The new system minimizes the use of maps. Instead every region or server exports metrics to one pseudo source. The only changes to hashmaps in the metrics system occurs on region close or open.


Overall the whole system is just better. The process was long and laborious, but worth it to make sure that HBase’s metrics system is in a good state. HBase 0.95, and later 0.96, will have the new metrics system.  There’s still more work to be completed but great strides have been made.



Hot Blogs (today's hits)

Tag Cloud