HBase - Who needs a Master?
By Matteo Bertozzi (mbertozzi at apache dot org), HBase Committer and Engineer on the Cloudera HBase Team.
At first glance, the Apache HBase architecture appears to follow a master/slave model where the master receives all the requests but the real work is done by the slaves. This is not actually the case, and in this article I will describe what tasks are in fact handled by the master and the slaves.
HBase provides low-latency random reads and writes on top of HDFS and it’s able to handle petabytes of data. One of the interesting capabilities in HBase is Auto-Sharding, which simply means that tables are dynamically distributed by the system when they become too large.
Regions and Region Servers
The basic unit of scalability, that provides the horizontal scalability, in HBase is called Region. Regions are a subset of the table’s data, and they are essentially a contiguous, sorted range of rows that are stored together.
Initially, there is only one region for a table. When regions become too large after adding more rows, the region is split into two at the middle key, creating two roughly equal halves.
Looking back at the HBase architecture the slaves are called Region Servers. Each Region Server is responsible to serve a set of regions, and one Region (i.e. range of rows) can be served only by one Region Server.
The HBase Architecture has two main services: HMaster that is responsible for coordinating Regions in the cluster and execute administrative operations; HRegionServer responsible to handle a subset of the table’s data.
HMaster, Region Assignment and Balancing
The HBase Master coordinates the HBase Cluster and is responsible for administrative operations.
A Region Server can serve one or more Regions. Each Region is assigned to a Region Server on startup and the master can decide to move a Region from one Region Server to another as the result of a load balance operation. The Master also handles Region Server failures by assigning the region to another Region Server.
The mapping of Regions to Region Servers is kept in a system table called META. By reading META, you can identify which region is responsible for your key. This means that for read and write operations, the master is not involved at all, and clients can go directly to the Region Server responsible to serve the requested data.
Locating a Row-Key: Which Region Server is responsible?
To put or get a row clients don’t have to contact the master, clients can contact directly the Region Server that handles the specified row. In the case of a scan clients can contact directly the set of Region Servers responsible for handling the set of keys.
To identify the Region Server, the client does a query on the META table.META is a system table, used to keep track of regions. It contains the server name and a region identifier comprised of a table name and the start row-key. By looking at the start-key and the next region start-key clients are able to identify the range of rows contained in a a particular region.
The client keeps a cache for the region locations. This avoids the need for clients to hit the META table every time an operation on the same region is issued. In case of a region split or move to another Region Server (due to balancing, or assignment policies) the client will receive an exception as response and the cache will be refreshed by fetching the updated information from the META table.
Since META is a table like the others, the client has to identify on which server META is located. The META locations are stored in a ZooKeeper node on assignment by the Master, and the client reads directly the node to get the address of the Region Server that contains META.
The original design was based on BigTable with another table called -ROOT- containing the META locations and ZooKeeper pointing to it. HBase 0.96 removed that in favor of ZooKeeper only since META cannot be split and therefore consists of a single region.
Client API - Master and Regions responsibilities
The HBase java client API is composed of two main interfaces.
HBaseAdmin: allows interaction with the “table schema" by creating/deleting/modifying tables, and it allows interaction with the cluster by assigning/unassigning regions, merging regions together, calling for a flush, and so on. This interface communicates with the Master.
HTable: allows the client to manipulate the data of a specified table, by using get, put, delete and all the other data operations. This interface communicates directly with the Region Servers responsible for handling the requested set of keys.
Those two interfaces have separate responsibilities: HBaseAdmin is only used to execute admin operations and communicate with the Master while the HTable is used to manipulate data and communicate with the Regions.
As we’ve seen in this article, having a Master/Slave architecture does not mean that each operation goes through the master. To read and write data the HBase client, in fact, goes directly to the specific Region Server responsible to handle the row keys for all the data operations (HTable). The Master is used by the client only for table creation, modification and deletion operations (HBaseAdmin).
Although there exists a concept of a Master, the HBase client does not depend on it for data operations and the cluster can keep serving data even if the master goes down.