Apache HBase

Monday March 24, 2014

HBaseCon2014, the HBase Community Event, May 5th in San Francisco


By the HBaseCon Program Committee and Justin Kestelyn

HBaseCon 2014 (May 5, 2014 in San Francisco) is coming up fast.  It is shaping up to be another grand day out for the HBase community with a dense agenda spread over four tracks of case studies, dev, ops, and ecosystem.  Check out our posted agenda grid. As in previous years, the program committee favored presentations on open source, shipping technologies and descriptions of in-production systems. The schedule is chock-a-block with a wide diversity of deploy types (industries, applications, scALE!) and interesting dev and operations stories.  Come to the conference and learn from your peer HBase users, operators and developers.

The general session will lead off with a welcome by Mike Olson of Cloudera (Cloudera is the host for this HBase community conference).

Next up will be a talk by the leads of the BigTable team at Google, Avtandil Garakanidze and Carter Page.  They will present on BigTable, the technology that HBase is based on.  Bigtable is “...the world’s largest multi-purpose database, supporting 90 percent of Google’s applications around the world.”  They will give a brief overview of “...Bigtable evolution since it was originally described in an OSDI ’06 paper, its current use cases at Google, and future directions.”

Then comes our own Liyin Tang of Facebook and Lars Hofhansl of Salesforce.com.  Liyin will speak on, “HydraBase: Facebook’s Highly Available and Strongly Consistent Storage Service Based on Replicated HBase Instances”.  HBase powers multiple mission-critical online applications at Facebook. This talk will cover the design of HydraBase (including the replication protocol), an analysis of a failure scenario, and a plan for contributions to the HBase trunk.”  Lars will explain how Salesforce.com’s scalability requirements led it to HBase and the multiple use cases for HBase there today. You’ll also learn how Salesforce.com works with the HBase community, and get a detailed look into its operational environment.

We then break up into sessions where you will be able to learn about operations best practices, new features in depth, and what is being built on top of HBase. You can also hear how Pinterest builds large scale HBase apps up in AWS, HBase at Bloomberg, OPower, RocketFuel, Xiaomi, Yahoo! and others, as well as a nice little vignette on how HBase is an enabling technology at OCLC replacing all instances of Oracle powering critical services for the libraries of the world.

There will be an optional pre-conference prep for attendees who are new to HBase by Jesse Anderson (Cloudera University).  Jesse’s talk will offer a brief Cliff’s Notes-level HBase overview before the General Session that covers architecture, API, and schema design.

Register now for HBaseCon 2014 at: https://hbasecon2014.secure.mfactormeetings.com/registration/

A big shout out goes out to our sponsors – Continuuity, Hortonworks, Intel, LSI, MapR, Salesforce.com, Splice Machine, WibiData (Gold); Facebook, Pepperdata (Silver); ASF (Community); O’Reilly Media, The Hive, NoSQL Weekly (Media) — without which HBaseCon would be impossible!

Thursday August 08, 2013

Taking the Bait

Lars Hofhansl, Andrew Purtell, and Michael Stack

HBase Committers

Information Week recently published an article titled “Will HBase Dominate NoSQL?”. Michael Hausenblas of MapR argues the ‘For’ HBase case and Jonathan Ellis of Apache Cassandra and vendor DataStax argues ‘Against’.

It is easy to dismiss this 'debate' as vendor sales talk and just get back to using and improving Apache HBase, but this article is a particularly troubling example. Here in both the ‘For’ and ‘Against’ arguments slight is being cast on the work of the HBase community. Here are some notes by way of redress:

First, Michael argues Hadoop is growing fast and because HBase came out of Hadoop and is tightly associated, ergo, HBase is on the up and up.  It is easy to make this assumption if you are not an active participant in the HBase community (We have also come across the inverse where HBase is driving Hadoop adoption). Michael then switches to a tired HBase versus Cassandra bake off, rehashing the old consistency versus eventual-consistency wars, ultimately concluding that HBase is ‘better’ simply because Facebook, the Cassandra wellspring, dropped it to use HBase instead as the datastore for some of their large apps. We would not make that kind of argument. Whether or not one should utilize Apache HBase or Apache Cassandra depends on a number of factors and is, like with most matters of scale, too involved a discussion for sound bites.

Then Michael does a bait-and-switch where he says “..we’ve created a ‘next version’ of enterprise HBase.... We brought it into GA under the label M7 in May 2013”.  The ‘We’ in the quote does not refer to the Apache HBase community but to Michael’s employer, MapR Technologies and the ‘enterprise HBase’ he is talking of is not Apache HBase.  M7 is a proprietary product that, to the best of our knowledge, is fundamentally different architecturally from Apache HBase. We cannot say more because of the closed source nature of the product. This strikes us as an attempt to attach the credit and good will that the Apache HBase community have all built up over the years through hard work and contributions to a commercial closed source product that is NOT Apache HBase.

Now let us address Jonathan’s “Against” argument.  Some of Jonathan’s claims are no longer true: “RegionServer failover takes 10 to 15 minutes” (see HDFS-3703 and HDFS-3912), or highly subjective: “Developing against HBase is painful.”  (In our opinion, our client API is simpler and easier to use than the commonly used Cassandra client libraries.) Otherwise, we find nothing here that has not been hashed and rehashed out over the years in forums ranging from mailing lists to Quora. Jonathan is mostly listing out provence, what comes of our being tightly coupled to Apache Hadoop’s HDFS filesystem and our tracing the outline of the Google BigTable Architecture. HBase derives many benefits from HDFS and inspiration from BigTable. As a result, we excel at some use cases that would be problematic for Cassandra. The reverse is also true. Neither we nor HDFS are standing still as the hardware used by our users evolves, or as new users bring experience from new use cases.

He highlights a difference in approach.

We see our adoption of Apache HDFS, our integration with Apache Hadoop MapReduce, and our use of Apache ZooKeeper for coordination as a sound separation of concerns. HBase is built on battle tested components. This is a feature, not a bug.

Where Jonathan might see a built in query language and secondary indexing facility as necessary complications to core Cassandra, we encourage and support projects like Salesforce’s Phoenix as part of a larger ecosystem centered around Apache HBase. The Phoenix guys are able to bring domain expertise to that part of the problem while we (HBase) can focus on providing a stable and performant storage engine core. Part of what has made Apache Hadoop so successful is its ecosystem of supporting and enriching projects - an ecosystem that includes HBase. An ecosystem like that is developing around HBase.

When Jonathan veers off to talk of the HBase community being “fragmented” with divided “[l]eadership”, we think perhaps what is being referred to is the fact that the Apache HBase project is not an “owned” project, a project led by a single vendor.  Rather it is a project where multiple teams from many organizations - Cloudera, Hortonworks, Salesforce, Facebook, Yahoo, Intel, Twitter, and Taobao, to name a few - are all pushing the project forward in collaboration. Most of the Apache HBase community participates in shared effort on two branches - what is about to be our new 0.96 release, and on another branch for our current stable release, 0.94. Facebook also maintains their own branch of Apache HBase in our shared source control repository. This branch has been a source of inspiration and occasional direct contribution to other branches. We make no apologies for finding a convenient and effective collaborative model according to the needs and wishes of our community. To other projects driven by a single vendor this may seem suboptimal or even chaotic (“fragmented”).

We’ll leave it at that.

As always we welcome your participation in and contributions to our community and the Apache HBase project. We have a great product and a no-ego low-friction decision making process not beholden to any particular commercial concern. If you have an itch that needs scratching and wonder if Apache HBase is a solution, or, if you are using Apache HBase but feel it could work better for you, come talk to us.



Hot Blogs (today's hits)

Tag Cloud