Apache HBase

Thursday December 29, 2016

HGraphDB: Apache HBase As An Apache TinkerPop Graph Database

Robert Yokota is a Software Engineer at Yammer.

An earlier version of this post was published here on Robert's blog.

Be sure to also check out the excellent follow on post Graph Analytics on HBase with HGraphDB and Giraph.

HGraphDB: Apache HBase As An Apache TinkerPop Graph Database

The use of graph databases is common among social networking companies. A social network can easily be represented as a graph model, so a graph database is a natural fit. For instance, Facebook has a graph database called Tao, Twitter has FlockDB, and Pinterest has Zen. At Yammer, an enterprise social network, we rely on Apache HBase for much of our messaging infrastructure, so I decided to see if HBase could also be used for some graph modelling and analysis.

Below I put together a wish list of what I wanted to see in a graph database.

  • It should be implemented directly on top of HBase.
  • It should support the TinkerPop 3 API.
  • It should allow the user to supply IDs for both vertices and edges.
  • It should allow user-supplied IDs to be either strings or numbers.
  • It should allow property values to be of arbitrary type, including maps, arrays, and serializable objects.
  • It should support indexing vertices by label and property.
  • It should support indexing edges by label and property, specific to a given vertex.
  • It should support range queries and pagination with both vertex indices and edge indices.

I did not find a graph database that met all of the above criteria. For instance, Titan is a graph database that supports the TinkerPop API, but it is not implemented directly on HBase. Rather, it is implemented on top of an abstraction layer that can be integrated with Apache HBase, Apache Cassandra, or Berkeley DB as its underlying store. Also, Titan does not support user-supplied IDs. Apache S2Graph Incubating is a graph database that is implemented directly on HBase, and it supports both user-supplied IDs and indices on edges, but it does not yet support the TinkerPop API nor does it support indices on vertices.

This led me to create HGraphDB, a TinkerPop 3 layer for HBase. It provides support for all of the above bullet points. Feel free to try it out if you are interested in using HBase as a graph database.


yokota !!great inspiration .. keep in touch...any good books on real time hadoop application ..please feel free to share sir!!

Posted by chaitanya on December 29, 2016 at 09:08 PM GMT #

How to compare between gremlin and hbase ? and what are the test approach could you pls let us know asap,any example or queries is there is any experts blog which answer my question?

Posted by jagan k on June 26, 2017 at 06:58 AM GMT #

Thanks for recommending the all applications are invented in windows operating system to Securely store and share files from anywhere. Access and make changes that sync automatically in the cloud or sync local copies of files for offline viewing and editing on your PC.

Posted by open file explorer windows 10 on August 27, 2018 at 06:56 AM GMT #

Have u test the HGraphDB performance? It's better than JanusGraph? Do you have some performance indicators?

Posted by haiai deng on December 05, 2018 at 05:58 AM GMT #

Post a Comment:
  • HTML Syntax: NOT allowed



Hot Blogs (today's hits)

Tag Cloud