Apache Hama

Thursday Jul 15, 2010

How will Hama BSP different from Pregel?

Firstly, why did we use HBase?

Until last year, we were researched the distributed matrix/graph computing package, based on Map/Reduce.

As you know, the Hadoop is consists of HDFS, which is designed for commodity servers as a shared nothing model (also termed as data partitioning model), and a distributed programming model called Map/Reduce. The Map/Reduce is a high-performance parallel data processing engine, to be sure, but it's not good for complex numerical/relational processing requires huge iterations or inter-node communications. So, we used HBase as a shared storage (shared memory model).

Why BSP instead of Map/Reduce and HBase?

However, there were still problems as below:

  • OS overhead of running shared storage software (HBase)
  • The limitation of HBase faculty (especially, a size of column qualifier)
  • Growth of code complexity

Therefore, we started to consider about message-passing model, and decided to adopt the BSP (Bulk Synchronous Parallel) model, inspired by Pregel from Google Research Blog.

What's the Pregel?

According to my understanding, Pregel is graph-specific: a large-scale graph computing framework, based on BSP model.

How will Hama BSP different from Pregel?

Hama BSP is a computing engine, based on BSP model, like a Pregel, and it'll be compatible with existing HDFS cluster, or any FileSystem and Database in the future. However, we believe that the BSP computing model is not limited to a problems of graph; it can be used for widely distributed software such as Map/Reduce. In addition to a field of graph, there are many other algorithms, which have similar problems with graph processing using Map/Reduce. Actually, the BSP model has been researched for many years in the field of matrix computation, too.

Therefore, we're trying to implement more generalized BSP computing solution. And, the Hama will consists of the BSP computing engine, and a set of few examples (e.g., matrix inversion, pagerank, BFS, ..., etc).

Learn about Hama by reading the documentation.

Comments:

When will Hama's BSP be able to store matrix information and be able to execute matrix operations? thanks, Maarten

Posted by Maarten Ectors on December 02, 2010 at 09:37 AM GMT+00:00 #

I guess it'll take some (5~6) months.

Posted by edwardyoon on December 02, 2010 at 10:49 PM GMT+00:00 #

Thank you for the reply. I hope you get more community support because Hama can be a core compontent of many solutions...

Posted by Maarten on December 09, 2010 at 10:15 AM GMT+00:00 #

Post a Comment:
  • HTML Syntax: NOT allowed

Calendar

Search

Hot Blogs (today's hits)

Tag Cloud

Categories

Feeds

Links

Navigation