Thursday March 05, 2015

Apache Hama now supports YARN, runs at Samsung Electronics

The Apache Hama team is pleased to announce that we’re now supporting not only the Mesos but the YARN (Thanks to Minho Kim who is a main contributor of YARN module).

Apache Hama is a High-Performance BSP computing engine, which can be used to perform compute-intensive general scientific BSP applications, Google’s Pregel-like graph applications, and machine learning algorithms.

YARN is the resource management technology that lets multiple computing frameworks run on the same Hadoop cluster using the same underlying storage. So, for example, a company could analyze the data using MapReduce, Spark, and Apache Hama.

“From the next release, you’ll be able to submit scientific BSP applications to the existing open source Hadoop, CDH, and HDP clusters without any installation” said Edward J. Yoon(@eddieyoon), a original creator of Apache Hama.

Meanwhile, we’re also working on support the HPC environment such as InfiniBand and GPUs — According to General Dynamics[1], they already proved the 10x performance improvement of Apache Hama on HPC cluster — and also plan to support deployment and automation configurations to the Hybrid Clouds for solving various problems of Manufacturing Engineering, Science, Finance, Research areas.

This contribution is mainly coming from Samsung Electronics. “Unlike most web services companies, our challenge is numerical or signal data, not text data. That’s why we’re investing in High-Performance computing for scientific advanced analytics.” said SeungHun Jeon, a Head of Cloud Tech Lab at Samsung Electronics.

“Since we build our own analytics platform in the Cloud by leveraging open source technologies such as Apache Hadoop, Storm, and Hama, we intend to keep making contributions to the Open Source communities. ” added Hyok S. Choi, a Principal Software Engineer at Samsung Electronics.

About Apache Hama

Apache Hama[2] was established in 2012 as a Top-Level Project of The Apache Software Foundation. It provides High-Performance BSP[3] computing engine on top of Hadoop.

Tuesday March 06, 2012

Apache Hama 0.4-incubating Released!

The Hama team is pleased to announce the release of Apache Hama 0.4-incubating under the Apache Incubator.

Hama is a pure BSP(Bulk Synchronous Parallel) computing framework on top of HDFS (Hadoop Distributed File System) for massive scientific computations such as matrix, graph and network algorithms.

This release includes:

* Multiple tasks per node
* Input/Output Formatter
* Stabilized Barrier Synchronization
* Message Combiners
* Improved examples
* and its Benchmark test results

Thanks to the Hama and Apache Incubating community for helping grow the project!

Thursday July 15, 2010

How will Hama BSP different from Pregel?

Firstly, why did we use HBase?

Until last year, we tried to implement the distributed matrix/graph computing algorithms based on Map/Reduce.

As you know, the Hadoop is consists of HDFS, which is designed for commodity servers as a shared nothing model (also termed as data partitioning model), and a distributed programming model called Map/Reduce. The Map/Reduce is a high-performance parallel data processing engine, to be sure, but it's not good for complex numerical/relational processing requires huge iterations or inter-node communications. So, we used HBase as a shared storage (shared memory model).

Why BSP instead of Map/Reduce and HBase?

However, there were still problems as below:

  • OS overhead of running shared storage software (HBase)
  • The limitation of HBase faculty (especially, a size of column qualifier)
  • Growth of code complexity

Therefore, we started to consider about message-passing model, and decided to adopt the BSP (Bulk Synchronous Parallel) model, inspired by Pregel from Google Research Blog.

What's the Pregel?

According to my understanding, Pregel is graph-specific: a large-scale graph computing framework, based on BSP model.

How will Hama BSP different from Pregel?

Hama BSP is a computing engine, based on BSP model, like a Pregel, and it'll be compatible with existing HDFS cluster, or any FileSystem and Database in the future. However, we believe that the BSP computing model is not limited to a problems of graph; it can be used for widely distributed software such as Map/Reduce. In addition to a field of graph, there are many other algorithms, which have similar problems with graph processing using Map/Reduce. Actually, the BSP model has been researched for many years in the field of matrix computation, too.

Therefore, we're trying to implement more generalized BSP computing solution. And, the Hama will consists of the BSP computing engine, and a set of few examples (e.g., matrix inversion, pagerank, BFS, ..., etc).

Learn about Hama by reading the documentation.

Friday April 30, 2010

We're introduced in the BSP Worldwide.

We're introduced in the BSP Worldwide : http://www.bsp-worldwide.org/bspww3000.html via Prof. Rob Bisseling.

Dear Edward,

I have put a link from  http://www.bsp-worldwide.org/bspww3000.html 
to your page and to the paper. 

I read it and find it very interesting. 
I heard a talk by Greg Malewicz from Google (Pregel) who is very enthousiastic about BSP. 
Suddenly I note a high interest in BSP everywhere.

I wish you good luck with your project.

best wishes,



