Apache Hama

Wednesday Mar 05, 2014

[ANNOUNCE] Hama 0.6.4 has been released.

The Hama team is pleased to announce the Hama 0.6.4 release.

Apache Hama is a pure BSP (Bulk Synchronous Parallel) computing framework on top of HDFS (Hadoop Distributed File System) for massive scientific computations such as matrix, graph and network algorithms.

This release improves memory usage by 3 times better than before (without significant performance degradation) and adds runtime message compression.

The artifacts are published and ready for you to download[1] either from the Apache mirrors or from the Maven repository. We welcome your help, feedback, and suggestions. For more information on how to report problems, and to get involved, visit the project website[2] and wiki[3].

Thanks.

1. http://www.apache.org/dist/hama/
2. http://hama.apache.org
3. http://wiki.apache.org/hama/

Wednesday Dec 04, 2013

Running Hama K-Means in 5 minutes

Already you might know, the Apache Hama project provides a set of machine learning algorithms which can be applied in applications with very large scale data in multiple domains.

In this post, I explain how to run BSP-based K-Means algorithm using Apache Hama, assume that you have already installed Hama cluster and you have tested it.

1. Download a Iris data set [Data set Information].

2. Then, run KMeans using (TRUNK version is recommended):

  % % $HAMA_HOME/bin/hama jar hama-examples-x.x.x.jar kmeans /tmp/kmeans.txt /tmp/result 10 3
  ...
  [5.1, 3.5, 1.4, 0.2] belongs to cluster 2
  [4.9, 3.0, 1.4, 0.2] belongs to cluster 2
  [4.7, 3.2, 1.3, 0.2] belongs to cluster 2
  [4.6, 3.1, 1.5, 0.2] belongs to cluster 2
  [5.0, 3.6, 1.4, 0.2] belongs to cluster 2
  ...


And Here's performance comparison with Mahout.

Friday Jul 06, 2012

Apache Hama 0.5.0 Released

The Apache Hama PMC is pleased to announce the release of Apache Hama 0.5.0.

Apache Hama is a BSP (Bulk Synchronous Parallel) computing framework
on top of HDFS (Hadoop Distributed File System) for massive scientific
computations such as matrix, graph and network algorithms.

This release is the first release as a top level project, contains two
significant new features (Message Compressor, complete clone of the
Google's Pregel) and many improvements for computing system
performance and durability.

The artifacts are published and ready for you to download[1] either
from the Apache mirrors or from the Maven repository. For more
details, please take a look at our website[2] and wiki[3].

Many thanks to the Hama community for making this release possible.

1. http://www.apache.org/dist/hama/
2. http://hama.apache.org
3. http://wiki.apache.org/hama/

Tuesday Mar 06, 2012

Apache Hama 0.4-incubating Released!

Hi all,

The Hama team is pleased to announce the release of Apache Hama 0.4-incubating under the Apache Incubator.

Hama is a pure BSP(Bulk Synchronous Parallel) computing framework on top of HDFS (Hadoop Distributed File System) for massive scientific computations such as matrix, graph and network algorithms.

This release includes:

* Multiple tasks per node
* Input/Output Formatter
* Stabilized Barrier Synchronization
* Message Combiners
* Improved examples
* and its Benchmark test results

Thanks to the Hama and Apache Incubating community for helping grow the project!

Sunday Jul 31, 2011

Apache Hama 0.3-incubating Released!

Hi all,

The Hama team is pleased to announce the release of Apache Hama
0.3-incubating under the Apache Incubator.

Hama is a distributed computing framework based on BSP (Bulk
Synchronous Parallel)[1] computing techniques for massive scientific
computations.

This release includes:

  • Added LocalBSPRunner
  • Added web UI for BSP cluster and job monitoring
  • Added more practical examples e.g., Shortest Path Problem[2], PageRank[3]
  • Performance has improved with BSPMessageBundle
  • Switched from Ant to Maven

You can be downloaded from the download page of Hama website[4].

Thanks to the Hama and Apache Incubating community for helping grow the project.

1. http://en.wikipedia.org/wiki/Bulk_synchronous_parallel
2. http://wiki.apache.org/hama/SSSP
3. http://wiki.apache.org/hama/PageRank
4. http://incubator.apache.org/hama/downloads.html

Friday Jun 03, 2011

Apache Hama 0.2.0-incubating Released!

The Hama team is pleased to announce the release of Apache Hama 0.2.0-incubating under the Apache Incubator.

Hama is a distributed computing framework based on BSP (Bulk Synchronous Parallel) computing techniques for massive scientific computations.

This first release includes:

  • BSP computing framework and its examples
  • CLI-based managing and monitoring tool of BSP job

You can be downloaded from the download page of Hama website[2].

Thanks to the Hama and Apache Incubating community for helping grow the project.

1. http://en.wikipedia.org/wiki/Bulk_synchronous_parallel
2. http://incubator.apache.org/hama/downloads.html

Monday Aug 02, 2010

Apache Hama in academic paper.

Abstract—APPLICATION. Various scientific computations have become so complex, and thus computation tools play an important role. In this paper, we explore the state-of-the-art framework providing high-level matrix computation primitives with MapReduce through the case study approach, and demon-strate these primitives with different computation engines to show the performance and scalability. We believe the opportunity for using MapReduce in scientific computation is even more promising than the success to date in the parallel systems literature.

http://csl.skku.edu/papers/CS-TR-2010-330.pdf

Thursday Jul 15, 2010

How will Hama BSP different from Pregel?

Firstly, why did we use HBase?

Until last year, we were researched the distributed matrix/graph computing package, based on Map/Reduce.

As you know, the Hadoop is consists of HDFS, which is designed for commodity servers as a shared nothing model (also termed as data partitioning model), and a distributed programming model called Map/Reduce. The Map/Reduce is a high-performance parallel data processing engine, to be sure, but it's not good for complex numerical/relational processing requires huge iterations or inter-node communications. So, we used HBase as a shared storage (shared memory model).

Why BSP instead of Map/Reduce and HBase?

However, there were still problems as below:

  • OS overhead of running shared storage software (HBase)
  • The limitation of HBase faculty (especially, a size of column qualifier)
  • Growth of code complexity

Therefore, we started to consider about message-passing model, and decided to adopt the BSP (Bulk Synchronous Parallel) model, inspired by Pregel from Google Research Blog.

What's the Pregel?

According to my understanding, Pregel is graph-specific: a large-scale graph computing framework, based on BSP model.

How will Hama BSP different from Pregel?

Hama BSP is a computing engine, based on BSP model, like a Pregel, and it'll be compatible with existing HDFS cluster, or any FileSystem and Database in the future. However, we believe that the BSP computing model is not limited to a problems of graph; it can be used for widely distributed software such as Map/Reduce. In addition to a field of graph, there are many other algorithms, which have similar problems with graph processing using Map/Reduce. Actually, the BSP model has been researched for many years in the field of matrix computation, too.

Therefore, we're trying to implement more generalized BSP computing solution. And, the Hama will consists of the BSP computing engine, and a set of few examples (e.g., matrix inversion, pagerank, BFS, ..., etc).

Learn about Hama by reading the documentation.

Friday Apr 30, 2010

We're introduced in the BSP Worldwide.

We're introduced in the BSP Worldwide : http://www.bsp-worldwide.org/bspww3000.html via Prof. Rob Bisseling.

Dear Edward,

I have put a link from  http://www.bsp-worldwide.org/bspww3000.html 
to your page and to the paper. 

I read it and find it very interesting. 
I heard a talk by Greg Malewicz from Google (Pregel) who is very enthousiastic about BSP. 
Suddenly I note a high interest in BSP everywhere.

I wish you good luck with your project.

best wishes,
Rob

Calendar

Search

Hot Blogs (today's hits)

Tag Cloud

Categories

Feeds

Links

Navigation