Apache Hama

Thursday July 09, 2015

Google's DistBelief Clone Project on Apache Hama

Deep Learning has become a household buzzword these days. Google, Microsoft, and Tencent have developed distributed deep learning systems but, these systems are closed source softwares. Many of open source softwares such as DeepDist, Caffe, ..., etc are data parallel only. In this blog post, I introduce an Artificial Neural Network implementation of Apache Hama ML package and future design plan for supporting both data and model parallelism.

1. Artificial Neural Network of Hama ML Package

The lastest Apache Hama provides distributed training of an Artificial Neural Network using its BSP computing engine (the initial code was contributed by Yexi Jiang, a Hama committer, Facebook). In general, the training data is stored in HDFS and is distributed in multiple machines. In Hama, two kinds of components are involved in the training procedure: the master task and the groom task. The master task is in charge of merging the model updating information and sending model updating information to all the groom tasks. The groom tasks is in charge of calculate the weight updates according to the training data.

The training procedure is iterative and each iteration consists of two phases: update weights and merge update. In the update weights phase, each groom task would first update the local model according to the received message from the master task. Then they would compute the weight updates locally with assigned data partitions (mini-batch SGD) and finally send the updated weights to the master task. In the merge update phase, the master task would update the model according to the messages received from the groom tasks. Then it would distribute the updated model to all groom tasks. The two phases will repeat alternatively until the termination condition is met (reach a specified number of iterations).

The model is designed in a hierarchical way. The base class is more abstract than the derived class, so that the structure of the ANN model can be freely set by the user, as long as it is a layered model. Therefore, the Perceptron, Auto-encoder, Linear and Logistic regressor can all be uniformly represented by an ANN.

2. Future Plan for Large Scale Deep Neural Network

2.1 Architecture

As described in above section, currently the data parallelism is only used. Each node will have a copy of the model. In each iteration, the computation is conducted on each node and a final aggregation is conducted in one node. Then the updated model will be synchronized to each node. So, the performance is one thing; the parameters should fit into the memory of a single machine.

Here is a tentative near future plan I propose for applications needing large model with huge memory consumptions, moderate computational power for one mini-batch, and lots of training data. The main idea is use of Parameter Server to parallelize model creation and distribute training across machines. The basic idea of data and model parallelism is use of the remote parameter server to parallelize model creation and distribute training across machines, and the region barrier synchronization per task group instead of global barrier synchronization for performing asynchronous mini-batches within single BSP job. Each task group works asynchronously, and trains large-scale neural network model using assigned data sets in BSP paradigm. The below diagram shows an example of 3 task groups:



Each task asynchronously asks the Parameter Server who stores the parameters in distributed machines for an updated copy of its model, computes the gradients on the assigned data, and sends updated gradients back to the parameter server. This architecture is inspired by Google's DistBelief (Jeff Dean et al, 2012).

2.2 Neuron-centric programming model

The new Programming API proposed by Edward J. Yoon will provide two user-defined functions, which the user can define the characteristic of artificial neural network model: Activation function and Cost function. Each function can be implemented by extending ActivationFunction and CostFunction abstract classes like below:

  /**
   * User-defined sigmoid actiavation function
   */
  public static class Sigmoid extends ActivationFunction {
    @Override
    public double apply(double input) {
      return 1.0 / (1 + Math.exp(-input));
    }

    @Override
    public double applyDerivative(double input) {
      return input * (1 - input);
    }
  }

The following properties are specified in the Job configuration:

  • The model topology: including the number of neurons (besides the bias neuron) in each layer; the type of squashing function; the degree of parallelization for each layer.
  • The learning rate: Specify how aggressive the model learning the training instances. A large value can accelerate the learning process but decrease the chance of model convergence. Recommend in range (0, 0.5].
  • The momemtum weight: Similar to learning rate, a large momemtum weight can accelerate the learning process but decrease the chance of model convergence. Recommend in range (0, 0.5].
  • The regularization weight: A large value can decrease the variance of the model but increase the bias at the same time. As this parameter is sensitive, it's better to set it as a very small value, say, 0.001.

The following is the sample design regarding how to create a job for training of three-layer neural network:

  public static void main(String[] args) throws Exception {
    ANNJob ann = new ANNJob();

    // set learning rate and momentum weight
    ann.setLearningRate(0.1);
    ann.setMomentumWeight(0.1);

    // initialize the topology of the model
    // set the activation function and parallel degree.
    ann.addLayer(featureDimension, Sigmoid.class, numOfTasks);
    ann.addLayer(featureDimension, Sigmoid.class, numOfTasks);
    ann.addLayer(labelDimension, Sigmoid.class, numOfTasks);

    // set the cost function to evaluate the error
    ann.setCostFunction(CrossEntropy.class);
    ...
  }

In closing this blog post, I would like to request your feedback about design ideas. Please feel free to drop a comment or send a mail to our dev@hama.apache.org mailing list. Thanks.

Sunday June 14, 2015

Apache Hama announces v0.7 Release!

Apache Hama team is pleased to announce the release of Hama v0.7 with new features and improvements.

Hama is a High-Performance BSP computing engine, which can be used to perform compute-intensive general scientific BSP applications, Google’s Pregel-like graph applications, and machine learning algorithms.

What are the major changes from the last release?

The important new feature of this release is that support the Mesos and Yet Another Resource Negotiator (YARN), so you’re able to submit your BSP applications to the existing open source and enterprise clusters e.g., CDH, HDP, and Mesosphere without any installation. In addition, we reinforced machine learning package by adding algorithms such as Max-Flow, K-Core, ANN, ..., etc.

There are also big improvements in the queue and messaging systems. We now use own outgoing/incoming message manager instead of using Java's built-in queues. It stores messages in serialized form in a set of bundles (or a single bundle) to reduce the memory usage and RPC overhead. Unsafe serialization is used to serialize Vertex and its message objects more quickly. Another important improvement is the enhanced graph package. Instead of sending each message individually, we package the messages per vertex and send a packaged message to their assigned destination nodes. With this we achieved significant improvement in the performance of graph applications. The attached benchmarks were done to test scalability and performance of PageRank algorithm for random generated 1 billion edges graph using Apache Hama and Giraph on Amazon EMR 30 nodes cluster. Note that the aggregators was used for detecting the convergence condition in case of Apache Hama.




What’s Next?

After a month of testing and benchmarking this version will bring substantial performance improvements together with important bug fixes which significantly improve the platform stability. We look forward to add more and more and see our community grow. The primary objective of the technical plans are:

  • Add stream input format for listening messages coming from 3rd party applications, and incremental learning algorithms.
  • Improve reliability of system e.g., fault tolerance, HA, ..., etc.
  • More machine learning algorithms, such as ensemble classifier, SVM, DNN, ..., etc

Where I can download it?

The release artifacts are published and ready for you to download either from the Apache mirrors or from the Maven repository. We welcome your help, feedback, and suggestions. For more information on how to report problems, and to get involved, visit the Hama project website[1] and wiki[2].

[1]. Apache Hama Website: https://hama.apache.org/
[2]. Apache Hama Wiki: https://wiki.apache.org/hama/

Thursday March 05, 2015

Apache Hama now supports YARN, runs at Samsung Electronics

The Apache Hama team is pleased to announce that we’re now supporting not only the Mesos but the YARN (Thanks to Minho Kim who is a main contributor of YARN module).

Apache Hama is a High-Performance BSP computing engine, which can be used to perform compute-intensive general scientific BSP applications, Google’s Pregel-like graph applications, and machine learning algorithms.

YARN is the resource management technology that lets multiple computing frameworks run on the same Hadoop cluster using the same underlying storage. So, for example, a company could analyze the data using MapReduce, Spark, and Apache Hama.

“From the next release, you’ll be able to submit scientific BSP applications to the existing open source Hadoop, CDH, and HDP clusters without any installation” said Edward J. Yoon(@eddieyoon), a original creator of Apache Hama.

Meanwhile, we’re also working on support the HPC environment such as InfiniBand and GPUs — According to General Dynamics[1], they already proved the 10x performance improvement of Apache Hama on HPC cluster — and also plan to support deployment and automation configurations to the Hybrid Clouds for solving various problems of Manufacturing Engineering, Science, Finance, Research areas.

This contribution is mainly coming from Samsung Electronics. “Unlike most web services companies, our challenge is numerical or signal data, not text data. That’s why we’re investing in High-Performance computing for scientific advanced analytics.” said SeungHun Jeon, a Head of Cloud Tech Lab at Samsung Electronics.

“Since we build our own analytics platform in the Cloud by leveraging open source technologies such as Apache Hadoop, Storm, and Hama, we intend to keep making contributions to the Open Source communities. ” added Hyok S. Choi, a Principal Software Engineer at Samsung Electronics.

About Apache Hama

Apache Hama[2] was established in 2012 as a Top-Level Project of The Apache Software Foundation. It provides High-Performance BSP[3] computing engine on top of Hadoop.

1. http://www.gd-ais.com/News/General-Dynamics-at-SC14-Delivering-Real-time-Intelligence-with-High-Performance-Data-Analytics
2. http://hama.apache.org/
3. http://en.wikipedia.org/wiki/Bulk_synchronous_parallel

Wednesday March 05, 2014

[ANNOUNCE] Hama 0.6.4 has been released.

The Hama team is pleased to announce the Hama 0.6.4 release.

Apache Hama is a pure BSP (Bulk Synchronous Parallel) computing framework on top of HDFS (Hadoop Distributed File System) for massive scientific computations such as matrix, graph and network algorithms.

This release improves memory usage by 3 times better than before (without significant performance degradation) and adds runtime message compression.

The artifacts are published and ready for you to download[1] either from the Apache mirrors or from the Maven repository. We welcome your help, feedback, and suggestions. For more information on how to report problems, and to get involved, visit the project website[2] and wiki[3].

Thanks.

1. http://www.apache.org/dist/hama/
2. http://hama.apache.org
3. http://wiki.apache.org/hama/

Wednesday December 04, 2013

Running Hama K-Means in 5 minutes

Already you might know, the Apache Hama project provides a set of machine learning algorithms which can be applied in applications with very large scale data in multiple domains.

In this post, I explain how to run BSP-based K-Means algorithm using Apache Hama, assume that you have already installed Hama cluster and you have tested it.

1. Download a Iris data set [Data set Information].

2. Then, run KMeans using (TRUNK version is recommended):

  % % $HAMA_HOME/bin/hama jar hama-examples-x.x.x.jar kmeans /tmp/kmeans.txt /tmp/result 10 3
  ...
  [5.1, 3.5, 1.4, 0.2] belongs to cluster 2
  [4.9, 3.0, 1.4, 0.2] belongs to cluster 2
  [4.7, 3.2, 1.3, 0.2] belongs to cluster 2
  [4.6, 3.1, 1.5, 0.2] belongs to cluster 2
  [5.0, 3.6, 1.4, 0.2] belongs to cluster 2
  ...


And Here's performance comparison with Mahout.

Friday July 06, 2012

Apache Hama 0.5.0 Released

The Apache Hama PMC is pleased to announce the release of Apache Hama 0.5.0.

Apache Hama is a BSP (Bulk Synchronous Parallel) computing framework
on top of HDFS (Hadoop Distributed File System) for massive scientific
computations such as matrix, graph and network algorithms.

This release is the first release as a top level project, contains two
significant new features (Message Compressor, complete clone of the
Google's Pregel) and many improvements for computing system
performance and durability.

The artifacts are published and ready for you to download[1] either
from the Apache mirrors or from the Maven repository. For more
details, please take a look at our website[2] and wiki[3].

Many thanks to the Hama community for making this release possible.

1. http://www.apache.org/dist/hama/
2. http://hama.apache.org
3. http://wiki.apache.org/hama/

Tuesday March 06, 2012

Apache Hama 0.4-incubating Released!

Hi all,

The Hama team is pleased to announce the release of Apache Hama 0.4-incubating under the Apache Incubator.

Hama is a pure BSP(Bulk Synchronous Parallel) computing framework on top of HDFS (Hadoop Distributed File System) for massive scientific computations such as matrix, graph and network algorithms.

This release includes:

* Multiple tasks per node
* Input/Output Formatter
* Stabilized Barrier Synchronization
* Message Combiners
* Improved examples
* and its Benchmark test results

Thanks to the Hama and Apache Incubating community for helping grow the project!

Sunday July 31, 2011

Apache Hama 0.3-incubating Released!

Hi all,

The Hama team is pleased to announce the release of Apache Hama
0.3-incubating under the Apache Incubator.

Hama is a distributed computing framework based on BSP (Bulk
Synchronous Parallel)[1] computing techniques for massive scientific
computations.

This release includes:

  • Added LocalBSPRunner
  • Added web UI for BSP cluster and job monitoring
  • Added more practical examples e.g., Shortest Path Problem[2], PageRank[3]
  • Performance has improved with BSPMessageBundle
  • Switched from Ant to Maven

You can be downloaded from the download page of Hama website[4].

Thanks to the Hama and Apache Incubating community for helping grow the project.

1. http://en.wikipedia.org/wiki/Bulk_synchronous_parallel
2. http://wiki.apache.org/hama/SSSP
3. http://wiki.apache.org/hama/PageRank
4. http://incubator.apache.org/hama/downloads.html

Friday June 03, 2011

Apache Hama 0.2.0-incubating Released!

The Hama team is pleased to announce the release of Apache Hama 0.2.0-incubating under the Apache Incubator.

Hama is a distributed computing framework based on BSP (Bulk Synchronous Parallel) computing techniques for massive scientific computations.

This first release includes:

  • BSP computing framework and its examples
  • CLI-based managing and monitoring tool of BSP job

You can be downloaded from the download page of Hama website[2].

Thanks to the Hama and Apache Incubating community for helping grow the project.

1. http://en.wikipedia.org/wiki/Bulk_synchronous_parallel
2. http://incubator.apache.org/hama/downloads.html

Monday August 02, 2010

Apache Hama in academic paper.

Abstract—APPLICATION. Various scientific computations have become so complex, and thus computation tools play an important role. In this paper, we explore the state-of-the-art framework providing high-level matrix computation primitives with MapReduce through the case study approach, and demon-strate these primitives with different computation engines to show the performance and scalability. We believe the opportunity for using MapReduce in scientific computation is even more promising than the success to date in the parallel systems literature.

http://csl.skku.edu/papers/CS-TR-2010-330.pdf

Thursday July 15, 2010

How will Hama BSP different from Pregel?

Firstly, why did we use HBase?

Until last year, we tried to implement the distributed matrix/graph computing algorithms based on Map/Reduce.

As you know, the Hadoop is consists of HDFS, which is designed for commodity servers as a shared nothing model (also termed as data partitioning model), and a distributed programming model called Map/Reduce. The Map/Reduce is a high-performance parallel data processing engine, to be sure, but it's not good for complex numerical/relational processing requires huge iterations or inter-node communications. So, we used HBase as a shared storage (shared memory model).

Why BSP instead of Map/Reduce and HBase?

However, there were still problems as below:

  • OS overhead of running shared storage software (HBase)
  • The limitation of HBase faculty (especially, a size of column qualifier)
  • Growth of code complexity

Therefore, we started to consider about message-passing model, and decided to adopt the BSP (Bulk Synchronous Parallel) model, inspired by Pregel from Google Research Blog.

What's the Pregel?

According to my understanding, Pregel is graph-specific: a large-scale graph computing framework, based on BSP model.

How will Hama BSP different from Pregel?

Hama BSP is a computing engine, based on BSP model, like a Pregel, and it'll be compatible with existing HDFS cluster, or any FileSystem and Database in the future. However, we believe that the BSP computing model is not limited to a problems of graph; it can be used for widely distributed software such as Map/Reduce. In addition to a field of graph, there are many other algorithms, which have similar problems with graph processing using Map/Reduce. Actually, the BSP model has been researched for many years in the field of matrix computation, too.

Therefore, we're trying to implement more generalized BSP computing solution. And, the Hama will consists of the BSP computing engine, and a set of few examples (e.g., matrix inversion, pagerank, BFS, ..., etc).

Learn about Hama by reading the documentation.

Friday April 30, 2010

We're introduced in the BSP Worldwide.

We're introduced in the BSP Worldwide : http://www.bsp-worldwide.org/bspww3000.html via Prof. Rob Bisseling.

Dear Edward,

I have put a link from  http://www.bsp-worldwide.org/bspww3000.html 
to your page and to the paper. 

I read it and find it very interesting. 
I heard a talk by Greg Malewicz from Google (Pregel) who is very enthousiastic about BSP. 
Suddenly I note a high interest in BSP everywhere.

I wish you good luck with your project.

best wishes,
Rob

Calendar

Search

Hot Blogs (today's hits)

Tag Cloud

Categories

Feeds

Links

Navigation