Apache MXNet (incubating)

Tuesday December 05, 2017

Milestone 1.0.0 Release for Apache MXNet

We are excited about the availability of the milestone 1.0.0 release of the Apache MXNet deep learning engine. These new capabilities (1) simplify training and deploying deep learning models, and (2) enable implementation of cutting-edge performance enhancements. The new capabilities in MXNet provide the following benefits to users:

  1. MXNet is faster: The 1.0.0 release includes implementation of cutting-edge features that optimize the performance of training and inference. Gradient compression enable users to train models up to five times faster by reducing communication bandwidth between compute nodes without loss in convergence rate or accuracy. For speech recognition acoustic modeling like the Alexa voice, this feature can reduce network bandwidth up to three orders of magnitude during training. With the support of NVIDIA Collective Communication Library (NCCL), users can train a model 20% faster on multi-GPU systems.
    • Optimize network bandwidth with gradient compression: In distributed training, each machine must communicate frequently with others to update the weight-vectors and thereby collectively build a single model, leading to high network traffic. Gradient compression algorithm enables users to train models up to five times faster by compressing the model changes communicated by each instance.
    • Optimize the training performance by taking advantage of NCCL: NCCL implements multi-GPU and multi-node collective communication primitives that are performance optimized for NVIDIA GPUs. NCCL provides communication routines that are optimized to achieve high bandwidth over interconnection between multi-GPUs. MXNet supports NCCL to train models about 20% faster on multi-GPU systems.
  2. MXNet is easier to use: The 1.0.0 release includes an advanced indexing capability that enables users to perform matrix operations in a more intuitive manner.
    • Advanced indexing for array operations in MXNet: It is now more intuitive for developers to leverage the powerful array operations in MXNet. They can use the advanced indexing capability by leveraging existing knowledge of Numpy/SciPy arrays. For example, it supports MXNet NDArray and Numpy ndarray as index, e.g. (a[mx.nd.array([1,2], dtype = ‘int32’).

MXNet has helped developers and researchers make progress with everything from language translation to autonomous vehicles and behavioral biometric security. We are excited to see the broad base of users that are building production artificial intelligence applications powered by neural network models developed and trained with MXNet. For example, the autonomous driving company TuSimple recently piloted a self-driving truck on a 200-mile journey from Yuma, Arizona to San Diego, California using MXNet. This release also includes a full-featured and performance optimized version of the Gluon programming interface. The ease-of-use associated with it combined with the extensive set of tutorials has led significant adoption among developers new to deep learning. The flexibility of the interface has driven interest within the research community, especially in the natural language processing domain.

Getting started with MXNet

Getting started with MXNet is simple. To learn more about the Gluon interface and deep learning, you can reference this comprehensive set of tutorials, which covers everything from an introduction to deep learning to how to implement cutting-edge neural network models. If you’re a contributor to a machine learning framework, check out the interface specs on GitHub.

Wednesday November 01, 2017

Apache MXNet 0.12 Release Adds Support for New NVIDIA Volta GPUs and Sparse Tensor

We are excited about the availability of Apache MXNet version 0.12. With this release, MXNet adds two new important features: support for NVIDIA Volta GPUs and support for Sparse Tensors

Support for NVIDIA Volta GPU Architecture

The MXNet v0.12 release adds support for NVIDIA Volta V100 GPUs, enabling users to train convolutional neural networks up to 3.5 times faster than on the Pascal GPUs. Trillions of floating-point (FP) multiplications and additions for training a neural network have typically been done using single precision (FP32) to achieve high accuracy. However, recent research has shown that the same accuracy can be achieved using half-precision (FP16) data types.

The Volta GPU architecture introduces Tensor Cores. Each Tensor Core can execute 64 fuse-multiply-add ops per clock, which roughly quadruples the CUDA core FLOPS per clock per core. Each Tensor Core performs D = A x B + C, where A and B are half-precision matrices, while C and D can be either half or single-precision matrices, thereby performing mixed precision training. The new mixed-precision training allows users to achieve optimal training performance without sacrificing accuracy by using FP16 for most of the layers of a network, and higher precision data types only when necessary.

You can take advantage of Volta Tensor Cores to enable FP16 training in MXNet by passing a simple command, "--dtype float16" to the MXNet training script. For example, you can invoke imagenet training script with command: train_imagenet.py --dtype float16

Sparse Tensor Support

MXNet v0.12 adds support for sparse tensors to efficiently store and compute tensors allowing developers to perform sparse matrix operations in a storage and compute-efficient manner and train deep learning models faster. MXNet v0.12 supports two major sparse data formats: Compressed Sparse Row (CSR) and Row Sparse (RSP). The CSR format is optimized to represent matrices with a large number of columns where each row has only a few non-zero elements. The RSP format is optimized to represent matrices with a huge number of rows where most of the row slices are complete zeros. For example, the CSR format can be used to encode the feature vectors of input data for a recommendation engine, whereas the RSP format can be used to perform the sparse gradient updates during training. This release enables sparse support on CPU for most commonly used operators such as matrix dot product and element-wise operators. Sparse support for more operators will be added in future releases.

Follow these tutorials to learn how to use the new sparse operators in MXNet.

Get Apache MXNet 0.12 from downloads page . Read more about this release in Release Notes .

Or, You can download and play with MXNet easily using one of the options below:

If you want to learn more about MXNet visit https://mxnet.incubator.apache.org/. Finally, you are welcome to join and also invite your friends to the dynamic and growing MXNet community by subscribing to dev@mxnet.incubator.apache.org

Calendar

Search

Hot Blogs (today's hits)

Tag Cloud

Categories

Feeds

Links

Navigation