Apache MXNet (incubating)
Milestone 1.0.0 Release for Apache MXNet
We are excited about the availability of the milestone 1.0.0 release of the Apache MXNet deep learning engine. These new capabilities (1) simplify training and deploying deep learning models, and (2) enable implementation of cutting-edge performance enhancements. The new capabilities in MXNet provide the following benefits to users:
- MXNet is faster: The 1.0.0 release includes implementation of cutting-edge features that optimize the performance of training and inference. Gradient compression enable users to train models up to five times faster by reducing communication bandwidth between compute nodes without loss in convergence rate or accuracy. For speech recognition acoustic modeling like the Alexa voice, this feature can reduce network bandwidth up to three orders of magnitude during training. With the support of NVIDIA Collective Communication Library (NCCL), users can train a model 20% faster on multi-GPU systems.
- Optimize network bandwidth with gradient compression: In distributed training, each machine must communicate frequently with others to update the weight-vectors and thereby collectively build a single model, leading to high network traffic. Gradient compression algorithm enables users to train models up to five times faster by compressing the model changes communicated by each instance.
- Optimize the training performance by taking advantage of NCCL: NCCL implements multi-GPU and multi-node collective communication primitives that are performance optimized for NVIDIA GPUs. NCCL provides communication routines that are optimized to achieve high bandwidth over interconnection between multi-GPUs. MXNet supports NCCL to train models about 20% faster on multi-GPU systems.
- Advanced indexing for array operations in MXNet: It is now more intuitive for developers to leverage the powerful array operations in MXNet. They can use the advanced indexing capability by leveraging existing knowledge of Numpy/SciPy arrays. For example, it supports MXNet NDArray and Numpy ndarray as index, e.g. (a[mx.nd.array([1,2], dtype = ‘int32’).
MXNet has helped developers and researchers make progress with everything from language translation to autonomous vehicles and behavioral biometric security. We are excited to see the broad base of users that are building production artificial intelligence applications powered by neural network models developed and trained with MXNet. For example, the autonomous driving company TuSimple recently piloted a self-driving truck on a 200-mile journey from Yuma, Arizona to San Diego, California using MXNet. This release also includes a full-featured and performance optimized version of the Gluon programming interface. The ease-of-use associated with it combined with the extensive set of tutorials has led significant adoption among developers new to deep learning. The flexibility of the interface has driven interest within the research community, especially in the natural language processing domain.
Getting started with MXNet
Getting started with MXNet is simple. To learn more about the Gluon interface and deep learning, you can reference this comprehensive set of tutorials, which covers everything from an introduction to deep learning to how to implement cutting-edge neural network models. If you’re a contributor to a machine learning framework, check out the interface specs on GitHub.
Apache MXNet 0.12 Release Adds Support for New NVIDIA Volta GPUs and Sparse Tensor
We are excited about the availability of Apache MXNet version 0.12. With this release, MXNet adds two new important features: support for NVIDIA Volta GPUs and support for Sparse Tensors
The MXNet v0.12 release adds support for NVIDIA Volta V100 GPUs, enabling users to train convolutional neural networks up to 3.5 times faster than on the Pascal GPUs. Trillions of floating-point (FP) multiplications and additions for training a neural network have typically been done using single precision (FP32) to achieve high accuracy. However, recent research has shown that the same accuracy can be achieved using half-precision (FP16) data types.
The Volta GPU architecture introduces Tensor Cores. Each Tensor Core can execute 64 fuse-multiply-add ops per clock, which roughly quadruples the CUDA core FLOPS per clock per core. Each Tensor Core performs D = A x B + C, where A and B are half-precision matrices, while C and D can be either half or single-precision matrices, thereby performing mixed precision training. The new mixed-precision training allows users to achieve optimal training performance without sacrificing accuracy by using FP16 for most of the layers of a network, and higher precision data types only when necessary.
You can take advantage of Volta Tensor Cores to enable FP16 training in MXNet by passing a simple command, "--dtype float16" to the MXNet training script. For example, you can invoke imagenet training script with command:
train_imagenet.py --dtype float16
MXNet v0.12 adds support for sparse tensors to efficiently store and compute tensors allowing developers to perform sparse matrix operations in a storage and compute-efficient manner and train deep learning models faster. MXNet v0.12 supports two major sparse data formats: Compressed Sparse Row (CSR) and Row Sparse (RSP). The CSR format is optimized to represent matrices with a large number of columns where each row has only a few non-zero elements. The RSP format is optimized to represent matrices with a huge number of rows where most of the row slices are complete zeros. For example, the CSR format can be used to encode the feature vectors of input data for a recommendation engine, whereas the RSP format can be used to perform the sparse gradient updates during training. This release enables sparse support on CPU for most commonly used operators such as matrix dot product and element-wise operators. Sparse support for more operators will be added in future releases.
Follow these tutorials to learn how to use the new sparse operators in MXNet.
Or, You can download and play with MXNet easily using one of the options below:
- The Pip package can be found here: https://pypi.python.org/pypi/mxnet
- The Docker Images can be found here: https://hub.docker.com/u/mxnet/
If you want to learn more about MXNet visit https://mxnet.incubator.apache.org/. Finally, you are welcome to join and also invite your friends to the dynamic and growing MXNet community by subscribing to firstname.lastname@example.org