Apache MXNet (incubating)

Monday September 17, 2018

Announcing Apache MXNet (incubating) 1.3.0 Release

Today the Apache MXNet community is pleased to announce the 1.3 release of the Apache MXNet deep learning framework. We would like to thank the Apache MXNet community for all their valuable contributions towards the MXNet 1.3 release.

With this release, MXNet has Gluon package enhancements, ONNX export, experimental Clojure bindings, TensorRT integration, and many more features, enhancements and usability improvements! In this blog post, we briefly summarize some of the high-level features and improvements. For a comprehensive list of major features and bug fixes, read the Apache MXNet 1.3.0 release notes.

mxnet-1.3.0.png

Gluon package enhancements

Gluon RNN layers are now hybridizable: With this feature, Gluon RNN layers such as gluon.rnn.RNN, gluon.rnn.LSTM and gluon.rnn.GRU can be converted to HybridBlocks. Now, many dynamic networks that are based on Gluon RNN layers can be completely hybridized, exported and used in the inference APIs in other language bindings such as C/C++, Scala, R, etc.

Support for sparse tensor: Gluon HybridBlocks now support hybridization with sparse operators. To enable sparse gradients in gluon.nn.Embedding, simply set sparse_grad=True. Furthermore, gluon.contrib.nn.SparseEmbedding provides an example of leveraging sparse parameters to reduce communication cost and memory consumption for multi-GPU training with large embeddings.

Support for Synchronized Cross-GPU Batch Norm: Gluon now supports Synchronized Batch Normalization, available as gluon.contrib.nn.SyncBatchNorm. This enables stable training on large-scale networks with high memory consumption such as FCN for image segmentation.

Updated Gluon model zoo: Gluon Vision Model Zoo now provides MobileNetV2 pre-trained models. Updated existing pre-trained models to provide state-of-the-art performance on all ResNet v1, ResNet v2, and vgg16, vgg19, vgg16_bn, vgg19_bn models.

Introducing new Clojure bindings with MXNet

MXNet now has experimental support for the Clojure programming language. The MXNet Clojure package brings state-of-the-art deep learning to the Clojure community. It enables Clojure developers to code and to execute tensor computation on multiple CPUs or GPUs. It also enables users to write seamless tensor/matrix computations with multiple GPUs in Clojure. Now users can construct and customize state-of-art deep learning models in Clojure, and apply them to tasks such as image classification and data science challenges. To start using Clojure package in MXNet, check out the Clojure tutorials and Clojure API documentation.

Introducing control flow operators

This is the first step towards optimizing dynamic neural networks with variable computation graphs. This release adds symbolic and imperative control flow operators such as foreach, while_loop and cond. To learn more about how to use these operators, check out the Control Flow Operators tutorial.

Performance improvements

TensorRT runtime integration: TensorRT provides significant acceleration of model inference on NVIDIA GPUs compared to running the full graph in MXNet using unfused GPU operators. In addition to faster fp32 inference, TensorRT optimizes fp16 inference and is capable of int8 inference (provided the quantization steps are performed). Besides increasing throughput, TensorRT significantly reduces inference latency, especially for small batches. With 1.3 release, MXNet introduces the runtime integration of TensorRT (experimental), in order to accelerate inference. Follow the MXNet-TensorRT article on the MXNet developer wiki to learn more about how to use this feature.

MKL-DNN enhancements: MKL-DNN is an open source library from Intel that contains a set of CPU-optimized deep learning operators. In the previous release, MXNet introduced integration with MKL-DNN to accelerate training and inference execution on CPU. With 1.3 release, we have increased support for these activation functions: sigmoid, tanh and softrelu.

ONNX export support

Export MXNet models to ONNX format: MXNet 1.2 provided users a way to import ONNX models into MXNet for inference. More details are available in this ONNX blog post. With the latest 1.3 release, users can now export MXNet models into ONNX format and import those models into other deep learning frameworks for inference! Check out the MXNet to ONNX exporter tutorial to learn more about how to use the mxnet.contrib.onnx API.

Other experimental features

  1. Apart from what we have covered above, MXNet now has support for:

  2. A new memory pool type for GPU memory which is more suitable for all the workloads with dynamic-shape inputs and outputs. Set an environment variable as MXNET_GPU_MEM_POOL_TYPE=Round to enable this feature. Topology-aware Allreduce approach for single-machine GPU training. Train up to 6.6x and 5.9x faster on AlexNet and VGG compared to MXNet 1.2. Activate this feature using the “control the data communication” environmental variables.

  3. Improved Scala APIs that focus on providing type safety and a better user experience. Symbol.api and NDArray.api bring a new set of functions that have a complete signature. The documentation for all of the arguments also integrates directly with IntelliJ IDEA. The new and improved Scala examples demonstrate usage of these new APIs.

Check out further details on these features in full release notes.

Maintenance improvements

In addition to adding and extending new functionalities, the release also focusses on stability and refinements.

The community fixed 130 unstable tests improving MXNet’s stability and reliability. The MXNet Model Backwards Compatibility Checker was introduced. This is an automated test on MXNet’s continuous integration platform that verifies saved models’ backward compatibility. This helps ensure that models created with older versions of MXNet can be loaded and used with the newer versions.

Getting started with MXNet

Getting started with MXNet is simple, visit the install page to get started. PyPI packages are available to install for Windows, Linux, and Mac.

To learn more about MXNet Gluon package and deep learning, you can follow our 60-minute crash course, and then later complete this comprehensive set of tutorials, which covers everything from an introduction to deep learning to how to implement cutting-edge neural network models. You can also check out lots of material on MXNet tutorials, MXNet blog posts (中文), MXNet YouTube channel (中文). Have fun with MXNet 1.3.0!

Acknowledgments

We would like to thank everyone who contributed to the 1.3.0 release:

Aaron Markham, Abhinav Sharma, access2rohit, Alex Li, Alexander Alexandrov, Alexander Zai, Amol Lele, Andrew Ayres, Anirudh Acharya, Anirudh Subramanian, Ankit Khedia, Anton Chernov, aplikaplik, Arunkumar V Ramanan, Asmus Hetzel, Aston Zhang, bl0, Ben Kamphaus, brli, Burin Choomnuan, Burness Duan, Caenorst, Cliff Woolley, Carin Meier, cclauss, Carl Tsai, Chance Bair, chinakook, Chudong Tian, ciyong, ctcyang, Da Zheng, Dang Trung Kien, Deokjae Lee, Dick Carter, Didier A., Eric Junyuan Xie, Faldict, Felix Hieber, Francisco Facioni, Frank Liu, Gnanesh, Hagay Lupesko, Haibin Lin, Hang Zhang, Hao Jin, Hao Li, Haozhi Qi, hasanmua, Hu Shiwen, Huilin Qu, Indhu Bharathi, Istvan Fehervari, JackieWu, Jake Lee, James MacGlashan, jeremiedb, Jerry Zhang, Jian Guo, Jin Huang, jimdunn, Jingbei Li, Jun Wu, Kalyanee Chendke, Kellen Sunderland, Kovas Boguta, kpmurali, Kurman Karabukaev, Lai Wei, Leonard Lausen, luobao-intel, Junru Shao, Lianmin Zheng, Lin Yuan, lufenamazon, Marco de Abreu, Marek Kolodziej, Manu Seth, Matthew Brookhart, Milan Desai, Mingkun Huang, miteshyh, Mu Li, Nan Zhu, Naveen Swamy, Nehal J Wani, PatricZhao, Paul Stadig, Pedro Larroy, perdasilva, Philip Hyunsu Cho, Pishen Tsai, Piyush Ghai, Pracheer Gupta, Przemyslaw Tredak, Qiang Kou, Qing Lan, qiuhan, Rahul Huilgol, Rakesh Vasudevan, Ray Zhang, Robert Stone, Roshani Nagmote, Sam Skalicky, Sandeep Krishnamurthy, Sebastian Bodenstein, Sergey Kolychev, Sergey Sokolov, Sheng Zha, Shen Zhu, Sheng-Ying, Shuai Zheng, slitsey, Simon, Sina Afrooze, Soji Adeshina, solin319, Soonhwan-Kwon, starimpact, Steffen Rochel, Taliesin Beynon, Tao Lv, Thom Lane, Thomas Delteil, Tianqi Chen, Todd Sundsted, Tong He, Vandana Kannan, vdantu, Vishaal Kapoor, wangzhe, xcgoner, Wei Wu, Wen-Yang Chu, Xingjian Shi, Xinyu Chen, yifeim, Yizhi Liu, YouRancestor, Yuelin Zhang, Yu-Xiang Wang, Yuan Tang, Yuntao Chen, Zach Kimberg, Zhennan Qin, Zhi Zhang, zhiyuan-huang, Ziyue Huang, Ziyi Mu, Zhuo Zhang.

… and thanks to all of the Apache MXNet community supporters, spreading knowledge and helping to grow the community!

Thursday May 24, 2018

Apache MXNet 1.2.0 Release is out!


Today Apache MXNet community announced the 1.2 release of the Apache MXNet deep learning framework. The new capabilities in MXNet provide the following benefits to users:


  1. MXNet is easier to use

    • New scala inference APIs: This release includes new Scala inference APIs which offer an easy-to-use, Scala idiomatic and thread-safe high level APIs for performing predictions with deep learning models trained with MXNet.

    • Exception Handling Support for Operators: MXNet now transports backend C++ exceptions to the different language front-ends and prevents crashes when exceptions are thrown during operator execution

  2. MXNet is faster

    • MKL-DNN integration: MXNet now integrates with Intel MKL-DNN to accelerate neural network operators: Convolution, Deconvolution, FullyConnected, Pooling, Batch Normalization, Activation, LRN, Softmax, as well as some common operators: sum and concat. This integration allows NDArray to contain data with MKL-DNN layouts and reduces data layout conversion to get the maximal performance from MKL-DNN. Currently, the MKL-DNN integration is still experimental.

    • Enhanced FP16 support: MXNet now adds support for distributed mixed precision training with FP16. It supports storing of master copy of weights in float32 with the multi_precision mode of optimizers. Improved speed of float16 operations on x86 CPU by 8 times through F16C instruction set.

  3. MXNet provides easy interoperability

    • Import ONNX models into MXNet: Implemented a new ONNX module in MXNet which offers an easy to use API to import ONNX models into MXNet's symbolic interface. Checkout the example on how you could use this API to import ONNX models and perform inference on MXNet. Currently, the ONNX-MXNet Import module is still experimental.

Getting started with MXNet


Getting started with MXNet is simple. To learn more about the Gluon interface and deep learning, you can reference this comprehensive set of tutorials, which covers everything from an introduction to deep learning to how to implement cutting-edge neural network models. If you’re a contributor to a machine learning framework, check out the interface specs on GitHub.

Saturday March 03, 2018

1.1.0 Release Makes Apache MXNet Faster and More Scalable

We are excited about the availability of the 1.1.0 release of Apache MXNet. Deep learning is a technique used to understand patterns in large datasets using algorithms inspired by biological neurons, and it has driven recent advances in artificial intelligence. MXNet is a fast and scalable deep learning framework for training and prediction with easy-to-use, concise APIs across multiple programming languages, including Python, R, Scala, and C++. Developers can use the Python or R APIs to develop and train neural network models to make accurate predictions. When it is time to integrate a trained model into an application, they can use any of the MXNet APIs to load the model and make predictions. This includes the high-level Scala inference API released today, which maintains Scala idiomatic conventions and supports multi-threaded architectures. In addition, it supports all MXNet operators and has comprehensive documentation and examples.

With 1.1.0 release, MXNet makes it easier for developers to build vocabulary and load pre-trained word embeddings by adding experimental API. We also added 'sparse.dot' operator to enhance the sparse tensor support. We also made some changes to our APIs to improve user experience. For example, we added `lazy_update` option for standard `SGD` & `Adam` optimizer with `row_sparse` gradients.

MXNet is now faster and more scalable. We improved GPU inference speed by 20% when batch size is 1. Improved batching for GEMM/TRSM operators with large matrices on GPU makes it faster for you to train models. We also added multi-threading for the class of broadcast_reduce operators on CPU.

Tuesday December 05, 2017

Milestone 1.0.0 Release for Apache MXNet

We are excited about the availability of the milestone 1.0.0 release of the Apache MXNet deep learning engine. These new capabilities (1) simplify training and deploying deep learning models, and (2) enable implementation of cutting-edge performance enhancements. The new capabilities in MXNet provide the following benefits to users:

  1. MXNet is faster: The 1.0.0 release includes implementation of cutting-edge features that optimize the performance of training and inference. Gradient compression enable users to train models up to five times faster by reducing communication bandwidth between compute nodes without loss in convergence rate or accuracy. For speech recognition acoustic modeling like the Alexa voice, this feature can reduce network bandwidth up to three orders of magnitude during training. With the support of NVIDIA Collective Communication Library (NCCL), users can train a model 20% faster on multi-GPU systems.
    • Optimize network bandwidth with gradient compression: In distributed training, each machine must communicate frequently with others to update the weight-vectors and thereby collectively build a single model, leading to high network traffic. Gradient compression algorithm enables users to train models up to five times faster by compressing the model changes communicated by each instance.
    • Optimize the training performance by taking advantage of NCCL: NCCL implements multi-GPU and multi-node collective communication primitives that are performance optimized for NVIDIA GPUs. NCCL provides communication routines that are optimized to achieve high bandwidth over interconnection between multi-GPUs. MXNet supports NCCL to train models about 20% faster on multi-GPU systems.
  2. MXNet is easier to use: The 1.0.0 release includes an advanced indexing capability that enables users to perform matrix operations in a more intuitive manner.
    • Advanced indexing for array operations in MXNet: It is now more intuitive for developers to leverage the powerful array operations in MXNet. They can use the advanced indexing capability by leveraging existing knowledge of Numpy/SciPy arrays. For example, it supports MXNet NDArray and Numpy ndarray as index, e.g. (a[mx.nd.array([1,2], dtype = ‘int32’).

MXNet has helped developers and researchers make progress with everything from language translation to autonomous vehicles and behavioral biometric security. We are excited to see the broad base of users that are building production artificial intelligence applications powered by neural network models developed and trained with MXNet. For example, the autonomous driving company TuSimple recently piloted a self-driving truck on a 200-mile journey from Yuma, Arizona to San Diego, California using MXNet. This release also includes a full-featured and performance optimized version of the Gluon programming interface. The ease-of-use associated with it combined with the extensive set of tutorials has led significant adoption among developers new to deep learning. The flexibility of the interface has driven interest within the research community, especially in the natural language processing domain.

Getting started with MXNet

Getting started with MXNet is simple. To learn more about the Gluon interface and deep learning, you can reference this comprehensive set of tutorials, which covers everything from an introduction to deep learning to how to implement cutting-edge neural network models. If you’re a contributor to a machine learning framework, check out the interface specs on GitHub.

Wednesday November 01, 2017

Apache MXNet 0.12 Release Adds Support for New NVIDIA Volta GPUs and Sparse Tensor

We are excited about the availability of Apache MXNet version 0.12. With this release, MXNet adds two new important features: support for NVIDIA Volta GPUs and support for Sparse Tensors

Support for NVIDIA Volta GPU Architecture

The MXNet v0.12 release adds support for NVIDIA Volta V100 GPUs, enabling users to train convolutional neural networks up to 3.5 times faster than on the Pascal GPUs. Trillions of floating-point (FP) multiplications and additions for training a neural network have typically been done using single precision (FP32) to achieve high accuracy. However, recent research has shown that the same accuracy can be achieved using half-precision (FP16) data types.

The Volta GPU architecture introduces Tensor Cores. Each Tensor Core can execute 64 fuse-multiply-add ops per clock, which roughly quadruples the CUDA core FLOPS per clock per core. Each Tensor Core performs D = A x B + C, where A and B are half-precision matrices, while C and D can be either half or single-precision matrices, thereby performing mixed precision training. The new mixed-precision training allows users to achieve optimal training performance without sacrificing accuracy by using FP16 for most of the layers of a network, and higher precision data types only when necessary.

You can take advantage of Volta Tensor Cores to enable FP16 training in MXNet by passing a simple command, "--dtype float16" to the MXNet training script. For example, you can invoke imagenet training script with command: train_imagenet.py --dtype float16

Sparse Tensor Support

MXNet v0.12 adds support for sparse tensors to efficiently store and compute tensors allowing developers to perform sparse matrix operations in a storage and compute-efficient manner and train deep learning models faster. MXNet v0.12 supports two major sparse data formats: Compressed Sparse Row (CSR) and Row Sparse (RSP). The CSR format is optimized to represent matrices with a large number of columns where each row has only a few non-zero elements. The RSP format is optimized to represent matrices with a huge number of rows where most of the row slices are complete zeros. For example, the CSR format can be used to encode the feature vectors of input data for a recommendation engine, whereas the RSP format can be used to perform the sparse gradient updates during training. This release enables sparse support on CPU for most commonly used operators such as matrix dot product and element-wise operators. Sparse support for more operators will be added in future releases.

Follow these tutorials to learn how to use the new sparse operators in MXNet.

Get Apache MXNet 0.12 from downloads page . Read more about this release in Release Notes .

Or, You can download and play with MXNet easily using one of the options below:

If you want to learn more about MXNet visit https://mxnet.incubator.apache.org/. Finally, you are welcome to join and also invite your friends to the dynamic and growing MXNet community by subscribing to dev@mxnet.incubator.apache.org

Calendar

Search

Hot Blogs (today's hits)

Tag Cloud

Categories

Feeds

Links

Navigation