Apache Hadoop

Wednesday October 07, 2020

Ozone - The journey so far

Shashikant Banerjee, Mukul Kumar Singh

Apache Hadoop Ozone is a highly scalable, redundant, distributed object-store. Ozone is designed to work well with the existing Apache Hadoop ecosystem applications like Hive, Spark etc. Moreover, it is designed for ease of operational use and scales to thousands of nodes and billions of objects in a single cluster. Ozone supports a Hadoop Compatible File System interface as well as the S3 protocol.


The Apache Hadoop Distributed File System (HDFS) has been the de facto file system for big data and works most optimally when most of the files are large – tens to hundreds of MBs. HDFS suffers from the famous small files limitation and struggles when the file count goes beyond 300 million. Ozone was designed to address HDFS’s limitations and at the same time to work seamlessly with the existing Hadoop applications.


Ozone releases are named after US national parks. With every minor release, we move to the next letter for the release name and choose a new national park starting with that letter.


New Features

  • Support for OzoneFileSystem
  • Integration with Hive, Spark, YARN, and MapReduce
  • Data pipeline handling and recovery

This is the first release of Ozone and it came out on Oct 1, 2018. The release had basic Ozone functionalities working where users could create ozone volumes, buckets, and keys via Ozone shell interface. Similarly, Ozone Filesystem interface was also functional thereby ensuring YARN, Hive, Spark and MapReduce applications can work against Ozone FS seamlessly. It had support for REST and RPC protocols as well as Java client libraries were also shipped for Ozone supporting both RPC and REST. Data pipeline handling and recovery in case of node failures support were built into the system as well.


New Features

  • Support for S3 protocol via S3 Gateway

The next release came out on Nov 22, 2018. This release came out with support for a new S3 compatible REST server which ensured Ozone can be used from any S3 compatible tools like AWS CLI and AWS Java SDK. For example, with this release now a user could create buckets using AWS CLI or use Goofys which is an S3 FUSE driver to mount any Ozone bucket as a POSIX file system. It also had support for a minimally complete S3 API set which includes GET, HEAD, DELETE operations on buckets as well as objects.

The release incorporated significant OzoneFilesystem stability improvements. Ozone data pipeline handling and recovery procedures were improved and streamlined.

Badlands (0.4.0)

New Features

  • Support for Apache Hadoop Security (Authentication/Authorization/Encryption)
  • Support for Auditing

The next release in the chain was 0.4.0 which came out on May 7, 2019. This release was primarily driven by the Security feature of Ozone. Kerberos-based authentication support for Ozone was added. It had support for Hadoop Delegation Tokens and Block Tokens, the motivation for which is to prevent the unauthorized access while keeping the protocol lightweight and without sharing the secret over the wire. Similar to this, S3Tokens were also added which are supposed to be used for every S3 client request.

Certificate Infrastructure for Ozone was plugged in to be used for certificate-based authentication for Ozone service components. Transparent Data Encryption (TDE) support came in as a part of this release as well which allows data blocks to be encrypted-at-rest along with Apache Ranger support to control Authorization. Yarn, Hive, and Spark were able to work seamlessly in a secure ozone cluster environment.

This release also included support for Audit Log which functionally completes the security ecosystem in Ozone. It came with a custom audit parser - a SQLite based command-line utility to parse/query audit logs with predefined templates and options for a custom query.

Support for S3A filesystem as well as for S3 gateway Multipart upload API were also major highlights of this release.

Biscayne (0.4.1)

New Features

  • Kubernetes Integration
  • Support for Native as well as Ranger ACLs

The next release made its way out on Oct 13, 2019, and with this release, native K8s (Kubernetes) support came in Ozone as well. The Ozone distribution package contains all the required resources files to deploy Ozone on Kubernetes which ensures that Ozone becomes a first-class citizen on Kubernetes clusters.

This release also plugged in support for native ACLs which can be used independently or along with Apache Ranger. If Apache Ranger is enabled, then ACLs will be checked first with Ranger and then Ozone’s internal ACLs will be evaluated. Ozone ACLs are a super set of Posix and S3 ACLs.

Crater Lake (0.5.0)

New Features

  • Support for Topology awareness
  • Support for GDPR Right to Erasure
  • Scale testing upto 1 billion objects

The next release made its way out on Mar 24, 2020. This was the first Beta release for Ozone. Network Topology awareness for block placement in Ozone was added in this release. Support for GDPR(Right to Erasure) was also one of the highlights of the release. With major stability and performance improvements in the IO pipeline, Ozone was tested and verified to work seamlessly with scale up to more than 1 Billion keys.

Denali (1.0.0)

New Features

  • HA support for Ozone Manager
  • Ozone OFS (New Filesystem scheme)
  • Support for SSL
  • Recon

This is the latest release of Ozone and is the GA release. This release went out on Sep 2, 2020. It represents a point of API stability and quality that we consider production-ready. This release also added support for Recon which is a Management & Administrative daemon inside Ozone which enables continuous monitoring of an Ozone cluster. It also periodically checks on the correctness and stability of an Ozone cluster and reports any anomalies.

The Ozone journey thus far has been equally exciting and challenging due to the goals that we as a community had set out to achieve. Testing and Integration with diverse tools and products has helped in enhancing and stabilizing the system. Ozone is now truly production-ready to be used along with other Hadoop Applications.

Some of the important features in the upcoming releases:

  • Support for upgrades and introduction of layout format in OM, SCM and DNs
  • Ozone File System improvements for Directory delete and rename
  • Decommissioning of Datanodes
  • Support for large capacity disks on datanodes

Some of the features which are further ahead in the Roadmap are

  • SCM High Availability
  • Erasure Encoding in Ozone to provide storage efficiency
  • NFS to support new Posix related workloads

More detailed information regarding Apache Ozone releases and roadmap can be found here: https://cwiki.apache.org/confluence/display/HADOOP/Ozone+Road+Map.

Tuesday July 28, 2020

[ANNOUNCE] Apache Hadoop 3.3.0 release

It gives me great pleasure to announce that the Apache Hadoop community has voted to release Apache Hadoop 3.3.0.

Apache Hadoop 3.3.0 is the first release of Apache Hadoop 3.3 line for the year 2020,which includes 2148 fixes since the previous Hadoop 3.2.0 release.

Of these fixes:

  • 525 in Hadoop Common
  • 804 in HDFS
  • 763 in YARN
  • 56 in MapReduce
  • Apache Hadoop 3.3.0 contains a number of significant features and enhancements.A few of them are noted as below.

    - Support ARM: This is the first release to support ARM architectures.

    - Upgrade protobuf from 2.5.0 : Protobuf upgraded to 3.7.1 as protobuf-2.5.0 reached EOL.

    - S3A Enhancements : Lots of enhancements to the S3A code including Delegation Token support, better handling of 404 caching, S3guard performance, resilience improvements.

    - ABFS Enhancements : Address issues which surface in the field and tune things which need tuning, add more tests where appropriate.Improve docs, especially troubleshooting.

    - Tencent Cloud COS File System Implementation: Tencent cloud is top 2 cloud vendors in China market and the object store COS is widely used among China’s cloud users. COSN filesytem to support Tencent cloud COS natively in Hadoop.

    - Java11 Runtime Support : Java 11 runtime support is completed.

    - HDFS RBF Stabilization: HDFS Router now supports security. Also contains many bug fixes and improvements.

    - DNS Resolution to support Nameservices to IP Support:DFS clients can use a single domain name to discover servers (namenodes/routers/observers) instead of explicitly listing out all hosts in the config.

    - Scheduling of opportunistic containers : Scheduling of opportunistic container through the central RM (YARN-5220), through distributed scheduling (YARN-2877), as well as the scheduling of containers based on actual node utilization (YARN-1011) and the container promotion/demotion (YARN-5085).

    - Application Catalog for YARN applications: Application catalog system which provides an editorial and search interface for YARN applications. This improves the usability of YARN for managing the life cycle of applications.

    * For major changes included in Hadoop 3.3 line, please refer to Hadoop3.3.0 main page [1]. * For more details about fixes in 3.3.0 release, please read the CHANGELOG [2] and RELEASENOTES [3]. The release news is posted on the Hadoop website too, you can go to the downloads section directly [4].

    Many thanks to everyone who contributed to the release, and everyone in the Apache Hadoop community! This release is a direct result of your great contributions.

    [1] https://hadoop.apache.org/docs

    [2] https://hadoop.apache.org/docs/r3.3.0/hadoop-project-dist/hadoop-common/release/3.3.0/CHANGELOG.3.3.0.html

    [3] https://hadoop.apache.org/docs/r3.3.0/hadoop-project-dist/hadoop-common/release/3.3.0/RELEASENOTES.3.3.0.html

    [4] https://hadoop.apache.org/releases.html

    Thursday August 22, 2019

    Hadoop Community Meetup @ Beijing, Aug 2019

    Hadoop Community Meetup @ Beijing

    Author: Junping Du (Tencent) & Wangda Tan (Cloudera)

    Picture01.jpg Picture02.jpg


    On Aug 11th 2019, Hadoop developers/users gathered together at Tencent’s Sigma Center office in Beijing to share their latest works, with 12 presentations by engineers from Tencent, Cloudera, Alibaba, Didi, Xiaomi, Meituan, ByteDance (Parent company of TikTok, Toutiao, etc.), JD.com, Huawei. This is also first Hadoop community meetup hosted by Apache Hadoop PMC members.


    We received tremendous numbers of participations to the meetup. There’re 200 spots available for registration to attend this meetup in-person, and spots got fully booked in 10 mins. We got 150+ attendees in-person, and 3000+ attendees participated online live sessions.

    We have participants from dozens of different companies and universities in person, many of them are flying from Shanghai, Hangzhou, Shenzhen and even San Francisco Bay Area!


    1. Hadoop Community Update And Roadmaps

    Junping Du @ Tencent and Wangda Tan @ Cloudera talked about Hadoop community updates and roadmaps.


    Junping Du @ Tencent



    Wangda Tan @ Cloudera

    Junping introduced recent trends in the storage field, such as better scalability and moving to cloud. He talked about features like RBF (Router Based Federation), improvements of NameNode scalability, Improvements of cloud connectors and Ozone.

    Wangda talked about recent trends in the compute field, such as better scalability, moving to clkoud-native environment, containerization works and support of Machine-Learning use cases. He talked about global scheduling framework for better scheduling throughput and placement quality. Recent containerization works in YARN such as runc, interactive docker shell. And YARN-on-cloud initiatives from community such as autoscaling, graceful decommissions, etc. Wangda also talked about Submarine and its release plans.

    At last, Wangda looked back at releases in 2018/2019, and shared tentative release plan of Hadoop in 2019. Such as 3.1.3, 3.2.1 and what’s new coming to 3.3.0.

    2. Ozone: Hadoop native object store

    Sammi (Yi) Chen @ Tencent talked about native object store project from Hadoop community.


    Ozone is a strong-consistent distributed object store service. Like HDFS, Ozone has same level of reliability, consistency and usability. It supports S3 interface, so it is not only useful to on-prem big-data workload. It is also a good option to move big data to cloud.

    Sammi talked about architecture of Ozone, and what’s new in Ozone 0.5 release.

    3. YARN 3.x in Alibaba

    Tao Yang from Alibaba talked about Hadoop use cases in Alibaba. He also talked about how new features in YARN 3.x being used to solve use cases. Tao talked about features like preemption, scheduling, resource over-commitment, scheduling diagnostic, mixed deployment of online/offline workload. Tao also talked about how new features in YARN help to better run Apache Flink on YARN.


    Tao talked about many interesting features such as MultiNodeLookupPolicy, which can help schedule jobs on a pluggable node sorter.

    4. HDFS Best Practices learned from Didi’s production environment.

    Hui Fei from Didi talked about HDFS best practices learned from Didi’s large scale (hundreds of PBs) production environment.

    Hui first talked about storage use cases and scale in Didi’s environment. Then Hui talked about functionalities and improvements Didi’s Hadoop team built on top of Hadoop HDFS 2.7.2 such as: Security, NameNode Federation, Balancer, etc.


    Hui also talked about the status of upgrading production cluster based on Hadoop 2.7.2 to Hadoop 3.2.0. The primary driver of upgrade is to save storage spaces. Didi wants to use features like Erasure Coding in Hadoop 3.x.

    Didi has upgraded a test cluster (100+ nodes) from 2.7.2 to 3.2.0, has a backup cluster with 2k+ nodes run Hadoop 3.1.1 and will rolling upgrade it to 3.2.0. There’s a primary cluster with 10K+ nodes (with 5 namespaces), will start to upgrade to 3.2.0 starting Oct

    5. Submarine: A one-stop, cross-platform machine learning platform

    Xun Liu @ NetEase and Zhankun Tang @ Cloudera talked about background, existing status and future of Submarine project.


    Zhankun Tang @ Cloudera



    Xun Liu @ Netease

    Machine learning includes many components like data-preprocessing, feature extraction, model training/serving/management, distributed workload management. Submarine project started by Hadoop community is targeted to achieve these goals by focusing on Notebook experiences. With Submarine, data scientists or machine learning engineer don’t need to understand lower-level platform such as YARN, K8s, Docker container.

    Zhankun showed a new feature called mini-submarine which allows developers try Submarine locally without installing a YARN cluster.

    Xun did demos for:

          Integration of Submarine + Zeppelin notebook.

          New Submarine web UI to allow data scientists to run jobs and manage models, etc. in the unified user experiences.

    Xun also talked about companies which are reported using Submarine in production. Such as NetEase, Linkedin, Dahua, Ke.com, JD.com.

    6. Hadoop Improvements in Xiaomi

    Chen Zhang and Kang Zhou from Xiaomi talked about how Hadoop is being used in Xiaomi. They talked about improvements of HDFS’s performance and scalability; Problems/Solutions when trying to platformize YARN.

    For HDFS side, Chen talked about their improvements of HDFS federation, such as lower the business impact when upgrading single NameNode to federated NameNode. They have also improved NameNode Performance, which now allows supporting 600 millions of objects (files + blocks) in a single NameNode.


    In YARN, Kang talked about usability improvements in YARN. Such as RMStateStore/History Server, etc. Also, he talked about multi-cluster management tools such as a unified client/RM-UI for multiple clusters. Kang also talked about improvements they have done for scheduling optimization like cache Resource Usage, improvements of utilization and preemption, etc.

    7. Key Customizations of YARN @ ByteDance

    Yakun Li from ByteDance talked customizations of their YARN cluster to handle extra large scale, multi-clusters environment, Including: utilization improvements, stabilization, optimizations for streaming/model-training environment, and multi datacenter issues, etc.


    For scheduling, Yakun also talked about how they implement Gang Scheduling in YARN, which do scheduling for application instead of node. And it can achieve low-latency, hard/soft constraints. He also talked about implementation of multi-thread version FairScheduler which can push number of container allocation per second up to 3k.

    In mixed-workloads (Batch, Streaming, ML) deployment part, Yakun talked about they have adopted Docker on YARN support to isolate dependencies. Support CPUSET/NUMA, temporarily skip nodes which have too high physical utilizations, etc. All these efforts can help mixed workload runs well in same cluster.

    8. YuniKorn: A New Unified Scheduler for Both YARN and K8s

    Weiwei Yang and Wangda Tan from Cloudera talked about their works about a new scheduler named YuniKorn (https://github.com/cloudera/yunikorn-core) and how it can benefit both YARN and K8s community.


    Weiwei Yang (Right) and Wangda Tan (Left) from Cloudera

    Scheduler of a container orchestration system, such as YARN and Kubernetes, is a critical component that users rely on to plan resources and manage applications. They have different characters to support different workloads.

    YARN schedulers are optimized for high-throughput, multi-tenant batch workloads. It can scale up to 50k nodes per cluster, and schedule 20k containers per second; On the other side, Kubernetes schedulers are optimized for long-running services, but many features like hierarchical queues, fairness resource sharing, and preemption etc, are either missing or not mature enough at this point of time.

    However, underneath they are responsible for one same job: the decision maker for resource allocations. They mentioned the need to run services on YARN as well as run jobs on Kubernetes. This motivates them to create a universal scheduler which can work for both YARN and Kubernetes and configured in the same way.

    In this talk, Weiwei and Wangda talked about their efforts of design and implement the universal scheduler. They have integrated it with to Kubernetes already and YARN integration is working-in-progress. This scheduler brings long-wanted features such as hierarchical queues, fairness between users/jobs/queues, preemption to Kubernetes; and it brings service scheduling enhancements to YARN. Most importantly, it provides the opportunity to let YARN and Kubernetes share the same user experience on scheduling big data workloads. And any improvements of this universal scheduler can benefit both Kubernetes and YARN community.

    9. HDFS cluster improvements and optimization practices in Meituan Dianping

    Xiaoqiao He from Meituan Dianping talked about Hadoop cluster scalabilities now. Their Hadoop cluster keep growing since 2015. By far, there’re more than 30k nodes in the Hadoop clusters.


    He shared many details and practice about the infrastructure of physical deployments, especially on solution for cluster across multiple regions. In the last part, Xiaoqiao shows some practices for optimizing HDFS cluster, such as: improve the Namenode restart process and rebalance for Namenode workload, etc.

    10. Evolution of YARN in JD.com

    Wanqiang Ji from JD.com talked about how YARN evolves to support JD.com’s business needs.


    In the last 3 years, maximum number of nodes in a single YARN cluster scales from 3k, 5k, 10k to 16k. Internally there’re works to balance resources between YARN/K8s cluster. Also there are improvements of container eviction policies to make sure nodes won’t crash or restart when machine’s physical utilization grows above a certain level.

    11. Lessons learned from large scale YARN cluster operation @ Tencent

    Jun Gong and Dongdong Chen from Tencent talked about their works to support large scale YARN cluster inside tencent.


    Gong Jun @ Tencent


    Dongdong Chen @ Tencent

    Jun and Dongdong shared inside Tencent, they widely used SLS to figure out bottleneck of scheduler, many of the scheduler improvements have contributed back to the community. After optimization, in their production cluster, they have 2k+ queues, 8K+ nodes, 5k+ concurrent jobs. And they can achieve 3k+ container allocations per second, and more than 100 millions container allocations per day.

    Also, Jun and Dongdong shared how they uses YARN CGroups parameters to fine-tune CPU/Memory/Network shares for launched YARN containers in a multi-tenant cluster.

    12. Run Spark and Hadoop on ARM

    Rui Chen and Sheng Liu from Huawei shared their works to run Spark and Hadoop on ARM.


    Rui Chen



    Sheng Liu


    Rui and Sheng shared the motivation of running hadoop and spark on ARM platform which is for high performance and power efficiency. After that, they went ahead to share status of ARM support for hadoop and spark and details of building release Tarball on ARM platform include parameters, and issues. In the last part, they introduced how hadoop/spark release work can make sure proper testing for arm platform and they were building a community called OpenLab to make sure the process more smoothly.



    Thanks everyone for contributing this successful event in one way or another, such as following speakers:

    Sammi Chen, Jun Gong and Dongdong Chen from Tencent,

    Weiwei Yang, Zhankun Tang from Cloudera,

    Wanqiang Ji from Jingdong,

    Tao Yang from Alibaba,

    Chen Zhang and Kang Zhou from Xiaomi,

    Hui Fei from Didi,

    Rui Chen and Sheng Liu from Huawei,

    Xiaoqiao He from Meituan Dianping ,

    Yakun Li from ByteDance,

    and Xun Liu from Netease.

    And especially thanks Chunyu Wang, Summer Xia, Katty Ma for organizing the meetup!

    Thursday August 09, 2018

    [ANNOUNCE] Apache Hadoop 3.1.1 release

    It gives me great pleasure to announce that the Apache Hadoop community has voted to release Apache Hadoop 3.1.1.

    Hadoop 3.1.1 is the first stable maintenance release for the year 2018 in the Hadoop-3.1 line, and brings a number of enhancements.


    3.1.1 is the first stable release of 3.1 line which is prod-ready.

    The Hadoop community fixed 435 JIRAs [1] in total as part of the 3.1.1 release. Of these fixes:

  • 60 in Hadoop Common
  • 139 in HDFS
  • 223 in YARN
  • 13 in MapReduce
  • Apache Hadoop 3.1.1 contains a number of significant features and enhancements. A few of them are noted below.

  • ENTRY_POINT support for Docker containers.
  • Restart policy support for YARN native services.
  • Capacity Scheduler: Intra-queue preemption for fairness ordering policy.
  • Stabilization works for scheduler / YARN service / docker support, etc.
  • Please see the Hadoop 3.1.1 CHANGES) for the detailed list of issues resolved. The release news is posted on the Apache Hadoop website too, you can go to the downloads section.

    Many thanks to everyone who contributed to the release, and everyone in the Apache Hadoop community! The release is a result of direct and indirect efforts from many contributors, listed below are the those who contributed directly by submitting patches and reporting issues.

    Abhishek Modi, Ajay Kumar, Akhil PB, Akira Ajisaka, Allen Wittenauer, Anbang Hu, Andrew Wang, Arpit Agarwal, Atul Sikaria, BELUGA BEHR, Bharat Viswanadham, Bibin A Chundatt, Billie Rinaldi, Bilwa S T, Botong Huang, Brahma Reddy Battula, Brook Zhou, CR Hota, Chandni Singh, Chao Sun, Charan Hebri, Chen Liang, Chetna Chaudhari, Chun Chen, Daniel Templeton, Davide Vergari, Dennis Huo, Dibyendu Karmakar, Ekanth Sethuramalingam, Eric Badger, Eric Yang, Erik Krogen, Esfandiar Manii, Ewan Higgs, Gabor Bota, Gang Li, Gang Xie, Genmao Yu, Gergely Novák, Gergo Repas, Giovanni Matteo Fumarola, Gour Saha, Greg Senia, Haibo Yan, Hanisha Koneru, Hsin-Liang Huang, Hu Ziqian, Istvan Fajth, Jack Bearden, Jason Lowe, Jeff Zhang, Jian He, Jianchao Jia, Jiandan Yang , Jim Brennan, Jinglun, John Zhuge, Joseph Fourny, K G Bakthavachalam, Karthik Palanisamy, Kihwal Lee, Kitti Nanasi, Konstantin Shvachko, Lei (Eddy) Xu, LiXin Ge, Lokesh Jain, Lukas Majercak, Miklos Szegedi, Mukul Kumar Singh, Namit Maheshwari, Nanda kumar, Nilotpal Nandi, Pavel Avgustinov, Prabhu Joseph, Prasanth Jayachandran, Robert Kanter, Rohith Sharma K S, Rushabh S Shah, Sailesh Patel, Sammi Chen, Sean Mackrory, Sergey Shelukhin, Shane Kumpf, Shashikant Banerjee, Siyao Meng, Sreenath Somarajapuram, Steve Loughran, Suma Shivaprasad, Sumana Sathish, Sunil Govindan, Surendra Singh Lilhore, Szilard Nemeth, Takanobu Asanuma, Tao Jie, Tao Yang, Ted Yu, Thomas Graves, Thomas Marquardt, Todd Lipcon, Vinod Kumar Vavilapalli, Wangda Tan, Wei Yan, Wei-Chiu Chuang, Weiwei Yang, Wilfred Spiegelenburg, Xiao Chen, Xiao Liang, Xintong Song, Xuan Gong, Yang Wang, Yesha Vora, Yiqun Lin, Yiran Wu, Yongjun Zhang, Yuanbo Liu, Zian Chen, Zoltan Haindrich, Zsolt Venczel, Zuoming Zhang, fang zhenyi, john lilley, jwhitter, kyungwan nam, liaoyuxiangqin, liuhongtong, lujie, skrho, yanghuafeng, yimeng, Íñigo Goiri.

    [1] JIRA query: project in (YARN, HADOOP, MAPREDUCE, HDFS) AND resolution = Fixed AND fixVersion = 3.1.1 ORDER BY key ASC, updated ASC, created DESC, priority DESC

    Friday April 06, 2018

    [ANNOUNCE] Apache Hadoop 3.1.0 release

    It gives us great pleasure to announce that the Apache Hadoop community has voted to release Apache Hadoop 3.1.0.

    Hadoop 3.1.0 is the first minor release for the year 2018 in the Hadoop-3.x line, and brings a number of enhancements.


    This release is *not* yet ready for production use. Critical issues are being ironed out via testing and downstream adoption. Production users should wait for a 3.1.1/3.1.2 release.

    The Hadoop community fixed 768 JIRAs (https://s.apache.org/apache-hadoop-3.1.0-all-tickets) in total as part of the 3.1.0 release. Of these fixes:

    • 141 in Hadoop Common
    • 266 in HDFS
    • 329 in YARN
    • 32 in MapReduce

    Apache Hadoop 3.1.0 contains a number of significant features and enhancements. A few of them are noted below.

    Hadoop Common

    • HADOOP-14831 / HADOOP-14531 / HADOOP-14825 / HADOOP-14325. S3/S3A/S3Guard related improvements.

    Hadoop HDFS

    • HDFS-9806 - HDFS block replicas to be provided by an external storage system

    Hadoop YARN

    • YARN-6223. First class GPU support on YARN
    • YARN-5983. First class FPGA support on YARN
    • YARN-5079 / YARN-4793 / YARN-4757 / YARN-6419. YARN native service support
    • YARN-6592. Rich placement constraints in YARN
    • YARN-5881. Enable configuration of queue capacity in terms of absolute resources for Capacity Scheduler.
    • YARN-7117. Capacity Scheduler: Support auto-creation of leaf queues while doing queue mapping

    Please see the Hadoop 3.1.0 CHANGES for the detailed list of issues resolved. The release news is posted on the Apache Hadoop website too, you can go to the downloads section directly

    Many thanks to everyone who contributed to the release, and everyone in the Apache Hadoop community! The release is a result of direct and indirect efforts from many contributors, listed below are the those who contributed directly by submitting patches.

    Aaron Fabbri, Aaron Gresch, Abdullah Yousufi, Ajay Kumar, Ajith S, Akhil PB, Aki Tanaka, Akira Ajisaka, Alessandro Andrioni, Allen Wittenauer, Andras Bokor, Andrew Wang, Anis Elleuch, Anu Engineer, Arpit Agarwal, Arun Suresh, Atul Sikaria, BELUGA BEHR, Bharat Viswanadham, Bibin A Chunda, Billie Rinaldi, Botong Huang, Brahma Reddy Battula, Carlo Curino, Chandni Singh, Chang Li, Chao Sun, Charan Hebri, Chen Hongfei, Chen Liang, Chetna Chaudhari, Chris Douglas, Da Ding, Daniel Templeton, Daryn Sharp, Devaraj K, Dmitry Chuyko, Ekanth S, Elek, Marton, Ellen Hui, Eric Badger, Eric Payne, Eric Yang, Erik Krogen, Esther Kundin, Ewan Higgs, Gabor Bota, Genmao Yu, Gera Shegalov, Gergely Novák, Gergo Repas, Gour Saha, Greg Phillips, Grigori Rybkine, Haibo Chen, Haibo Yan, Hanisha Koneru, He Xiaoqiao, Igor Dvorzhak, Jack Bearden, Jason Lowe, Jian He, Jiandan Yang, Jianfei Jiang, Jim Brennan, Jinjiang Ling, Joe McDonnell, Johan Gustavsson, Johannes Alberti, John Zhuge, Jonathan Eagles, Jonathan Hung, Jongyoul Lee, Juan Rodríguez Hortalá, Junping Du, Kai Sasaki, Kannapiran Srinivasan, Karthik Kambatla, Karthik Palaniappan, Karthik Palanisamy, Keqiu Hu, Kihwal Lee, Konstantin Shvachko, Konstantinos Karanasos, Kuhu Shukla, Lei (Eddy) Xu, LiXin Ge, Lokesh Jain, Lukas Majercak, Manikandan R, Manoj Govindassamy, Masahiro Tanaka, Mikhail Erofeev, Miklos Szegedi, Ming Ma, Mukul Kumar Singh, Nanda kumar, Nandor Kollar, Nishant Bangarwa, Oleg Danilov, Panagiotis Garefalakis, PandaMonkey, Peter Bacsko, Ping Liu, Prabhu Joseph, Rahul Pathak, Rajesh Balamohan, Ray Chiang, Remi Catherino, Robert Kanter, Rohith Sharma K S, Rui Li, Rushabh S Shah, SammiChen, Sampada Dehankar, Santhosh G Nayak, Sean Mackrory, Sean Po, Sen Zhao, Shane Kumpf, Sharad Sonker, Shashikant Banerjee, Shawna Martell, Sivaguru Sankaridurg, Soumabrata Chakraborty, Stephen O'Donnell, Steve Loughran, Steven Rand, Subru Krishnan, Suma Shivaprasad, Sunil G, Surendra Singh Lilhore, Szilard Nemeth, Takanobu Asanuma, Tanuj Nayak, Tao Yang, Tarun Parimi, Thejas M Nair, Thomas Marquard, Todd Lipcon, Tsz Wo Nicholas Sze, Varada Hemeswari, Varun Saxena, Varun Vasudev, Vasudevan Skm, Vinayakumar B, Vipin Rathor, Virajith Jalaparti, Vishwajeet Dusane, Wangda Tan, Wei Yan, Weiwei Wu, Weiwei Yang, Wilfred Spiegelenburg, Xiao Chen, Xiao Liang, Xiaoyu Yao, Xuan Gong, Yang Wang, Yeliang Cang, Yesha Vora, Yiqun Lin, Yonger, Yongjun Zhang, Young Chen, Yu-Tang Lin, Yufei Gu, Zhankun Tang, Zian Chen, Zsolt Venczel, chencan, fang zhenyi, hu xiaodong, kartheek muthyala, liaoyuxiangqin, ligongyi, liuhongtong, lovekesh bansal, lufei, lujie, maobaolong, shanyu zhao, tartarus, usharani, wangzhiyuan, wujinhu, Íñigo Goiri.


    Wangda Tan and Vinod Kumar Vavilapalli

    Tuesday December 12, 2017

    Welcome to Apache™ Hadoop®!

    The Apache™ Hadoop® project develops open-source software for reliable, scalable, distributed computing.



    Hot Blogs (today's hits)

    Tag Cloud