Entries tagged [big]

Monday August 30, 2021

The Apache Drill Project Announces Apache® Drill(TM) v1.19 Milestone Release

Open Source, enterprise-grade, schema-free Big Data SQL query engine used by thousands of organizations, including Ant Group, Cisco, Ericsson, Intuit, MicroStrategy, Tableau, TIBCO, TransUnion, Twitter, and more.

Wilmington, DE —30 August 2021— The Apache Drill Project announced the release of Apache® DrillTM v1.19, the schema-free Big Data SQL query engine for Apache Hadoop®, NoSQL, and Cloud storage.

"Drill 1.19 is our biggest release ever," said Charles Givre, Vice President of Apache Drill. "With an already short learning curve, Drill 1.19 makes it even easier for users to quickly query, analyze, and visualize data from disparate sources and complex data sets.”

An "SQL-on-Hadoop" engine, Apache Drill is easy to deploy, highly performant, able to quickly process trillions of records, and scalable from a single laptop to a 1000-node cluster. With its schema-free JSON model (the first distributed SQL query engine of its kind), Drill is able to query complex semi-structured data in situ without requiring users to define schemas or transform data. It provides plug-and-play integration with existing Hive and HBase deployments, and is extensible out-of-the-box to access multiple data sources, such as S3 and Apache HDFS, HBase, and Hive. Additionally, Drill can directly query data from REST APIs to include platforms like SalesForce and ServiceNow. 

Drill supports the ANSI SQL 2003 standard syntax ecosystem as well as dozens of NoSQL databases and file systems, including Apache HBase, MongoDB, Elasticsearch, Cassandra, REST APIs, , HDFS, MapR-FS, Amazon S3, Azure Blob Storage, Google Cloud Storage, NAS,  local files, and more. Drill leverages familiar BI tools (such as Apache Superset, Tableau, MicroStrategy, QlikView and Excel) as well as data virtualization and visualization tools, and runs interactive queries on Hive tables with different Hive metastores.

Apache Drill v1.19
Drill is designed from the ground up to support high-performance analysis on rapidly evolving data on modern Big Data applications. v1.19 reflects more than 100 changes, improvements, and new features that include:

  • New Connectors for Apache Cassandra, Elasticsearch, and Splunk.

  • New Format Reader for XML without schemas

  • Added Avro support for Kafka plugin

  • Integrated password vault for secure credential storage

  • Support for Linux ARM64 systems

  • Added limit pushdowns for file systems, HTTP REST APIs and MongoDB

  • Added streaming for Drill's REST API

  • Integration with Apache Airflow


Developers, analysts, business users, and data scientists use Apache Drill for data exploration and analysis for its enterprise-grade reliability, security, and performance. Drill's flexibility and ease-of-use have attracted thousands of users that include Ant Group, Cardlytics, Cisco, Ericsson, Intuit, MicroStrategy, Qlik, Tableau, TIBCO, TransUnion, Twitter, National University of Singapore, and more.

"Individuals, businesses, and organizations of all types rely on Apache Drill's rich functionality," added Givre. "We invite everyone to participate in our user and developer lists as well as our Slack channel, and contribute to the project to build on our momentum and help improve the future experience for all Drill users."

Catch Apache Drill in action at ApacheCon@Home, taking place online 21-23 September 2021. For more information and to register, visit https://www.apachecon.com/ .

Availability and Oversight
Apache Drill software is released under the Apache License v2.0 and is overseen by a volunteer, self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases.

About Apache Drill
Apache Drill is the Open Source, schema-free Big Data SQL query engine for Apache Hadoop, NoSQL, and Cloud storage. For more information, including documentation and ways to become involved with Apache Drill, visit http://drill.apache.org/ , https://twitter.com/ApacheDrill , and https://apache-drill.slack.com/ .

© The Apache Software Foundation. "Apache", "Drill", "Apache Drill", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.

#  #  #

Monday August 02, 2021

The Apache Software Foundation Announces Apache® Pinot™ as a Top-Level Project

Open Source distributed real-time Big Data analytics infrastructure in use at Amazon-Eero, Doordash, Factual/FourSquare, LinkedIn, Stripe, Uber, Walmart, Weibo, and WePay, among others.

Wilmington, DE —2 August 2021— The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today Apache® Pinot™ as a Top-Level Project (TLP).

Apache Pinot is a distributed Big Data analytics infrastructure created to deliver scalable real-time analytics at high throughput with low latency. The project was first created at LinkedIn in 2013, open-sourced in 2015, and entered the Apache Incubator in October 2018.

"We are pleased to successfully adopt 'the Apache Way' and graduate from the Apache Incubator," said Kishore Gopalakrishna, Vice President and original co-creator of Apache Pinot. "Pinot initially pushed the boundaries of real-time analytics by delivering insights to millions of Linkedin users. Today, as an Apache Top-Level Project, Pinot is in the hands of developers across the globe who are building it to power several user-facing  analytical applications and unlock the value of data within their organizations."

Scalable to trillions of records, Apache Pinot’s online analytical processing (OLAP) ingests both online and offline data sources from Apache Kafka, Apache Spark, Apache Hadoop HDFS, flat files, and Cloud storages in real time. Pinot is able to ingest millions of events and serve thousands of queries per second, and provide unified analytics in a distributed, fault-tolerant fashion. Features include:

  • Speed —answers OLAP queries with low latency on real-time data

  • Pluggable indexing —Sorted, Inverted, Text Index, Geospatial Index, JSON Index, Range Index, Bloom filters

  • Smart Materialized Views - Fast Aggregations via star-tree index

  • Supports different stream systems with near real-time ingestion —with Apache Kafka, Confluent Kafka, and Amazon Kinesis, as well as customizable input format, with out-of the box support for Avro and JSON formats

  • Highly available, horizontally scalable, and fault tolerant

  • Supports lookup joins natively and full joins using PrestoDB/Trino

Apache Pinot is used to power internal and external analytics at Adbeat, Amazon-Eero, Cloud Kitchens, Confluera, Doordash, Factual/FourSquare, Guitar Center, LinkedIn, Publicis Sapient, Razorpay, Scale Unlimited, Startree, Stripe, Traceable, Uber, Walmart, Weibo, WePay, and more.

Examples of how Apache Pinot helps organizations across numerous verticals include: 1) a fintech company uses Pinot to achieve financial data visibility across 500+ terabytes of data and sustain half million queries per second with financial transactions; 2) a food delivery service leveraged Pinot in the midst of the COVID-19 pandemic to analyze real-time data to provide a socially-distanced pick-up experience for its riders and restaurants; and 3) a large retail chain with geographically distributed franchises and stores uses Pinot for revenue-generating opportunities by analyzing real-time data for internal use cases, as well as real-time cart analysis to increase sales.

"We rely on Apache Pinot for all our real-time analytics needs at LinkedIn," said Kapil Surlaker, Vice President of Engineering at LinkedIn. "It's battle-tested at LinkedIn scale for hundreds of our low-latency analytics applications. We believe Apache Pinot is the best tool out there to build site-facing analytics applications and we will continue to contribute heavily and collaborate with the Apache Pinot community. We are very happy to see that it's now a Top-level Apache project."

"We use Apache Pinot in our real-time analytics platform to power external user-facing applications and critical operational dashboards," said Ujwala Tulshigiri, Engineering Manager at Uber. "With Pinot's multi-tenancy support and horizontal scalability, we have scaled to hundreds of use cases that run complex aggregations queries on terabytes of data at millisecond latencies, with the minimal overhead of cluster management."

"We've been using Apache Pinot since last year, and it's been a huge win for our client’s dashboard project," said Ken Krugler, President of Scale Unlimited. "Pinot's ability to rapidly generate aggregation results over billions of records, with modest hardware requirements, was critical for the success of the project. We've also been able to provide patches to add functionality and fix issues, which the Pinot community has quickly integrated and released. There was never any doubt in our minds that Pinot would graduate from the Apache incubator and become a successful top-level project."

"Last year, we started without analytics built into our product," said Pradeep Gopanapalli, technical staff member at Confluera. "By the end of the year, we were using Apache Pinot for real-time analytics in production. Not many of our competitors can even dream of having such results. We are very happy with our choice."

"Pinot is critical to our real-time analytics platform and allowed us to scale without degrading latency," said software engineer Elon Azoulay. "Pinot enables us to onboard large datasets effortlessly, run complex queries which return in milliseconds and is super reliable. We would like to emphasize how helpful and engaged the community is and are certain that we made the right choice with Pinot, it continues to impress us and satisfy our real-time analytics needs."

"We created Pinot at LinkedIn with the goal of tackling the low-latency OLAP problem for site-facing use cases at scale. We evolved it to solve numerous OLAP use cases, and open-sourced it because there aren't many technologies in that domain," said Subbu Subramaniam, member of the Apache Pinot Project Management Committee, and Senior Staff Engineer at LinkedIn. "It is heart-warming to see such a wide adoption and great contributions from the community in improving Pinot over time."

"We are at the beginning of this transformation and we cannot wait to see every software company build real-time applications using Apache Pinot," added Gopalakrishna. "We welcome everyone to join our community Slack channel and contribute to the project."

Catch Apache Pinot in action at ApacheCon Asia online on 7 August 2021. For more information and to register, visit https://www.apachecon.com/acasia2021/

Availability and Oversight
Apache Pinot software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache Pinot, visit http://pinot.apache.org/ and https://twitter.com/ApachePinot

About the Apache Incubator
The Apache Incubator is the primary entry path for projects and codebases wishing to become part of the efforts at The Apache Software Foundation. All code donations from external organizations and existing external projects enter the ASF through the Incubator to: 1) ensure all donations are in accordance with the ASF legal standards; and 2) develop new communities that adhere to our guiding principles. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF. For more information, visit http://incubator.apache.org/

About The Apache Software Foundation (ASF)
Established in 1999, The Apache Software Foundation is the world’s largest Open Source foundation, stewarding 227M+ lines of code and providing more than $22B+ worth of software to the public at 100% no cost. The ASF’s all-volunteer community grew from 21 original founders overseeing the Apache HTTP Server to 850+ individual Members and 200 Project Management Committees who successfully lead 350+ Apache projects and initiatives in collaboration with 8,200+ Committers through the ASF’s meritocratic process known as "The Apache Way". Apache software is integral to nearly every end user computing device, from laptops to tablets to mobile devices across enterprises and mission-critical applications. Apache projects power most of the Internet, manage exabytes of data, execute teraflops of operations, and store billions of objects in virtually every industry. The commercially-friendly and permissive Apache License v2 is an Open Source industry standard, helping launch billion dollar corporations and benefiting countless users worldwide. The ASF is a US 501(c)(3) not-for-profit charitable organization funded by individual donations and corporate sponsors that include Aetna, Alibaba Cloud Computing, Amazon Web Services, Anonymous, Baidu, Bloomberg, Capital One, Cloudera, Comcast, Confluent, Didi Chuxing, Facebook, Google, Huawei, IBM, Indeed, Microsoft, Namebase, Pineapple Fund, Red Hat, Reprise Software, Talend, Tencent, Target, Union Investment, Verizon Media, and Workday. For more information, visit http://apache.org/ and https://twitter.com/TheASF

© The Apache Software Foundation. "Apache", "Pinot", "Apache Pinot", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.

# # #

Thursday April 08, 2021

The Apache Software Foundation Announces Apache® DolphinScheduler™ as a Top-Level Project

Open Source distributed Big Data visual workflow scheduler system in use at thousands of organizations, including Budweiser, China Unicom, IDG Capital, IBM China, JD.com, Lenovo, New Oriental, Nokia China, Qihoo 360, SF Express, and Tencent, among others.


Wilmington, DE —8 April 2021— The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today Apache® DolphinScheduler™ as a Top-Level Project (TLP).


Apache DolphinScheduler is a distributed, extensible visual Big Data workflow scheduler system. The project was first created at Analysys in December 2017, and entered the Apache Incubator in August 2019.


"We learned a lot about becoming a strong Open Source project during our time in the Apache Incubator," said Lidong Dai, Vice President of Apache DolphinScheduler. "Our incubation mentors helped guide us with developing our project and community the Apache Way. We are pleased to have graduated as an Apache Top-Level Project."


As a distributed and extensible data workflow scheduler platform with rich directed acyclic graph (DAG) visual interfaces, DolphinScheduler solves complex task dependencies and triggers in the data pipeline. Out-of-the-box, its easy-to-extend processing connects numerous systems to 100,000-level data task scheduling. Apache DolphinScheduler is:

  • Cloud Native —support multi-cloud/data center workflow  management, also supports Kubernetes, Docker deployment and custom task types, distributed scheduling, with overall scheduling capability increased linearly with the scale of the cluster

  • Highly Reliable —decentralized multi-master and multi-worker, high availability, supported by itself, overload processing

  • User-Friendly —all process definition operations are visualized, defines key information at a glance, one-click deployment

  • Supports Rich Scenarios —includes streaming, pause, recover operation, multi-tenant, and additional task types such as spark, hive, mr, shell, python, flink, sub_process, and more.

"Apache DolphinScheduler is designed for cloud-native," added Dai. "We are proud to have built a reliable and cloud friendly data workflow system while using next generation architecture and smart UI design."


Apache DolphinScheduler has more than 4,000 users in China, with Internet companies and banks forming a large percentage of users. Users include Budweiser, China Unicom, IDG Capital, IBM China, JD.com, Lenovo, New Oriental, Nokia China, Qihoo 360, SF Express, and Tencent, among others.


"Apache DolphinScheduler is an excellent data workflow open-source product," said Zhengjun Yin, Architect at China Unicom. "Its community is very friendly and gives us strong support. We save the cost of hundreds of human-months by using DolphinScheduler!"


"Apache DolphinScheduler is amazing," said Xide Gu, Architect at JD Logistics. "JD Logistics used Apache DolphinScheduler as  a stable and powerful platform to connect and control the data flow from various data sources in JDL, such as SAP Hana and Hadoop. It offers open API, easy plug-in and stable data flow development and scheduler environment. DolphinScheduler really helps JD Logistics data team accelerate development efficiency in many Agile BI projects!"

"I am honored to guide the DolphinScheduler community from day one of the incubating. In the past 1.5 years, it grows fast and healthy," said Sheng Wu, ASF Board Member and DolphinScheduler Incubator Champion. "They learned the Apache culture quickly, and have great executive capability. It is great to see the project graduating from the incubator with a diverse and active community. Being a top-level project is a new beginning for you, look forward to becoming a global and powerful project." "I am honored to witness the entire process of DolphinScheduler from open source to entry into the Apache incubator, and then to graduation to become an independent Apache top-level project," said Shi Shaofeng, Member of the Apache Kylin and Apache Incubator Project Management Committees. "During more than one year, the participants in the DolphinScheduler community have been adhering to the open-source spirit, constantly innovating and making progress. The developers and contributors join in the community constantly and make DolphinScheduler, a big data scheduling tool created by the Chinese, become more and more perfect, more and more users, and enter a virtuous cycle of development. It is expected that after graduation from the incubator, she will continue to move forward under the management of PMCs and create more value for society and the public through open-source software." "Congratulations to open source project DolphinScheduler for graduating from the Apache incubator and becoming ASF's top project," said Chen Liang, Vice President of Apache CarbonData. "DolphinScheduler has been developing the community in accordance with the Apache Way and has attracted many open-source developers to join. With the joint efforts of community members, the project has become more and more mature. Best wishes to the DolphinScheduler community!"


"We look forward to diversifying the Apache DolphinScheduler community with seed users from all over the world," added Dai. "Those interested in participating are welcome to reach out to us on our project mailing lists and other channels."


Catch Apache DolphinScheduler in action at its global MeetUp, held online in collaboration with the Apache ShardingSphere community, on 15 May 2021. Members of the DolphinScheduler and ShardingSphere Project Management Committees will share features and use cases on both projects in English. To register, visit https://www.meetup.com/dolphinscheduler-meetup-group/

Availability and Oversight

Apache DolphinScheduler software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache DolphinScheduler, visit https://DolphinScheduler.apache.org/ , https://twitter.com/DolphinSchedule , and https://asf-dolphinscheduler.slack.com/ .


About the Apache Incubator

The Apache Incubator is the primary entry path for projects and codebases wishing to become part of the efforts at The Apache Software Foundation. All code donations from external organizations and existing external projects enter the ASF through the Incubator to: 1) ensure all donations are in accordance with the ASF legal standards; and 2) develop new communities that adhere to our guiding principles. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF. For more information, visit http://incubator.apache.org/ 


About The Apache Software Foundation (ASF)

Established in 1999, The Apache Software Foundation is the world’s largest Open Source foundation, stewarding 227M+ lines of code and providing more than $20B+ worth of software to the public at 100% no cost. The ASF’s all-volunteer community grew from 21 original founders overseeing the Apache HTTP Server to 813 individual Members and 200 Project Management Committees who successfully lead 350+ Apache projects and initiatives in collaboration with nearly 8,100 Committers through the ASF’s meritocratic process known as "The Apache Way". Apache software is integral to nearly every end user computing device, from laptops to tablets to mobile devices across enterprises and mission-critical applications. Apache projects power most of the Internet, manage exabytes of data, execute teraflops of operations, and store billions of objects in virtually every industry. The commercially-friendly and permissive Apache License v2 is an Open Source industry standard, helping launch billion dollar corporations and benefiting countless users worldwide. The ASF is a US 501(c)(3) not-for-profit charitable organization funded by individual donations and corporate sponsors including Aetna, Alibaba Cloud Computing, Amazon Web Services, Anonymous, Baidu, Bloomberg, Budget Direct, Capital One, Cloudera, Comcast, Confluent, Didi Chuxing, Facebook, Google, Handshake, Huawei, IBM, Microsoft, Namebase, Pineapple Fund, Red Hat, Reprise Software, Target, Tencent, Union Investment, Verizon Media, and Workday. For more information, visit http://apache.org/ and https://twitter.com/TheASF  


© The Apache Software Foundation. "Apache", "DolphinScheduler", "Apache DolphinScheduler", "ShardingSphere", "Apache ShardingSphere", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.

# # #

Tuesday February 16, 2021

The Apache Software Foundation Announces Apache® Gobblin™ as a Top-Level Project

Open Source distributed Big Data integration framework in use at Apple, CERN, Comcast, Intel, LinkedIn, Nerdwallet, PayPal, Prezi, Roku, Sandia National Labs, Swisscom, Verizon, and more.

Wilmington, DE —16 February 2021— The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today Apache® Gobblin™ as a Top-Level Project (TLP).

Apache Gobblin is a distributed Big Data integration framework used in both streaming and batch data ecosystems. The project originated at LinkedIn in 2014, was open-sourced in 2015, and entered the Apache Incubator in February 2017.

"We are excited that Gobblin has completed the incubation process and is now an Apache Top-Level Project," said Abhishek Tiwari, Vice President of Apache Gobblin and software engineering manager at LinkedIn. "Since entering the Apache Incubator, we have completed four releases and grown our community the Apache Way to more than 75 contributors from around the world."

Apache Gobblin is used to integrate hundreds of terabytes and thousands of datasets per day by simplifying the ingestion, replication, organization, and lifecycle management processes across numerous execution environments, data velocities, scale, connectors, and more.

"Originally creating this project, seeing it come to life and solve mission-critical problems at many companies has been a very gratifying experience for me and the entire Gobblin team," said Shirshanka Das, Founder and CTO at Acryl Data, and member of the Apache Gobblin Project Management Committee.

As a highly scalable data management solution for structured and byte-oriented data in heterogeneous data ecosystems, Apache Gobblin makes the arduous task of creating and maintaining a modern data lake easy. It supports the three main capabilities required by every data team: 

  • Ingestion and export of data from a variety of sources and sinks into and out of the data lake while supporting simple transformations. 
  • Data Organization within the lake (e.g. compaction, partitioning, deduplication).
  • Lifecycle and Compliance Management of data within the lake (e.g. data retention, fine-grain data deletions) driven by metadata.

"Apache Gobblin supports deployment models all the way from a single-process standalone application to thousands of containers running in cloud-native environments, ensuring that your data plane can scale with your company’s growth," added Das.

Apache Gobblin is in use at Apple, CERN, Comcast, Intel, LinkedIn, Nerdwallet, PayPal, Prezi, Roku, Sandia National Laboratories, Swisscom, and Verizon, among many others.

"We chose Apache Gobblin as our primary data ingestion tool at Prezi because it proved to scale, and it is a swiss army knife of data ingestion," said Tamas Nemeth, Tech Lead and Manager at Prezi. "Today, we ingest, deduplicate, and compact more than 1200 Apache Kafka topics with its help, and this number is still growing. We are looking forward to continuing to contribute to the project and helping the community enable other companies to use Apache Gobblin."

"Apache Gobblin has been at the center stage of the data management story at LinkedIn. We leverage it for various use-cases ranging from ingestion, replication, compaction, retention, and more," said Kapil Surlaker, Vice President of Engineering at LinkedIn. "It is battle-tested and serves us well at exabyte scale. We firmly believe in the data wrangling capabilities that Gobblin has to offer, and we will continue to contribute heavily and collaborate with the Apache Gobblin community. We are happy to see that Gobblin has established itself as an industry standard and is now an Apache Top-Level Project."

"Open community and meritocracy are the key drivers for Apache Gobblin's success," added Tiwari. "We invite everyone interested in the data management space to join us and help shape the future of Gobblin."

Catch Apache Gobblin in action in the upcoming hackathon planned for late Q1 2021. Details will be posted on the Apache Gobblin mailing lists and Twitter feed listed below.

Availability and Oversight
Apache Gobblin software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache Gobblin, visit https://gobblin.apache.org/ and https://twitter.com/ApacheGobblin 

About the Apache Incubator
The Apache Incubator is the primary entry path for projects and codebases wishing to become part of the efforts at The Apache Software Foundation. All code donations from external organizations and existing external projects enter the ASF through the Incubator to: 1) ensure all donations are in accordance with the ASF legal standards; and 2) develop new communities that adhere to our guiding principles. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF. For more information, visit http://incubator.apache.org/ 

About The Apache Software Foundation (ASF)
Established in 1999, The Apache Software Foundation is the world’s largest Open Source foundation, stewarding 227M+ lines of code and providing more than $20B+ worth of software to the public at 100% no cost. The ASF’s all-volunteer community grew from 21 original founders overseeing the Apache HTTP Server to 813 individual Members and 206 Project Management Committees who successfully lead 350+ Apache projects and initiatives in collaboration with nearly 8,000 Committers through the ASF’s meritocratic process known as "The Apache Way". Apache software is integral to nearly every end user computing device, from laptops to tablets to mobile devices across enterprises and mission-critical applications. Apache projects power most of the Internet, manage exabytes of data, execute teraflops of operations, and store billions of objects in virtually every industry. The commercially-friendly and permissive Apache License v2 is an Open Source industry standard, helping launch billion dollar corporations and benefiting countless users worldwide. The ASF is a US 501(c)(3) not-for-profit charitable organization funded by individual donations and corporate sponsors including Aetna, Alibaba Cloud Computing, Amazon Web Services, Anonymous, Baidu, Bloomberg, Budget Direct, Capital One, Cloudera, Comcast, Didi Chuxing, Facebook, Google, Handshake, Huawei, IBM, Microsoft, Pineapple Fund, Red Hat, Reprise Software, Target, Tencent, Union Investment, Verizon Media, and Workday. For more information, visit http://apache.org/ and https://twitter.com/TheASF 

© The Apache Software Foundation. "Apache", "Gobblin", "Apache Gobblin", "Hadoop", "Apache Hadoop", "MapReduce", "Apache MapReduce", "Mesos", "Apache Mesos", "YARN", "Apache YARN", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.

# # #

Wednesday February 03, 2021

The Apache Software Foundation Announces Apache® DataSketches™ as a Top-Level Project

Open Source high-performance Big Data streaming algorithm library in use at Nielsen Identity, Permutive, Splice Machine, and Verizon Media, among others.

Wilmington, DE —3 February 2021— The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today Apache® DataSketches™ as a Top-Level Project (TLP).

Apache DataSketches is a highly performant Big Data analysis library for scalable approximate algorithms. The project originated at Yahoo in 2012, was open-sourced in 2015, and entered the Apache Incubator in March 2019.

"We are excited to be part of the ASF," said Lee Rhodes, Vice President of Apache DataSketches. "We have learned a great deal from the incubation process and look forward to working with new users of our library that want to take advantage of sketching technology."

Apache DataSketches’s library of specialized streaming algorithms —known as sketches— comprise small data structures that process data at massive scale. Sketches are ideal for queries that cannot afford the time or huge compute resources needed to generate exact results. Where approximate results are acceptable, sketches are the only viable alternative for interactive queries with real-time analysis. Apache DataSketches is:

  • Fast —produces approximate results at orders of magnitude faster than traditional methods -- user configurable size vs accuracy tradeoff;
  • Efficient —sketch algorithms process data in a single pass for both real-time and batch;
  • Mergeable —allows for parallelization;
  • Optimized for large-scale computing environments that process Big Data —such as Apache Hadoop, Apache Spark, Apache Druid, Apache Hive, Apache Pig, PostgreSQL;
  • Binary compatible across multiple languages and platforms —available in Java, C++, and Python;
  • Expanded Analysis —including count distinct with set operations, quantiles, most frequent items (heavy hitters), matrix computations, and more; and
  • Mathematically defined and proven error properties —provides a priori and a posteriori error estimation and upper and lower bounds with statistically derived confidence intervals.

Apache DataSketches is used in large-scale computing environments such as Nielsen Identity, Permutive, Splice Machine, and Verizon Media, among others, as well as Apache Druid and Apache Pinot (incubating).

"The Apache DataSketches project takes powerful algorithms for data summarization and analysis, and makes them available to everyone," said Professor Graham Cormode of the University of Warwick. "While these methods are tremendously useful in practice, their descriptions were previously only in highly technical scientific papers. This project has made robust, dependable and well-documented implementations available to all. Already the library has been used for a wide range of applications, including service quality, monitoring, ad analytics and the sciences."

"Using Apache DataSketches has enabled Apache Druid users to perform common tasks such as quantiles and unique counting in a highly performant and efficient manner," said Gian Merlino, Vice President of Apache Druid. "We have worked closely together over the years to make the power of DataSketches accessible to Apache Druid users, helping us provide real-time analytics at scale."

"Sketches are fundamental to calculating many of our key company metrics," said Tom Miller, Director of Software Development Engineering at Verizon Media. "It allows us to greatly simplify our data processing and reduce storage costs by allowing us to calculate non-additive metrics across user specified dimension combinations at report time instead of having to either retain raw data or pre-calculate for each set of dimensions."

"Combining Apache Druid and DataSketches allows us to provide our customers real-time insights into their target audiences and advertising campaigns," said Yakir Buskilla, Senior Vice President of Research and Development and General Manager Israel at Nielsen Identity. "The ability to evaluate set expressions make the Theta Sketch especially powerful for multi-set cardinality estimation as well as funnel analysis."

“Apache DataSketches has provided us with a solid theoretical foundation upon which we are able to store and process data at scale - in a simple, fast and cost-efficient manner," said David Cromberge, Senior Software Engineer at Permutive. "It has been a pleasure to engage with their creators and community who have been helpful at every step of the way.”

"We use DataSketches's Theta-Sketches for distinct-count aggregations that are used to solve large multi-set cardinality approximation," said Mayank Shrivastava, Committer and member of the Apache Pinot (incubating) Podling Project Management Committee. "The ability to evaluate set expressions make the Theta Sketch especially powerful for multi-set cardinality estimation as well as funnel analysis."

"We welcome those interested in streaming algorithms to visit us, learn about this exciting technology, and contribute to Apache DataSketches to make our project even better," added Rhodes.

Availability and Oversight
Apache DataSketches software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache DataSketches, visit https://datasketches.apache.org .

About the Apache Incubator
The Apache Incubator is the primary entry path for projects and codebases wishing to become part of the efforts at The Apache Software Foundation. All code donations from external organizations and existing external projects enter the ASF through the Incubator to: 1) ensure all donations are in accordance with the ASF legal standards; and 2) develop new communities that adhere to our guiding principles. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF. For more information, visit http://incubator.apache.org/ .

About The Apache Software Foundation (ASF)
Established in 1999, The Apache Software Foundation is the world’s largest Open Source foundation, stewarding 227M+ lines of code and providing more than $20B+ worth of software to the public at 100% no cost. The ASF’s all-volunteer community grew from 21 original founders overseeing the Apache HTTP Server to 813 individual Members and 206 Project Management Committees who successfully lead 350+ Apache projects and initiatives in collaboration with nearly 8,000 Committers through the ASF’s meritocratic process known as "The Apache Way". Apache software is integral to nearly every end user computing device, from laptops to tablets to mobile devices across enterprises and mission-critical applications. Apache projects power most of the Internet, manage exabytes of data, execute teraflops of operations, and store billions of objects in virtually every industry. The commercially-friendly and permissive Apache License v2 is an Open Source industry standard, helping launch billion dollar corporations and benefiting countless users worldwide. The ASF is a US 501(c)(3) not-for-profit charitable organization funded by individual donations and corporate sponsors including Aetna, Alibaba Cloud Computing, Amazon Web Services, Anonymous, Baidu, Bloomberg, Budget Direct, Capital One, Cloudera, Comcast, Didi Chuxing, Facebook, Google, Handshake, Huawei, IBM, Microsoft, Pineapple Fund, Red Hat, Reprise Software, Target, Tencent, Union Investment, Verizon Media, and Workday. For more information, visit http://apache.org/ and https://twitter.com/TheASF .

© The Apache Software Foundation. "Apache", "DataSketches", "Apache DataSketches", "Druid", "Apache Druid", "Hadoop", "Apache Hadoop", "Hive", "Apache Hive", "Pig", "Apache Pig", "Pinot (incubating)", "Apache Pinot (incubating)", "Spark", "Apache Spark", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.

# # #

Thursday January 21, 2021

The Apache Software Foundation Announces Apache® Superset™ as a Top-Level Project

Open Source enterprise-grade Big Data visualization and business intelligence Web application in use at Airbnb, American Express, Dropbox, Lyft, Netflix, Nielsen, Rakuten Viki, Twitter, and Udemy, among others.

Wilmington, DE —21 January 2021— The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today Apache® Superset™ as a Top-Level Project (TLP).

Apache Superset is a modern, Open Source data exploration and visualization platform that  enables users to easily and quickly build and explore dashboards using its simple no-code visualization builder and state-of-the-art SQL editor. The project originated at Airbnb in 2015 and entered into the Apache Incubator program in May 2017.

"It's been amazing to be an active part of growing a welcoming, diverse and engaged community over the past five years while following the ASF principles around inclusion, openness and collaboration," said Maxime Beauchemin, Vice President of Apache Superset. "At the scale and level of diversity that the Superset project has achieved, it's critical to have a solid governance model in place like the one prescribed by the ASF."

Apache Superset v1.0
Superset helps streamline the analytics process by providing an intuitive interface to rapidly explore and visualize datasets, create interactive dashboards, and model real-time business intelligence insights at scale. The platform integrates with most SQL speaking data sources, including modern cloud-native databases, data warehouses, and engines at petabyte scale. 

The Project also celebrates a major milestone with the release of Apache Superset 1.0. Features include: 

  • Rich library of visualizations with support for integrating custom visualizations
  • Thin caching layer to optimize performance of charts and dashboards 
  • Code-free visualization builder
  • State-of-the-art SQL editor and metadata workflow
  • Extensible enterprise authentication and security model 
  • Easy-to-use, lightweight semantic layer
  • Notification alerts and scheduled reports


"Apache Superset 1.0 is a solid, mature, self-standing solution that fully solves business intelligence and data visualization needs for modern data teams," added Beauchemin. "Superset not only covers the table stakes, but also offers guarantees, features and a fresh approach that existing BI solutions can't match."

Apache Superset is in use at Airbnb, American Express, Dropbox, Lyft, Netflix, Nielsen, Rakuten Viki, Twitter, and Udemy, among others. A list of known users is available at https://github.com/apache/superset/blob/master/INTHEWILD.md .

"Apache Superset helps Airbnb democratize data insights and make data-informed decisions," said Jeff Feng, Product Lead at Airbnb and member of the Apache Superset Project Management Committee. "Superset uniquely connects SQL analysis with data exploration for thousands of our employees each week. It also serves as a flexible and reliable platform for visualizing metrics, helping executives and knowledge workers see and understand data."

"We had an amazing journey with Superset at Dropbox," said Chloe Wang, Senior Product Manager, Data Insights Platform at Dropbox. "Superset got introduced in 2019 and soon became the most widely adopted query engine within the analytical organization. As a result, our analysts are able to make timely and high confidence product decisions."

"Before Superset, we were paying for a patchwork of proprietary tools and we kept running into limitations when it came to customizing charts and dashboards," said Amit Miran, Software Team Lead for Media Application Framework group at Nielsen. "Once the Superset project supported adding of custom visualizations, that was the turning point for us at Nielsen to start adopting Superset in large projects. We’re very excited about native dashboard filters and future support for cross filtering, which will make our viz plugins even more powerful. The excitement for the project drove me to become involved in my first open source project."

"Apache Superset is an amazing project that enables engineers to easily execute data analysis," said Grace Guo, member of the Apache Superset Project Management Committee. "I have been a Superset user and a Superset builder for a few years. I run queries in SQL Lab, visualize data using one of the many supported chart types, and build dashboards, specifically focusing on performance and product adoption metrics. As an engineer, I appreciate the ability to contribute to the product. If I see some area to improve, or need a feature which doesn’t exist, I am happy to create a PR to fix it for myself and benefit other users."

"Apache Superset’s strength lies in its community," added Beauchemin. "We invite those interested in data visualization to join our mailing lists and help shape future versions of Superset."

Learn more about the latest in v1.0 at the Apache Superset community global MeetUp on 28 January. Registration is open to all and free of charge https://s.apache.org/3cm4f 


Availability and Oversight
Apache Superset software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache Superset, visit https://superset.apache.org/


About the Apache Incubator
The Apache Incubator is the primary entry path for projects and codebases wishing to become part of the efforts at The Apache Software Foundation. All code donations from external organizations and existing external projects enter the ASF through the Incubator to: 1) ensure all donations are in accordance with the ASF legal standards; and 2) develop new communities that adhere to our guiding principles. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF. For more information, visit http://incubator.apache.org/

About The Apache Software Foundation (ASF)
Established in 1999, The Apache Software Foundation is the world’s largest Open Source foundation, stewarding 227M+ lines of code and providing more than $20B+ worth of software to the public at 100% no cost. The ASF’s all-volunteer community grew from 21 original founders overseeing the Apache HTTP Server to 813 individual Members and 206 Project Management Committees who successfully lead 350+ Apache projects and initiatives in collaboration with nearly 8,000 Committers through the ASF’s meritocratic process known as "The Apache Way". Apache software is integral to nearly every end user computing device, from laptops to tablets to mobile devices across enterprises and mission-critical applications. Apache projects power most of the Internet, manage exabytes of data, execute teraflops of operations, and store billions of objects in virtually every industry. The commercially-friendly and permissive Apache License v2 is an Open Source industry standard, helping launch billion dollar corporations and benefiting countless users worldwide. The ASF is a US 501(c)(3) not-for-profit charitable organization funded by individual donations and corporate sponsors including Aetna, Alibaba Cloud Computing, Amazon Web Services, Anonymous, Baidu, Bloomberg, Budget Direct, Capital One, Cloudera, Comcast, Didi Chuxing, Facebook, Google, Handshake, Huawei, IBM, Microsoft, Pineapple Fund, Red Hat, Reprise Software, Target, Tencent, Union Investment, Verizon Media, and Workday. For more information, visit http://apache.org/ and https://twitter.com/TheASF


© The Apache Software Foundation. "Apache", "Superset", "Apache Superset", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.

# # #

Thursday June 04, 2020

The Apache Software Foundation Announces Apache® Hudi™ as a Top-Level Project

Open Source data lake technology for stream processing on top of Apache Hadoop in use at Alibaba, Tencent, Uber, and more.

Wakefield, MA —4 June 2020— The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today Apache® Hudi™ as a Top-Level Project (TLP).

Apache Hudi (Hadoop Upserts Deletes and Incrementals) data lake technology enables stream processing on top of Apache Hadoop compatible cloud stores & distributed file systems. The project was originally developed at Uber in 2016 (code-named and pronounced "Hoodie"), open-sourced in 2017, and submitted to the Apache Incubator in January 2019.

"Learning and growing the Apache way in the incubator was a rewarding experience," said Vinoth Chandar, Vice President of Apache Hudi. "As a community, we are humbled by how far we have advanced the project together, while at the same time, excited about the challenges ahead."

Apache Hudi is used to manage petabyte-scale data lakes using stream processing primitives like upserts and incremental change streams on Apache Hadoop Distributed File System (HDFS) or cloud stores. Hudi data lakes provide fresh data while being an order of magnitude efficient over traditional batch processing. Features include:

  • Upsert/Delete support with fast, pluggable indexing
  • Transactionally commit/rollback data
  • Change capture from Hudi tables for stream processing
  • Support for Apache Hive, Apache Spark, Apache Impala and Presto query engines
  • Built-in data ingestion tool supporting Apache Kafka, Apache Sqoop and other common data sources
  • Optimize query performance by managing file sizes, storage layout
  • Fast row based ingestion format with async compaction into columnar format
  • Timeline metadata for audit tracking

Apache Hudi is in use at organizations such as Alibaba Group, EMIS Health, Linknovate, Tathastu.AI, Tencent, and Uber, and is supported as part of Amazon EMR by Amazon Web Services. A partial list of those deploying Hudi is available at https://hudi.apache.org/docs/powered_by.html

"We are very pleased to see Apache Hudi graduate to an Apache Top-Level Project. Apache Hudi is supported in Amazon EMR release 5.28 and higher, and enables customers with data in Amazon S3 data lakes to perform record-level inserts, updates, and deletes for privacy regulations, change data capture (CDC), and simplified data pipeline development," said Rahul Pathak, General Manager, Analytics, AWS. “We look forward to working with our customers and the Apache Hudi community to help advance the project."

"At Uber, Hudi powers one of the largest transactional data lakes on the planet in near real time to provide meaningful experiences to users worldwide," said Nishith Agarwal, member of the Apache Hudi Project Management Committee. "With over 150 petabytes of data and more than 500 billion records ingested per day, Uber’s use cases range from business critical workflows to analytics and machine learning."

"Using Apache Hudi, end-users can handle either read-heavy or write-heavy use cases, and Hudi will manage the underlying data stored on HDFS/COS/CHDFS using Apache Parquet and Apache Avro," said Felix Zheng, Lead of Cloud Real-Time Computing Service Technology at Tencent.

"As cloud infrastructure becomes more sophisticated, data analysis and computing solutions gradually begin to build data lake platforms based on cloud object storage and computing resources," said Li Wei, Technical Lead on Data Lake Analytics, at Alibaba Cloud. "Apache Hudi is a very good incremental storage engine that helps users manage the data in the data lake in an open way and accelerate users' computing and analysis."

"Apache Hudi is a key building block for the Hopsworks Feature Store, providing versioned features, incremental and atomic updates to features, and indexed time-travel queries for features," said Jim Dowling, CEO/Co-Founder at Logical Clocks. "The graduation of Hudi to a top-level Apache project is also the graduation of the open-source data lake from its earlier data swamp incarnation to a modern ACID-enabled, enterprise-ready data platform."

"Hudi's graduation to a top-level Apache project is a result of the efforts of many dedicated contributors in the Hudi community," said Jennifer Anderson, Senior Director of Platform Engineering at Uber. "Hudi is critical to the performance and scalability of Uber's big data infrastructure. We're excited to see it gain traction and achieve this major milestone."

"Thus far, Hudi has started a meaningful discussion in the industry about the wide gaps between data warehouses and data lakes. We have also taken strides to bridge some of them, with the help of the Apache community," added Chandar. "But, we are only getting started with our deeply technical roadmap. We certainly look forward to a lot more contributions and collaborations from the community to get there. Everyone’s invited!"

Catch Apache Hudi in action at Virtual Berlin Buzzwords 7-12 June 2020, as well as at MeetUps, and other events.

Availability and Oversight
Apache Hudi software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache Hudi, visit http://hudi.apache.org/ and https://twitter.com/apachehudi 

About the Apache Incubator
The Apache Incubator is the primary entry path for projects and codebases wishing to become part of the efforts at The Apache Software Foundation. All code donations from external organizations and existing external projects enter the ASF through the Incubator to: 1) ensure all donations are in accordance with the ASF legal standards; and 2) develop new communities that adhere to our guiding principles. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF. For more information, visit http://incubator.apache.org/ 

About The Apache Software Foundation (ASF)
Established in 1999, The Apache Software Foundation (ASF) is the world’s largest Open Source foundation, stewarding 200M+ lines of code and providing more than $20B+ worth of software to the public at 100% no cost. The ASF’s all-volunteer community grew from 21 original founders overseeing the Apache HTTP Server to 765 individual Members and 206 Project Management Committees who successfully lead 350+ Apache projects and initiatives in collaboration with 7,600 Committers through the ASF’s meritocratic process known as "The Apache Way". Apache software is integral to nearly every end user computing device, from laptops to tablets to mobile devices across enterprises and mission-critical applications. Apache projects power most of the Internet, manage exabytes of data, execute teraflops of operations, and store billions of objects in virtually every industry. The commercially-friendly and permissive Apache License v2 is an Open Source industry standard, helping launch billion dollar corporations and benefiting countless users worldwide. The ASF is a US 501(c)(3) not-for-profit charitable organization funded by individual donations and corporate sponsors including Aetna, Alibaba Cloud Computing, Amazon Web Services, Anonymous, Baidu, Bloomberg, Budget Direct, Capital One, CarGurus, Cerner, Cloudera, Comcast, Facebook, Google, Handshake, Huawei, IBM, Indeed, Inspur, Leaseweb, Microsoft, Pineapple Fund, Red Hat, Target, Tencent, Union Investment, Verizon Media, and Workday. For more information, visit http://apache.org/ and https://twitter.com/TheASF 

© The Apache Software Foundation. "Apache", "Hudi", "Apache Hudi", "Hadoop", "Apache Hadoop", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.

# # #

Wednesday May 13, 2020

The Apache Software Foundation Announces the 10th Anniversary of Apache® HBase™

Open Source distributed, scalable Big Data store celebrates a decade of processing zettabytes of data across highly scalable large tables for the Apache Hadoop ecosystem 

Wakefield, MA —13 May 2020— The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today the tenth Anniversary of Apache® HBase, the distributed, scalable data store for the Apache Hadoop Big Data ecosystem.

"The success of Apache HBase is the success of Open Source," said Duo Zhang, Vice President of Apache HBase. "Ten years after graduating as a TLP, HBase is still among the most active projects at the ASF. We have hundreds of contributors all around the world. We speak different languages, we have different skills, but we all work together to make HBase better and better. Ten year anniversary is not the end, but a new beginning, I believe our strong community will lead the project to a bright future."

HBase originated at Powerset in 2006 as an Open Source system to run on Apache Hadoop’s Distributed File System (HDFS), similar to how BigTable ran on top of the Google File System. In 2007, a significant code contribution was added to the Apache Hadoop codebase and was integrated into the Apache Hadoop 0.15.0 release later that year. Development on HBase continued as a sub-project of Apache Hadoop, and graduated as an Apache Top-Level Project (TLP) in April 2010.

An Open Source, versioned, non-relational database, Apache HBase provides low latency random access to very large tables —billions of rows and millions of columns— atop clusters of non-specialized, commodity hardware. HBase reads, writes, and processes structured, semi-structured, and unstructured data in real-time environments.

Apache HBase is in use at thousands of organizations, including Adobe, Airbnb, Alibaba, Bloomberg, Flipkart, Huawei, HP, Hubspot, IBM, Microsoft, NetEase, Pinterest, Salesforce, Shopee, Tencent, Twitter, Xiaomi, and Yahoo! (now Verizon Media), among others.

Testimonials

"Congratulations on the 10th birthday of Apache HBase! Alibaba started to use HBase since January 2011 and has witnessed its growth and come along with the community through the years. The Apache HBase community has always been an open and powerful team that produced many stable, production-ready and widely used versions. Today at Alibaba, we have HBase clusters with more than 10k nodes serving hundreds of petabytes of data, as well as  more than 1,000 enterprise HBase users on Alibaba Cloud. We will continue collaborating with and contributing to the HBase community and wish us all ongoing success in future!"
—Chunhui Shen and Yu Li, members of the HBase team at Alibaba

"I have worked with Apache HBase for many years and I think it is a great product. it does what it says on the tin so to speak. Ironically if you look around the NoSQL competitors, most of them are supported by start-ups, whereas HBase is only supported as part of Apache suite of products by vendors like Cloudera, Hortonworks, MapR, etc. For those who would prefer to use SQL on top, there is Apache Phoenix around which makes life easier for the most SQL-savvy world to work on HBase: problem solved. For TCO, HBase is still value for money compared to others. You don't need expensive RAM or SSD with HBase. That makes it easy to onboard it in no time. Also HBase can be used in a variety of different business applications, whereas other commercial ones  are focused on narrower niche markets. Least but last happy 10th anniversary and hope HBase will go from strength to strength and we will keep using it for years to come!"
—Dr. Mich Talebzadeh, Chief Data Architect, Big Data

"Congratulations on the 10th anniversary of Apache HBase! Xiaomi started to use HBase in 2012, when our business started booming. Many key Xiaomi products and services, as well as Xiaomi's data analytics platform, require a new system to provide quick and random access to billions of rows of structured and semi-structured data. Traditional solutions are not able to handle the large volume of data brought by the quickly increasing Xiaomi user base. Among several available options, we choose HBase not only because it provides a rich set of features and excellent performance specs, but also because it has a very active, open and friendly community. Embracing open source has been part of Xiaomi's engineering culture, and our deep involvement in the development of Apache HBase demonstrates the best practices of Xiaomi's open source strategy. In the past several years, we have contributed tons of bug fixes and important features to HBase, and, in the meantime, we have contributed 9 committers and 3 PMC members to the HBase community. Looking forward, we will continue to work closely with the Apache HBase community to help the project grow, and we wish Apache HBase a wonderful future!"
—Dr. Baoqiu Cui, Vice President of Xiaomi Corporation and Technical Committee Chairman

“Congratulations on the 10th anniversary of Apache HBase, it’s great to see how the project has developed over the years and continues to have good community support around it! Salesforce has a large global footprint of Apache HBase in production storing multiple petabytes of customer data and serving several billions of queries per day for a wide variety of use cases including security, monitoring, collaboration portals, and performance caches to scale over RDBMS limitations. HBase has played a major role in Salesforce’s customer success in the BigData storage space and we continue to invest in it as one of the pillars of our multi-substrate database strategy along with Apache Phoenix for SQL access to data stored in HBase. We have contributed many features and bug fixes to HBase over the last several years, and we look forward to continue working with the Apache HBase community to develop the project further. Here’s to many more successful years for Apache HBase!”
—Sanjeev Lakshmanan, Senior Director, Software Development, Salesforce

“Happy 10th Apache HBase! It was around 8 years ago that we started looking at HBase to include as part of our Hosted Big Data Services stack. Fast-forward to today and it continues to be a critical offering in our stack, powering a diverse set of use cases and workloads such as ad targeting, content personalization, analytics, security, monitoring, etc. HBase enables these diverse workloads thanks to it’s high-scalability, feature set and performance, all of which have been continuously refined through the years. In turn our footprint continues to grow storing petabytes of data across thousands of machines. Our success is in part thanks to the project’s success as we benefit from our collaborations, the contributions and other efforts by the community (eg mailing list, meetups, HBaseCon, etc). This is a testament to the open, friendly and dedicated community around Apache HBase which is necessary for the success of any open source project. We wish the project continued success for years to come as we continue to collaborate with and be part of the community cultivating the project.”
—Francis Liu and Thiruvel Thirumoolan,  HBase Big Data Team Members, Yahoo! (now Verizon Media)

“Congratulations on the 10th anniversary of Apache HBase! It’s great to see how this project has evolved from a big data project to one that runs business critical systems and continues to accelerate with a growing community and increasing pace of development! Cloudera has over 500 customers in production using it for a range of use cases ranging from mission critical transactional applications to supporting data warehousing. Our largest customers have footprints in excess of 7,000 nodes storing over 70PB of data. Our customers choose HBase because of its resilience with some customers able to realize 100% application uptime using HBase (over the past 3 years). We plan to continue to invest in HBase (and Apache Phoenix) to ensure that we can continue to both broaden support for a variety of hybrid transactional and analytical use cases and deepen support for existing use cases. Here's to many more successful years!"
—Arun C. Murthy, Chief Product Officer, Cloudera

“Many Congratulations to the Apache HBase community on the 10th anniversary. Apache HBase provides rich functions and excellent performance, and has an open and friendly community. Huawei started using HBase since 2010: HBase is widely used by multiple solutions of Huawei running on more than 10,000 nodes, storing hundreds of PBs data to meet our requirements. Huawei FusionInsight provides the Best Practices of Huawei for HBase, which serves a lot of customers across many industries such as finance, operators, government, energy, medical, manufacturing, and transportation. Meanwhile, Huawei team members contributed a lot of bug fixes and features to HBase, successfully hosted the first HBase Asia Technology Conference HBaseCon Asia 2017 at Shenzhen. Going forward, Huawei will continue to work closely with the Apache HBase community to promote community development.”
—Wei Zhi, Kai Mo and Pankaj Kumar, members of the HBase team at Huawei

“Happy 10th anniversary, HBase! At Ultra Tendency, you have been the backbone of our Dual Lambda Streaming Architecture for many years! You have served billions of queries to our customers without interruption and at low latency. Your architecture guaranteed that you were always there when we needed you, never letting us or our customers down. You are the reason why our European clients today are running flourishing new business models backed by low-latency streaming products. Our committers and contributors will continue to fix bugs and provide feature enhancements. Ultra Tendency wishes you a bright and successful future!”
—Jan Hentschel, Chief Information Officer, Ultra Tendency

“Congratulations on the 10th anniversary of Apache HBase, I can't believe it's been 10 years since the first day when I tried to use Apache HBase and its ecosystem to help the business and company. Also, it is so great to see many colleagues and friends work, discuss, cooperate together to make this system become better. Some of them also make great career development and some are still progress. Shopee, one of the biggest e-commerce platforms in Southeast Asia, has several large Apache HBase clusters in production to support businesses that depend on several billions of queries per day. Apache HBase has played a significant role in Shopee and it is still in expansion along with the business growth of Shopee. Apache HBase, as well as the community, helps us a lot and we also will continue to make contributions to Apache HBase. Looking forward to keeping working with the Apache HBase community to develop the project and its ecosystem further.”
—Li Luo, Manager of Data Infra department, Shopee

”At Microsoft, our mission is to empower every person and every organization on the planet to achieve more, and it’s this mission that drives our commitment to open source. Congratulations to the Apache HBase community on its 10th anniversary. Microsoft has been part of the vibrant HBase community since 2014, today we are proud to serve the numerous enterprise customers across industries who are leveraging HBase in Azure HDInsight for their most critical business applications.”
—Tomas Talius, Director of Engineering, Azure Data Services, Microsoft

Availability and Oversight
Apache HBase software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache HBase, visit http://hbase.apache.org/ and https://twitter.com/HBase 

About The Apache Software Foundation (ASF)
Established in 1999, The Apache Software Foundation (ASF) is the world’s largest Open Source foundation, stewarding 200M+ lines of code and providing more than $20B+ worth of software to the public at 100% no cost. The ASF’s all-volunteer community grew from 21 original founders overseeing the Apache HTTP Server to 813 individual Members and 206 Project Management Committees who successfully lead 350+ Apache projects and initiatives in collaboration with 7,600+ Committers through the ASF’s meritocratic process known as "The Apache Way". Apache software is integral to nearly every end user computing device, from laptops to tablets to mobile devices across enterprises and mission-critical applications. Apache projects power most of the Internet, manage exabytes of data, execute teraflops of operations, and store billions of objects in virtually every industry. The commercially-friendly and permissive Apache License v2 is an Open Source industry standard, helping launch billion dollar corporations and benefiting countless users worldwide. The ASF is a US 501(c)(3) not-for-profit charitable organization funded by individual donations and corporate sponsors including Aetna, Alibaba Cloud Computing, Amazon Web Services, Anonymous, ARM, Baidu, Bloomberg, Budget Direct, Capital One, CarGurus, Cerner, Cloudera, Comcast, Facebook, Google, Handshake, Huawei, IBM, Indeed, Inspur, Leaseweb, Microsoft, ODPi, Pineapple Fund, Private Internet Access, Red Hat, Target, Tencent, Union Investment, Verizon Media, and Workday. For more information, visit http://apache.org/ and https://twitter.com/TheASF 


© The Apache Software Foundation. "Apache", "HBase", "Apache HBase", "Hadoop", "Apache Hadoop", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.

# # #

Tuesday February 19, 2019

The Apache® Software Foundation Announces Apache Arrow™ Momentum

Open Source Big Data in-memory columnar layer adopted by dozens of Open Source and commercial technologies; exceeded 1,000,000 monthly downloads within first three years as an Apache Top-Level Project

Wakefield, MA —19 February 2019— The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, today announced momentum with Apache® Arrow™, the Open Source Big Data in-memory columnar layer.

Since the founding of the project in January 2016, Apache Arrow has quickly become the defacto standard for representing and processing analytical data in memory, accelerating analytical processing and interchange by more than 100x.

"When we became a Top-Level Project, we projected that the majority of the world's data will be processed through Arrow within the next decade," said Jacques Nadeau, Vice President of Apache Arrow. "In just three years time, we are proud to see Arrow's substantial industry adoption and increased value across a wide range of analytical, machine learning, and artificial intelligence workloads."

Highlights of Apache Arrow's success include:

Industry Adoption —more than 20 major technologies adopted Arrow to accelerate in-memory analytics, including Apache Spark, NVIDIA RAPIDS, pandas, and Dremio, among others. A list of known Open Source and commercial implementations can be found at https://arrow.apache.org/powered_by/

Millions of Downloads —leveraging and integrating Apache Arrow into many other technologies has bolstered downloads to more than 1,000,000 each month.

New Language Support —as a cross-language development platform, supporting multiple programming languages is paramount. Apache Arrow has grown from supporting one language to eleven different languages today; they include C++, Java, Python, R, C#, Javascript, and Ruby, among others.

Seamless Data Format Support —Arrow supports different data types, both simple and nested, located in arbitrary memory such as regular system RAM, memory-mapped files or on-GPU memory. In addition, it can ingest data from popular storage formats such as Apache Parquet, CSV files, Apache ORC, JSON, and more.

Major Code Donations —Apache Arrow's new features and expanded functionality are due in part to code and component donations that include:
  • C# Library
  • Gandiva LLVM-based Expression Compiler
  • Go Library
  • Javascript Library
  • Plasma Shared Memory Object Store
  • Ruby Libraries (Apache Arrow and Apache Parquet)
  • Rust Libraries (Parquet and DataFusion Query Engine)
Community and Contributor Growth —over the past 12 months, nearly 300 individuals have submitted more than 3,000 contributions that have grown the Apache Arrow code base by 300,000 lines of code. The Arrow community is welcoming approximately 10 new contributors each month.


In January the project announced its most recent release, Apache Arrow 0.12.0, which reflects more than 600 enhancements developed during Q4 2018. The Apache Arrow community is actively working on a number of impactful new initiatives that include solving high performance analytical problems and allowing for more efficient data distribution across entire clusters.

"Apache Arrow's rapid industry adoption and developer community growth supports our original thesis of the importance of a language-independent open standard for columnar data," said Wes McKinney, member of the Apache Arrow Project Management Committee, and creator of Python's pandas project. "Additionally, we are seeing productive collaborations take place not only between programming languages but also between the database systems and data science worlds. We look forward to welcoming more data system developers into our community."

About Apache Arrow
Apache Arrow is a cross-language development platform for in-memory data. It specifies a standardized language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware. It also provides computational libraries and zero-copy streaming messaging and interprocess communication. Languages currently supported include C, C++, C#, Go, Java, JavaScript, MATLAB, Python, R, Ruby, and Rust.

Availability and Oversight
Apache Arrow software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache Arrow, visit http://arrow.apache.org/

About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees more than 350 leading Open Source projects, including Apache HTTP Server --the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 730 individual Members and 7,000 Committers across six continents successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official global conference series. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Aetna, Alibaba Cloud Computing, Anonymous, ARM, Baidu, Bloomberg, Budget Direct, Capital One, Cerner, Cloudera, Comcast, Facebook, Google, Handshake, Hortonworks, Huawei, IBM, Indeed, Inspur, LeaseWeb, Microsoft, Oath, ODPi, Pineapple Fund, Pivotal, Private Internet Access, Red Hat, Target, Tencent, Union Investment, and Workday. For more information, visit http://apache.org/ and https://twitter.com/TheASF

© The Apache Software Foundation. "Apache", "Arrow", "Apache Arrow", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.

# # #

Wednesday January 23, 2019

The Apache Software Foundation Announces Apache® Hadoop® v3.2.0

Pioneering Open Source distributed enterprise framework powers US$166B Big Data ecosystem

Wakefield, MA —23 January 2019— The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, today announced Apache® Hadoop® v3.2.0, the latest version of the Open Source software framework for reliable, scalable, distributed computing.

Now in its 11th year, Apache Hadoop is the foundation of the US$166B Big Data ecosystem (source: IDC) by enabling data applications to run and be managed on large hardware clusters in a distributed computing environment. "Apache Hadoop has been at the center of this big data transformation, providing an ecosystem with tools for businesses to store and process data on a scale that was unheard of several years ago," according to Accenture Technology Labs.

"This latest release unlocks the powerful feature set the Apache Hadoop community has been working on for more than nine months," said Vinod Kumar Vavilapalli, Vice President of Apache Hadoop. "It further diversifies the platform by building on the cloud connector enhancements from Apache Hadoop 3.0.0 and opening it up for deep learning use-cases and long-running apps."

Apache Hadoop 3.2.0 highlights include:
  • ABFS Filesystem connector —supports the latest Azure Datalake Gen2 Storage;
  • Enhanced S3A connector —including better resilience to throttled AWS S3 and DynamoDB IO;
  • Node Attributes Support in YARN —helps to tag multiple labels on the nodes based on its attributes and supports placing the containers based on expression of these labels;
  • Storage Policy Satisfier  —supports HDFS (Hadoop Distributed File System) applications to move the blocks between storage types as they set the storage policies on files/directories; 
  • Hadoop Submarine —enables data engineers to easily develop, train and deploy deep learning models (in TensorFlow) on very same Hadoop YARN cluster;
  • C++ HDFS client —helps to do async IO to HDFS which helps downstream projects such as Apache ORC;
  • Upgrades for long running services —supports in-place seamless upgrades of long running containers via YARN Native Service API (application program interface) and CLI (command-line interface).

"This is one of the biggest releases in Apache Hadoop 3.x line which brings many new features and over 1,000 changes," said Sunil Govindan, Apache Hadoop 3.2.0 release manager. "We are pleased to announce that Apache Hadoop 3.2.0 is available to take your data management requirements to the next level. Thanks to all our contributors who helped to make this release happen."

Apache Hadoop is widely deployed at numerous enterprises and institutions worldwide, such as Adobe, Alibaba, Amazon Web Services, AOL, Apple, Capital One, Cloudera, Cornell University, eBay, ESA Calvalus satellite mission, Facebook, foursquare, Google, Hortonworks, HP, Huawei, Hulu, IBM, Intel, LinkedIn, Microsoft, Netflix, The New York Times, Rackspace, Rakuten, SAP, Tencent, Teradata, Tesla Motors, Twitter, Uber, and Yahoo. The project maintains a list of educational and production users, as well as companies that offer Hadoop-related services at https://wiki.apache.org/hadoop/PoweredBy

Global Knowledge hails, "...the open-source Apache Hadoop platform changes the economics and dynamics of large-scale data analytics due to its scalability, cost effectiveness, flexibility, and built-in fault tolerance. It makes possible the massive parallel computing that today's data analysis requires."

Hadoop is proven at scale: Netflix captures 500+B daily events using Apache Hadoop. Twitter uses Apache Hadoop to handle 5B+ sessions a day in real time. Twitter’s 10,000+ node cluster processes and analyzes more than a zettabyte of raw data through 200B+ tweets per year. Facebook’s cluster of 4,000+ machines that store 300+ petabytes is augmented by 4 new petabytes of data generated each day. Microsoft uses Apache Hadoop YARN to run the internal Cosmos data lake, which operates over hundreds of thousands of nodes and manages billions of containers per day.

Transparency Market Research recently reported that the global Hadoop market is anticipated to rise at a staggering 29% CAGR with a market valuation of US$37.7B by the end of 2023.

Apache Hadoop remains one of the most active projects at the ASF: it ranks #1 for Apache project repositories by code commits, and is the #5 repository by size (3,881,797 lines of code).

"The Apache Hadoop community continues to go from strength to strength in further driving innovation in Big Data," added Vavilapalli. "We hope that developers, operators and users leverage our latest release in fulfilling their data management needs."

Catch Apache Hadoop in action at the Strata conference, 25-28 March 2019 in San Francisco, and dozens of Hadoop MeetUps held around the world, including on 30 January 2019 at LinkedIn in Sunnyvale, California.

Availability and Oversight
Apache Hadoop software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache Hadoop, visit http://hadoop.apache.org/ and https://twitter.com/hadoop

About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees more than 350 leading Open Source projects, including Apache HTTP Server --the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 730 individual Members and 7,000 Committers across six continents successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official global conference series. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Aetna, Alibaba Cloud Computing, Anonymous, ARM, Baidu, Bloomberg, Budget Direct, Capital One, Cerner, Cloudera, Comcast, Facebook, Google, Handshake, Hortonworks, Huawei, IBM, Indeed, Inspur, LeaseWeb, Microsoft, Oath, ODPi, Pineapple Fund, Pivotal, Private Internet Access, Red Hat, Target, Tencent, and Union Investment. For more information, visit http://apache.org/ and https://twitter.com/TheASF

© The Apache Software Foundation. "Apache", "Hadoop", "Apache Hadoop", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.

# # #

Tuesday January 08, 2019

The Apache Software Foundation Announces Apache® Airflow™ as a Top-Level Project

Open Source Big Data workflow management system in use at Adobe, Airbnb, Etsy, Google, ING, Lyft, PayPal, Reddit, Square, Twitter, and United Airlines, among others.

Wakefield, MA —8 January 2019— The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today Apache® Airflow™ as a Top-Level Project (TLP).

Apache Airflow is a flexible, scalable workflow automation and scheduling system for authoring and managing Big Data processing pipelines of hundreds of petabytes. Graduation from the Apache Incubator as a Top-Level Project signifies that the Apache Airflow community and products have been well-governed under the ASF's meritocratic process and principles.

"Since its inception, Apache Airflow has quickly become the de-facto standard for workflow orchestration," said Bolke de Bruin, Vice President of Apache Airflow. "Airflow has gained adoption among developers and data scientists alike thanks to its focus on configuration-as-code. That has gained us a community during incubation at the ASF that not only uses Apache Airflow but also contributes back. This reflects Airflow’s ease of use, scalability, and power of our diverse community; that it is embraced by enterprises and start-ups alike, allows us to now graduate to a Top-Level Project."

Apache Airflow is used to easily orchestrate complex computational workflows. Through smart scheduling, database and dependency management, error handling and logging, Airflow automates resource management, from single servers to large-scale clusters. Written in Python, the project is highly extensible and able to run tasks written in other languages, allowing integration with commonly used architectures and projects such as AWS S3, Docker, Apache Hadoop HDFS, Apache Hive, Kubernetes, MySQL, Postgres, Apache Zeppelin, and more. Airflow originated at Airbnb in 2014 and was submitted to the Apache Incubator March 2016.

Apache Airflow is in use at more than 200 organizations, including Adobe, Airbnb, Astronomer, Etsy, Google, ING, Lyft, NYC City Planning, Paypal, Polidea, Qubole, Quizlet, Reddit, Reply, Solita, Square, Twitter, and United Airlines, among others. A list of known users can be found at https://github.com/apache/incubator-airflow#who-uses-apache-airflow

"Adobe Experience Platform is built on cloud infrastructure leveraging open source technologies such as Apache Spark, Kafka, Hadoop, Storm, and more," said Hitesh Shah, Principal Architect of Adobe Experience Platform. "Apache Airflow is a great new addition to the ecosystem of orchestration engines for Big Data processing pipelines. We have been leveraging Airflow for various use cases in Adobe Experience Cloud and will soon be looking to share the results of our experiments of running Airflow on Kubernetes." 

"Our clients just love Apache Airflow. Airflow has been a part of all our Data pipelines created in past 2 years acting as the ring-master and taming our Machine Learning and ETL Pipelines," said Kaxil Naik, Data Engineer at Data Reply. "It has helped us create a Single View for our client's entire data ecosystem. Airflow's Data-aware scheduling and error-handling helped automate entire report generation process reliably without any human-intervention. It easily integrates with Google Cloud (and other major cloud providers) as well and allows non-technical personnel to use it without a steep learning curve because of Airflow’s configuration-as-a-code paradigm."

"With over 250 PB of data under management, PayPal relies on workflow schedulers such as Apache Airflow to manage its data movement needs reliably," said Sid Anand, Chief Data Engineer at PayPal. "Additionally, Airflow is used for a range of system orchestration needs across many of our distributed systems: needs include self-healing, autoscaling, and reliable [re-]provisioning."

"Since our offering of Apache Airflow as a service in Sept 2016, a lot of big and small enterprises have successfully shifted all of their workflow needs to Airflow," said Sumit Maheshwari, Engineering Manager at Qubole. "At Qubole, not only are we a provider, but also a big consumer of Airflow as well. For example, our whole Insight and Recommendations platform is built around Airflow only, where we process billions of events every month from hundreds of enterprises and generate insights for them on big data solutions like Apache Hadoop, Apache Spark, and Presto. We are very impressed by the simplicity of Airflow and ease at which it can be integrated with other solutions like clouds, monitoring systems or various data sources."

"At ING, we use Apache Airflow to orchestrate our core processes, transforming billions of records from across the globe each day," said Rob Keevil, Data Analytics Platform Lead at ING WB Advanced Analytics. "Its feature set, Open Source heritage and extensibility make it well suited to coordinate the wide variety of batch processes we operate, including ETL workflows, model training, integration scripting, data integrity testing, and alerting. We have played an active role in Airflow development from the onset, having submitted hundreds of pull requests to ensure that the community benefits from the Airflow improvements created at ING.  We are delighted to see Airflow graduate from the Apache Incubator, and look forward to see where this exciting project will be taken in future!"

"We saw immediately the value of Apache Airflow as an orchestrator when we started contributing and using it," said Jarek Potiuk, Principal Software Engineer at Polidea. "Being able to develop and maintain the whole workflow by engineers is usually a challenge when you have a huge configuration to maintain. Airflow allows your DevOps to have a lot of fun and still use the standard coding tools to evolve your infrastructure. This is 'infrastructure as a code' at its best."

"Workflow orchestration is essential to the (big) data era that we live in," added de Bruin. "The field is evolving quite fast and the new data thinking is just starting to make an impact. Apache Airflow is a child of the data era and therefore very well positioned, and is also young so a lot of development can still happen. Airflow can use bright minds from scientific computing, enterprises, and start-ups to further improve it. Join the community, it is easy to hop on!"

Availability and Oversight
Apache Airflow software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache Airflow, visit http://airflow.apache.org/ and https://twitter.com/ApacheAirflow

About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees more than 350 leading Open Source projects, including Apache HTTP Server --the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 730 individual Members and 7,000 Committers across six continents successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Aetna, Alibaba Cloud Computing, Anonymous, ARM, Baidu, Bloomberg, Budget Direct, Capital One, Cerner, Cloudera, Comcast, Facebook, Google, Handshake, Hortonworks, Huawei, IBM, Indeed, Inspur, LeaseWeb, Microsoft, Oath, ODPi, Pineapple Fund, Pivotal, Private Internet Access, Red Hat, Target, Tencent, and Union Investment. For more information, visit http://apache.org/ and https://twitter.com/TheASF

© The Apache Software Foundation. "Apache", "Airflow", "Apache Airflow", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.

# # #

Wednesday December 12, 2018

The Apache Software Foundation Announces Apache® Griffin™ as a Top-Level Project

Open Source Big Data quality solution in use at eBay, Expedia, Huawei, JD.com, Meituan, PayPal, Pingan Bank, PPDAI, VIP.com, VMWare, and more.

Wakefield, MA —12 December 2018— The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today Apache® Griffin™ as a Top-Level Project (TLP).

Apache Griffin is a robust Open Source Big Data quality solution for distributed data systems at any scale. It provides a unified process to measure data quality from different perspectives, as well as building and validating trusted data assets in both streaming or batch contexts. Griffin originated at eBay and entered the Apache Incubator in December 2016.

"We are very proud of Griffin reaching this important milestone," said William Guo, Vice President of Apache Griffin. "By actively improving Big Data quality, Griffin helps build trusted data assets, therefore boosting your confidence in your business." 

Apache Griffin enables data scientists/analysts to handle data quality issues by:
  • Defining –specifying data quality requirements such as accuracy, completeness, timeliness, profiling, etc.;

  • Measuring –source data ingested into the Griffin computing cluster will apply data quality measurement based on user-defined requirements; and

  • Applying Metrics –data quality reports as metrics will be exported to designated destination.

In addition, Griffin allows users to easily onboard new requirements into the platform and write comprehensive logic to further define their data quality. 

Apache Griffin is in use in high volume, high demand environments at 163.com/Netease, eBay, Expedia, Huawei, JD.com, Meituan, PayPal, Pingan Bank, PPDAI, VIP.com, and VMWare, among others.

"eBay contributed Griffin to the Apache Incubator in December 2016 to ensure its future development in a community-driven manner. It started with the idea on how eBay could address the data quality issue across multiple systems, especially in streaming context," said Vivian Tian, VP of eBay, GM - China Center of Excellence. "Griffin brings data quality solution to data ecosystem and ensure data applications have a solid quality foundation. We are extremely happy to see Griffin graduate as an Apache Top Level Project, and look forward to continued innovation and collaboration with the Apache community."

"We have been using Apache Griffin for about two years, monitoring 1000+ tables with data quality metrics, and are very happy to see it graduate to a Top-Level Project," said Chao Zhu, Senior Director at VIPshop Finance. "Apache Griffin and its data quality DSL can help us easily identify data quality issues instantly on our big data platform. In addition, Apache Griffin's architecture is highly extensible. We are looking forward to using it in real time data quality management system. We also look forward to contribute some of our minor enhancement to Griffin back to the community."

"We appreciate the Griffin project which really helps so much in our daily data jobs.After years of struggling with the complexity of data quality issues, we turned to Apache Griffin for a new platform that would simplify our data quality pipeline," said Jianfeng Liu, Director of Real-time Data Department at PPDAI. "Because of Apache Griffin's unified model for both batch and stream processing, we've been able to replace legacy systems with one solution that works seamlessly in our production environment. Griffin DSLs have allowed us to dramatically simplify our pipeline and to reduce our efforts a lot. I'm very proud and excited to see that the project is graduating."

"Apache Griffin is one of the best data quality solutions which my team has been used so far. It has been an exciting journey seeing the Griffin community evolve rapidly. And many people iteratively adopting it and contributing to newer capabilities," said Austin Sun, Senior Engineering Manager, Enterprise Service Platform at PayPal. "In PayPal risk domain, we benefit a lot from Apache Griffin to provide high quality data to make precise decision and protect our customer. In addition to PayPal risk, I knew there are several other corporates also leverages core capability from Griffin as their data quality solution. It’s my great honor to witness Griffin grows to a top level project. Way to go, Griffin."

"Apache Griffin project is yet another showcase how community over code can work for projects coming out from internal usages of companies into the open source," said Henry Saputra, ASF member and Incubator Mentor for Apache Griffin. "I am proud to be the part of the projects and mentors for the project when it was being contributed from eBay, in addition to several other projects already donated to ASF such as Apache Kylin and Eagle. The team has worked tremendously hard to adapt the Apache Way, and also shown great respect for the open source community in all the processes for design, development, and release processes.As a Top-Level Project I believe the PMC will help lead the project to much more success in the future."

"Graduation is not the end, it is the beginning of another journey. We hope to take Apache Griffin to the next level with a wider set of features and users," added Guo. "We welcome anyone to join our efforts by helping with product design, documentation, code, technical discussions or promoting Apache Griffin in The Apache Way."

Availability and Oversight
Apache Griffin software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache Griffin, visit http://griffin.apache.org/ and https://twitter.com/apachegriffin

About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees more than 350 leading Open Source projects, including Apache HTTP Server --the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 730 individual Members and 6,800 Committers across six continents successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Aetna, Alibaba Cloud Computing, Anonymous, ARM, Baidu, Bloomberg, Budget Direct, Capital One, Cerner, Cloudera, Comcast, Facebook, Google, Handshake, Hortonworks, Huawei, IBM, Indeed, Inspur, LeaseWeb, Microsoft, Oath, ODPi, Pineapple Fund, Pivotal, Private Internet Access, Red Hat, Target, Tencent, and Union Investment. For more information, visit http://apache.org/ and https://twitter.com/TheASF

© The Apache Software Foundation. "Apache", "Griffin", "Apache Griffin", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.

# # #

Thursday August 23, 2018

The Apache Software Foundation Announces Apache® HAWQ® as a Top-Level Project

Advanced Big Data query engine and analytic database in use at Alibaba, Haier, VMWare, ZTESoft, and hundreds more.

Wakefield, MA —23 August 2018— The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today Apache® HAWQ® as a Top-Level Project (TLP).

Apache HAWQ is an advanced enterprise SQL-on-Hadoop query engine and analytic database. It combines the key technological advantages of MPP database with the scalability and convenience of Apache Hadoop. HAWQ reads data from and writes data to HDFS natively, delivers industry-leading performance and linear scalability, and provides users with a complete, standards compliant SQL interface.

"We are very excited to see Apache HAWQ graduate as a Top-Level Project and we would like to thank our Incubation mentors for all their help," said Dr. Lei Chang, Vice President of Apache HAWQ. "This is a huge milestone that reflects the collective contributions from the growing global community to deliver a world-class SQL engine for analytics."

HAWQ operates natively in Apache Hadoop to provide users the tools to confidently and successfully interact with petabyte-range data sets. Features include:
  • Exceptional performance: parallel processing architecture delivers high performance throughput and low latency —potentially near real time— query responses that can scale to petabyte-sized datasets;
  • Robust ANSI SQL compliance: leverage familiar skills. Achieve higher levels of compatibility for SQL-based applications and BI/data visualization tools. Execute complex queries and joins, including roll-ups and nested queries; and 
  • Apache Hadoop ecosystem integration: integrate and manage with Apache YARN. Provision with Apache Ambari. Interface with Apache HCatalog. Supports Apache Parquet, Apache HBase, and others. Easily scales nodes up or down to meet performance or capacity requirements.

Apache HAWQ is in use at Alibaba, Haier, VMware, ZTESoft, and hundreds of users around the world.

"We admire Apache HAWQ's flexible framework and ability to scale up in a Cloud ecosystem. HAWQ helps those seeking a heterogeneous computing system to handle ad-hoc queries and heavy batch workloads," said Kuien Liu, Computing Platform Architect at Alibaba. "Alibaba encourages more and more engineers to continue to embrace Open Source, and Apache HAWQ stands out as a star project. We are proud to have been collaborating with this community since 2015."

"Haier Group has deployed clusters of more than 30 nodes in the production environment from the very beginning of HAWQ," said Xiaoliang Wu, Big Data Architect at Haier. "We use HAWQ as an ad-hoc query and batch computation engine in areas such as social network services and IOT. Because of its superior scalability and stability, HAWQ greatly improves development efficiency and reduces operation and maintenance costs. We believe that Apache HAWQ is a very competitive product in the SQL-On-Hadoop field."

"We have been using Apache HAWQ at VMware for 4 years now," said Dominie Jacob, Lead Big Data Engineer at VMware Inc. "It is easy to manage and scale using Apache Ambari, and easy to provision and attach more nodes based on demand. Being virtualized, it is easy to provision and attach more nodes based on demand. In our BI Big Data world, HAWQ is the primary database for accessing the Hadoop datasets, building models, and executing predictive model workflows. HAWQ is working seamlessly with billions of records, thousands of Tables/Functions/Tableau-Reports, and hundreds of users. The demand for HAWQ is increasing. As VMware always encourages us to pick up and contribute back to Open Source technologies, we would love to collaborate with this community and see more enhancements. In our BI space, HAWQ is one of the top priorities."

"Apache HAWQ is an attractive technology for Big Data applications," said Zixu Zhao, Architect at ZTESoft. "HAWQ serves as the foundation of our Big Data platform and it has been used in a lot of applications, such as interactive analytics and BI on telecom data. We congratulate HAWQ on becoming an Apache Top-Level Project."

"Becoming an Apache Top-Level Project is an important milestone," added Chang. "There is much work ahead of us, and we look forward to growing the HAWQ community and codebase."

Catch Apache HAWQ in action at ApacheCon North America 24-27 September 2018 http://apachecon.com/acna18/ .

Availability and Oversight

Apache HAWQ software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache HAWQ, visit http://hawq.apache.org/ and https://twitter.com/ApacheHAWQ .

About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees more than 350 leading Open Source projects, including Apache HTTP Server --the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 730 individual Members and 6,800 Committers across six continents successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Aetna, Anonymous, ARM, Bloomberg, Budget Direct, Capital One, Cerner, Cloudera, Comcast, Facebook, Google, Hortonworks, Huawei, IBM, Indeed, Inspur, LeaseWeb, Microsoft, Oath, ODPi, Pineapple Fund, Pivotal, Private Internet Access, Red Hat, Target, and Union Investment. For more information, visit http://apache.org/ and https://twitter.com/TheASF


© The Apache Software Foundation. "Apache", "HAWQ", "Apache HAWQ", "Ambari", "Apache Ambari", "Hadoop", "Apache Hadoop", "HBase", "Apache HBase", "HCatalog", "Apache HCatalog", "Parquet", "Apache Parquet", "YARN", "Apache YARN", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.

# # #

Wednesday January 10, 2018

The Apache Software Foundation Announces Apache® Trafodion™ as a Top-Level Project

Mature Big Data database management system for working in SQL at Apache Hadoop-scale levels in use China Mobile, China Unicom, Dell EMC, Esgyn Corporation, and Millersoft Limited, among others.

Forest Hill, MD —10 January 2018— The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today that Apache® Trafodion™ has graduated from the Apache Incubator to become a Top-Level Project (TLP), signifying that the project's community and products have been well-governed under the ASF's meritocratic process and principles.

Apache Trafodion extends Apache Hadoop to guarantee transactional integrity and operational workloads for new kinds of Big Data applications that run on Hadoop.

 "We are very excited to have been established as an Apache Top-Level Project," said Pierre Smits, Vice President of Apache Trafodion. "Graduation is a terrific milestone that culminates 2.5 years of contributions from around the globe to establishing a growing community committed to delivering a high-grade OLTP solution on top of the Apache Hadoop ecosystem."

Building on the scalability, elasticity, and flexibility of Hadoop, Trafodion (meaning "transactions" in Welsh) is the first integrated Open Source solution that delivers on the promise of integrated transactional and analytical systems (OLTP/OLAP) for Apache Hadoop. Trafodion's features include:
  • Fully functional ANSI SQL support, leveraging existing SQL skills;
  • Distributed ACID data protection, guaranteeing data consistency across multiple tables and rows;
  • Compile-Time and Run-Time Optimizers, delivering performance improvements for OLTP workloads;
  • Parallel-aware Query Optimizer, supporting large data sets;
  • Apache Spark integration, supporting streaming analysis;
  • Interoperability with existing Apache Hadoop tools and solutions, such as Hive, Ambari, Flume, Kafka, and Oozie; and 
  • Apache Hadoop and Linux distribution neutrality.

Trafodion originated at HP-IT in 2013, and was donated to the Apache Incubator in May 2015. The project has had four official releases since entering the Apache Incubator. 

Apache Trafodion is in use at China Mobile, China Unicom, Dell EMC, Esgyn Corporation, and Millersoft Limited, among others.

"As a member of the HP Core Team responsible for releasing Trafodion to The Apache Software Foundation, and responsible for the project’s name, I'm thrilled to see the Trafodion community be recognized with this major achievement. Congratulations to all who made it possible," said Ken Holt, COO at Esgyn Corporation. "Trafodion is the heart of EsgynDB, and the community is like its lifeblood — we at Esgyn are committed to continue to grow and support the community."

"Congratulations to the Trafodion community for becoming an Apache Top-Level Project," said Tianduo Gao, Senior Development Engineer of Software Technology (Suzhou) at China Mobile. "We are planning to use Trafodion to expand the business of China Mobile's Big Data platform: our data statistics of 4G real-time business in the country and provinces are more efficient than ever before."

"Becoming a core Apache Project is a major step forward for Trafodion. It will give Millersoft the confidence to introduce the technology to our Big Data clients," said Calum Miller, Director of Millersoft Limited. "Testing of our Open Source Data Vault engine running on top of Apache Trafodion is going well and we look forward to announcing a fully integrated product shortly."

"Apache Trafodion enhanced the operational efficiency of our Big Data platforms, and brought us better customer experience and broader application scenarios," said Charles Yu, Managing Director, Application Services at Dell EMC.

"Congratulations to Trafodion for officially becoming part of the Apache open source ecosystem," said Qingquan Gu, Senior Development Engineer of Internet of Things Marketing Service Center at China Unicom. "Using Trafodion provided China Unicom with the ability to build and integrate Big Data platforms, enhanced our operational efficiency, and brought us better customer experience."

"Becoming an Apache Top-Level Project is only the beginning," added Smits. "We are looking forward to growing the Trafodion community, reaching new adopters and contributors, and fostering a strong ecosystem around the project."

Availability and Oversight
Apache Trafodion software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache Trafodion, visit http://trafodion.apache.org/ and https://twitter.com/Trafodion

About the Apache Incubator
The Apache Incubator is the entry path for projects and codebases wishing to become part of the efforts at The Apache Software Foundation. All code donations from external organizations and existing external projects wishing to join the ASF enter through the Incubator to: 1) ensure all donations are in accordance with the ASF legal standards; and 2) develop new communities that adhere to our guiding principles. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF. For more information, visit http://incubator.apache.org/

About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees more than 350 leading Open Source projects, including Apache HTTP Server --the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 680 individual Members and 6,300 Committers across six continents successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Alibaba Cloud Computing, ARM, Bloomberg, Budget Direct, Capital One, Cash Store, Cerner, Cloudera, Comcast, Facebook, Google, Hewlett Packard, Hortonworks, Huawei, IBM, Inspur, iSIGMA, ODPi, LeaseWeb, Microsoft, PhoenixNAP, Pivotal, Private Internet Access, Red Hat, Serenata Flowers, Target, Union Investment, WANdisco, and Yahoo. For more information, visit http://apache.org/ and https://twitter.com/TheASF

© The Apache Software Foundation. "Apache", "Trafodion", "Apache Trafodion", "Hadoop", "Apache Hadoop", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.

# # #

Thursday December 14, 2017

The Apache Software Foundation Announces Apache® Hadoop® v3.0.0 General Availability

Ubiquitous Open Source enterprise framework maintains decade-long leading role in $100B annual Big Data market

Forest Hill, MD —14 December 2017— The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, today announced Apache® Hadoop® v3.0.0, the latest version of the Open Source software framework for reliable, scalable, distributed computing.

Over the past decade, Apache Hadoop has become ubiquitous within the greater Big Data ecosystem by enabling firms to run and manage data applications on large hardware clusters in a distributed computing environment.

"This latest release unlocks several years of development from the Apache community," said Chris Douglas, Vice President of Apache Hadoop. "The platform continues to evolve with hardware trends and to accommodate new workloads beyond batch analytics, particularly real-time queries and long-running services. At the same time, our Open Source contributors have adapted Apache Hadoop to a wide range of deployment environments, including the Cloud."

"Hadoop 3 is a major milestone for the project, and our biggest release ever," said Andrew Wang, Apache Hadoop 3 release manager. "It represents the combined efforts of hundreds of contributors over the five years since Hadoop 2. I'm looking forward to how our users will benefit from new features in the release that improve the efficiency, scalability, and reliability of the platform."

Apache Hadoop 3.0.0 highlights include:
  • HDFS erasure coding —halves the storage cost of HDFS while also improving data durability;
  • YARN Timeline Service v.2 (preview) —improves the scalability, reliability, and usability of the Timeline Service;
  • YARN resource types —enables scheduling of additional resources, such as disks and GPUs, for better integration with machine learning and container workloads;
  • Federation of YARN and HDFS subclusters transparently scales Hadoop to tens of thousands of machines;
  • Opportunistic container execution improves resource utilization and increases task throughput for short-lived containers. In addition to its traditional, central scheduler, YARN also supports distributed scheduling of opportunistic containers; and 
  • Improved capabilities and performance improvements for cloud storage systems such as Amazon S3 (S3Guard), Microsoft Azure Data Lake, and Aliyun Object Storage System.

Hadoop 3.0.0 has already undergone extensive testing and integration with the broader Open Source ecosystem at The Apache Software Foundation. With this release, its community of developers and users promote this release series out of beta.

Apache Hadoop is widely deployed at numerous enterprises and institutions worldwide, such as Adobe, Alibaba, Amazon Web Services, AOL, Apple, Capital One, Cloudera, Cornell University, eBay, ESA Calvalus satellite mission, Facebook, foursquare, Google, Hortonworks, HP, Hulu, IBM, Intel, LinkedIn, Microsoft, Netflix, The New York Times, Rackspace, Rakuten, SAP, Tencent, Teradata, Tesla Motors, Twitter, Uber, and Yahoo. The project maintains a list of known users at https://wiki.apache.org/hadoop/PoweredBy

"It's tremendous to see this significant progress, from the raw tool of eleven years ago, to the mature software in today's release," said Doug Cutting, original co-creator of Apache Hadoop. "With this milestone, Hadoop better meets the requirements of its growing role in enterprise data systems.  The Open Source community continues to respond to industrial demands."

Apache Hadoop's diverse community enjoys continued growth amongst the ASF's most active projects, and remains at the forefront of more than three dozen Apache Big Data projects.

Apache Hadoop committer history

Apache Hadoop has received countless awards, including top prizes at the Media Guardian Innovation Awards and Duke's Choice Awards, and has been hailed by industry analysts:

"...the lifeblood of organizational analytics…" —Gartner

"Hadoop Is Here To Stay" —Forrester

"...today Hadoop is the only cost-sensible and scalable open source alternative to commercially available Big Data management packages. It also becomes an integral part of almost any commercially available Big Data solution and de-facto industry standard for business intelligence (BI)." —MarketAnalysis.com/Market Research Media

"...commanding half of big data’s $100 billion annual market value...Hadoop is the go-to big data framework." —BigDataWeek.com

"Hadoop, and its associated tools, is currently the 'big beast' of the big data world and the Hadoop environment is undergoing rapid development..." —Bloor Research


"The opportunity to effect meaningful, even fundamental change in the Apache Hadoop project remains open," added Douglas. "Our new contributors uprooted the project from its historical strength in Web-scale analytics by introducing powerful, proven abstractions for data management, security, containerization, and isolation. Apache Hadoop drives innovation in Big Data by growing its community. We hope this latest release continues to draw developers, operators, and users to the ASF."

Catch Apache Hadoop in action at the Strata Data Conference in San Jose, CA, 5-8 March 2018, and at dozens of Hadoop Meetups held around the world.

Availability and Oversight
Apache Hadoop software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache Hadoop, visit http://hadoop.apache.org/

About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees more than 350 leading Open Source projects, including Apache HTTP Server —the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 680 individual Members and 6,300 Committers successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Alibaba Cloud Computing, ARM, Bloomberg, Budget Direct, Capital One, Cash Store, Cerner, Cloudera, Comcast, Facebook, Google, Hortonworks, Huawei, IBM, Inspur, iSIGMA, ODPi, LeaseWeb, Microsoft, PhoenixNAP, Pivotal, Private Internet Access, Red Hat, Serenata Flowers, Target, Union Investment, WANdisco, and Yahoo. For more information, visit http://www.apache.org/ and https://twitter.com/TheASF

© The Apache Software Foundation. "Apache", "Hadoop", "Apache Hadoop", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.

# # #

Calendar

Search

Hot Blogs (today's hits)

Tag Cloud

Categories

Feeds

Links

Navigation