The Apache Software Foundation Blog

Tuesday January 10, 2017

The Apache Software Foundation Announces Apache® Beam™ as a Top-Level Project

Unified programming model for batch and streaming Big Data processing, handling data of any scale, and providing portability across multiple execution engines and environments.

Forest Hill, MD —10 January 2017— The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today that Apache® Beam™ has graduated from the Apache Incubator to become a Top-Level Project (TLP), signifying that the project's community and products have been well-governed under the ASF's meritocratic process and principles.

Apache Beam is a unified programming model for both batch and streaming data processing. It includes software development kits in Java and Python for defining the data processing pipelines, as well as runners to execute them on several execution engines, including Apache Apex, Apache Flink, Apache Spark, and Google Cloud Dataflow.

"Graduation is an exciting milestone for Apache Beam," said Davor Bonaci, Vice President of Apache Beam. "Becoming a top-level project is a recognition of the amazing growth of the Apache Beam community, both in terms of size and diversity. Together we are pushing forward the state of the art in distributed data processing and, at the same time, enhancing the ability to interconnect additional storage/messaging systems and execution engines."

The technology behind Apache Beam evolved in large part from Google's internal work on data processing, tracing its roots all the way back to the Google's initial MapReduce system and its fundamental changes to the science of distributed data processing. It also reflects modern advances in data processing, embodied in Google's FlumeJava and MillWheel systems, and culminating with the unified programming model of Google Cloud Dataflow, which became the heart of Apache Beam.

This unified programming model can easily and intuitively express data processing pipelines for everything from simple batch-based data ingestion to complex event-time-based stream processing. The abstractions in the model are designed to support efficient parallel execution, while also cleanly separating the user's processing logic from details of the underlying engine.

Raising the level of abstraction allows a single Apache Beam pipeline to run, without modification, on multiple execution engines. This portability across diverse execution engines is just one of many extensibility points that let Apache Beam integrate with the broader Apache and Big Data ecosystems. Beside runners, developers can already easily add support for additional IO connectors, libraries of transformations, SDKs, and even domain-specific extensions.

"Apache Beam helps us make stream processing accessible to a broad audience of data engineers, by offering an API which is comprehensive, easy to reason about and at the same time fully decoupled from the underlying execution engine," said Assaf Pinhasi, Director of Big Data Platform at PayPal. "Our data engineers can now focus on what they do best – i.e. express their processing pipelines easily, and not have to worry about how these get translated to the complex underlying engine they run on."

"The graduation of Apache Beam as a top-level project is a great achievement and, in the fast-paced Big Data world we live in, recognition of the importance of a unified, portable, and extensible abstraction framework to build complex batch and streaming data processing pipelines," said Laurent Bride, Chief Technology Officer at Talend. "Customers don't like to be locked-in, so they will appreciate the runtime flexibility Apache Beam provides. With four mature runners already available and I'm sure more to come, Beam represents the future and will be a key element of Talend's strategic technology stack moving forward."

"We applaud the Apache Beam working group for its success in creating a unified and consistent platform for building portable data processing pipelines," said Fausto Ibarra, Director of Product Management, Google Cloud Platform. "We believe that we all have a responsibility to share what we're learning, and we are proud and delighted to witness the successful collaboration to build not only a powerful programming model for processing data from bounded and unbounded sources, but also a portability layer for running pipelines on many processing engines, including Apache Spark, Apache Flink, Apache Apex, and Google Cloud Dataflow. Apache Beam's graduation to Top Level Project is a well-deserved recognition for the individuals and companies who contributed to the project."

"Apache Beam represents a principled approach for analyzing data streams, simplifying a range of complex data processing concepts and providing developers with a flexible, straightforward model," said Kostas Tzoumas, Co-founder and Chief Executive Officer at data Artisans. "The Apache Flink community wrote one of the first Beam runners, and those of us at data Artisans has been contributing to the Beam project since its inception."

"The Apache Beam community has quickly adapted the Apache Way and been very welcoming to new contributors and ideas. It also encourages communication across other projects that collaborate under the Beam umbrella," said Thomas Weise, Vice President of Apache Apex, and Chief Technology Officer/Co-Founder of Atrato. "Beam helps the wider ecosystem by establishing common terminology and well thought through concepts that reflect in multiple runners and even the native API of the underlying engines."

"In my work at Apache, I have rarely seen an incubating project build a community as well as the Apache Beam project has done," said Ted Dunning, Vice President of Apache Incubator, and Chief Application Architect at MapR Technologies. "The way that they have been able to complement and enhance other streaming data projects is really a credit to everyone involved."

"We'd like to invite you to consider joining us on this exciting ride, whether as a user or a contributor, as we work towards our first release with API stability," added Bonaci. "If you'd like to try out Apache Beam today, check out the latest 0.4.0 release. We welcome contribution and participation from anyone through our mailing lists, issue tracker, pull requests, and events."

Catch Apache Beam in action at numerous face-to-face meetups and conferences, including Apache: Big Data North America 2017, DataWorks Summit and Hadoop Summit Munich 2017, Strata + Hadoop World San Jose and London 2017.

Availability and Oversight
Apache Beam software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For project updates, downloads, documentation, and ways to become involved with Apache Beam, visit https://beam.apache.org/ and @ApacheBeam.

About the Apache Incubator
The Apache Incubator is the entry path for projects and codebases wishing to become part of the efforts at The Apache Software Foundation. All code donations from external organizations and existing external projects wishing to join the ASF enter through the Incubator to: 1) ensure all donations are in accordance with the ASF legal standards; and 2) develop new communities that adhere to our guiding principles. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF. For more information, visit http://incubator.apache.org/

About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees more than 350 leading Open Source projects, including Apache HTTP Server --the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 620 individual Members and 5,900 Committers successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Alibaba Cloud Computing, ARM, Bloomberg, Budget Direct, Capital One, Cash Store, Cerner, Cloudera, Comcast, Confluent, Facebook, Google, Hortonworks, HP, Huawei, IBM, InMotion Hosting, iSigma, LeaseWeb, Microsoft, OPDi, PhoenixNAP, Pivotal, Private Internet Access, Produban, Red Hat, Serenata Flowers, Target, WANdisco, and Yahoo. For more information, visit http://www.apache.org/ and https://twitter.com/TheASF

© The Apache Software Foundation. "Apache", "Beam", "Apache Beam", "Apache Apex", "Apex", "Apache Flink", "Flink", "Apache Spark", "Spark", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.

# # #

The Apache Software Foundation Announces Apache® Eagle™ as a Top-Level Project

Intelligent Big Data monitoring and alerting solution in use at high volume, high demand Websites, platforms, and organizations such as eBay, PayPal, Dataguise, and YHD.com, among others.

Forest Hill, MD —10 January 2017— The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today that Apache® Eagle™ has graduated from the Apache Incubator to become a Top-Level Project (TLP), signifying that the project's community and products have been well-governed under the ASF's meritocratic process and principles.

Apache Eagle is an Open Source monitoring and alerting solution for instantly identifying security and performance issues on Big Data platforms such as Apache Hadoop, Apache Spark, and more.

"We are proud to complete the incubation process and graduate as an Apache Top-Level Project," said Edward Zhang, Vice President of Apache Eagle. "The community is actively improving product coverage for analyzing various performance and security issues in large Hadoop clusters."

Eagle was first developed at eBay to solve the monitoring problem for a large scale Hadoop cluster. The eBay team soon realized it would be useful to the whole community, and submitted the project to the Apache Incubator in October 2015. Since then, the project gained a lot of attraction from various developers and organizations for its broad usage scenarios, such as system/service monitoring, application performance monitoring, and security breach detection.

Apache Eagle features include:
  • Highly extensible - Apache Eagle builds its core framework around the application concept; the application itself includes the logic for monitoring source data collection, pre-processing and normalization. Developers can easily develop out-of-box monitoring applications using Eagle's application framework, and deploy into Eagle.
  • Scalable - the project’s fundamental runtime is based on proven Big Data technologies, and applies a scalable core to make it adaptive according to the throughput of the data stream as well as the number of monitored applications.
  • Real-time - provides state-of-the-art alert engine to identify security breaches and performance issues.
  • Dynamic - users can freely enable or disable a monitoring application and dynamically change their alert policies without any impact to the underlying runtime.

"It is exciting to see increasing deployments of Apache Eagle, along with great use cases and contributions back to the project," added Zhang.

"Apache Eagle is a highly scalable and extensible technology platform to support the ever growing needs of intelligent monitoring and alerting in a massively distributed computing environment," said Debashis Saha, CTO and EVP at Jiff Inc. "As the founding executive sponsor of this project at eBay, I am proud to see the community continue to expand the capabilities by supporting complex and diverse use cases for monitoring in security, infrastructure, networking and distributed services in Apache Eagle. Congratulations to the team and the community in graduating to a Apache top level project."

"As a leader in data-centric security with a focus on cloud and Big Data technologies, Dataguise is proud to be part of the Eagle committers group. DgSecure Monitor, our sensitivity-aware monitoring product, uses Apache Eagle as the core engine," said Subra Ramesh, VP of Products and Engineering at Dataguise Inc. "Apache Eagle's flexible architecture, proven scalability, and  cutting-edge design, have enabled DgSecure Monitor to be a highly responsive and scalable solution for both on-premises and cloud deployments. We look forward to continued involvement with Eagle as it has now become a top-level Apache project."

"We have been using Apache Eagle for about a year, and are very happy to see it graduate to a Top-Level Project. Apache Eagle and its low latency real-time alert engine can help us easily identify security and performance issues instantly on Hadoop platform," said Anson Zhong, Senior Vice President of Tech Department at YHD.com. "In addition, Eagle's architecture is highly extensible. We are looking forward to using it in real time risk management system."

"Apache Eagle is a great monitoring and alerting solution designed for large-scale distributed environment," said Chad Chun, Director of Analytics Data Infrastructure at eBay. "It was originally intended for security monitoring and quickly become a generic solution for allowing domain experts to create their own monitoring applications on top of Eagle. This is a wonderful design for easily leveraging the power of community to create and share applications. Looking forward to the tremendous adoption in the industry."

"The Apache Eagle community has done a tremendous job throughout the incubation process, and I'm thrilled to see it graduate to a Top-Level Project," said P. Taylor Goetz, ASF Member and Apache Eagle Project Mangement Committee member. "Eagle fills a very important role in providing top-notch security and performance monitoring and alerting for Big Data deployments. The Eagle project has built a robust, sustainable community and demonstrated a firm understanding of the Apache Way. I look forward to further innovation as the Eagle community marks this important milestone."

"It is great to see Apache Eagle graduate to a Top Level Project within a year of time," said Seshu Adunuthula, Senior Director of Data Platforms at eBay. "It is a great product with unique position to fill the gap of monitoring and alerting large-scale distributed computing environment which is well architected to allow communities to easily implement monitoring and alerting applications on different technical domains such as networking and database clusters.  I would love to see the community to grow fast in the next coming years!"

The project welcomes contributions and community participation through mailing lists, Slack channel, face-to-face Meetups, and other events.

Availability and Oversight
Apache Eagle software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For project updates, downloads, documentation, and ways to become involved with Apache Eagle, visit http://eagle.apache.org and @TheApacheEagle.

About the Apache Incubator
The Apache Incubator is the entry path for projects and codebases wishing to become part of the efforts at The Apache Software Foundation. All code donations from external organizations and existing external projects wishing to join the ASF enter through the Incubator to: 1) ensure all donations are in accordance with the ASF legal standards; and 2) develop new communities that adhere to our guiding principles. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF. For more information, visit http://incubator.apache.org

About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees more than 350 leading Open Source projects, including Apache HTTP Server --the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 620 individual Members and 5,900 Committers successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Alibaba Cloud Computing, ARM, Bloomberg, Budget Direct, Capital One, Cash Store, Cerner, Cloudera, Comcast, Confluent, Facebook, Google, Hortonworks, HP, Huawei, IBM, InMotion Hosting, iSigma, LeaseWeb, Microsoft, OPDi, PhoenixNAP, Pivotal, Private Internet Access, Produban, Red Hat, Serenata Flowers, Target, WANdisco, and Yahoo. For more information, visit http://www.apache.org/ and https://twitter.com/TheASF

© The Apache Software Foundation. "Apache", "Eagle", "Apache Eagle", "Apache Hadoop", "Hadoop", "Apache Spark", "Spark", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.

# # #


Calendar

Search

Hot Blogs (today's hits)

Tag Cloud

Categories

Feeds

Links

Navigation