Entries tagged [big]

Wednesday May 17, 2017

The Apache Software Foundation Announces Apache® Beam™ v2.0.0

Open Source unified programming model for batch and streaming Big Data processing in use at Google Cloud, PayPal, and Talend, among others.

Forest Hill, MD —17 May 2017— The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today the availability of Apache® Beam™ v2.0.0, the first stable release of the unified programming model for both batch and streaming Big Data processing.

An Apache Top-Level Project (TLP) since December 2016, Beam includes Java and Python software development kits used to define data processing pipelines and runners to execute them on Apache Apex, Apache Flink, Apache Spark, and Google Cloud Dataflow, among other execution engines.

Apache Beam has its roots in Google's internal work on data processing over the last decade, evolving from the initial MapReduce system, through FlumeJava and MillWheel, into Google Cloud Dataflow v1.x, which defined the unified programming model that became the heart of Apache Beam.

"The first stable release is an important milestone for the Apache Beam community," said Davor Bonaci, Vice President of Apache Beam. "This is a statement from the community that it intends to maintain API stability with all releases for the foreseeable future, making Beam suitable for enterprise deployment."

Apache Beam v2.0.0 improves user experience across the project, focusing on seamless portability across execution environments, including engines, operating systems, on-premise clusters, cloud providers, and data storage systems. Other highlights include:
  • API stability and future compatibility within this major version;
  • Stateful data processing paradigms that unlock efficient, data-dependent computations;
  • Support for user-extensible file systems, with built-in support for Hadoop Distributed File System, among others; and
  • A metrics subsystem for deeper insight into pipeline execution.

Apache Beam is in use at Google Cloud, PayPal, and Talend, among others.

"Apache Beam is a mature data processing API for the enterprise, with powerful semantics that solve real-world challenges of stream processing," said Tomer Pilossof, Big Data Manager at PayPal. "With Beam, we provide data processing solutions for a wide range of customers within the PayPal organization."

"We at Talend are thrilled to have contributed to Apache Beam reaching the 2.0.0 milestone and its first official stable release," said Laurent Bride, Chief Technology Officer at Talend. "Apache Beam is now part of the foundation of Talend products. Recently, we released Talend Data Preparation for Big Data which leverages Beam to create transformation pipelines that are portable across many execution engines. Later this year, we plan to deliver Talend Data Streams, taking the Apache Beam integration one step further by utilizing its powerful streaming semantics. Whether for batch, streaming, or real-time use cases, Apache Beam is a powerful framework that delivers the flexibility and advanced functionality our customers need."

"We congratulate the Apache Beam community for reaching the key milestone of a first stable release," said William Vambenepe, Lead Product Manager for Big Data, Google Cloud. "We look forward to our Google Cloud Dataflow customers taking full advantage of Beam's powerful programming model and newest features to run their data processing pipelines on Google Cloud."

Apache Beam v2.0.0 is making its debut at Apache: Big Data, taking place this week in Miami, FL, with four sessions featuring Apache Beam. Apache Beam will also be highlighted at numerous face-to-face meetups and conferences, including the Future of Data San Jose meetup, Strata Data Conference London, Berlin Buzzwords, and DataWorks Summit San Jose.

"I'd like to invite everyone to try out Apache Beam v2.0.0 today and consider joining our vibrant community," added Bonaci. "We welcome feedback, contribution and participation through our mailing lists, issue tracker, pull requests, and events."

Availability and Oversight
Apache Beam software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache Beam, visit https://beam.apache.org/ and https://twitter.com/ApacheBeam

About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees more than 350 leading Open Source projects, including Apache HTTP Server -- the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 680 individual Members and 6,000 Committers successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Alibaba Cloud Computing, ARM, Bloomberg, Budget Direct, Capital One, Cash Store, Cerner, Cloudera, Comcast, Confluent, Facebook, Google, Hortonworks, HP, Huawei, IBM, InMotion Hosting, iSigma, LeaseWeb, Microsoft, ODPi, PhoenixNAP, Pivotal, Private Internet Access, Produban, Red Hat, Serenata Flowers, Target, WANdisco, and Yahoo. For more information, visit http://www.apache.org/ and https://twitter.com/TheASF

© The Apache Software Foundation. "Apache", "Beam", "Apache Beam", "Apex", "Apache Apex", "Flink", "Apache Flink", "Spark", "Apache Spark", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.

# # #

Monday May 15, 2017

The Apache Software Foundation Announces Apache® Samza™ v0.13

Open Source Big Data distributed stream processing framework in production at Intuit, LinkedIn, Netflix, Optimizely, Redfin, and Uber, among other organizations.

Forest Hill, MD —15 May 2017— The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today the availability of Apache® Samza™  v0.13, the latest version of the Open Source Big Data distributed stream processing framework.

An Apache Top-Level Project (TLP) since January 2015, Samza is designed to provide support for fault-tolerant, large scale stream processing. Developers use Apache Samza to write applications that consume streams of data and to help organizations understand and respond to their data in real-time. Apache Samza offers a unified API to process streaming data from pub-sub messaging systems like Apache Kafka and batch data from Apache Hadoop.

"The latest 0.13 release takes Apache Samza's data processing capabilities to the next level with multiple new features," said Yi Pan, Vice President of Apache Samza. "It also improves the simplicity and portability of real-time applications."

Apache Samza powers several real-time data processing needs including realtime analytics on user data, message routing, combating fraud, anomaly detection, performance monitoring, real-time communication, and more. Apache Samza can process up to 1.1 million messages per second on a single machine. v0.13 highlights include:
  • A higher level API that developers can use this to express complex processing pipelines on streams more concisely;
  • Support for running Samza applications as a lightweight embedded library without relying on YARN;
  • Support for flexible deployment options; 
  • Support for rolling upgrade of running Samza applications;
  • Improved monitoring and failure detection using a built-in heart beating mechanism;
  • Enabling better integrations with other cluster-manager frameworks and environments; and
  • Several bug-fixes that improve reliability, stability and robustness of data processing,

Organizations such as Intuit, LinkedIn, Netflix, Optimizely, Redfin, TripAdvisor, and Uber rely on Apache Samza to power complex data architectures that process billions of events each day. A list of user organizations is available at https://cwiki.apache.org/confluence/display/SAMZA/Powered+By

"Apache Samza is a highly performant stream/data processing system that has been battle tested over the years of powering mission critical applications in a wide range of businesses," said Kartik Paramasivam, Head of Streams Infrastructure, and Director of Engineering at LinkedIn. "With this 0.13 release, the power of Samza is no longer limited to YARN based topologies. It can now be used in any hosting environment. In addition, it now has a new higher level API that makes it significantly easier to create arbitrarily complex processing pipelines."

"Apache Samza has been powering near real-time use cases at Uber for the last year and a half," said Chinmay Soman, Staff Software Engineer at Uber. "This ranges from analytical use cases such as understanding business metrics, feature extraction for machine learning as well as some critical applications such as Fraud detection, Surge pricing and Intelligent promotions. Samza has been proven to be robust in production and is currently processing about billions of messages per day, accounting for 100s of TB of data flowing through the system." 

"At Optimizely, we have built the world’s leading experimentation platform, which ingests billions of click-stream events a day from millions of visitors for analysis," said Vignesh Sukumar, Senior Engineering Manager at Optimizely. "Apache Samza has been a great asset to Optimizely's Event ingestion pipeline allowing us to perform large scale, real time stream computing such as aggregations (e.g. session computations) and data enrichment on a multiple billion events/day scale. The programming model, durability and the close integration with Apache Kafka fit our needs perfectly."

"It has been a phenomenal experience engaging with this vibrant international community of users and contributors, and I look forward to our continued growth. It is a great time to be involved in the project and we welcome new contributors to the Samza community," added Pan.

Catch Apache Samza in action at Apache: Big Data, 16-18 May 2017 in Miami, FL http://apachecon.com/ , where the community will be showcasing how Samza simplifies stream processing at scale.

Availability and Oversight
Apache Samza software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache Samza, visit http://samza.apache.org/ , https://blogs.apache.org/samza/ , and https://twitter.com/samzastream

About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees more than 350 leading Open Source projects, including Apache HTTP Server --the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 680 individual Members and 6,000 Committers successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Alibaba Cloud Computing, ARM, Bloomberg, Budget Direct, Capital One, Cash Store, Cerner, Cloudera, Comcast, Confluent, Facebook, Google, Hortonworks, HP, Huawei, IBM, InMotion Hosting, iSigma, LeaseWeb, Microsoft, ODPi, PhoenixNAP, Pivotal, Private Internet Access, Produban, Red Hat, Serenata Flowers, Target, WANdisco, and Yahoo. For more information, visit http://www.apache.org/ and https://twitter.com/TheASF

© The Apache Software Foundation. "Apache", "Hadoop", "Apache Hadoop", "Kafka", "Apache Kafka", "Samza", "Apache Samza", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.

# # #

Monday May 01, 2017

The Apache Software Foundation Announces Apache® CarbonData™ as a Top-Level Project

Open Source Big Data analytics accelerator in use at Bank of Communications, Hulu, Huawei, SAIC Motor, Zhejiang Mobile, among others.

Forest Hill, MD –1 May 2017– The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today that Apache® CarbonData™ has graduated from the Apache Incubator to become a Top-Level Project (TLP), signifying that the project's community and products have been well-governed under the ASF's meritocratic process and principles.

Apache CarbonData is an indexed columnar store file format for fast analytics on Big Data platforms (including Apache Hadoop, Apache Spark, among others) to help speed up queries an order of magnitude faster over petabytes of data.

"We are very proud to complete the incubation process and graduate as an Apache Top-Level Project," said Liang Chen, Vice President of Apache CarbonData. "The CarbonData community grew rapidly over last ten months, both in terms of size and diversity. Since entering the Apache Incubator, we have completed 4 releases, and exceeded 90 contributors from 10 different organizations."

With the aim of using a unified file format to satisfy all kinds of data analysis cases, Apache CarbonData seamlessly integrates with Hadoop and Spark to improve Big Data analysis efficiency. In benchmarks, CarbonData's faster interactive query helps in speeding up queries approximately 10x faster than standard column-oriented SQL on Hadoop data stores.

Highlights include:

  • Unique data organization to allow faster filtering and better compression;
  • Multi-level Indexing to enable faster search and speeding up query processing;
  • Deep Apache Spark Integration for dataframe + SQL compliance;
  • Advanced push down optimization to minimize the amount of data being read processed, converted, transmitted, and shuffled;
  • Efficient compression and global encoding schemes to further improve aggregation query performance;
  • Dictionary encoding for reduced storage space and faster processing; and
  • Data update + delete support using standard SQL syntax.


Apache CarbonData is in use at an array of organizations, including Bank of Communications, medical/pharma social platform DXY, Hulu, Huawei, group online retailer MEITUAN, SAIC Motor, Zhejiang Mobile, among others.

"CarbonData has very good performance as a ‘SQL on Hadoop’ solution," said Tan Sheng, Director of SAIC Motor’s Big Data team. "It is suitable for SAIC Motor to adopt as a central Big Data platform component. Not only do we use Apache CarbonData, we also actively participate in its community as contributors." 

"Apache CarbonData is great, as helped our audit business to improve 7-10X performance based on 14 billion rows of data," said Wei Zhao, Senior Engineer at Bank of Communications.

"Apache CarbonData is very suitable for our filter query cases, and has averaged 20x improvement on performance," said William Zhu, Architecture team member at DXY. "And, as CarbonData supports data update and delete, this feature is very useful. We would consider CarbonData as our all-in-one solution to unify all analysis data."

CarbonData was first developed at Huawei in 2013. The project was submitted to the Apache Incubator in June 2016, and had its first official release two months later. The project won top honors in the BlackDuck 2016 Open Source Rookies of the Year's Big Data category.

"Apache CarbonData is a great example of the value of the incubation process," said Jean-Baptiste Onofré, Apache CarbonData Incubator Mentor and Project Management Committee member. "Helping grow the CarbonData developer and user communities has increased our visibility, which allowed us to extend our use cases and tests, and gather new ideas. The initial CarbonData committers did (and are still doing) great work to welcome new users and contributors, clearly understanding it's a step forward for the project."

"We will continue to put our efforts towards optimizing data format efficiency for Big Data ecosystem and provide an unified and high performance data storage solution," added Liang. "The Apache CarbonData community welcomes interested contributors to work with us on our journey forward."

Catch Apache CarbonData in action at ApacheCon (16-18 May/Miami), and Spark Summit (5-7 June/San Francisco).

Availability and Oversight
Apache CarbonData software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache CarbonData, visit http://carbondata.apache.org/ , https://twitter.com/ApacheCarbonDat , and https://www.facebook.com/carbondata/

About the Apache Incubator
The Apache Incubator is the entry path for projects and codebases wishing to become part of the efforts at The Apache Software Foundation. All code donations from external organizations and existing external projects wishing to join the ASF enter through the Incubator to: 1) ensure all donations are in accordance with the ASF legal standards; and 2) develop new communities that adhere to our guiding principles. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF. For more information, visit http://incubator.apache.org/

About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees more than 350 leading Open Source projects, including Apache HTTP Server --the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 620 individual Members and 6,000 Committers successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Alibaba Cloud Computing, ARM, Bloomberg, Budget Direct, Capital One, Cash Store, Cerner, Cloudera, Comcast, Confluent, Facebook, Google, Hortonworks, HP, Huawei, IBM, InMotion Hosting, iSigma, LeaseWeb, Microsoft, ODPi, PhoenixNAP, Pivotal, Private Internet Access, Produban, Red Hat, Serenata Flowers, Target, WANdisco, and Yahoo. For more information, visit http://www.apache.org/ and https://twitter.com/TheASF

© The Apache Software Foundation. "Apache", "CarbonData", "Apache CarbonData", "Hadoop", "Apache Hadoop", "Spark", "Apache Spark", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.

# # # 

Wednesday February 08, 2017

The Apache Software Foundation Announces Apache® Ranger™ as a Top-Level Project

Big Data security management framework for the Apache Hadoop ecosystem in use at ING, Protegrity, and Sprint, among other organizations.

Forest Hill, MD —8 February 2017— The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today that Apache® Ranger™ has graduated from the Apache Incubator to become a Top-Level Project (TLP), signifying that the project's community and products have been well-governed under the ASF's meritocratic process and principles.

The latest addition to the ASF’s more than three dozen projects in Big Data, Apache Ranger is a centralized framework used to define, administer and manage security policies consistently across Apache Hadoop components. Ranger also offers the most comprehensive security coverage, with native support for numerous Apache projects, including Atlas (incubating), HBase, HDFS, Hive, Kafka, Knox, NiFi, Solr, Storm, and YARN. 

"Graduating to a Top-Level Project reflects the maturity and growth of the Ranger Community," said Selvamohan Neethiraj, Vice President of Apache Ranger. "We are pleased to celebrate a great milestone and officially play an integral role in the Apache Big Data ecosystem."

Apache Ranger provides a simple and effective way to set access control policies and audit the data access across the entire Hadoop stack by following industry best practices. One of the key benefits of Ranger is that access control policies can be managed by security administrators from a single place and consistently across hadoop ecosystem. Ranger also enables the community to add new systems for authorization even outside Hadoop ecosystem, with a robust plugin architecture, that can be extended with minimal effort. In addition, Apache Ranger provides many advanced features, such as:
  • Ranger Key Management Service (compatible with Hadoop’s native KMS API to store and manage encryption keys for HDFS Transparent Data Encryption);
  • Dynamic column masking and row filtering;
  • Dynamic policy conditions (such as prohibition of toxic joins);
  • User context enrichers (such as geo-location and time of day mappings); and
  • Classification or tag based policies for Hadoop ecosystem components via integration with Apache Atlas.

"As early adopters of Apache Ranger and having contributed to Apache Ranger, we have come to rely upon Apache Ranger as a key part of our security infrastructure for data," said Ferd Scheepers, Chief Information Architect at ING. "We are therefore pleased to learn that the project has now graduated to a TLP project through the efforts of the Apache community. We believe that Apache Ranger represents the best-in-class Open Source security framework for authorization, encryption management, and auditing across Hadoop ecosystem. We laud the community's efforts in building an extensible and enterprise grade architecture for Apache Ranger, and for innovative features such as tag or classification based security (built in conjunction with Apache Atlas). We congratulate the Apache Ranger community on achieving this significant milestone and are confident Apache Ranger will evolve into the de-facto standard for security stack across the Hadoop ecosystem."

"As heavy users of Apache Ranger in production, we are pleased to see the project become a TLP through validation across community efforts," said Timothy R. Connor, Big Data & Advanced Analytics Manager at Sprint. "Apache Ranger has built a next generation ABAC model for authorization along with a robust data-centric Open Source security framework supporting advanced security capabilities such as dynamic row filtering and column masking. All of these point to Apache Ranger maturing into a robust and comprehensive security product for authorization, encryption management and auditing through the Apache community."

"It's great to see Apache Ranger become a TLP," said Dominic Sartorio, Senior Vice President of Products & Development at Protegrity. "Apache Ranger's comprehensive auditing and broad authorization coverage across the Hadoop ecosystem, along with its highly scalable and extensible architecture and rich set of APIs, integrates very well with Protegrity's fine grained data protection capabilities. Our continued collaboration with the Apache Ranger community will help meet the data security requirements of the next generation of enterprise-grade production Hadoop deployments."

"As organizations entrust their enterprise data to Open Source data platforms such as Apache Hadoop, there is a critical need to use the most innovative techniques to safeguard this data," said Alan Gates, Co-Founder of HortonWorks and Apache Ranger incubation mentor. "Apache Ranger community has taken the original, proprietary code base and used it to build a new and successful Apache project that employs an attribute-based approach to define and enforce authorization policies. This modern approach is a combination of subject, action, resource, and environment and goes beyond role-based access control techniques exclusively based on organizational roles - permissions mapping. It has been a pleasure to be their mentor in this process and help them learn the Apache way."

"More and more users are adopting Apache Ranger to secure data in the Hadoop ecosystem," added Neethiraj. "We look forward to welcoming new Ranger users to our mailing lists and community events."

Availability and Oversight
Apache Ranger software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For project updates, downloads, documentation, and ways to become involved with Apache Ranger, visit https://ranger.apache.org/ and @ApacheRanger.

About the Apache Incubator
The Apache Incubator is the entry path for projects and codebases wishing to become part of the efforts at The Apache Software Foundation. All code donations from external organizations and existing external projects wishing to join the ASF enter through the Incubator to: 1) ensure all donations are in accordance with the ASF legal standards; and 2) develop new communities that adhere to our guiding principles. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF. For more information, visit http://incubator.apache.org/

About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees more than 350 leading Open Source projects, including Apache HTTP Server --the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 620 individual Members and 5,900 Committers successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Alibaba Cloud Computing, ARM, Bloomberg, Budget Direct, Capital One, Cash Store, Cerner, Cloudera, Comcast, Confluent, Facebook, Google, Hortonworks, HP, Huawei, IBM, InMotion Hosting, iSigma, LeaseWeb, Microsoft, OPDi, PhoenixNAP, Pivotal, Private Internet Access, Produban, Red Hat, Serenata Flowers, Target, WANdisco, and Yahoo. For more information, visit http://www.apache.org/ and https://twitter.com/TheASF

© The Apache Software Foundation. "Apache", "Ranger", "Apache Ranger", "HBase", "Apache HBase", "HDFS", "Apache HDFS", "Hive", "Apache Hive", "Kafka", "Apache Kafka", "Knox", "Apache Knox", "NiFi", "Apache NiFi", "Solr", "Apache Solr", "Storm", "Apache Storm", "YARN", "Apache YARN", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.

# # #


Wednesday July 27, 2016

Apache Software Foundation Announces Apache® Twill™ as a Top-Level Project

Open Source abstraction layer over Apache Hadoop® YARN simplifies developing distributed Hadoop applications.

Forest Hill, MD –27 July 2016– The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today that Apache® Twill™ has graduated from the Apache Incubator to become a Top-Level Project (TLP), signifying that the project's community and products have been well-governed under the ASF's meritocratic process and principles.

Apache Twill is an abstraction over Apache Hadoop® YARN that reduces the complexity of developing distributed Hadoop applications, allowing developers to focus more on their application logic.

"The Twill community is excited to graduate from the Apache Incubator to a Top-Level Project," said Terence Yim, Vice President of Apache Twill and Software Engineer at Cask. "We are proud of the innovation, creativity and simplicity Twill demonstrates. We are also very excited to bring a technology so versatile in Hadoop into the hands of every developer in the industry."

Apache Twill provides rich built-in features for common distributed applications for development, deployment, and management, greatly easing Hadoop cluster operation and administration.

"Enterprises use big data technologies - and specifically Hadoop - to drive more value," said Patrick Hunt, member of the Apache Software Foundation and Senior Software Engineer at Cloudera. "Apache Twill helps streamline and reduce complexity of developing distributed applications and its graduation to an Apache Top-Level Project means more people will be able to take advantage of Apache Hadoop YARN more easily."

"This is an exciting and major milestone for Apache Twill," said Keith Turner, member of the Apache Fluo (incubating) Project Management Committee, which used Twill in the development of Fluo, an Open Source project that makes it possible to update the results of a large-scale computation, index, or analytic as new data is discovered. "Early in development, we knew we needed a standard way to launch Fluo across a cluster, and we found Twill. With Twill, we quickly and easily had Fluo running across many nodes on a cluster." 

Apache Twill is in production by several organizations across various industries, easing distributed Hadoop application development and deployment.

Twill originated at Cask in early 2013. After 7 major releases, the project was submitted to the Apache Incubator in November of 2013.

"Apache Twill has come a long way through The Apache Software Foundation, and we're thrilled it has become an ASF Top-Level Project," said Nitin Motgi, CTO of Cask. "Apache Twill has become a key component behind the Cask Data Application Platform (CDAP), using YARN containers and Java threads as the processing abstraction. CDAP is an Open Source integration and application platform that makes it easy for developers and organizations to quickly build, deploy and manage data applications on Apache Hadoop and Apache Spark."

"The Apache Twill community worked extremely well within the incubator environment, developing and collaborating openly to follow The Apache Way," said Henry Saputra, ASF Member and member of the Apache Twill Project Management Committee. "There is a tremendous demand for effective APIs and virtualization for developing big data applications and Apache Twill fills that need perfectly. We’re looking forward to continuing the journey with Apache Twill as a Top-Level Project."

Catch Apache Twill in action at:
  • JavaOne, 18-22 September 2016 in San Francisco
  • Strata+Hadoop World, 27-29 September 2016 in New York City
Availability and Oversight
Apache Twill software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache Twill, visit http://twill.apache.org/ and follow @ApacheTwill

About the Apache Incubator
The Apache Incubator is the entry path for projects and codebases wishing to become part of the efforts at The Apache Software Foundation. All code donations from external organizations and existing external projects wishing to join the ASF enter through the Incubator to: 1) ensure all donations are in accordance with the ASF legal standards; and 2) develop new communities that adhere to our guiding principles. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF. For more information, visit http://incubator.apache.org/

About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees more than 350 leading Open Source projects, including Apache HTTP Server --the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 550 individual Members and 5,300 Committers successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Alibaba Cloud Computing, ARM, Bloomberg, Budget Direct, Cerner, Cloudera, Comcast, Confluent, Facebook, Google, Hortonworks, HP, Huawei, IBM, InMotion Hosting, iSigma, LeaseWeb, Microsoft, OPDi, PhoenixNAP, Pivotal, Private Internet Access, Produban, Red Hat, Serenata Flowers, WANdisco, and Yahoo. For more information, visit http://www.apache.org/ and https://twitter.com/TheASF

©The Apache Software Foundation. "Apache", "Twill", "Apache Twill", "Hadoop", "Apache Hadoop", "Apache Hadoop YARN", "Spark", "Apache Spark", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.

# # #

Tuesday July 26, 2016

The Apache Software Foundation Announces Apache® Kudu™ as a Top-Level Project

Open Source columnar storage engine enables fast analytics across the Internet of Things, time series, cybersecurity, and other Big Data applications in the Apache Hadoop ecosystem

Forest Hill, MD –25 July 2016– The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today that Apache® Kudu™ has graduated from the Apache Incubator to become a Top-Level Project (TLP), signifying that the project's community and products have been well-governed under the ASF's meritocratic process and principles.

Apache Kudu is an Open Source columnar storage engine built for the Apache Hadoop ecosystem designed to enable flexible, high-performance analytic pipelines.

"Under the Apache Incubator, the Kudu community has grown to more than 45 developers and hundreds of users," said Todd Lipcon, Vice President of Apache Kudu and Software Engineer at Cloudera. "Recognizing the strong Open Source community is a testament to the power of collaboration and the upcoming 1.0 release promises to give users an even better storage layer that complements Apache HBase and HDFS."

Optimized for lightning-fast scans, Kudu is particularly well suited to hosting time-series data and various types of operational data. In addition to its impressive scan speed, Kudu supports many operations available in traditional databases, including real-time insert, update, and delete operations. Kudu enables a "bring your own SQL" philosophy, and supports being accessed by multiple different query engines including such other Apache projects as Drill, Spark, and Impala (incubating).

Apache Kudu is in use at diverse companies and organizations across many industries, including retail, online service delivery, risk management, and digital advertising.

"Using Apache Kudu alongside interactive SQL tools like Apache Impala (incubating) has allowed us to deploy a next-generation platform for real-time analytics and online reporting," said Baoqiu Cui, Chief Architect at Xiaomi. "Apache Kudu has been deployed in production at Xiaomi for more than six months and has enabled us to improve key reliability and performance metrics for our customers. Kudu's graduation to a Top-Level Project allows companies like ours to operate a hybrid architecture without complexity. We look forward to continuing to contribute to its success."

"We are already seeing the many benefits of Apache Kudu. In fact we're using its combination of fast scans and fast updates for upcoming releases of our risk solutions," said Cory Isaacson, CTO at Risk Management Solutions, Inc. "Kudu is performing well, and RMS is proud to have contributed to the project’s integration with Apache Spark."

"The Internet of Things, cybersecurity and other fast data drivers highlight the demands that real-time analytics place on Big Data platforms," said Arvind Prabhakar, Apache Software Foundation member and CTO of StreamSets. "Apache Kudu fills a key architectural gap by providing an elegant solution spanning both traditional analytics and fast data access. StreamSets provides native support for Apache Kudu to help build real-time ingestion and analytics for our users."

"Graduation to a Top-Level Project marks an important milestone in the Apache Kudu community, but we are really just beginning to achieve our vision of a hybrid storage engine for analytics and real-time processing," added Lipcon. "As our community continues to grow, we welcome feedback, use cases, bug reports, patch submissions, documentation, new integrations, and all other contributions."

The Apache Kudu project welcomes contributions and community participation through mailing lists, a Slack channel, face-to-face MeetUps, and other events. Catch Apache Kudu in action at Strata + Hadoop World, 26-29 September 2016 in New York. 

Availability and Oversight
Apache Kudu software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For project updates, downloads, documentation, and ways to become involved with Apache Kudu, visit http://kudu.apache.org/ , @ApacheKudu, and http://kudu.apache.org/blog/.

About the Apache Incubator
The Apache Incubator is the entry path for projects and codebases wishing to become part of the efforts at The Apache Software Foundation. All code donations from external organizations and existing external projects wishing to join the ASF enter through the Incubator to: 1) ensure all donations are in accordance with the ASF legal standards; and 2) develop new communities that adhere to our guiding principles. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF. For more information, visit http://incubator.apache.org/

About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees more than 350 leading Open Source projects, including Apache HTTP Server --the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 550 individual Members and 5,300 Committers successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Alibaba Cloud Computing, ARM, Bloomberg, Budget Direct, Cerner, Cloudera, Comcast, Confluent, Facebook, Google, Hortonworks, HP, Huawei, IBM, InMotion Hosting, iSigma, LeaseWeb, Microsoft, OPDi, PhoenixNAP, Pivotal, Private Internet Access, Produban, Red Hat, Serenata Flowers, WANdisco, and Yahoo. For more information, visit http://www.apache.org/ and https://twitter.com/TheASF

© The Apache Software Foundation. "Apache", "Kudu", "Apache Kudu", "Drill", "Apache Drill", "Hadoop", "Apache Hadoop", "Apache Impala (incubating)", "Spark", "Apache Spark", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.

# # #

Wednesday June 29, 2016

The Apache Software Foundation Announces Apache® OODT™ v1.0

Open Source Big Data middleware metadata framework in use at Children's Hospital Los Angeles Virtual Pediatric Intensive Care Unit, DARPA MEMEX and XDATA, NASA Jet Propulsion Laboratory, and the National Cancer Institute, among others.

Forest Hill, MD —29 June 2016— The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today the availability of Apache® OODT™ v1.0, the Big Data middleware metadata framework.

OODT is a grid middleware framework for science data processing, information integration, and retrieval. As "middleware for metadata" (and vice versa), OODT is used for computer processing workflow, hardware and file management, information integration, and linking databases. The OODT architecture allows distributed computing and data resources to be searchable and utilized by any end user.

"Apache OODT 1.0 is a great milestone in this project," said Tom Barber, Vice President of Apache OODT. "Effectively managing data pools has historically been problematic for some users, and OODT addresses a number of the issues faced. v1.0 allows us to prepare for some big changes within the platform with new UI designs for user-facing apps and data flow processing under the hood. It's an exciting time in the data management sector and we believe Apache OODT can be at the forefront of it."

OODT 1.0 signals a stage in the project where the initial scope of the platform is feature- complete and ready for general consumption. v1.0 features include:
  • Data ingestion and processing;
  • Automatic data discovery and metadata extraction;
  • Metadata management;
  • Workflow processing and support; and
  • Resource management

Originally created at NASA Jet Propulsion Laboratory in 1998 as a way to build a national framework for data sharing, OODT has been instrumental to the National Cancer Institute's Early Detection Research Network for managing distributed scientific data sets across 20+ institutions nationwide for more than a decade.

Apache OODT is in use in many scientific data system projects in Earth science, planetary science, and astronomy at NASA, such as the Lunar Mapping and Modeling Project (LMMP), NPOESS Preparatory Project (NPP) Sounder PEATE Testbed, the Orbiting Carbon Observatory-2 (OCO-2) project, and the Soil Moisture Active Passive mission testbed. In addition, OODT is used for large-scale data management and data preparation tasks in the DARPA MEMEX and XDATA efforts, and for supporting research and data analysis within the pediatric intensive care domain in collaboration with Children's Hospital Los Angeles (CHLA) and its Laura P. and Leland K. Whittier Virtual Pediatric Intensive Care Unit (VPICU), among many other applications.

"To watch Apache OODT grow from an internal NASA project to 1.0 where it is today and dozens of releases is an amazing feat. I truly believe having it at the ASF has allowed it to grow and prosper. We are doubling down on our commitment to Apache OODT, investing in its enhancement and use in several national-scale projects," said Chris Mattmann, member of the Apache OODT Project Management Committee, and Chief Architect, Instrument and Science Data Systems Section at NASA JPL. "Apache OODT processes some of the world's biggest data sets, distributes and manages them, and makes sure science happens in a timely and accurate fashion."

OODT entered the Apache Incubator in January 2010, and graduated as a Top-level Project in November 2010. 

Catch Apache OODT in action at ApacheCon Europe, 14-18 November 2016 in Seville, Spain http://apachecon.com/ .

Availability and Oversight
Apache OODT software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache OODT, visit http://oodt.apache.org/ and https://twitter.com/apache_oodt

About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees more than 350 leading Open Source projects, including Apache HTTP Server --the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 550 individual Members and 5,300 Committers successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Alibaba Cloud Computing, ARM, Bloomberg, Budget Direct, Cerner, Cloudera, Comcast, Confluent, Facebook, Google, Hortonworks, HP, Huawei, IBM, InMotion Hosting, iSigma, LeaseWeb, Microsoft, OPDi, PhoenixNAP, Pivotal, Private Internet Access, Produban, Red Hat, Serenata Flowers, WANdisco, and Yahoo. For more information, visit http://www.apache.org/ and https://twitter.com/TheASF

© The Apache Software Foundation. "Apache", "OODT", "Apache OODT", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.

# # #

The Apache Software Foundation Announces Apache® Bahir™ as a Top-Level Project

Bolsters Big Data processing by providing extensions to distributed analytic platforms such as Apache Spark.

Forest Hill, MD –29 June 2016– The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today that Apache® Bahir™ has become a Top-Level Project (TLP).

Apache Bahir bolsters Big Data processing by serving as a home for existing connectors that initiated under Apache Spark, as well as provide additional extensions/plugins for other related distributed system, storage, and query execution systems.

"Apache Bahir is a new community that aims to be a place to curate extensions related to distributed analytic platforms following the Apache Governance," said Luciano Resende, Vice President of Apache Bahir and an Architect at IBM contributing to The Apache Software Foundation for over 10 years. "The project is initially offering a few Apache Spark extensions but it is definitely open for expanding to other platforms such as Apache Beam, Apache Flink and others."

Bahir code is extracted from the Apache Spark project, and has spun out as a standalone project to provide implementations for different Spark related extensions/plugins, connectors, and other pluggable components. Current extensions include:
  •  streaming-akka (akka:Open Source toolkit and runtime simplifying the construction of concurrent and distributed applications on the Java Virtual Machine)
  •  streaming-mqtt (mqtt: lightweight messaging protocol for small sensors and mobile devices, optimized for high-latency or unreliable networks)
  •  streaming-twitter (Twitter: online social networking service; Bahir allows the processing of social data from Twitter)
  •  streaming-zeromq (zeromq: a high-performance asynchronous messaging library, aimed at use in distributed or concurrent applications)

In addition, Apache Bahir has a strong relationship with different storage layers; the project intends to extend that relationship to a number of other ASF projects and Apache-licensed initiatives.

"We are very interested in streaming-mqtt for remote sensing applications and control/monitoring. We have a lot of Big Data needs in Earth science especially in remote and difficult to access environments and plugins such as streaming-mqtt from Bahir provide a readily accessible and Apache-based solution to that," said Chris Mattmann, member of the Apache Bahir Project Management Committee, and Chief Architect, Instrument and Science Data Systems Section at NASA Jet Propulsion Laboratory.

"We are very motivated to increase the size and diversity of the Apache Bahir community," added Resende. "We welcome feedback, use cases, bug reports, patch submissions, code contributions, documentation, new extension proposals, and other ways to participate."

Availability and Oversight
Apache Bahir software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache Bahir, visit http://bahir.apache.org/

About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees more than 350 leading Open Source projects, including Apache HTTP Server --the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 550 individual Members and 5,300 Committers successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Alibaba Cloud Computing, ARM, Bloomberg, Budget Direct, Cerner, Cloudera, Comcast, Confluent, Facebook, Google, Hortonworks, HP, Huawei, IBM, InMotion Hosting, iSigma, LeaseWeb, Microsoft, OPDi, PhoenixNAP, Pivotal, Private Internet Access, Produban, Red Hat, Serenata Flowers, WANdisco, and Yahoo. For more information, visit http://www.apache.org/ and https://twitter.com/TheASF

© The Apache Software Foundation. "Apache", "Bahir", "Apache Bahir", "Spark", "Apache Spark" and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.

# # #

Wednesday May 25, 2016

The Apache Software Foundation Announces Apache® Zeppelin™ as a Top-Level Project

Open Source Big Data analytics and visualization tool for distributed, interactive, and collaborative systems using Apache Flink, Apache Hadoop, Apache Spark, and more.

Forest Hill, MD –25 May 2016– The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today that Apache® Zeppelin™ has graduated from the Apache Incubator to become a Top-Level Project (TLP), signifying that the project's community and products have been well-governed under the ASF's meritocratic process and principles.

Apache Zeppelin is a modern, web-based notebook that enables interactive data analytics. Notebooks help developers, data scientists, and related users to handle data efficiently without worrying about command lines and cluster details.

"The Zeppelin community is pleased to graduate from the Apache Incubator," said Lee Moon Soo, Vice President of Apache Zeppelin. "With 118 worldwide contributors and widespread adoption in numerous commercial products, we are proud to officially be a part of the Apache Big Data ecosystem."

Zeppelin's collaborative data analytics and visualization capabilities makes data exploration, visualization, sharing, and collaboration easy over distributed, general-purpose data processing systems that use Apache Flink, Apache Hadoop, and Apache Spark, among other Big Data platforms.

Apache Zeppelin is:
  • Multi-purpose --features data ingestion, exploration, analysis, visualization, and collaboration;
  • Robust --supports 20+ more backend systems, including Apache Spark, Apache Flink, Apache Hive, Python, R, and any JDBC (Java Database Connectivity);
  • Easy to deploy --built on top of modern Web technologies (provides built-in Apache Spark integration, eliminating the need to build a separate module, plugin, or library);
  • Easy to use --with built-in visualizations and dynamic forms;
  • Flexible --allows users to mix different languages, exchange data between backends, adjust the layout;
  • Extensible --with pluggable architecture for interpreters, notebook storages, authentication, and visualizations (in progress); and
  • Advanced --allows interaction between custom visualizations and cluster resources

"With Apache Zeppelin, a wide range of users can make beautiful data-driven, interactive, and collaborative documents with SQL, Scala, and more," added Soo.

Apache Zeppelin is in use at an array of organizations and solutions, including Amazon Web Services, Hortonworks, JuJu, and Twitter, among others. 

"Congratulations to Apache Zeppelin community on graduation," said Tim Hall, Vice President of Product Management at Hortonworks. "Several members of our team have been working over the past year in the Zeppelin community 
to make it enterprise ready. We are excited to be associated with this community and look forward to helping our customers get the best insights out of their data with Apache Zeppelin."

"Apache Zeppelin is becoming an important tool at Twitter for creating and sharing interactive data analytics and visualizations," said Prasad Wagle, Technical Lead in the Data Platform team at Twitter. "Since it integrates seamlessly with all the popular data analytics engines, it is very easy to create and share reports and dashboards. With its extensible architecture and a vibrant Open Source community, I am looking forward to Apache Zeppelin advancing the state of the art in data analytics and visualization."

"Apache Zeppelin is the major user-facing piece of Memcore’s in-memory data processing Cloud offering. Building a technology stack might be quite exciting engineering challenge, however, if users can’t visualize and work with the data conveniently, it is as good as not having the data at all. Apache Zeppelin enables efficient user acquisition by anyone trying to build new products or service offerings in the Big- and Fast- Data markets, making innovations, collaboration, and development easier for anyone," said Dr. Konstantin Boudnik, Founder and CEO of Memcore.io. "I am very excited to see Apache Zeppelin graduating as an ASF Top Level Project. This shows that more people are joining the community, bringing the project to a new level, and adding more integration points with existing data analytics and transactional software systems. This directly benefits the community at-large."

Apache Zeppelin originated in 2013 at NFLabs as Peloton, a commercial data analytics product. Since entering the Apache Incubator in December 2014, the project has had three releases, and twice participated in Google Summer of Code under the Apache umbrella.

"It was an honor to help with the incubation of Zeppelin," said Ted Dunning, Vice President of the Apache Incubator. "I have been very impressed with the Zeppelin community and the software they have built. I see Apache Zeppelin being adopted all over the place where people need to apply a notebook style to a wide variety of kinds of computing."

Catch Apache Zeppelin in action during Berlin Buzzwords, 7 June 2016 https://s.apache.org/mV8E

Availability and Oversight
Apache Zeppelin software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache Zeppelin, visit http://zeppelin.apache.org/ and https://twitter.com/ApacheZeppelin

About the Apache Incubator
The Apache Incubator is the entry path for projects and codebases wishing to become part of the efforts at The Apache Software Foundation. All code donations from external organizations and existing external projects wishing to join the ASF enter through the Incubator to: 1) ensure all donations are in accordance with the ASF legal standards; and 2) develop new communities that adhere to our guiding principles. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF. For more information, visit http://incubator.apache.org/

About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees more than 350 leading Open Source projects, including Apache HTTP Server --the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 550 individual Members and 5,300 Committers successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Alibaba Cloud Computing, ARM, Bloomberg, Budget Direct, Cerner, Cloudera, Comcast, Confluent, Facebook, Google, Hortonworks, HP, Huawei, IBM, InMotion Hosting, iSigma, LeaseWeb, Microsoft, OPDi, PhoenixNAP, Pivotal, Private Internet Access, Produban, Red Hat, Serenata Flowers, WANdisco, and Yahoo. For more information, visit http://www.apache.org/ and https://twitter.com/TheASF

© The Apache Software Foundation. "Apache", "Zeppelin", "Apache Zeppelin", "Ambari", "Apache Ambari", "Flink", "Apache Flink", "Hadoop", "Apache Hadoop", "Hive", "Apache Hive", "Spark", "Apache Spark", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.

# # #

Monday May 23, 2016

The Apache Software Foundation Announces Apache® TinkerPop™ as a Top-Level Project

Powerful Open Source Big Data graph computing framework in use at Amazon, DataStax, and IBM, among others.

Forest Hill, MD –23 May 2016– The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today that Apache® TinkerPop™ has graduated from the Apache Incubator to become a Top-Level Project (TLP), signifying that the project's community and products have been well-governed under the ASF's meritocratic process and principles.

Apache TinkerPop is a graph computing framework that provides developers the tools required to build modern graph applications in any application domain and at any scale.

"Graph databases and mainstream interest in graph applications have seen tremendous growth in recent years," said Stephen Mallette, Vice President of Apache TinkerPop. "Since its inception in 2009, TinkerPop has been helping to promote that growth with its Open Source graph technology stack. We are excited to now do this same work as a top-level project within the Apache Software Foundation."

As a graph computing framework for both real-time, transactional graph databases (OLTP) and and batch analytic graph processors (OLAP), TinkerPop is useful for working with small graphs that fit within the confines of a single machine, as well as massive graphs that can only exist partitioned and distributed across a multi-machine compute cluster.

TinkerPop unifies these highly varied graph system models, giving developers less to learn, faster time to development, and less risk associated with both scaling their system and avoiding vendor lock-in.

The Power to Process One Trillion Edges
The central component to Apache TinkerPop is Gremlin, a graph traversal machine and language, which makes it possible to write complex queries (called traversals) that can execute either as real-time OLTP queries, analytic OLAP queries, or a hybrid of the two.

Because the Gremlin language is separate from the Gremlin machine, TinkerPop serves as a foundation for any query language to work against any TinkerPop-enabled system. Much like the Java virtual machine is host to Java, Groovy, Scala, Clojure, and the like, the Gremlin traversal machine is already host to Gremlin, SPARQL, SQL, and various host language embeddings in Python, JavaScript, etc. Once a language is compiled to a Gremlin traversal, the Gremlin machine can evaluate it against a graph database or processor. Instantly, languages such as SPARQL can execute across a one thousand node cluster for long running analytic jobs touching large parts of the graph or sub-second queries within a small neighborhood.

Apache TinkerPop is in use at organizations such as DataStax and IBM, among many others. Amazon.com is currently using TinkerPop and Gremlin to process its order fullfillment graph which contains approximately one trillion edges.

The core Apache TinkerPop release provides production-ready, reference implementations of a number of different data systems including Neo4j (OLTP), Apache Giraph (OLAP), Apache Spark (OLAP), and Apache Hadoop (OLAP). However, the bulk of the implementations are maintained within the larger TinkerPop ecosystem. These implementations include commercial and Open Source graph databases and processors, Gremlin language variants for various programming languages on and off the Java Virtual Machine, visualization applications for graph analysis and many other tools and libraries. The TinkerPop ecosystem is richly supported with many options for developers to choose from.

TinkerPop originated in 2009 at the Los Alamos National Laboratory. After two major releases (TinkerPop1 in 2011 and TinkerPop2 in 2012), the project was submitted to the Apache Incubator in January 2015.

"Following in a long line of Apache projects that revolutionized entire industries, starting with with the Apache HTTP Server, continuing with Web Services, search, and Big Data technologies, Apache TinkerPop will no doubt reshape the Graph Computing landscape," said Hadrian Zbarcea, co-Vice President of ASF Fundraising and Incubator Mentor of Apache TinkerPop. "While TinkerPop has just graduated as an ASF Top Level Project, it is already seven years old, a mature technology, backed by a number of vendors, a vibrant community, and absolutely brilliant developers."

The project welcomes those interested in contributing to Apache TinkerPop. For more information, visit http://tinkerpop.apache.org/docs/3.2.0-incubating/dev/developer/#_contributing

Availability and Oversight
Apache TinkerPop software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache TinkerPop, visit http://tinkerpop.apache.org/ and https://twitter.com/apachetinkerpop

About the Apache Incubator
The Apache Incubator is the entry path for projects and codebases wishing to become part of the efforts at The Apache Software Foundation. All code donations from external organizations and existing external projects wishing to join the ASF enter through the Incubator to: 1) ensure all donations are in accordance with the ASF legal standards; and 2) develop new communities that adhere to our guiding principles. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF. For more information, visit http://incubator.apache.org/

About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees more than 350 leading Open Source projects, including Apache HTTP Server --the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 550 individual Members and 5,300 Committers successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Alibaba Cloud Computing, ARM, Bloomberg, Budget Direct, Cerner, Cloudera, Comcast, Confluent, Facebook, Google, Hortonworks, HP, Huawei, IBM, InMotion Hosting, iSigma, LeaseWeb, Microsoft, OPDi, PhoenixNAP, Pivotal, Private Internet Access, Produban, Red Hat, Serenata Flowers, WANdisco, and Yahoo. For more information, visit http://www.apache.org/ and https://twitter.com/TheASF

© The Apache Software Foundation. "Apache", "TinkerPop", "Apache TinkerPop", "Apache HTTP Server", "Giraph", "Apache Giraph", "Hadoop", "Apache Hadoop", "Spark", "Apache Spark" and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.

# # #

Monday April 25, 2016

The Apache Software Foundation Announces Apache® Apex™ as a Top-Level Project

Open Source enterprise-grade unified Big Data stream and batch processing engine for Apache Hadoop in use at GE, Silver Spring Networks, and more.

Forest Hill, MD –25 April 2016– The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today that Apache® Apex™ has graduated from the Apache Incubator to become a Top-Level Project (TLP), signifying that the project's community and products have been well-governed under the ASF's meritocratic process and principles.

Apache Apex is a large scale, high throughput, low latency, fault tolerant, unified Big Data stream and batch processing platform for the Apache Hadoop® ecosystem.

"It is very exciting to see Apex after nearly 4 years since inception becoming an ASF top-level project," said Thomas Weise, Vice President of Apache Apex. "It opens the strong capabilities and potential of the platform to a wider audience and we’re looking forward to a growing community to continue driving innovation in the stream processing space."

Recognized by InfoWorld for its "blazing speed and simplified programmability," Apex works in conjunction with Apache Hadoop YARN, a resource management platform for working with Hadoop clusters.

Apex was originally created at DataTorrent Inc. in 2012 (coinciding with the first alpha release of YARN), and entered the Apache Incubator in August 2015.

Apex enables streaming analytics on Apache Hadoop with an enterprise-grade platform. It has been built to leverage the underlying infrastructure provided by YARN and HDFS (Hadoop Distributed File System), including resource management, multi-tenancy and security. 

Faster to Deployment
Apache Apex meets the demands of today's Big Data applications with real-time reporting, monitoring, and learning with millisecond data point precision. Its pipeline processing architecture can be used for real-time and batch processing in a unified architecture. Apex is highly performant, linearly scalable, fault tolerant, stateful, secure, distributed, easily operable with low latency, no data loss, and exactly-once semantics.

Apex streamlines development and productization of Hadoop applications and lowers the barrier-to-entry by enabling developers to write or re-use generic Java code, minimizing the specialized expertise needed to write Big Data applications. This allows organizations to maximize developer productivity, accelerate development of business logic, and reduce time to market.

"Apache Apex is an example of the latest generation of advanced stream processing software that adds significant technology and capabilities over previous options," said Ted Dunning, Vice President of the Apache Incubator, Apache Apex Incubator Mentor, and Chief Application Architect at MapR Technologies. "That this project came to Apache and is now a fully fledged project is very exciting."

Apex comes with a comprehensive library of reusable operators (functional building blocks) that can be leveraged to quickly create new and non-trivial applications. This also includes connectors to integrate with many external systems that include message buses, databases, file systems and social media feeds. Examples are Apache Cassandra, Apache HBase, JDBC, and Apache Kafka.

"Apache Apex is a battle-hardened technology, processing huge volumes of streaming data at some of the world’s largest enterprise and Internet companies," said technology advisor Eric Baldeschwieler. "Its successful Apache incubation has provided a tremendous boost to Apex, bringing many new members to its community of users and developers."

Enterprise Grade Unified Stream and Batch Processing
Apache Apex use cases include ingestion, fast real-time analytics, data movement, Extract-Transform-Load (ETL), fast batch, alerts, and real-time actions across diverse industries such as programmatic advertising, telecommunications, Internet of Things (IoT), and financial services.

"We are in the process of leveraging Big Data technologies to transform business processes and drive more value," explained Reid Levesque, Head of Solution Engineering at a financial services company. "We chose Apex to help us in this journey to do real-time ingestion and analytics on our various data sources and now we are proud to see it graduate to an Apache top level project."

Apex powers Big Data projects in production at numerous large enterprises such as GE Predix (IoT Cloud platform for industrial data and analytics); PubMatic (marketing automation software platform for publishers), and Silver Spring Networks (IoT solutions for smart cities).

"We at GE Predix data services have used Apex for our data pipeline product and look forward to our continued usage and contribution," said Parag Goradia, Executive Director of Predix Data Services. "We had great experience with Apache Apex and its capabilities. We believe Apex has a bright future as it will continue to solve big problems in the big data industry. We are proud to be associated with this project and excited that it is now in top level status."

"The Apex community has done a great job throughout the incubation process. They have built a robust community and demonstrated a firm understanding of The Apache Way," said P. Taylor Goetz, ASF Member and Apache Apex Incubator Mentor. "I'm pleased to see Apex graduate to a top-level project. These are exciting times in the world of stream processing."

"Congratulations to the Apache Apex community for working successfully through the incubation process and becoming part of the greater Apache Hadoop ecosystem," added Dunning.

Catch Apache Apex in action at:

Availability and Oversight
Apache Apex software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache Apex, visit http://apex.apache.org/ and https://twitter.com/ApacheApex

About the Apache Incubator
The Apache Incubator is the entry path for projects and codebases wishing to become part of the efforts at The Apache Software Foundation. All code donations from external organizations and existing external projects wishing to join the ASF enter through the Incubator to: 1) ensure all donations are in accordance with the ASF legal standards; and 2) develop new communities that adhere to our guiding principles. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF. For more information, visit http://incubator.apache.org/

About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees more than 350 leading Open Source projects, including Apache HTTP Server --the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 550 individual Members and 5,300 Committers successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Alibaba Cloud Computing, ARM, Bloomberg, Budget Direct, Cerner, Cloudera, Comcast, Confluent, Facebook, Google, Hortonworks, HP, Huawei, IBM, InMotion Hosting, iSigma, LeaseWeb, Microsoft, PhoenixNAP, Pivotal, Private Internet Access, Produban, Red Hat, Serenata Flowers, WANdisco, and Yahoo. For more information, visit http://www.apache.org/ and https://twitter.com/TheASF

© The Apache Software Foundation. "Apache", "Apex", "Apache Apex", "Cassandra", "Apache Cassandra", "HBase", "Apache HBase", "Hadoop", "Apache Hadoop", "Kafka", "Apache Kafka", "YARN", "Apache YARN", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.

# # #

Tuesday March 08, 2016

The Apache® Software Foundation announces Apache Flink™ v1.0

Advanced distributed stream processing framework performs 50x faster than other real-time computation systems

Forest Hill, MD —8 March 2016— The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today the availability of Apache® Flink™ v1.0, the advanced Open Source distributed real-time stream processing system.

Apache Flink is an easy to use, yet sophisticated Open Source stream processor, with recent test results clocking in at least 50x faster than other distributed real-time computation systems.

"Releasing Flink 1.0 is the most important milestone in the project since graduation to a top-level Apache project one year ago," said Stephan Ewen, Vice President of Apache Flink and co-founder/CTO of data Artisans. "This is a collective achievement of more than 150 individuals that have contributed code to date."

Under The Hood
Flink uniquely supports a combination of features that include flexible windowing on event time, out-of-order stream handling, high availability, and exactly-once guarantees, together with high event throughput and low processing latency.

V.1.0 furthers Apache Flink's maturity, making it significantly easier to program, deploy, and maintain Flink pipelines at scale by:
  • initiating backwards compatibility of public APIs against all 1.x.y versions;
  • introducing functionality for complex event processing (CEP);
  • supporting large state beyond memory limits;
  • supporting state versioning and savepoints; and 
  • improving the system's monitoring functionality

"Flink v1.0 is indeed a testament to the maturity of the platform, which now enjoys production use at Fortune Global 500, as well as leading tech companies," said Kostas Tzoumas, member of the Apache Flink Project Management Committee, and co-founder/CEO of data Artisans.

"Google congratulates the Apache Flink community for this achievement," said William Vambenepe, Lead Product Manager for Big Data on Google Cloud Platform. "Flink is unlocking the richness of stream processing at scale, and delivering on the promise of the Dataflow Programming Model for all users, anywhere. We look forward to continuing to work with the Flink community, including further unification of APIs as part of Apache Beam (incubating)."

"At King.com we are using Flink to process more than 30 billion events daily, leveraging Flink's stateful streaming abstractions," said Christofer Waldenström, Team Lead for Streaming Platform at King.com. "We find that Flink provides a convenient way to interact with real-time data for complex streaming use-cases involving large state beyond memory."

"Apache Flink proved to be a valuable framework in our day-to-day business. It helps us to process log events, aggregate tracking information, apply filters and decide upon message routing," said Christian Kreutzfeldt, Senior Solution Developer & Architect at Otto Group BI. "We are still excited to see how fast new applications can be implemented and deployed. Even complex requirements do not constitute a significant challenge. For the upcoming version 1.0 we are looking forward to see a stabilized API and the advanced monitoring features. Especially the back pressure monitoring could become a great tool to understand internal processing behavior much better. Furthermore from an enterprise user perspective we are happy to see that Apache Flink finally reached version 1.0 which typically opens the door to the broader enterprise market."

Flink originated at the Stratosphere research project that started in 2009 by the Technical University of Berlin, along with several other European universities. The project was submitted to the Apache Incubator in April 2014 and became an Apache Top-Level Project in December 2014. 

Today, Flink among the ASF's dynamic Big Data projects, with more than 150 contributors to date, a wealth of production deployments, and commercial support by data Artisans, a company founded by the core team that originally developed Flink.

"The two things that have always struck me about Flink has been the excellence of the code and the excellence of the team," said Ted Dunning, Vice President of the Apache Incubator and Chief Application Architect at MapR. "This pattern is continuing with this release."

Get Involved!
Apache Flink welcomes contribution and community participation through mailing lists as well as attending face-to-face MeetUps, developer trainings, and the following events:
  • QCon (London, 7-9 March 2016)
  • Strata/Hadoop World (San Jose, 28-31 March 2016)
  • Hadoop Summit (Dublin, 13-14 April 2016)
  • Kafka Summit (San Francisco, 26 April 2016)
  • Apache: Big Data (Vancouver, 9-12 May 2016)
  • OSCON (Austin, TX, 18-19 May 2016)
  • Strata/Hadoop World (London, 31 May - 3 June 2016)
  • Berlin Buzzwords (Berlin, 5-7 June 2016)
  • Flink Forward (Berlin, 12-14 September 2016)

Availability and Oversight
Apache Flink software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache Flink, visit http://flink.apache.org/ and https://twitter.com/ApacheFlink

About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees more than 350 leading Open Source projects, including Apache HTTP Server --the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 550 individual Members and 5,300 Committers successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Alibaba Cloud Computing, ARM, Bloomberg, Budget Direct, Cerner, Cloudera, Comcast, Confluent, Facebook, Google, Hortonworks, HP, Huawei, IBM, InMotion Hosting, iSigma, LeaseWeb, Microsoft, PhoenixNAP, Pivotal, Private Internet Access, Produban, Red Hat, Serenata Flowers, WANdisco, and Yahoo. For more information, visit http://www.apache.org/ or follow @TheASF on Twitter.

© The Apache Software Foundation. "Apache", "Apache Beam (incubating)", "Beam (incubating)", "Apache Cassandra", "Cassandra", "Apache Flink", "Flink", "Apache Hadoop", "Hadoop", "Apache HBase", "HBase", "Apache Kafka", "Kafka", "Apache MapReduce", "MapReduce", "Apache Storm", "Storm", "Apache YARN", "YARN", "ApacheCon", and their logos are registered trademarks or trademarks of The Apache Software Foundation in the U.S. and/or other countries. All other brands and trademarks are the property of their respective owners.

# # #

Wednesday February 17, 2016

The Apache® Software Foundation Announces Apache Arrow™ as a Top-Level Project

Open source Big Data in-memory columnar layer accelerates analytical processing and interchange by more than 100x. 

Forest Hill, MD --17 Feb 2016-- The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today Apache Arrow as a new Top-Level Project. 

A high-performance cross-system data layer for columnar in-memory analytics, Apache Arrow provides the following benefits for Big Data workloads:
  • Accelerates the performance of analytical workloads by more than 100x in some cases
  • Enables multi-system workloads by eliminating cross-system communication overhead

Initially seeded by code from the Apache Drill project, Apache Arrow was built on top of a number of Open Source collaborations, and establishes a de-facto standard for columnar in-memory processing and interchange.

"The Open Source community has joined forces on Apache Arrow," said Jacques Nadeau, Vice President of Apache Arrow and Vice President Apache Drill. "Developers from 13 major Open Source Big Data projects are already on board --by introducing a new era of columnar in-memory analytics, we anticipate the majority of the world's data will be processed through Arrow within the next few years."

Code committers to Apache Arrow include developers from Apache Big Data projects Calcite, Cassandra, Drill, Hadoop, HBase, Impala, Kudu (incubating), Parquet, Phoenix, Spark, and Storm as well as established and emerging Open Source projects such as Pandas and Ibis.

"Arrow's cross platform and cross system strengths will enable Python and R to become first-class languages across the entire Big Data stack," said Wes McKinney, creator of Pandas.

Apache Arrow accelerates analytical processing by providing a high performance columnar in-memory representation. A number of processing algorithms benefit greatly from this memory design. 

"A columnar in-memory data layer enables systems and applications to process data at full hardware speeds," said Todd Lipcon, original Apache Kudu creator and member of the Apache Arrow Project Management Committee. "Modern CPUs are designed to exploit data-level parallelism via vectorized operations and SIMD instructions. Arrow facilitates such processing."

In many workloads, 70-80% of CPU cycles are spent serializing and deserializing data. Arrow solves this problem by enabling data to be shared between systems and processes with no serialization, deserialization or memory copies.

"An industry-standard columnar in-memory data layer enables users to combine multiple systems, applications and programming languages in a single workload without the usual overhead," said Ted Dunning, Vice President of the Apache Incubator and member of the Apache Arrow Project Management Committee.

In addition to traditional relational data, Arrow supports complex data with dynamic schemas. For example, Arrow can handle JSON data which is commonly used in IoT workloads, modern applications and log files. Implementations are also available (or underway) for a number of programming languages including Java, C++ and Python to allow greater interoperability among a number of Big Data solutions.

"Real world use cases often include complex combinations of structured and rapidly growing complex-data. Already tested with Apache Drill, the efficient in-memory columnar representation and processing in Arrow will enable users to enjoy the performance of columnar processing with the flexibility of JSON," said Parth Chandra, member of the Apache Drill and Apache Arrow Project Management Committees.

Catch Apache Arrow in action at Strata + Hadoop World (San Jose: 30 March 2016, and London: 1-3 June 2016), as well as upcoming MeetUps and local events http://arrow.apache.org/events

Availability and Oversight
Apache Arrow software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache Arrow, visit http://arrow.apache.org/

About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees more than 350 leading Open Source projects, including Apache HTTP Server --the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 550 individual Members and 5,300 Committers successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Alibaba Cloud Computing, ARM, Bloomberg, Budget Direct, Cerner, Cloudera, Comcast, Confluent, Facebook, Google, Hortonworks, HP, Huawei, IBM, InMotion Hosting, iSigma, LeaseWeb, Matt Mullenweg, Microsoft, PhoenixNAP, Pivotal, Private Internet Access, Produban, Red Hat, Serenata Flowers, WANdisco, and Yahoo. For more information, visit http://www.apache.org/ or follow @TheASF on Twitter.

© The Apache Software Foundation. "Apache", "Apache Arrow", "Arrow", "Apache Calcite", "Calcite", "Apache Cassandra", "Cassandra", "Apache Drill", "Drill", "Apache Hadoop", "Hadoop", "Apache HBase", "HBase", "Apache Impala", "Impala", "Apache Kudu (incubating)", "Kudu (incubating)", "Apache Parquet", "Parquet", "Apache Phoenix", "Phoenix", "Apache Spark", "Spark", "Apache Storm", "Storm", "ApacheCon", and their logos are registered trademarks or trademarks of The Apache Software Foundation in the U.S. and/or other countries. All other brands and trademarks are the property of their respective owners.

# # # 

Wednesday January 27, 2016

The Apache® Software Foundation Announces Strong Momentum; Enters 2016 More Influential, Innovative, Efficient, and with a New Look

Forest Hill, MD –27 January 2016– The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, today announced continued momentum in Open Source innovation and leadership.

"We continue to see strong adoption of Apache software across diverse sectors and categories," said ASF President Ross Gardler. "Whether users are seeking Open Source solutions or not, Apache ranks as a top choice. Our brand strength, proven leadership, and community-driven process enable individuals and market competitors alike to collaborate in a trusted environment. This, in turn, furthers our mission of developing top quality software that is bolstered by vibrant, active communities."

From Abdera to Zookeeper, the ASF is home to dozens of ubiquitous, best-in-class software projects such as Cordova, Flex, Lucene/Solr, Maven, OpenOffice, Tomcat, and the flagship Apache HTTP Server.

INNOVATION
Since its inception, the ASF has long been recognized as a leading source for Open Source network-server, network-client, and library tools that meet the demand for interoperable, adaptable, and sustainable solutions. Its reputation for producing reliable enterprise-grade software continues to grow dramatically across several categories, most notably Big Data, where Apache powerhouses such as Hadoop, Cassandra, Storm, and many others dominate the marketplace. According to Forrester Research, 100% of enterprises will embrace the Apache Hadoop ecosystem for data storage, processing, problem solving, and predictive analytics, particularly across Cloud environments.

Apache projects and categories that are experiencing widespread recognition include Ambari, Kafka, Mesos, and Spark --all winners of InfoWorld’s 2016 Technology of the Year award. In 2015, Apache products were featured more than 400 times in Gartner Magic Quadrant reports.

Projects and communities intending to become fully-fledged Apache projects do so through the Apache Incubator. This includes code donations from external organizations as well as established projects such as Groovy, which became an Apache Top-Level Project in November 2015,. There are 47 projects currently undergoing development in the Apache Incubator, 60% of which are in the Big Data category. In addition, there are 39 initiatives in the Apache Labs innovation "sandbox" for testing technical concepts.

EFFICIENCY
Beginning with an inaugural membership of 21 individuals who oversaw the progress of the Apache HTTP Server, the ASF has grown to 588 individual members and 5,317 Committers collaborating across six continents. All development is done on a volunteer basis --some committers may be paid by their employers for their time and code contributions, but the ASF does not pay for software or project oversight. 

The private, 501(c)(3) non-profit charitable organization is funded through tax-deductible contributions from corporations, foundations, and private individuals. The ASF Sponsorship Program raises critical funds that helps offset day-to-day operating expenses such as bandwidth and connectivity, servers and hardware, legal and accounting services, brand management and public relations, general office expenditures, and support staff.

Approximately 75% of the ASF's US$1M annual budget is dedicated to running critical infrastructure support services. Each day, millions of people across the globe access the ASF’s two dozen servers and 75 distinct hosts. A distributed team (comprising 10 rotating volunteers and 5 paid staff) on three continents keep Apache services running running 24x7x365 at near 100% uptime on an annual budget of less than US$5,000 per project.

"The ASF founders were both pragmatic and provident in establishing an operation that is rigorously cost-efficient," added Gardler. "It’s amazing that we’ve been able to scale 35,000% over 16 years with very limited resources."

NEW LOOK
The ASF also unveiled a new visual identity today, the result of more than a year of planning alongside a design collaborative led by Lisa Dae and Fran Lukesh, whose services were donated by HotWax Systems and Lucidworks respectively. 

OLD LOGO http://www.apache.org/images/asf_logo.gif



NEW LOGO https://www.apache.org/img/asf_logo.png



"We are proud to debut our new identity, which reflects the Foundation’s truly diverse nature," said Sally Khudairi, Vice President of Marketing and Publicity at the ASF. "The simplified logo and feather glyph pay homage to our legacy whilst reflecting our forward-focused energetic growth. We’re looking better than ever!"

The new identity will be unveiled in Brussels 30-31 January at the FOSDEM conference, with the brand roll out continuing across Apache projects and activities throughout the year.

About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees more than 350 leading Open Source projects, including Apache HTTP Server --the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 550 individual Members and 5,300 Committers successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Alibaba Cloud Computing, ARM, Bloomberg, Budget Direct, Cerner, Cloudera, Comcast, Confluent, Facebook, Google, Hortonworks, HP, Huawei, IBM, InMotion Hosting, iSigma, LeaseWeb, Matt Mullenweg, Microsoft, PhoenixNAP, Pivotal, Private Internet Access, Produban, Red Hat, Serenata Flowers, WANdisco, and Yahoo. For more information, visit http://www.apache.org/ or follow @TheASF on Twitter.

© The Apache Software Foundation. "Apache", "Apache Abdera", "Apache Cassandra", "Apache Cordova", "Apache Flex", "Apache Hadoop", "Apache Hive", "Apache HTTP Server", "Apache Lucene/Solr", "Apache Maven", "Apache OpenOffice", "Apache Spark", "Apache Storm", "Apache Tomcat", "Apache Zookeeper", "ApacheCon", and their logos are registered trademarks or trademarks of The Apache Software Foundation in the U.S. and/or other countries. All other brands and trademarks are the property of their respective owners.

# # # 

Sunday December 13, 2015

Applications now Open for Travel Assistance to ApacheCon North America 2016

The Apache Software Foundation's Travel Assistance Committee (TAC) have announced that applications for travel assistance to ApacheCon North America 2016 are now open.

ApacheCon North America will take place 9-13 May 2016 in Vancouver British Columbia, Canada.

This year's ApacheCon is split into two separate themed events:

 - Apache:Big Data --the Apache projects and people working in Big Data, ubiquitous computing, data engineering, and science
 - ApacheCon:Core --the technologies, projects, and communities driving the future of Open Source, ubiquitious and emerging Web solutions, and Cloud computing


Event details and are available at http://apachecon.com/ ; the Calls for Participation for both events are also open.

Important: due to the short timeframe of each event, only applications from those who are able to attend BOTH events will be considered. Applications close on 2 March 2, 2016. Please ensure your application contains as much supporting material as required to efficiently and accurately process your request. 

The TAC exists to help those that would like to attend ApacheCon events, but are unable to do so for financial reasons. For more information, and to apply, please visit http://www.apache.org/travel/ As with previous years, the TAC team anticipates a high volume of applications from a diverse range of backgrounds, and therefore encourage those considering to submit an application to do so as soon as possible.

We look forward to seeing you in Vancouver!

-Sally Khudairi, on behalf of the TAC team

Calendar

Search

Hot Blogs (today's hits)

Tag Cloud

Categories

Feeds

Links

Navigation