Entries tagged [bigdata]

Tuesday February 19, 2019

The Apache® Software Foundation Announces Apache Arrow™ Momentum

Open Source Big Data in-memory columnar layer adopted by dozens of Open Source and commercial technologies; exceeded 1,000,000 monthly downloads within first three years as an Apache Top-Level Project

Wakefield, MA —19 February 2019— The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, today announced momentum with Apache® Arrow™, the Open Source Big Data in-memory columnar layer.

Since the founding of the project in January 2016, Apache Arrow has quickly become the defacto standard for representing and processing analytical data in memory, accelerating analytical processing and interchange by more than 100x.

"When we became a Top-Level Project, we projected that the majority of the world's data will be processed through Arrow within the next decade," said Jacques Nadeau, Vice President of Apache Arrow. "In just three years time, we are proud to see Arrow's substantial industry adoption and increased value across a wide range of analytical, machine learning, and artificial intelligence workloads."

Highlights of Apache Arrow's success include:

Industry Adoption —more than 20 major technologies adopted Arrow to accelerate in-memory analytics, including Apache Spark, NVIDIA RAPIDS, pandas, and Dremio, among others. A list of known Open Source and commercial implementations can be found at https://arrow.apache.org/powered_by/

Millions of Downloads —leveraging and integrating Apache Arrow into many other technologies has bolstered downloads to more than 1,000,000 each month.

New Language Support —as a cross-language development platform, supporting multiple programming languages is paramount. Apache Arrow has grown from supporting one language to eleven different languages today; they include C++, Java, Python, R, C#, Javascript, and Ruby, among others.

Seamless Data Format Support —Arrow supports different data types, both simple and nested, located in arbitrary memory such as regular system RAM, memory-mapped files or on-GPU memory. In addition, it can ingest data from popular storage formats such as Apache Parquet, CSV files, Apache ORC, JSON, and more.

Major Code Donations —Apache Arrow's new features and expanded functionality are due in part to code and component donations that include:
  • C# Library
  • Gandiva LLVM-based Expression Compiler
  • Go Library
  • Javascript Library
  • Plasma Shared Memory Object Store
  • Ruby Libraries (Apache Arrow and Apache Parquet)
  • Rust Libraries (Parquet and DataFusion Query Engine)
Community and Contributor Growth —over the past 12 months, nearly 300 individuals have submitted more than 3,000 contributions that have grown the Apache Arrow code base by 300,000 lines of code. The Arrow community is welcoming approximately 10 new contributors each month.


In January the project announced its most recent release, Apache Arrow 0.12.0, which reflects more than 600 enhancements developed during Q4 2018. The Apache Arrow community is actively working on a number of impactful new initiatives that include solving high performance analytical problems and allowing for more efficient data distribution across entire clusters.

"Apache Arrow's rapid industry adoption and developer community growth supports our original thesis of the importance of a language-independent open standard for columnar data," said Wes McKinney, member of the Apache Arrow Project Management Committee, and creator of Python's pandas project. "Additionally, we are seeing productive collaborations take place not only between programming languages but also between the database systems and data science worlds. We look forward to welcoming more data system developers into our community."

About Apache Arrow
Apache Arrow is a cross-language development platform for in-memory data. It specifies a standardized language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware. It also provides computational libraries and zero-copy streaming messaging and interprocess communication. Languages currently supported include C, C++, C#, Go, Java, JavaScript, MATLAB, Python, R, Ruby, and Rust.

Availability and Oversight
Apache Arrow software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache Arrow, visit http://arrow.apache.org/

About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees more than 350 leading Open Source projects, including Apache HTTP Server --the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 730 individual Members and 7,000 Committers across six continents successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official global conference series. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Aetna, Alibaba Cloud Computing, Anonymous, ARM, Baidu, Bloomberg, Budget Direct, Capital One, Cerner, Cloudera, Comcast, Facebook, Google, Handshake, Hortonworks, Huawei, IBM, Indeed, Inspur, LeaseWeb, Microsoft, Oath, ODPi, Pineapple Fund, Pivotal, Private Internet Access, Red Hat, Target, Tencent, Union Investment, and Workday. For more information, visit http://apache.org/ and https://twitter.com/TheASF

© The Apache Software Foundation. "Apache", "Arrow", "Apache Arrow", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.

# # #

Wednesday January 23, 2019

The Apache Software Foundation Announces Apache® Hadoop® v3.2.0

Pioneering Open Source distributed enterprise framework powers US$166B Big Data ecosystem

Wakefield, MA —23 January 2019— The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, today announced Apache® Hadoop® v3.2.0, the latest version of the Open Source software framework for reliable, scalable, distributed computing.

Now in its 11th year, Apache Hadoop is the foundation of the US$166B Big Data ecosystem (source: IDC) by enabling data applications to run and be managed on large hardware clusters in a distributed computing environment. "Apache Hadoop has been at the center of this big data transformation, providing an ecosystem with tools for businesses to store and process data on a scale that was unheard of several years ago," according to Accenture Technology Labs.

"This latest release unlocks the powerful feature set the Apache Hadoop community has been working on for more than nine months," said Vinod Kumar Vavilapalli, Vice President of Apache Hadoop. "It further diversifies the platform by building on the cloud connector enhancements from Apache Hadoop 3.0.0 and opening it up for deep learning use-cases and long-running apps."

Apache Hadoop 3.2.0 highlights include:
  • ABFS Filesystem connector —supports the latest Azure Datalake Gen2 Storage;
  • Enhanced S3A connector —including better resilience to throttled AWS S3 and DynamoDB IO;
  • Node Attributes Support in YARN —helps to tag multiple labels on the nodes based on its attributes and supports placing the containers based on expression of these labels;
  • Storage Policy Satisfier  —supports HDFS (Hadoop Distributed File System) applications to move the blocks between storage types as they set the storage policies on files/directories; 
  • Hadoop Submarine —enables data engineers to easily develop, train and deploy deep learning models (in TensorFlow) on very same Hadoop YARN cluster;
  • C++ HDFS client —helps to do async IO to HDFS which helps downstream projects such as Apache ORC;
  • Upgrades for long running services —supports in-place seamless upgrades of long running containers via YARN Native Service API (application program interface) and CLI (command-line interface).

"This is one of the biggest releases in Apache Hadoop 3.x line which brings many new features and over 1,000 changes," said Sunil Govindan, Apache Hadoop 3.2.0 release manager. "We are pleased to announce that Apache Hadoop 3.2.0 is available to take your data management requirements to the next level. Thanks to all our contributors who helped to make this release happen."

Apache Hadoop is widely deployed at numerous enterprises and institutions worldwide, such as Adobe, Alibaba, Amazon Web Services, AOL, Apple, Capital One, Cloudera, Cornell University, eBay, ESA Calvalus satellite mission, Facebook, foursquare, Google, Hortonworks, HP, Huawei, Hulu, IBM, Intel, LinkedIn, Microsoft, Netflix, The New York Times, Rackspace, Rakuten, SAP, Tencent, Teradata, Tesla Motors, Twitter, Uber, and Yahoo. The project maintains a list of educational and production users, as well as companies that offer Hadoop-related services at https://wiki.apache.org/hadoop/PoweredBy

Global Knowledge hails, "...the open-source Apache Hadoop platform changes the economics and dynamics of large-scale data analytics due to its scalability, cost effectiveness, flexibility, and built-in fault tolerance. It makes possible the massive parallel computing that today's data analysis requires."

Hadoop is proven at scale: Netflix captures 500+B daily events using Apache Hadoop. Twitter uses Apache Hadoop to handle 5B+ sessions a day in real time. Twitter’s 10,000+ node cluster processes and analyzes more than a zettabyte of raw data through 200B+ tweets per year. Facebook’s cluster of 4,000+ machines that store 300+ petabytes is augmented by 4 new petabytes of data generated each day. Microsoft uses Apache Hadoop YARN to run the internal Cosmos data lake, which operates over hundreds of thousands of nodes and manages billions of containers per day.

Transparency Market Research recently reported that the global Hadoop market is anticipated to rise at a staggering 29% CAGR with a market valuation of US$37.7B by the end of 2023.

Apache Hadoop remains one of the most active projects at the ASF: it ranks #1 for Apache project repositories by code commits, and is the #5 repository by size (3,881,797 lines of code).

"The Apache Hadoop community continues to go from strength to strength in further driving innovation in Big Data," added Vavilapalli. "We hope that developers, operators and users leverage our latest release in fulfilling their data management needs."

Catch Apache Hadoop in action at the Strata conference, 25-28 March 2019 in San Francisco, and dozens of Hadoop MeetUps held around the world, including on 30 January 2019 at LinkedIn in Sunnyvale, California.

Availability and Oversight
Apache Hadoop software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache Hadoop, visit http://hadoop.apache.org/ and https://twitter.com/hadoop

About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees more than 350 leading Open Source projects, including Apache HTTP Server --the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 730 individual Members and 7,000 Committers across six continents successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official global conference series. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Aetna, Alibaba Cloud Computing, Anonymous, ARM, Baidu, Bloomberg, Budget Direct, Capital One, Cerner, Cloudera, Comcast, Facebook, Google, Handshake, Hortonworks, Huawei, IBM, Indeed, Inspur, LeaseWeb, Microsoft, Oath, ODPi, Pineapple Fund, Pivotal, Private Internet Access, Red Hat, Target, Tencent, and Union Investment. For more information, visit http://apache.org/ and https://twitter.com/TheASF

© The Apache Software Foundation. "Apache", "Hadoop", "Apache Hadoop", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.

# # #

Tuesday January 08, 2019

The Apache Software Foundation Announces Apache® Airflow™ as a Top-Level Project

Open Source Big Data workflow management system in use at Adobe, Airbnb, Etsy, Google, ING, Lyft, PayPal, Reddit, Square, Twitter, and United Airlines, among others.

Wakefield, MA —8 January 2019— The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today Apache® Airflow™ as a Top-Level Project (TLP).

Apache Airflow is a flexible, scalable workflow automation and scheduling system for authoring and managing Big Data processing pipelines of hundreds of petabytes. Graduation from the Apache Incubator as a Top-Level Project signifies that the Apache Airflow community and products have been well-governed under the ASF's meritocratic process and principles.

"Since its inception, Apache Airflow has quickly become the de-facto standard for workflow orchestration," said Bolke de Bruin, Vice President of Apache Airflow. "Airflow has gained adoption among developers and data scientists alike thanks to its focus on configuration-as-code. That has gained us a community during incubation at the ASF that not only uses Apache Airflow but also contributes back. This reflects Airflow’s ease of use, scalability, and power of our diverse community; that it is embraced by enterprises and start-ups alike, allows us to now graduate to a Top-Level Project."

Apache Airflow is used to easily orchestrate complex computational workflows. Through smart scheduling, database and dependency management, error handling and logging, Airflow automates resource management, from single servers to large-scale clusters. Written in Python, the project is highly extensible and able to run tasks written in other languages, allowing integration with commonly used architectures and projects such as AWS S3, Docker, Apache Hadoop HDFS, Apache Hive, Kubernetes, MySQL, Postgres, Apache Zeppelin, and more. Airflow originated at Airbnb in 2014 and was submitted to the Apache Incubator March 2016.

Apache Airflow is in use at more than 200 organizations, including Adobe, Airbnb, Astronomer, Etsy, Google, ING, Lyft, NYC City Planning, Paypal, Polidea, Qubole, Quizlet, Reddit, Reply, Solita, Square, Twitter, and United Airlines, among others. A list of known users can be found at https://github.com/apache/incubator-airflow#who-uses-apache-airflow

"Adobe Experience Platform is built on cloud infrastructure leveraging open source technologies such as Apache Spark, Kafka, Hadoop, Storm, and more," said Hitesh Shah, Principal Architect of Adobe Experience Platform. "Apache Airflow is a great new addition to the ecosystem of orchestration engines for Big Data processing pipelines. We have been leveraging Airflow for various use cases in Adobe Experience Cloud and will soon be looking to share the results of our experiments of running Airflow on Kubernetes." 

"Our clients just love Apache Airflow. Airflow has been a part of all our Data pipelines created in past 2 years acting as the ring-master and taming our Machine Learning and ETL Pipelines," said Kaxil Naik, Data Engineer at Data Reply. "It has helped us create a Single View for our client's entire data ecosystem. Airflow's Data-aware scheduling and error-handling helped automate entire report generation process reliably without any human-intervention. It easily integrates with Google Cloud (and other major cloud providers) as well and allows non-technical personnel to use it without a steep learning curve because of Airflow’s configuration-as-a-code paradigm."

"With over 250 PB of data under management, PayPal relies on workflow schedulers such as Apache Airflow to manage its data movement needs reliably," said Sid Anand, Chief Data Engineer at PayPal. "Additionally, Airflow is used for a range of system orchestration needs across many of our distributed systems: needs include self-healing, autoscaling, and reliable [re-]provisioning."

"Since our offering of Apache Airflow as a service in Sept 2016, a lot of big and small enterprises have successfully shifted all of their workflow needs to Airflow," said Sumit Maheshwari, Engineering Manager at Qubole. "At Qubole, not only are we a provider, but also a big consumer of Airflow as well. For example, our whole Insight and Recommendations platform is built around Airflow only, where we process billions of events every month from hundreds of enterprises and generate insights for them on big data solutions like Apache Hadoop, Apache Spark, and Presto. We are very impressed by the simplicity of Airflow and ease at which it can be integrated with other solutions like clouds, monitoring systems or various data sources."

"At ING, we use Apache Airflow to orchestrate our core processes, transforming billions of records from across the globe each day," said Rob Keevil, Data Analytics Platform Lead at ING WB Advanced Analytics. "Its feature set, Open Source heritage and extensibility make it well suited to coordinate the wide variety of batch processes we operate, including ETL workflows, model training, integration scripting, data integrity testing, and alerting. We have played an active role in Airflow development from the onset, having submitted hundreds of pull requests to ensure that the community benefits from the Airflow improvements created at ING.  We are delighted to see Airflow graduate from the Apache Incubator, and look forward to see where this exciting project will be taken in future!"

"We saw immediately the value of Apache Airflow as an orchestrator when we started contributing and using it," said Jarek Potiuk, Principal Software Engineer at Polidea. "Being able to develop and maintain the whole workflow by engineers is usually a challenge when you have a huge configuration to maintain. Airflow allows your DevOps to have a lot of fun and still use the standard coding tools to evolve your infrastructure. This is 'infrastructure as a code' at its best."

"Workflow orchestration is essential to the (big) data era that we live in," added de Bruin. "The field is evolving quite fast and the new data thinking is just starting to make an impact. Apache Airflow is a child of the data era and therefore very well positioned, and is also young so a lot of development can still happen. Airflow can use bright minds from scientific computing, enterprises, and start-ups to further improve it. Join the community, it is easy to hop on!"

Availability and Oversight
Apache Airflow software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache Airflow, visit http://airflow.apache.org/ and https://twitter.com/ApacheAirflow

About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees more than 350 leading Open Source projects, including Apache HTTP Server --the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 730 individual Members and 7,000 Committers across six continents successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Aetna, Alibaba Cloud Computing, Anonymous, ARM, Baidu, Bloomberg, Budget Direct, Capital One, Cerner, Cloudera, Comcast, Facebook, Google, Handshake, Hortonworks, Huawei, IBM, Indeed, Inspur, LeaseWeb, Microsoft, Oath, ODPi, Pineapple Fund, Pivotal, Private Internet Access, Red Hat, Target, Tencent, and Union Investment. For more information, visit http://apache.org/ and https://twitter.com/TheASF

© The Apache Software Foundation. "Apache", "Airflow", "Apache Airflow", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.

# # #

Thursday August 23, 2018

The Apache Software Foundation Announces Apache® HAWQ® as a Top-Level Project

Advanced Big Data query engine and analytic database in use at Alibaba, Haier, VMWare, ZTESoft, and hundreds more.

Wakefield, MA —23 August 2018— The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today Apache® HAWQ® as a Top-Level Project (TLP).

Apache HAWQ is an advanced enterprise SQL-on-Hadoop query engine and analytic database. It combines the key technological advantages of MPP database with the scalability and convenience of Apache Hadoop. HAWQ reads data from and writes data to HDFS natively, delivers industry-leading performance and linear scalability, and provides users with a complete, standards compliant SQL interface.

"We are very excited to see Apache HAWQ graduate as a Top-Level Project and we would like to thank our Incubation mentors for all their help," said Dr. Lei Chang, Vice President of Apache HAWQ. "This is a huge milestone that reflects the collective contributions from the growing global community to deliver a world-class SQL engine for analytics."

HAWQ operates natively in Apache Hadoop to provide users the tools to confidently and successfully interact with petabyte-range data sets. Features include:
  • Exceptional performance: parallel processing architecture delivers high performance throughput and low latency —potentially near real time— query responses that can scale to petabyte-sized datasets;
  • Robust ANSI SQL compliance: leverage familiar skills. Achieve higher levels of compatibility for SQL-based applications and BI/data visualization tools. Execute complex queries and joins, including roll-ups and nested queries; and 
  • Apache Hadoop ecosystem integration: integrate and manage with Apache YARN. Provision with Apache Ambari. Interface with Apache HCatalog. Supports Apache Parquet, Apache HBase, and others. Easily scales nodes up or down to meet performance or capacity requirements.

Apache HAWQ is in use at Alibaba, Haier, VMware, ZTESoft, and hundreds of users around the world.

"We admire Apache HAWQ's flexible framework and ability to scale up in a Cloud ecosystem. HAWQ helps those seeking a heterogeneous computing system to handle ad-hoc queries and heavy batch workloads," said Kuien Liu, Computing Platform Architect at Alibaba. "Alibaba encourages more and more engineers to continue to embrace Open Source, and Apache HAWQ stands out as a star project. We are proud to have been collaborating with this community since 2015."

"Haier Group has deployed clusters of more than 30 nodes in the production environment from the very beginning of HAWQ," said Xiaoliang Wu, Big Data Architect at Haier. "We use HAWQ as an ad-hoc query and batch computation engine in areas such as social network services and IOT. Because of its superior scalability and stability, HAWQ greatly improves development efficiency and reduces operation and maintenance costs. We believe that Apache HAWQ is a very competitive product in the SQL-On-Hadoop field."

"We have been using Apache HAWQ at VMware for 4 years now," said Dominie Jacob, Lead Big Data Engineer at VMware Inc. "It is easy to manage and scale using Apache Ambari, and easy to provision and attach more nodes based on demand. Being virtualized, it is easy to provision and attach more nodes based on demand. In our BI Big Data world, HAWQ is the primary database for accessing the Hadoop datasets, building models, and executing predictive model workflows. HAWQ is working seamlessly with billions of records, thousands of Tables/Functions/Tableau-Reports, and hundreds of users. The demand for HAWQ is increasing. As VMware always encourages us to pick up and contribute back to Open Source technologies, we would love to collaborate with this community and see more enhancements. In our BI space, HAWQ is one of the top priorities."

"Apache HAWQ is an attractive technology for Big Data applications," said Zixu Zhao, Architect at ZTESoft. "HAWQ serves as the foundation of our Big Data platform and it has been used in a lot of applications, such as interactive analytics and BI on telecom data. We congratulate HAWQ on becoming an Apache Top-Level Project."

"Becoming an Apache Top-Level Project is an important milestone," added Chang. "There is much work ahead of us, and we look forward to growing the HAWQ community and codebase."

Catch Apache HAWQ in action at ApacheCon North America 24-27 September 2018 http://apachecon.com/acna18/ .

Availability and Oversight

Apache HAWQ software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache HAWQ, visit http://hawq.apache.org/ and https://twitter.com/ApacheHAWQ .

About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees more than 350 leading Open Source projects, including Apache HTTP Server --the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 730 individual Members and 6,800 Committers across six continents successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Aetna, Anonymous, ARM, Bloomberg, Budget Direct, Capital One, Cerner, Cloudera, Comcast, Facebook, Google, Hortonworks, Huawei, IBM, Indeed, Inspur, LeaseWeb, Microsoft, Oath, ODPi, Pineapple Fund, Pivotal, Private Internet Access, Red Hat, Target, and Union Investment. For more information, visit http://apache.org/ and https://twitter.com/TheASF


© The Apache Software Foundation. "Apache", "HAWQ", "Apache HAWQ", "Ambari", "Apache Ambari", "Hadoop", "Apache Hadoop", "HBase", "Apache HBase", "HCatalog", "Apache HCatalog", "Parquet", "Apache Parquet", "YARN", "Apache YARN", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.

# # #

Thursday May 17, 2018

The Apache® Software Foundation Announces Agenda, Keynotes, and Sponsors for ApacheCon™ North America 2018

Community-driven conference series to gather dozens of Apache projects and their communities in Montréal to share and learn about the latest Open Source innovations in Big Data, Cloud, Finance, IoT, Machine Learning, Search, Servers, and more in a collaborative, vendor-neutral environment

Wakefield, MA —17 May 2018— The Apache® Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today the program for its official conference series, ApacheCon™, taking place 24-27 September 2018 in Montréal, Canada.

ApacheCon showcases the latest breakthroughs from ubiquitous Apache projects and upcoming innovations in the Apache Incubator, as well as Open Source development and leading community-driven projects "The Apache Way".

"The Apache Software Foundation continues to lead in community-driven development that accelerates innovation across the Open Source ecosystem," said Rich Bowen, Vice President of Conferences at the ASF. "ApacheCon delivers state-of-the-art content, directly from Apache project developer and user communities."

Over the past 19 years, ApacheCon has drawn attendees from more than 60 countries to learn about core Open Source technologies independent of business interests, corporate biases, or sales pitches. 

Topics include Big Data, Cloud, Community, Finance, Graphs, HTTP Server, IoT, Machine Learning, Messaging, NoSQL, Programming, Search, Security, Streaming, and more.

Dedicated Apache project tracks/co-located events include Apache Spark and Scala half-day workshop with IBM; CloudStack Collaboration Conference, TomcatCon, and Geospatial Track, organized by the Open Geospatial Consortium (OGC).

Sessions and Speakers
More than 100 presentations on dozens of popular Apache projects, including Avro, Beam, CouchDB, Drill, Fineract, Groovy, Hadoop, HAWQ, HBase, Hive, HTTP Server, Kafka, Karaf, Lucene, MyNewt, NetBeans, OpenNLP, ORC, Parquet, Phoenix, Ranger, RocketMQ, Solr, Spark, Struts, Unomi, YARN, and Zeppelin.

Featured projects undergoing development in the Apache Incubator include Druid, Dubbo, Gobblin, Hivemall, MXNet, OpenWhisk, PLC4X, Pulsar, S2Graph, and Traffic Control.

The complete schedule is available at https://apachecon.com/acna18/

Keynotes

Tuesday, 25 September

  • Cliff Schmidt, Apache Member, former ASF Board member, and Literacy Bridge founder on how Amplio uses technology to educate and improve the quality of life of people living in very difficult parts of the world.

  • Myrle Krantz, Apache Member and Vice President Apache Fineract, on how Open Source banking is helping the global fight against poverty.

Wednesday, 26 September

  • Bridget Kromhout, Principal Cloud Developer Advocate at Microsoft, on the really hard problem in software: the people.

  • Euan McLeod, ‎VP VIPER at ‎Comcast, on the many ways that Apache software delivers your favorite shows to your living room.

Events and Activities
ApacheCon will also feature popular sessions that include Lightning Talks, BarCampApache, Hackathon, PGP key signing, and ample networking opportunities with Apache projects and their communities.

Sponsors
ApacheCon is made possible by Platinum Sponsors Comcast and IBM; Gold Sponsor Amazon; Silver Sponsors Linode and Red Hat; and Bronze Sponsor ShapeBlue, who join Apache event sponsors CloudOps, Dito, GridGain, and Talener. Sponsorship opportunities are available for ApacheCon, as well as Apache Roadshow Berlin and Washington DC: contact help@apachecon.com for further details.

Venue
ApacheCon North America 2018 will be held at the Montréal Marriott Chateau Champlain in Montréal, Canada. ApacheCon has secured room blocks with special rates for sleeping rooms ($225 CAD per night, including WiFi) available through 24 August.

Registration
Registration for ApacheCon is open. Early registration incentives include savings of $225 off the general rate of $800 through 21 July, as well as special discounts for Apache Committers. 

Members of the media and analyst community who would like to reserve a complimentary press pass to ApacheCon should contact Sally Khudairi at press@apache.org.

About ApacheCon
ApacheCon is the official global conference series of The Apache Software Foundation. Since 2000 ApacheCon has been drawing participants at all levels to explore ”Tomorrow’s Technology Today” across 300+ Apache projects and their diverse communities. ApacheCon showcases the latest developments in ubiquitous Apache projects and emerging innovations through hands-on sessions, keynotes, real-world case studies, trainings, hackathons, community events, and more. For more information, visit http://apachecon.com/ and https://twitter.com/ApacheCon .

About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees more than 350 leading Open Source projects, including Apache HTTP Server —the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 680 individual Members and 6,500 Committers across six continents successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Aetna, Alibaba Cloud Computing, ARM, Baidu, Bloomberg, Budget Direct, Capital One, Cash Store, Cerner, Cloudera, Comcast, Facebook, Google, Hortonworks, Huawei, IBM, Indeed, Inspur, iSIGMA, ODPi, LeaseWeb, Microsoft, PhoenixNAP, Pivotal, Private Internet Access, Red Hat, Target, Union Investment, and Yahoo. For more information, visit http://apache.org/ and https://twitter.com/TheASF

© The Apache Software Foundation. "Apache", "Avro", "Apache Avro", "Beam", "Apache Beam", "CouchDB", "Apache CouchDB", "Drill", "Apache Drill", "Fineract", "Apache Fineract", "Groovy", "Apache Groovy", "Hadoop", "Apache Hadoop", "HAWQ", "Apache HAWQ", "HBase", "Apache HBase", "Hive", "Apache Hive", "Apache HTTP Server", "Kafka", "Apache Kafka", "Karaf", "Apache Karaf", "Lucene", "Apache Lucene", "MyNewt", "Apache MyNewt", "NetBeans", "Apache NetBeans", "OpenNLP", "Apache OpenNLP", "ORC", "Apache ORC", "Parquet", "Apache Parquet", "Phoenix", "Apache Phoenix", "Ranger", "Apache Ranger", "RocketMQ", "Apache RocketMQ", "Solr", "Apache Solr", "Spark", "Apache Spark", "Struts", "Apache Struts", "Unomi", "Apache Unomi", "YARN", "Apache YARN", "Zeppelin", "Apache Zeppelin", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.

# # #

Wednesday April 18, 2018

The Apache Software Foundation Announces Apache® Oozie(TM) v5.0.0

Open Source workflow scheduler for Apache Hadoop used to build complex Big Data transformations.

Wakefield, MA —18 April 2018— The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today Apache® OozieTM v5.0.0, the workflow scheduler for Apache Hadoop.

Apache Oozie is a scalable, reliable, and extensible Java Web application used for job workflow scheduling and operational services management within an Apache Hadoop cluster. Integrated with the Hadoop stack, Oozie supports jobs for Apache projects such as Spark, Hive, MapReduce, Pig, and Sqoop, and can also schedule system-specific jobs, such as Java programs and shell scripts. The project entered the Apache Incubator in 2011, and graduated as an Apache Top-Level Project in 2012.

"Apache Oozie 5's flagship feature, Oozie on YARN, started off as a 1 day hackathon project almost 4 years ago, and it's great to see that the Oozie community has taken it on and made it ready for everyone to use," said Robert Kanter, Vice President of Apache Oozie. "It's a big change to Oozie's architecture, and I think our users are going to be very happy with the benefits it brings."

Apache Oozie allows cluster administrators to build complex Big Data transformations out of multiple component tasks. This provides greater control over jobs and also makes it easier to repeat those jobs at predetermined intervals. 

Oozie combines multiple jobs sequentially into one logical unit of work through 1) Oozie Workflow jobs -- Directed Acyclic Graphs (DAGs) of actions; and 2) Oozie Coordinator jobs -- recurrent Oozie Workflow jobs triggered by time (frequency) and data availability. Apache Oozie 5.0.0 includes new features, bug fixes and minor improvements that include:
  • moved launcher from MapReduce mapper to YARN ApplicationMaster;
  • switched from Tomcat 6 to embedded Jetty 9;
  • updated third party libraries;
  • completely rewritten workflow graph generator;
  • JDK 8 support;
  • deprecated Instrumentation in favor of Metrics;
  • added indexes to speed up DB queries; and 
  • fixed CVE-2017-15712

The full list of new features can be found in the project release notes at https://oozie.apache.org/docs/5.0.0/release-log.txt

"Oozie 5 is a major milestone for the project," said Andras Piros, Apache Oozie committer and Apache Oozie v5.0 Release Manager. "We are proud to provide all the new functionality to big data administrators, data engineers, and data scientists who can leverage a faster, more streamlined, and more secure workflow orchestrator. Features like Oozie on YARN, Jetty 9 support, and ecosystem revamp enable Apache Hadoop users to create and schedule Hadoop jobs in an efficient and modern way not seen before."

"Oozie has long been a staple of a productive Apache Hadoop deployment, playing an important role in orchestrating the rest of the ecosystem. Oozie 5 represents the next step in where Oozie is headed," added Kanter. "The Apache Oozie community has already got some great features in the works for our next release. We welcome anyone who wants to contribute to join us in making Oozie the best it can be."

Availability and Oversight
Apache Oozie software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache Oozie, visit http://oozie.apache.org/

About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees more than 350 leading Open Source projects, including Apache HTTP Server --the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 680 individual Members and 6,500 Committers across six continents successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Aetna, Alibaba Cloud Computing, ARM, Baidu, Bloomberg, Budget Direct, Capital One, Cash Store, Cerner, Cloudera, Comcast, Facebook, Google, Hortonworks, Huawei, IBM, Indeed, Inspur, iSIGMA, ODPi, LeaseWeb, Microsoft, PhoenixNAP, Pivotal, Private Internet Access, Red Hat, Target, Union Investment, and Yahoo. For more information, visit http://apache.org/ and https://twitter.com/TheASF

© The Apache Software Foundation. "Apache", "Oozie", "Apache Oozie", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.

# # #

Thursday December 14, 2017

The Apache Software Foundation Announces Apache® Hadoop® v3.0.0 General Availability

Ubiquitous Open Source enterprise framework maintains decade-long leading role in $100B annual Big Data market

Forest Hill, MD —14 December 2017— The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, today announced Apache® Hadoop® v3.0.0, the latest version of the Open Source software framework for reliable, scalable, distributed computing.

Over the past decade, Apache Hadoop has become ubiquitous within the greater Big Data ecosystem by enabling firms to run and manage data applications on large hardware clusters in a distributed computing environment.

"This latest release unlocks several years of development from the Apache community," said Chris Douglas, Vice President of Apache Hadoop. "The platform continues to evolve with hardware trends and to accommodate new workloads beyond batch analytics, particularly real-time queries and long-running services. At the same time, our Open Source contributors have adapted Apache Hadoop to a wide range of deployment environments, including the Cloud."

"Hadoop 3 is a major milestone for the project, and our biggest release ever," said Andrew Wang, Apache Hadoop 3 release manager. "It represents the combined efforts of hundreds of contributors over the five years since Hadoop 2. I'm looking forward to how our users will benefit from new features in the release that improve the efficiency, scalability, and reliability of the platform."

Apache Hadoop 3.0.0 highlights include:
  • HDFS erasure coding —halves the storage cost of HDFS while also improving data durability;
  • YARN Timeline Service v.2 (preview) —improves the scalability, reliability, and usability of the Timeline Service;
  • YARN resource types —enables scheduling of additional resources, such as disks and GPUs, for better integration with machine learning and container workloads;
  • Federation of YARN and HDFS subclusters transparently scales Hadoop to tens of thousands of machines;
  • Opportunistic container execution improves resource utilization and increases task throughput for short-lived containers. In addition to its traditional, central scheduler, YARN also supports distributed scheduling of opportunistic containers; and 
  • Improved capabilities and performance improvements for cloud storage systems such as Amazon S3 (S3Guard), Microsoft Azure Data Lake, and Aliyun Object Storage System.

Hadoop 3.0.0 has already undergone extensive testing and integration with the broader Open Source ecosystem at The Apache Software Foundation. With this release, its community of developers and users promote this release series out of beta.

Apache Hadoop is widely deployed at numerous enterprises and institutions worldwide, such as Adobe, Alibaba, Amazon Web Services, AOL, Apple, Capital One, Cloudera, Cornell University, eBay, ESA Calvalus satellite mission, Facebook, foursquare, Google, Hortonworks, HP, Hulu, IBM, Intel, LinkedIn, Microsoft, Netflix, The New York Times, Rackspace, Rakuten, SAP, Tencent, Teradata, Tesla Motors, Twitter, Uber, and Yahoo. The project maintains a list of known users at https://wiki.apache.org/hadoop/PoweredBy

"It's tremendous to see this significant progress, from the raw tool of eleven years ago, to the mature software in today's release," said Doug Cutting, original co-creator of Apache Hadoop. "With this milestone, Hadoop better meets the requirements of its growing role in enterprise data systems.  The Open Source community continues to respond to industrial demands."

Apache Hadoop's diverse community enjoys continued growth amongst the ASF's most active projects, and remains at the forefront of more than three dozen Apache Big Data projects.

Apache Hadoop committer history

Apache Hadoop has received countless awards, including top prizes at the Media Guardian Innovation Awards and Duke's Choice Awards, and has been hailed by industry analysts:

"...the lifeblood of organizational analytics…" —Gartner

"Hadoop Is Here To Stay" —Forrester

"...today Hadoop is the only cost-sensible and scalable open source alternative to commercially available Big Data management packages. It also becomes an integral part of almost any commercially available Big Data solution and de-facto industry standard for business intelligence (BI)." —MarketAnalysis.com/Market Research Media

"...commanding half of big data’s $100 billion annual market value...Hadoop is the go-to big data framework." —BigDataWeek.com

"Hadoop, and its associated tools, is currently the 'big beast' of the big data world and the Hadoop environment is undergoing rapid development..." —Bloor Research


"The opportunity to effect meaningful, even fundamental change in the Apache Hadoop project remains open," added Douglas. "Our new contributors uprooted the project from its historical strength in Web-scale analytics by introducing powerful, proven abstractions for data management, security, containerization, and isolation. Apache Hadoop drives innovation in Big Data by growing its community. We hope this latest release continues to draw developers, operators, and users to the ASF."

Catch Apache Hadoop in action at the Strata Data Conference in San Jose, CA, 5-8 March 2018, and at dozens of Hadoop Meetups held around the world.

Availability and Oversight
Apache Hadoop software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache Hadoop, visit http://hadoop.apache.org/

About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees more than 350 leading Open Source projects, including Apache HTTP Server —the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 680 individual Members and 6,300 Committers successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Alibaba Cloud Computing, ARM, Bloomberg, Budget Direct, Capital One, Cash Store, Cerner, Cloudera, Comcast, Facebook, Google, Hortonworks, Huawei, IBM, Inspur, iSIGMA, ODPi, LeaseWeb, Microsoft, PhoenixNAP, Pivotal, Private Internet Access, Red Hat, Serenata Flowers, Target, Union Investment, WANdisco, and Yahoo. For more information, visit http://www.apache.org/ and https://twitter.com/TheASF

© The Apache Software Foundation. "Apache", "Hadoop", "Apache Hadoop", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.

# # #

Monday September 25, 2017

The Apache Software Foundation Announces Apache® RocketMQ™ as a Top-Level Project

Open Source distributed messaging and streaming Big Data platform in use at Alibaba Group, Didi Chuxing, S.F. Express, WeBank, Peking University, and Chinese Academy of Sciences, among others.

Forest Hill, MD –25 September 2017– The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today that Apache® RocketMQ™ has graduated from the Apache Incubator to become a Top-Level Project (TLP), signifying that the project's community and products have been well-governed under the ASF's meritocratic process and principles.

Apache RocketMQ is an Open Source distributed messaging and streaming Big Data platform with low latency, high performance and reliability, trillion-level capacity and flexible scalability.

"I am very excited to see Apache RocketMQ as a Top-Level Project and I would like to thank our mentors for all their help, the Apache Incubator Project Management Committee for its advice and guidance, everyone in the RocketMQ community, and Alibaba for publishing the research upon which RocketMQ is based," said Xiaorui Wang, Vice President of Apache RocketMQ. "During the incubation process, the RocketMQ community worked very hard to develop high-quality distributed software for messaging and streaming, in an open and inclusive manner in accordance with the Apache Way."

RocketMQ originated at Alibaba in 2012, and, after handling 1.2 trillion concurrent online message transmissions in the Alibaba Nov. 11th Global Shopping Festival, was donated to the Apache Incubator in November 2016. Apache RocketMQ v4.0.0 was released in February 2017.

As a distributed messaging engine, RocketMQ features include:
  • Low latency; more than 99.6% response latency within 1 millisecond under high pressure;
  • Finance-oriented, high availability with tracking and auditing features;
  • Industry-sustainable, trillion-level message capacity guaranteed;
  • Vendor-neutral, support multiple messaging protocols like JMS and OpenMessaging;
  • Big Data friendly, batch transferring with versatile integration for flooding throughput; and
  • Massive accumulation, given sufficient disk space, accumulate messages without performance loss.

"RocketMQ was conceived from the outset as an open-source distributed messaging and streaming platform with low latency, high performance and reliability, trillion-level capacity and flexible scalability," said Von Gosling, original co-creator of RocketMQ and Chief Architect of Aliware MQ at Alibaba Group. "It has been great to witness the growth of the RocketMQ community and codebase as an ASF incubating project, and I look forward to this continuing as a Top-Level Project. Today, more than 100 companies are using Apache RocketMQ, with more feedback coming from the community. According to our data, more than 80% of the project's contributions are from outside the donator Alibaba Group."

In addition to Alibaba Group, Apache RocketMQ is in use at hundreds of companies and research/educational institutions that include Didi Chuxing, S.F. Express, WeBank, Peking University, and Chinese Academy of Sciences, among others.

"Graduation from the Incubator marks an important milestone for the RocketMQ project," said Bruce Snyder, Apache RocketMQ Incubator Mentor and Director of Software Development at SAP Hybris. "This is recognition of the focus and hard work of the project members to learn The Apache Way and drive community around RocketMQ. I am honored to have helped guide the project to a successful graduation."

"At Didi, we have used Apache RocketMQ as storage engine to build MessageQueue service. Based on high availability and high performance of RocketMQ we provide high-quality service," said Neil Qi, Architect at Didi Chuxing. "I believe RocketMQ will become the best MessageQueue project in future."

"New participants are more than welcome to join the project, To serve the community better, we created and maintained two repositories, one as our kernel version and the other one is for community contributions. The community contributed some integrated projects with some other Apache TLPs like Apache Storm, Apache Ignite, Apache Spark and Apache Flume," said Xinyu "yukon" Zhou, member of the Apache RocketMQ Project Management Committee. "We enthusiastically look forward to working together with all contributors to Apache RocketMQ in order to advance the state-of-the-art distributed messaging engine."

Availability and Oversight
Apache RocketMQ software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache RocketMQ, visit http://rocketmq.apache.org/ and https://twitter.com/ApacheRocketMQ

About the Apache Incubator
The Apache Incubator is the entry path for projects and codebases wishing to become part of the efforts at The Apache Software Foundation. All code donations from external organizations and existing external projects wishing to join the ASF enter through the Incubator to: 1) ensure all donations are in accordance with the ASF legal standards; and 2) develop new communities that adhere to our guiding principles. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF. For more information, visit http://incubator.apache.org/

About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees more than 350 leading Open Source projects, including Apache HTTP Server --the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 650 individual Members and 6,200 Committers across six continents successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Alibaba Cloud Computing, ARM, Bloomberg, Budget Direct, Capital One, Cash Store, Cerner, Cloudera, Comcast, Facebook, Google, Hortonworks, HP, Huawei, IBM, Inspur, iSigma, LeaseWeb, Microsoft, ODPi, PhoenixNAP, Pivotal, Private Internet Access, Red Hat, Serenata Flowers, Target, WANdisco, and Yahoo. For more information, visit http://apache.org/ and https://twitter.com/TheASF

© The Apache Software Foundation. "Apache", "RocketMQ", "Apache RocketMQ", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.

# # #

Tuesday August 22, 2017

The Apache Software Foundation Announces Apache® MADlib™ as a Top-Level Project

Big Data machine-learning library used for scalable in-database analytics

Forest Hill, MD –22 August 2017– The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today that Apache® MADlib™ has graduated from the Apache Incubator to become a Top-Level Project (TLP), signifying that the project's community and products have been well-governed under the ASF's meritocratic process and principles.

Apache MADlib is a comprehensive library for scalable in-database analytics. It provides parallel implementations of machine learning, graph, mathematical and statistical methods for structured and unstructured data.

"Graduating as a Top-Level Project is a very important milestone for Apache MADlib," said Aaron Feng, Vice President of Apache MADlib. "During the incubation process, the MADlib community worked very hard to develop high quality software for in-database analytics, in an open and inclusive manner in accordance with the Apache Way."

MADlib grew out of discussions between database engine developers, data scientists, IT architects and academics interested in new approaches to scalable, sophisticated in-database analytics. These discussions were written up in a paper from VLDB 2009 [1] that coined the term "MAD Skills" for data analysis. The MADlib software project began the following year as a collaboration between researchers at UC Berkeley and engineers and computer scientists at Pivotal (formerly EMC/Greenplum). In September 2015, MADlib joined the ASF community as an incubating project.

MADlib is deployed on a wide variety of industry and academic projects across many different verticals, including automotive, consumer, finance, government, healthcare, and telecommunications.

"MADlib was conceived from the outset as an open-source meeting ground for software developers, computing researchers and data scientists to collaborate on scalable, in-database machine learning and statistics," said Joe Hellerstein, Professor of Computer Science at UC Berkeley, Co-Founder and Chief Strategy Officer at Trifacta, and one of the original authors of MADlib. "It has been great to witness the growth of the MADlib community and codebase as an ASF incubating project, and I look forward to this continuing as a Top-Level Project."

"At Pivotal, we have seen our customers successfully deploy MADlib on large scale data science projects across a wide variety of industry verticals," said Elisabeth Hendrickson, Vice President, R&D for Data at Pivotal. "As MADlib graduates to a Top-Level Project at the ASF, we anticipate increased adoption in the enterprise given the mature level of the codebase and the active developer community."

"The potential of the Apache MADlib project is unbounded," said Jim Jagielski, Vice Chairman of the ASF. "The ability to perform in-depth and detailed analytics, on both structured and unstructured data, using SQL enables MADlib to be applicable in scenarios where others simply can't compete. As not only interest in, but real-world usage of, machine learning becomes common place, MADlib joins the growing roster of Apache projects that define innovation."

"Apache MADlib is a great example of the diversity at Apache," said Ted Dunning, Apache MADlib Incubator Mentor and Member of the ASF Board of Directors. "MADlib does state-of-the-art machine learning, but does as an inherent part of a database. This is a radical approach that can provide important design flexibility. I am excited to see MADlib become a fully fledged project at Apache."

"New participants are more than welcome to join the project," added Feng. "We enthusiastically look forward to working together with all contributors to Apache MADlib in order to advance the state-of-the-art of scale-out data science tools."

[1] http://dl.acm.org/citation.cfm?id=1687576

Availability and Oversight
Apache MADlib software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache MADlib, visit http://madlib.apache.org/ and https://twitter.com/ApacheMADlib

About the Apache Incubator
The Apache Incubator is the entry path for projects and codebases wishing to become part of the efforts at The Apache Software Foundation. All code donations from external organizations and existing external projects wishing to join the ASF enter through the Incubator to: 1) ensure all donations are in accordance with the ASF legal standards; and 2) develop new communities that adhere to our guiding principles. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF. For more information, visit http://incubator.apache.org/

About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees more than 350 leading Open Source projects, including Apache HTTP Server --the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 650 individual Members and 6,200 Committers across six continents successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Alibaba Cloud Computing, ARM, Bloomberg, Budget Direct, Capital One, Cash Store, Cerner, Cloudera, Comcast, Facebook, Google, Hortonworks, HP, Huawei, IBM, Inspur, iSigma, LeaseWeb, Microsoft, ODPi, PhoenixNAP, Pivotal, Private Internet Access, Red Hat, Serenata Flowers, Target, WANdisco, and Yahoo. For more information, visit http://apache.org/ and https://twitter.com/TheASF

© The Apache Software Foundation. "Apache", "MADlib", "Apache MADlib", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.

# # #

Wednesday July 26, 2017

The Apache Software Foundation Announces Apache® Fluo™ as a Top-Level Project

Newest addition to Apache Big Data ecosystem used for continual, incremental processing of data at petabyte scale

Forest Hill, MD –26 July 2017– The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today that Apache® Fluo™ has graduated from the Apache Incubator to become a Top-Level Project (TLP), signifying that the project's community and products have been well-governed under the ASF's meritocratic process and principles.

Apache Fluo is a distributed system for incrementally processing large data sets stored in Apache Accumulo (the sorted, distributed key/value store based on Google's Bigtable, built on top of Apache Hadoop, Apache Zookeeper, and Apache Thrift). With Fluo, users can continuously join new data into large existing data sets without reprocessing all data. Unlike batch and streaming frameworks, Fluo offers much lower latency and can operate on extremely large data sets.

"I am very excited to see Apache Fluo graduate and I would like to thank our mentors for all their help, the Apache Incubator Project Management Committee for its advice and guidance, everyone in the Fluo community, and Google for publishing the research upon which Fluo is based," said Keith Turner, Vice President of Apache Fluo. "As a result of collaboration within the community, we are graduating with a beautifully designed piece of software."

Based on Percolator (built on top of Bigtable to support incremental updates to the search index at Google), Fluo makes it possible to continually-update the results of a large-scale computation, index, or analytic as new data is discovered.

"Apache Fluo is a very clever piece of software, elegantly supplementing Apache Accumulo's ability to store and maintain very large indexes," said Christopher Tubbs, ASF Member and Committer on Apache Accumulo and Apache Fluo. "Its support of transactions enables Accumulo to solve a whole new set of big data problems, and its observer framework makes designing ingest workflows fun."

An example of how Fluo works is a use case of counting phrases in unique documents. This could be accomplished by two MapReduce jobs: one job to get a unique set of documents and a following job to count phrases. Where petabytes of documents are concerned, running both jobs for a small amount of new data is inefficient. Apache Fluo enables continuous, quick computations of these two joins as new data arrives, constantly emitting deltas of phrase counts. Anything could consume the emitted deltas. For example, a query system could be continuously updated using them.

"We are excited that Fluo is becoming a Top-Level Project at the Apache Software Foundation," said Dr. Adina Crainiceanu, Apache Rya (incubating) Committer and Associate Professor, Computer Science Department, United States Naval Academy. "Heartfelt congratulations to the Fluo community for achieving this important milestone. The Apache Rya project uses the observer framework in Fluo to cache and maintain answers to complex SPARQL queries for large RDF datasets. Using cached answers greatly improves Rya's performance for complex queries. Fluo complements Rya by allowing the incremental and continuous update of the cached answers. Fluo is particularly useful because it allows updates to happen as new data is ingested, reduces updates latency, avoids stale results, and circumvents the periodical reprocessing of the entire dataset. We are confident that Apache Fluo will become one of the important frameworks for updating indexing results in a dynamic data-acquiring context."

"Fluo fulfills an important role in the Apache Hadoop ecosystem, significantly expanding existing capabilities for working with large data sets," said Billie Rinaldi, ASF Member and former Vice President of Apache Accumulo. "I was excited to see this project come to the Apache Incubator, and am even more pleased to see it graduate to a top-level Apache project."

"We welcome new users and contributors to Apache Fluo," added Turner. "If you are interested in trying Fluo, check out the Fluo Tour on the project Website. Join our mailing lists to discuss how Fluo may be a good solution for your problem, as well as for help with debugging and finding starter issues."

Catch Apache Fluo in action and meet members of the Fluo community at Accumulo Summit, 16 October 2017 in Columbia, MD. http://accumulosummit.com/

Availability and Oversight
Apache Fluo software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache Fluo, visit http://fluo.apache.org/ and https://twitter.com/ApacheFluo

About the Apache Incubator
The Apache Incubator is the entry path for projects and codebases wishing to become part of the efforts at The Apache Software Foundation. All code donations from external organizations and existing external projects wishing to join the ASF enter through the Incubator to: 1) ensure all donations are in accordance with the ASF legal standards; and 2) develop new communities that adhere to our guiding principles. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF. For more information, visit http://incubator.apache.org/

About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees more than 350 leading Open Source projects, including Apache HTTP Server --the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 620 individual Members and 6,000 Committers successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Alibaba Cloud Computing, ARM, Bloomberg, Budget Direct, Capital One, Cash Store, Cerner, Cloudera, Comcast, Confluent, Facebook, Google, Hortonworks, HP, Huawei, IBM, InMotion Hosting, iSigma, LeaseWeb, Microsoft, ODPi, PhoenixNAP, Pivotal, Private Internet Access, Produban, Red Hat, Serenata Flowers, Target, WANdisco, and Yahoo. For more information, visit https://www.apache.org/ and https://twitter.com/TheASF

© The Apache Software Foundation. "Apache", "Fluo", "Apache Fluo", "Accumulo", "Apache Accumulo", "Rya", "Apache Rya", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.

# # #

Monday June 05, 2017

The Apache Software Foundation Announces Momentum With Apache® Hadoop® v2.8

Major release of the cornerstone of the Big Data ecosystem, from which dozens of Apache Big Data projects and countless industry solutions originate.

Forest Hill, MD —5 June 2017— The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today momentum with Apache® Hadoop® v2.8, the latest version of the Open Source software framework for reliable, scalable, distributed computing.

Now ten years old, Apache Hadoop dominates the greater Big Data ecosystem as the flagship project and community amongst the ASF's more than three dozen projects in the category.

"Apache Hadoop 2.8 maintains the project's momentum in its stable release series," said Chris Douglas, Vice President of Apache Hadoop. "Our community of users, operators, testers, and developers continue to evolve the thriving Big Data ecosystem at the ASF. We're committed to sustaining the scalable, reliable, and secure platform our greater Hadoop community has built over the last decade."

Apache Hadoop supports processing and storage of extremely large data sets in a distributed computing environment. The project has been regularly lauded by industry analysts worldwide for driving market transformation. Forrester Research estimates that firms will spend US$800M in Hadoop software and related services in 2017. According to Zion Market Research, the global Hadoop market is expected to reach approximately US$87.14B by 2022, growing at a CAGR of around 50% between 2017 and 2022.

Apache Hadoop 2.8 is the result of 2 years of extensive collaborative development from the global Apache Hadoop community. With 2,914 commits as new features, improvements and bug fixes since v2.7, highlights include:
  • Several important security related enhancements, including Hadoop UI protection of Cross-Frame Scripting (XFS) which is an attack that combines malicious JavaScript with an iframe that loads a legitimate page in an effort to steal data from an unsuspecting user, and Hadoop REST API protection of Cross site request forgery (CSRF) attack which attempt to force an authenticated user to execute functionality without their knowledge.

  • Support for Microsoft Azure Data Lake as a source and destination of data. This benefits anyone deploying Hadoop in Microsoft's Azure Cloud. The Azure Data Lake service was actually developed for Hadoop and analytics workloads.

  • The "S3A" client for working with data stored in Amazon S3 has been radically enhanced for scalability, performance, and security. The performance enhancements were driven by Apache Hive and Apache Spark benchmarks. In Hive TCP-DS benchmarks, Apache Hadoop is currently faster working with columnar data stored in S3  than Amazon EMR's closed-source connector. This shows the benefit of collaborative Open Source development.

  • Several WebHDFS related enhancements include integrated CSRF prevention filter in WebHDFS, support OAuth2 in WebHDFS, disallow/allow snapshots via WebHDFS, and more.

  • Integration with other applications has been improved with a separate jar for the hdfs-client than the hadoop-hdfs JAR with all the server side code. Downstream projects that access HDFS can depend on the hadoop-hdfs-client module to reduce the amount of transitive classpath dependencies.

  • YARN NodeManager Resource Reconfiguration through RM Admin CLI for a live cluster that allows YARN clusters to have a more flexible resource model especially for a Cloud deployment.

In addition to physical Hadoop clusters, where the majority of storage and computation lies, Apache Hadoop is very popular within Cloud infrastructures. Contributions from Apache Hadoop's diverse community includes improvements provided by Cloud infrastructure vendors and large Hadoop-in-Cloud users. These improvements include: Azure and S3 storage and YARN reconfiguration in particular, improve Hadoop's deployment on and integration with Cloud Infrastructures. The improvements in Hadoop 2.8 enable Cloud-deployed clusters to be more dynamic in sizing, adapting to demand by scaling up and down.

"My colleagues and I are happy that tests of Apache Hive and Hadoop 2.8 show that we are able to provide a similar experience reading data in from S3 as Amazon EMR, with its closed-source fork/rewrite of S3," said Steve Loughran, member of the Apache Hadoop Project Management Committee.

Hailed as a "Swiss army knife of the 21st century" by the Media Guardian Innovation Awards  and "the most important software you’ve never heard of…helped enable both Big Data and Cloud computing" by author Thomas Friedman, Apache Hadoop is used by an array of companies such as Alibaba, Amazon Web Services, AOL, Apple, eBay, Facebook, foursquare, IBM, HP, LinkedIn, Microsoft, Netflix, The New York Times, Rackspace, SAP,  Tencent, Teradata, Tesla Motors, Uber, and Twitter. Yahoo, an early pioneer, hosts the world's largest known Hadoop production environment to date, spanning more than 38,000 nodes.

Catch Apache Hadoop in action at DataWorks Summit 13-15 June 2017 in San Jose, CA.

Availability and Oversight
Apache Hadoop software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache Hadoop, visit http://hadoop.apache.org/ and https://twitter.com/hadoop

About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees more than 350 leading Open Source projects, including Apache HTTP Server --the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 680 individual Members and 6,000 Committers successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Alibaba Cloud Computing, ARM, Bloomberg, Budget Direct, Capital One, Cash Store, Cerner, Cloudera, Comcast, Confluent, Facebook, Google, Hortonworks, HP, Huawei, IBM, InMotion Hosting, iSigma, LeaseWeb, Microsoft, ODPi, PhoenixNAP, Pivotal, Private Internet Access, Produban, Red Hat, Serenata Flowers, Target, WANdisco, and Yahoo. For more information, visit http://www.apache.org/ and https://twitter.com/TheASF

© The Apache Software Foundation. "Apache", "Hadoop", "Apache Hadoop", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.

# # #

Wednesday May 17, 2017

The Apache Software Foundation Announces Apache® Beam™ v2.0.0

Open Source unified programming model for batch and streaming Big Data processing in use at Google Cloud, PayPal, and Talend, among others.

Forest Hill, MD —17 May 2017— The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today the availability of Apache® Beam™ v2.0.0, the first stable release of the unified programming model for both batch and streaming Big Data processing.

An Apache Top-Level Project (TLP) since December 2016, Beam includes Java and Python software development kits used to define data processing pipelines and runners to execute them on Apache Apex, Apache Flink, Apache Spark, and Google Cloud Dataflow, among other execution engines.

Apache Beam has its roots in Google's internal work on data processing over the last decade, evolving from the initial MapReduce system, through FlumeJava and MillWheel, into Google Cloud Dataflow v1.x, which defined the unified programming model that became the heart of Apache Beam.

"The first stable release is an important milestone for the Apache Beam community," said Davor Bonaci, Vice President of Apache Beam. "This is a statement from the community that it intends to maintain API stability with all releases for the foreseeable future, making Beam suitable for enterprise deployment."

Apache Beam v2.0.0 improves user experience across the project, focusing on seamless portability across execution environments, including engines, operating systems, on-premise clusters, cloud providers, and data storage systems. Other highlights include:
  • API stability and future compatibility within this major version;
  • Stateful data processing paradigms that unlock efficient, data-dependent computations;
  • Support for user-extensible file systems, with built-in support for Hadoop Distributed File System, among others; and
  • A metrics subsystem for deeper insight into pipeline execution.

Apache Beam is in use at Google Cloud, PayPal, and Talend, among others.

"Apache Beam is a mature data processing API for the enterprise, with powerful semantics that solve real-world challenges of stream processing," said Tomer Pilossof, Big Data Manager at PayPal. "With Beam, we provide data processing solutions for a wide range of customers within the PayPal organization."

"We at Talend are thrilled to have contributed to Apache Beam reaching the 2.0.0 milestone and its first official stable release," said Laurent Bride, Chief Technology Officer at Talend. "Apache Beam is now part of the foundation of Talend products. Recently, we released Talend Data Preparation for Big Data which leverages Beam to create transformation pipelines that are portable across many execution engines. Later this year, we plan to deliver Talend Data Streams, taking the Apache Beam integration one step further by utilizing its powerful streaming semantics. Whether for batch, streaming, or real-time use cases, Apache Beam is a powerful framework that delivers the flexibility and advanced functionality our customers need."

"We congratulate the Apache Beam community for reaching the key milestone of a first stable release," said William Vambenepe, Lead Product Manager for Big Data, Google Cloud. "We look forward to our Google Cloud Dataflow customers taking full advantage of Beam's powerful programming model and newest features to run their data processing pipelines on Google Cloud."

Apache Beam v2.0.0 is making its debut at Apache: Big Data, taking place this week in Miami, FL, with four sessions featuring Apache Beam. Apache Beam will also be highlighted at numerous face-to-face meetups and conferences, including the Future of Data San Jose meetup, Strata Data Conference London, Berlin Buzzwords, and DataWorks Summit San Jose.

"I'd like to invite everyone to try out Apache Beam v2.0.0 today and consider joining our vibrant community," added Bonaci. "We welcome feedback, contribution and participation through our mailing lists, issue tracker, pull requests, and events."

Availability and Oversight
Apache Beam software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache Beam, visit https://beam.apache.org/ and https://twitter.com/ApacheBeam

About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees more than 350 leading Open Source projects, including Apache HTTP Server -- the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 680 individual Members and 6,000 Committers successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Alibaba Cloud Computing, ARM, Bloomberg, Budget Direct, Capital One, Cash Store, Cerner, Cloudera, Comcast, Confluent, Facebook, Google, Hortonworks, HP, Huawei, IBM, InMotion Hosting, iSigma, LeaseWeb, Microsoft, ODPi, PhoenixNAP, Pivotal, Private Internet Access, Produban, Red Hat, Serenata Flowers, Target, WANdisco, and Yahoo. For more information, visit http://www.apache.org/ and https://twitter.com/TheASF

© The Apache Software Foundation. "Apache", "Beam", "Apache Beam", "Apex", "Apache Apex", "Flink", "Apache Flink", "Spark", "Apache Spark", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.

# # #

Monday May 01, 2017

The Apache Software Foundation Announces Apache® CarbonData™ as a Top-Level Project

Open Source Big Data analytics accelerator in use at Bank of Communications, Hulu, Huawei, SAIC Motor, Zhejiang Mobile, among others.

Forest Hill, MD –1 May 2017– The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today that Apache® CarbonData™ has graduated from the Apache Incubator to become a Top-Level Project (TLP), signifying that the project's community and products have been well-governed under the ASF's meritocratic process and principles.

Apache CarbonData is an indexed columnar store file format for fast analytics on Big Data platforms (including Apache Hadoop, Apache Spark, among others) to help speed up queries an order of magnitude faster over petabytes of data.

"We are very proud to complete the incubation process and graduate as an Apache Top-Level Project," said Liang Chen, Vice President of Apache CarbonData. "The CarbonData community grew rapidly over last ten months, both in terms of size and diversity. Since entering the Apache Incubator, we have completed 4 releases, and exceeded 90 contributors from 10 different organizations."

With the aim of using a unified file format to satisfy all kinds of data analysis cases, Apache CarbonData seamlessly integrates with Hadoop and Spark to improve Big Data analysis efficiency. In benchmarks, CarbonData's faster interactive query helps in speeding up queries approximately 10x faster than standard column-oriented SQL on Hadoop data stores.

Highlights include:

  • Unique data organization to allow faster filtering and better compression;
  • Multi-level Indexing to enable faster search and speeding up query processing;
  • Deep Apache Spark Integration for dataframe + SQL compliance;
  • Advanced push down optimization to minimize the amount of data being read processed, converted, transmitted, and shuffled;
  • Efficient compression and global encoding schemes to further improve aggregation query performance;
  • Dictionary encoding for reduced storage space and faster processing; and
  • Data update + delete support using standard SQL syntax.


Apache CarbonData is in use at an array of organizations, including Bank of Communications, medical/pharma social platform DXY, Hulu, Huawei, group online retailer MEITUAN, SAIC Motor, Zhejiang Mobile, among others.

"CarbonData has very good performance as a ‘SQL on Hadoop’ solution," said Tan Sheng, Director of SAIC Motor’s Big Data team. "It is suitable for SAIC Motor to adopt as a central Big Data platform component. Not only do we use Apache CarbonData, we also actively participate in its community as contributors." 

"Apache CarbonData is great, as helped our audit business to improve 7-10X performance based on 14 billion rows of data," said Wei Zhao, Senior Engineer at Bank of Communications.

"Apache CarbonData is very suitable for our filter query cases, and has averaged 20x improvement on performance," said William Zhu, Architecture team member at DXY. "And, as CarbonData supports data update and delete, this feature is very useful. We would consider CarbonData as our all-in-one solution to unify all analysis data."

CarbonData was first developed at Huawei in 2013. The project was submitted to the Apache Incubator in June 2016, and had its first official release two months later. The project won top honors in the BlackDuck 2016 Open Source Rookies of the Year's Big Data category.

"Apache CarbonData is a great example of the value of the incubation process," said Jean-Baptiste Onofré, Apache CarbonData Incubator Mentor and Project Management Committee member. "Helping grow the CarbonData developer and user communities has increased our visibility, which allowed us to extend our use cases and tests, and gather new ideas. The initial CarbonData committers did (and are still doing) great work to welcome new users and contributors, clearly understanding it's a step forward for the project."

"We will continue to put our efforts towards optimizing data format efficiency for Big Data ecosystem and provide an unified and high performance data storage solution," added Liang. "The Apache CarbonData community welcomes interested contributors to work with us on our journey forward."

Catch Apache CarbonData in action at ApacheCon (16-18 May/Miami), and Spark Summit (5-7 June/San Francisco).

Availability and Oversight
Apache CarbonData software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache CarbonData, visit http://carbondata.apache.org/ , https://twitter.com/ApacheCarbonDat , and https://www.facebook.com/carbondata/

About the Apache Incubator
The Apache Incubator is the entry path for projects and codebases wishing to become part of the efforts at The Apache Software Foundation. All code donations from external organizations and existing external projects wishing to join the ASF enter through the Incubator to: 1) ensure all donations are in accordance with the ASF legal standards; and 2) develop new communities that adhere to our guiding principles. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF. For more information, visit http://incubator.apache.org/

About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees more than 350 leading Open Source projects, including Apache HTTP Server --the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 620 individual Members and 6,000 Committers successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Alibaba Cloud Computing, ARM, Bloomberg, Budget Direct, Capital One, Cash Store, Cerner, Cloudera, Comcast, Confluent, Facebook, Google, Hortonworks, HP, Huawei, IBM, InMotion Hosting, iSigma, LeaseWeb, Microsoft, ODPi, PhoenixNAP, Pivotal, Private Internet Access, Produban, Red Hat, Serenata Flowers, Target, WANdisco, and Yahoo. For more information, visit http://www.apache.org/ and https://twitter.com/TheASF

© The Apache Software Foundation. "Apache", "CarbonData", "Apache CarbonData", "Hadoop", "Apache Hadoop", "Spark", "Apache Spark", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.

# # # 

The Apache Software Foundation Announces Apache® Mahout™ v0.13.0

Open Source scalable machine learning and data mining library for Big Data artificial intelligence now more powerful and easier to use.

Forest Hill, MD —1 May 2017— The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today the availability of Apache® MahoutTM v0.13.0, the latest version of the Open Source scalable machine learning library.

Apache Mahout provides an environment for quickly creating machine-learning applications that scale and run on the highest-performance parallel computation engines available. Mahout is the first scalable generalized tensor and linear algebra solving engine taking data scientists from interactive experiments to production use.

"Apache Mahout 0.13.0 is more powerful with its new algorithm framework that allows for easier implementation of machine learning algorithms," said Andrew Palumbo, Vice President of Apache Mahout. "The enhanced Mahout code base and development framework make machine learning even more accessible, which is a game changer in the field of artificial intelligence."

Mahout provides a wide variety of premade algorithms (Matrix Factorization, QR via ALS, SSVD, PCA, etc.) for Scala + Apache Spark, H2O, and Apache Flink, as well as on-GPU compute for performance improvements in very large tensor math. Apache Mahout provides the data science tools to automatically find meaningful patterns in Big Data sets by supporting the following main data science use cases:
  • Collaborative filtering – mines user behavior and makes product recommendations (such as eCommerce product recommenders);
  • Regression – estimates a numerical value based on values of other inputs;
  • Clustering – takes items in a particular class (such as Web pages or newspaper articles) and organizes them into naturally occurring groups, such that items belonging to the same group are similar to each other; and
  • Classifying – learns from existing categorizations and then assigns unclassified items to the best category.

New in v0.13.0
Apache Mahout now makes it easier to do matrix math on graphics cards, which is relevant for most modern machine-learning and deep-learning methods. In addition, v0.13.0 allows shared nothing computation on GPUs, on multi-core CPU, or in the JVM as appropriate, as well as a simplified framework for building new algorithms. As Mahout comprises an interactive environment and library that support generalized scalable linear algebra and include many modern machine-learning algorithms, the project has also collaborated with developers on other projects, including the Open Source linear algebra library ViennaCL, the Java wrapper library interface JavaCPP, and the graphics processor technology manufacturer NVIDIA to add CUDA bindings directly into Mahout for simplicity of development.

The v0.13.0 release reflects 62 separate JIRA issues from v0.12.2, including numerous enhancements to Mahout-Samsara, the vector math experimentation environment with R-like syntax that works at scale. Complete release notes are at http://mahout.apache.org/release-notes/Apache-Mahout-0.13.0-Release-Notes.pdf

Future versions of Mahout will include support for native iterative solvers, a more robust algorithm library, and smarter probing and optimization of multiplications, among other features.

A comprehensive list of users of Apache Mahout is available at https://mahout.apache.org/general/powered-by-mahout.html ; current users are mostly researchers and developers actively involved in building distributed machine-learning pipelines and tools.

"We thank our community of developers and users who helped make this milestone release possible, and welcome new contributors to help us advance machine learning," added Palumbo.

Catch Apache Mahout in action at Apache: Big Data, where attendees learn first-hand from many original project creators and companies from the greater Mahout community. Apache: Big Data will be held 16-18 May 2017 in Miami, FL. To register, and for more information, visit http://apachecon.com/

Availability and Oversight
Apache Mahout software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache Mahout, visit http://mahout.apache.org/ and https://twitter.com/ApacheMahout

About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees more than 350 leading Open Source projects, including Apache HTTP Server --the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 620 individual Members and 6,000 Committers successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Alibaba Cloud Computing, ARM, Bloomberg, Budget Direct, Capital One, Cash Store, Cerner, Cloudera, Comcast, Confluent, Facebook, Google, Hortonworks, HP, Huawei, IBM, InMotion Hosting, iSigma, LeaseWeb, Microsoft, ODPi, PhoenixNAP, Pivotal, Private Internet Access, Produban, Red Hat, Serenata Flowers, Target, WANdisco, and Yahoo. For more information, visit http://www.apache.org/ and https://twitter.com/TheASF

© The Apache Software Foundation. "Apache", "Flink", "Apache Flink", "Mahout", "Apache Mahout", "Spark", "Apache Spark", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.

# # #

Thursday January 26, 2017

ApacheCon: Tomorrow's Technology Today.

Since 1999, The Apache Software Foundation (ASF) has been recognized as a leading source for Open Source software and tools that meet the demand for interoperable, adaptable, and sustainable solutions. 

The all-volunteer ASF develops, stewards, and incubates dozens of enterprise-grade Open Source projects that power mission-critical applications in financial services, aerospace, publishing, government, healthcare, research, infrastructure, and more. From Abdera to ZooKeeper, the ASF's reliable, community-driven software continues to grow dramatically across many categories, including Cloud, IoT and Edge Computing, Artificial Intelligence and Deep Learning, Mobile, and Big Data, where the Apache Hadoop ecosystem dominates the marketplace.

Today, many of the ASF's 300+ projects serve as the backbone for some of the world's most visible and widely used applications in Big Data (Cassandra, Hadoop, Spark); Cloud (CouchDB, CloudStack, Mesos); Search and CMS (Derby, Jackrabbit, Lucene/Solr); DevOps and Build Management (Ant, Buildr, Maven); Web Frameworks (Flex, OFBiz, Struts); Servers (HTTP Web Server, Tomcat, Traffic Server); among others.

Apache technologies also command some of the most sought-after career skills. Hiring managers surveyed in the 2016 Open Source Jobs Report state that Open Source recruitment will increase more than any other part of their business over the next six months. Knowing which are the essential skills is simply not enough: keeping current with Apache projects and actively participating in their communities are essential in understanding how to address business needs and how to best apply the tips and tools available.

The ASF's official global conference series, ApacheCon, is one of the most effective ways to quickly gain competitive advantage. Over the past 17 years, ApacheCon has drawn attendees from more than 60 countries to learn about core Open Source technologies directly from Apache project developer and user communities --independent of business interests, corporate biases, or sales pitches. Highlights include:

  • Timely Content --learn first-hand from the largest collection of global Apache project communities through detailed sessions, and standalone tracks such as Apache: Big Data, Flex Project Summit, TomcatCon, Apache: IoT, and CloudStack Collaboration Conference. Breaking industry news? You'll hear it here first.

  • Innovation Insight --presentations from the Apache Incubator (the ASF's hub for Open Source innovations, where a record 64 projects are currently undergoing development) include the latest developments in data science, Cloud, embedded systems, and many other categories, as well as industry-specific areas such as climate, microfinances, and cryptography. Learn what's next.

  • Knowledge Exchange --meet the people behind dozens of Apache projects through ample networking opportunities including BarCampApache, hackathons, BoFs, and corridor discussions. Driving a project in new directions? Starting a new initiative? Ideation sparks here.

  • Education --gain the latest skills with in-depth tutorials, trainings, and workshops with low student-to-instructor ratio. Classes are often led by the original creators and companies behind some of the most popular projects in Open Source.

  • Sponsor Showcase and Expo --engage with some of the commercial products and service providers behind Apache project communities in a friendly, relaxed, non-sales environment. 

Join individual developers, Fortune 500 companies, start ups, educators, consultants, and Open Source enthusiasts at the largest dedicated gathering of the collective Apache community at ApacheCon North America 16-18 May 2016 in Miami, FL.

Important Dates:

 - Call for Participation now open through 11 February 2017 http://apachecon.com/
 - Travel Assistance applications open through 8 March 2017 https://www.apache.org/travel/
 - Conference Schedule published on 9 March 2017
 - Early registration and lodging incentives available through 12 March 2017 http://events.linuxfoundation.org/events/apachecon-north-america/attend/register-

For more information, visit http://apachecon.com/ and https://twitter.com/ApacheCon

# # #

Calendar

Search

Hot Blogs (today's hits)

Tag Cloud

Categories

Feeds

Links

Navigation