Entries tagged [hadoop]

Monday March 09, 2015

The Apache Software Foundation Announces Apache™ Tajo™ v0.10.0

Mature, robust, Open Source relational Big Data warehousing solution provides advanced "SQL-on-Hadoop®" functionality and support.

Forest Hill, MD —9 March 2015— The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today the availability of Apache™ Tajo™ v0.10.0, the latest version of the advanced Open Source data warehousing system in Apache Hadoop®.

Apache Tajo is used for low-latency and scalable ad-hoc queries, online aggregation, and ETL (extract-transform-load process) on large data sets stored on HDFS (Hadoop Distributed File System) and other data sources. By supporting SQL standards and leveraging advanced database techniques, Tajo allows direct control of distributed execution and data flow across a variety of query evaluation strategies and optimization opportunities.

"Tajo has evolved over the last couple of years into a mature 'SQL-on-Hadoop' engine," said Hyunsik Choi, Vice President of Apache Tajo. "The improved JDBC driver in this release allows users to easily access Tajo as if users use traditional RDBMSs. We have verified new JDBC driver on many commercial BI solutions and various SQL tools. It was easy and works successfully."

Tajo v0.10.0 reflects dozens of new features and improvements, including:

  • Oracle and PostgreSQL catalog store support
  • Direct JSON file support
  • HBase storage integration (allowing users to directly access HBase tables through Tajo)
  • Improved JDBC driver for easier use of JDBC application
  • Improved Amazon S3 support


A complete overview of all new enhancements can be found in the project release notes at https://dist.apache.org/repos/dist/dev/tajo/tajo-0.10.0-rc1/relnotes.html

Described as "a dark horse in the race for mass adoption" by GigaOM, Tajo is in use at numerous organizations worldwide, including Gruter, Korea University, Melon, NASA JPL Radio Astronomy and Airborne Snow Observatory projects, and SK Telecom for processing Web-scale data sets in real time.

Byeong Hwa Yun, Project Leader at Melon, said "Congratulations on 0.10.0 release! Melon is the biggest music streaming service company in S. Korea. We use Tajo as an ETL tool as well as an analytical processing system. We have experienced that Tajo makes our ETL jobs faster 1.5x-10x than Hive does. Besides, HBase storage integration in this release enables our analytic pipeline simpler. We hope that Tajo has a large role to play in the Apache Hadoop ecosystem." 

"I'm very happy with that Tajo has rapidly developed in recent years," said Jihoon Son, member of the Apache Tajo Project Management Committee. "One of the most impressive parts is the improved support on Amazon S3. Thanks to the EMR bootstrap, users can exploit Tajo's advanced SQL functionalities on AWS with just a few clicks."

Availability and Oversight
Apache Tajo software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache Tajo, visit http://tajo.apache.org/ and https://twitter.com/ApacheTajo

About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees more than 350 leading Open Source projects, including Apache HTTP Server --the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 500 individual Members and 4,500 Committers successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Budget Direct, Cerner, Citrix, Cloudera, Comcast, Facebook, Google, Hortonworks, HP, IBM, InMotion Hosting, iSigma, Matt Mullenweg, Microsoft, Pivotal, Produban, WANdisco, and Yahoo. For more information, visit http://www.apache.org/ and follow https://twitter.com/TheASF

# # #

© The Apache Software Foundation. "Apache", "Tajo", "Apache Tajo", "Hadoop", "Apache Hadoop", and the Apache Tajo logo are registered trademarks or trademarks of The Apache Software Foundation. All other brands and trademarks are the property of their respective owners.

Tuesday February 24, 2015

The Apache Software Foundation Announces Apache™ HBase™ v1.0

Stable version of Open Source, distributed Big Data store for Apache Hadoop features improved performance, ease of use, new availability guarantees, and future release compatibility.

Forest Hill, MD –24 February 2015– The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today the availability of Apache™ HBase™ v1.0, the distributed, scalable, database for Apache™ Hadoop™ and HDFS™.

"Apache HBase v1.0 marks a major milestone in the project's development," said Michael Stack, Vice President of Apache HBase. "It is a monumental moment that the army of contributors who have made this possible should all be proud of. The result is a thing of collaborative beauty that also happens to power key, large-scale Internet platforms."

Dubbed the "Hadoop Database", HBase is used on top of Apache Hadoop and HDFS (Hadoop Distributed File System) for random, real-time read/write access for Big Data (billions of rows X millions of columns) across clusters of commodity hardware. HBase is used by Apple, Facebook, FINRA, Flipboard, Flurry, Pinterest, RocketFuel, Salesforce, Xiaomi, and Yahoo!, among many other organizations. 

Apache HBase has also fostered a healthy ecosystem of projects that run on top of it, such as Apache Phoenix, a SQL layer over HBase, and OpenTSDB, a time series database that uses HBase as its backing store.

"Medium- and high- scale services at hundreds of enterprises and some of the largest Internet companies today are backed by Apache HBase," explained Andrew Purtell, member of the Apache HBase Project Management Committee. "Chances are when using your computer or mobile device you interact with a system built with HBase many times daily without ever knowing it. The HBase 1.0 release appropriately acknowledges a maturity already achieved by the Apache HBase community and software both, and is a great occasion to learn more about HBase, how it can help you solve your scale data challenges, and the growing ecosystem of Open Source and commercial software that chooses HBase as foundation."

Apache HBase v1.0 is the result of 7 years of development, and reflects more than 1,500 changes and upgrades over the previous major release, Apache HBase 0.98.0. Notable new features include:

  • Improved performance without sacrificing stability;
  • Introduction of new APIs and reorganization of select client-side APIs;
  • Read availability using timeline consistent region replicas for new availability guarantees;
  • Online configuration change to enable reloading a subset of the server configuration without restarting the region servers; and
  • New look, enhanced usability, and radically revamped documentation.

(Please see the accompanying technical fact sheet at https://blogs.apache.org/hbase/entry/start_of_a_new_era for details on new functionality). 

Lars Hofhansl, Principal Architect at Salesforce.com, and member of the HBase Project Management Committee, said, "Over 13,000 JIRA issues were filed to get HBase where it is now. Going forward we have a clear compatibility story between major and minor versions."

"This is a very exciting moment for Apache HBase, and goes to show how far we have come as a community in stabilizing and maturing Apache HBase", said Francis Liu, Development Lead for Apache HBase at Yahoo. "HBase is an integral part of our technology stack powering numerous critical offstage processing use cases across our business in online advertising, search, communication, content personalization and targeting, and social, mobile and emerging products. Today, we operate some of the largest HBase clusters across a 3,000 server footprint, and look forward to working with the community with a stable release as a base to scale individual HBase clusters to millions of regions soon." 

"Hearty congratulations to the HBase community," said Ishan Chhabra, Lead of all things HBase at Rocketfuel Inc. "Apache HBase already powers our critical online applications and data pipelines over thousands of machines globally, and the community's relentless focus on stability and performance gives us the confidence to continue making it an integral part of our data stack as we scale to 10,000+ machines."

"Apache HBase is a critical data storage system at Pinterest, where we run it across thousands of nodes doing close to 10 million operations every second," said Raghavendra Prabhu, Head of Infrastructure at Pinterest. "HBase is the underlying technology behind Pinterest's Zen graph storage service, which powers key product features like the home feed, messages, notifications, network news and our interest graph. We are eagerly looking forward to the improvements in availability and reliability in HBase 1.0 and will continue to work with the community on improving it for large scale user facing workloads."

"HBase has been the cornerstone of our customer analytics platform Lily since late 2008", says Steven Noels, CTO of NGDATA. "Granted it was an adventurous choice at that time, but since then HBase has evolved and matured, reconfirming that choice time and time again. Seeing 1.0 (finally) shipping is a sign of confirmed adoption throughout all layers of the industry, from internet companies up to (in our case) large financial institutions, telcos and media companies. We are thankful to be part of such a strong, persistent and vibrant community development endeavour." 

"HBase-1.0.0 is the start of a new era," said Enis Söztutar, HBase v1.0 release manager and member of the Apache HBase Project Management Committee. "We have marked it as the next stable version of Apache HBase, and encourage all new users to start using this version."

"There is no rest for the wickedly talented set of contributors who made HBase 1.0," added Stack. "HBase 2.0 is already taking form in our master branch. Users can look forward to new orders of read/write and node count scaling and this time around they won't have to wait seven years on it shipping; HBase 2.0 will be out later this year."

Availability and Oversight
Apache HBase v1.0 is available immediately as a free download from http://www.apache.org/dyn/closer.cgi/hbase/ As with all Apache products, Apache HBase software is released under the Apache License v2.0, and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For documentation and ways to become involved with Apache HBase, visit http://hbase.apache.org/ and @HBase on Twitter.

Get Involved!
The HBase community welcomes contributions and participation through various mailing lists as well as attending face-to-face MeetUps, trainings, and events. Catch Apache HBase in action at HBaseCon, taking place 7 May 2015 in San Francisco http://hbasecon.com/

About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees more than 350 leading Open Source projects, including Apache HTTP Server --the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 500 individual Members and 4,500 Committers successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Budget Direct, Cerner, Citrix, Cloudera, Comcast, Facebook, Google, Hortonworks, HP, IBM, InMotion Hosting, iSigma, Matt Mullenweg, Microsoft, Pivotal, Produban, WANdisco, and Yahoo. For more information, visit http://www.apache.org/ and follow https://twitter.com/TheASF

© The Apache Software Foundation. "Apache", "Apache Hadoop", "Hadoop", "Apache HBase", "HBase", "Apache HDFS", "HDFS", "Apache Phoenix", "Phoenix", and "ApacheCon", are trademarks of The Apache Software Foundation. All other brands and trademarks are the property of their respective owners. 

Monday January 19, 2015

The Apache Software Foundation Announces Apache™ Falcon™ as a Top-Level Project

Open Source Big Data processing and management solution for Apache Hadoop™ in use at Hortonworks, InMobi, and Talend, among others.

Forest Hill, MD –19 January 2015– The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today that Apache™ Falcon™ has graduated from the Apache Incubator to become a Top-Level Project (TLP), signifying that the project's community and products have been well-governed under the ASF's meritocratic process and principles.

Apache Falcon is a data processing and management solution for Apache Hadoop™, designed for data motion, coordination of data pipelines, lifecycle management, and data discovery. Falcon provides enterprises higher quality and predictable outcomes for their data by enabling end consumers to quickly onboard their data and its associated processing and management tasks on Hadoop clusters. The platform is successfully deployed across various industries, including advertising, healthcare, mobile applications, software solutions, and technology.

"Apache Falcon solves a very important and critical problem in the big data space. Graduation to TLP marks an important step in progression of the project," said Srikanth Sundarrajan, Vice President of Apache Falcon. "Falcon has a robust road map to ease the pain of application developers and administrators alike in authoring and managing complex data management and processing applications."

"Graduation of Apache Falcon's is a proud moment for the community who came together to solve a very relevant problem of data processing and management in Hadoop ecosystem," said Mohit Saxena, CTO and co-founder InMobi, one of the largest users of Apache Falcon. "I also want to applaud the efforts of contributors, committers and user community who actively pitched in the development of Falcon and it is only because of their conviction and efforts project has graduated. I am hoping promotion of Falcon to TLP will increase the contribution and adoption across the community and help Falcon achieve newer heights." 

Falcon represents a significant step forward in the Hadoop platform by enabling easy data management. Users of Falcon platform simply define infrastructure endpoints, data sets and processing rules declaratively. These declarative configurations are expressed in such a way that the dependencies between these configured entities are explicitly described. This information about inter-dependencies between various entities allows Falcon to orchestrate and manage various data management functions.

"Falcon has evolved over the last couple of years into a mature data management solution for Apache Hadoop with many production deployments proving it to be very valuable for users to manage their data and associated processing on Hadoop clusters," said Venkatesh Seetharam, Apache Falcon Project Management Committee member. 

"As Hadoop usage patterns have matured, the highest value implementations are based on the data lake concept. Data lakes require prescriptive and reliable pipelines," explained Greg Pavlik, Vice President of Engineering at Hortonworks. "Apache Falcon represents the best and most mature --and therefore essential-- building block for modeling, managing and operating data lakes."

"Falcon has enabled our team to incrementally build up a complex pipeline comprised of over 90 processes and 200 feeds that would have been very challenging with Apache Oozie alone," said programmer Michael Miklavcic.

"I began to work on Falcon in my spare time for fun, but it quickly became interesting in relation to my job at Talend", said Jean-Baptise Onofré, Vice President of Apache Karaf and Software Architect at Talend. "As Talend DataIntegration provides features like CDC (Change Data Capture), and data notification, we are in the process of integrating Apache Falcon in Talend products." 

"Apache Falcon's graduation is a milestone for the project and a credit to its contributors. Its open, collaborative development has effected a robust community around software essential to the Hadoop ecosystem," said Chris Douglas, Falcon incubation mentor at the ASF. "By becoming a Top-Level Project, the ASF recognizes its demonstrated ability to self-govern. Congratulations to Falcon's users, to its contributors, and particularly to its new Project Management Committee on this achievement."

Availability and Oversight
As with all Apache products, Apache Falcon software is released under the Apache License v2.0, and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For documentation and ways to become involved with Apache Falcon, visit http://falcon.apache.org/ and @ApacheFalcon on Twitter

About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees more than 350 leading Open Source projects, including Apache HTTP Server --the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 500 individual Members and 4,500 Committers successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Budget Direct, Cerner, Citrix, Cloudera, Comcast, Facebook, Google, Hortonworks, HP, Huawei, IBM, InMotion Hosting, iSigma, Matt Mullenweg, Microsoft, Pivotal, Produban, WANdisco, and Yahoo. For more information, visit http://www.apache.org/ or follow https://twitter.com/TheASF.

© The Apache Software Foundation. "Apache", "Apache Falcon", "Falcon", "Apache Hadoop", "Hadoop", "Apache Oozie", "Oozie", "ApacheCon", and the Apache Falcon logo are trademarks of The Apache Software Foundation. All other brands and trademarks are the property of their respective owners.

# # #

Tuesday December 02, 2014

The Apache Software Foundation Announces Apache™ Drill™ as a Top-Level Project

World's first schema-free SQL query engine brings self-service data exploration to Apache Hadoop™

Forest Hill, MD –02 December 2014– The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 200 Open Source projects and initiatives, announced today that Apache™ Drill™ has graduated from the Apache Incubator to become a Top-Level Project (TLP), signifying that the project's community and products have been well-governed under the ASF's meritocratic process and principles. 

Apache Drill is the world's first schema-free SQL query engine that delivers real-time insights by removing the constraint of building and maintaining schemas before data can be analyzed. Drill users can run interactive ANSI SQL queries on complex or constantly evolving data including JSON, Parquet, and HBase without ever worrying about schema definitions. As a result, Drill not only enables rapid application development on Apache Hadoop™ but also allows enterprise BI analysts to access Hadoop in a self-service fashion. 

"Apache Drill's graduation is a testament to the maturity of the technology and a strong indicator of the active community that develops and supports it," said Jacques Nadeau, Vice President of Apache Drill. "Drill's vibrant community ensures that it will continue to evolve to meet the demands of self-service data exploration use cases." 

While providing faster time to value from data stored in Hadoop, Drill also reduces the burden on IT developers and administrators who prepare and maintain datasets for analysis. Analysts can explore data in real-time, pull in new datasets on the fly, and also use traditional BI tools to visualize the data easily – all by themselves. 

Inspired by Google's Dremel (an academic paper on interactive analysis of Web-scale datasets), and a vision to support modern big data applications, Drill entered the Apache Incubator in August 2012. The project currently has code contributions from individual committers representing MapR, Hortonworks, Pentaho, and Cisco, among others. 

"We see the Apache Top-Level Project status as a major milestone for Drill. With a growing user base and diverse community interest, we are excited that Drill will indeed be a game changer for Hadoop application developers and BI analysts alike," said Tomer Shiran, member of the Apache Drill Project Management Committee. 

Availability and Oversight
As with all Apache products, Apache Drill software is released under the Apache License v2.0, and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For documentation and ways to become involved with Apache Drill, visit http://drill.apache.org and https://twitter.com/ApacheDrill

About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees more than two hundred leading Open Source projects, including Apache HTTP Server --the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 500 individual Members and 4,500 Committers successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Budget Direct, Citrix, Cloudera, Comcast, Facebook, Google, Hortonworks, HP, Huawei, IBM, InMotion Hosting, Matt Mullenweg, Microsoft, Pivotal, Produban, WANdisco, and Yahoo. For more information, visit http://www.apache.org/ or follow @TheASF on Twitter. 

© The Apache Software Foundation. "Apache", "Apache Drill", "Drill", "Hadoop", "Apache Hadoop", "ApacheCon," and the Apache Drill logo are trademarks of The Apache Software Foundation. All other brands and trademarks are the property of their respective owners. 

# # # 

Monday September 29, 2014

The Apache Software Foundation Announces Apache™ Storm™ as a Top-Level Project

Easy-to-integrate distributed Open Source real-time computation framework adds reliable data processing capabilities to Apache Hadoop

Forest Hill, MD –29 September 2014– The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 200 Open Source projects and initiatives, announced today that Apache™ Storm™ has graduated from the Apache Incubator to become a Top-Level Project (TLP), signifying that the project's community and products have been well-governed under the ASF's meritocratic process and principles. 

"Apache Storm's graduation is not only an indication of its maturity as a technology, but also of the robust, active community that develops and supports it," said P. Taylor Goetz, Vice President of Apache Storm. "Storm’s vibrant community ensures that Storm will continue to evolve to meet the demands of real-time stream processing and computation use cases."

Apache Storm is a high-performance, easy-to-implement distributed real-time computation framework for processing fast, large streams of data, adding reliable data processing capabilities to Apache Hadoop. Using Storm, a Hadoop cluster can efficiently process a full range of workloads, from real-time to interactive to batch. 

Storm was originally developed at BackType prior to being acquired by Twitter, and entered the Apache Incubator in September 2013. The project currently has code contributions from individual committers representing Hortonworks, Twitter, Verisign, and Yahoo, among others.

"Becoming a top level project is huge for Storm and a testament to how active and diverse our user and developer communities are. Four years ago Storm was nothing more than an idea and it's been incredible to watch its growth from being open-sourced through joining the Apache incubator and now through graduation," said Nathan Marz, original creator of Storm. 

"Today's announcement marks a major milestone in the continued evolution of Storm since Yahoo initiated the proposal to move it to Apache in 2012. We are proud of our continued contributions to Storm that have led to the hardening of security, multi-tenancy support, and increased scalability. Today, Apache Storm is widely adopted at Yahoo for real-time data processing needs including content personalization, advertising, and mobile development. It's thrilling to see the Hadoop ecosystem and community expand with the continued adoption of Storm," said Andrew Feng, Distinguished Architect at Yahoo.

"The Storm community has come together, has built some fantastic software and has now graduated to top-level.  This process has been a great example of open source community building at its best," said Ted Dunning, Apache Storm Incubator Mentor.

Storm is ideal for real-time data processing workloads, and is used to define information sources and manipulations to allow batch, distributed processing of streaming data.  Benchmarked as processing one million 100 byte messages per second per node, Storm is fault-tolerant, scalable across clusters of machines, and easy to operate. Developers can write Storm topologies using any programming language, with production-suitable configurations capable in one day. In addition, Storm easily integrates with database systems, handling parallelization, partitioning, and retrying on failures where necessary.

"Graduation to a top level project gives users the confidence that they can adopt Apache Storm knowing that it's backed by a robust, sustainable developer community and the governance framework and processes of the ASF," added Goetz. "As a Chair of the Project Management Committee for Storm, I focus much of my energy encouraging developers to contribute code and get involved in the Storm community. We encourage this collaboration because it is the lifeblood of rapid, reliable innovation."

Availability and Oversight
As with all Apache products, Apache Storm software is released under the Apache License v2.0, and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For documentation and ways to become involved with Apache Storm, visit http://storm.apache.org/ and @Apache_Storm on Twitter

About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees more than two hundred leading Open Source projects, including Apache HTTP Server --the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 450 individual Members and 4,000 Committers successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Budget Direct, Citrix, Cloudera, Comcast, Facebook, Google, Hortonworks, HP, Huawei, IBM, InMotion Hosting, Matt Mullenweg, Microsoft, Pivotal, Produban, WANdisco, and Yahoo. For more information, visit http://www.apache.org/ or follow @TheASF on Twitter.

© The Apache Software Foundation. "Apache", "Apache Storm", "Storm", "ApacheCon", and the Apache Cayenne logo are trademarks of The Apache Software Foundation. All other brands and trademarks are the property of their respective owners.

# # #

Tuesday July 22, 2014

The Apache Software Foundation Announces Apache™ Tez™ as a Top-Level Project

Highly-efficient Open Source framework for Apache Hadoop® YARN-powered data processing applications in use at Microsoft, NASA, Netflix, and Yahoo, among others. 

Forest Hill, MD –22 July 2014– The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 170 Open Source projects and initiatives, announced today that Apache™ Tez™ has graduated from the Apache Incubator to become a Top-Level Project (TLP), signifying that the project's community and products have been well-governed under the ASF's meritocratic process and principles. 

"Graduation to a top-level Apache project is a significant validation of the community momentum behind Tez," said Hitesh Shah, Vice President of Apache Tez. 
Apache Tez is an embeddable and extensible framework for building high-performance batch and interactive data processing engines and tools that require out-of-the-box integration with Apache Hadoop® YARN. Tez leverages Hadoop’s unparalleled ability to process petabyte-scale datasets, allowing projects in the Apache Hadoop ecosystem (such as Apache Hive and Apache Pig) and third-party software vendors to express fit-to-purpose data processing logic in a way that meets their unique demands for fast response times and extreme throughput. 

Tez's customizable execution architecture enables scalable, purpose-built data-processing computations, and also allows for dynamic performance optimizations based on real information about the data and the resources required to process it. 

Tez was originally developed by Hortonworks, and entered the Apache Incubator in February 2013. The project currently has code contributions from individuals representing Cloudera, Facebook, Hortonworks, LinkedIn, Microsoft, Twitter, and Yahoo. 

"I'm really happy to see the graduation of Apache Tez from the Incubator. The community has worked diligently to get to this point," said Chris Mattmann, Apache Tez Incubator Mentor, and Chief Architect, Instrument and Science Data Systems Section at NASA JPL. "Tez makes queries on Hadoop databases like Hive interactive, instead of batch oriented. Tez is similar to recently graduated projects in the Apache Big Data ecosystem including Apache Spark and also Apache Tajo, projects with similar goals of speeding up queries in Hadoop. My data science team at NASA is looking at Tez, Spark, and Tajo and evaluating them on projects in climate science and in radio astronomy." 

"Netflix builds its big data analytics platform in the cloud by leveraging open source technologies such as Apache Hadoop, Hive, Pig and more," said Cheolsoo Park, Senior Software Engineer at Netflix and Vice President of Apache Pig. "While MapReduce has served us well for years, Tez is a welcome improvement. Netflix has made significant contributions to the development of Pig-on-Tez alongside with Hortonworks, LinkedIn, and Yahoo. Based on our initial benchmark of Pig-on-Tez, it is nearly twice as fast as MapReduce for some of our heavy production jobs. This is a huge improvement in efficiency. We look forward to deploying Pig-on-Tez in production this year. We thank the Tez community for all your help and are excited that Tez has become an Apache top-level project." 

"Yahoo's business is built on Hadoop; it's essential to our ability to deliver personalized, delightful experiences for our users and create value for our advertisers," said Peter Cnudde, Vice President of Engineering, Yahoo. "We're committed to working closely with the Apache community to evolve the processing of Big Data at scale with technologies such as Apache Hive, Tez, and YARN." 

"It's fantastic to see Tez promoted to a top-level Apache project. Microsoft has invested in improving Hive performance by bringing innovation used in SQL Server to Hadoop, through contributions to Tez," said Eric Hanson, Principal Engineer in the HDInsight team at Microsoft and an Apache Hive Committer. "Hive on Tez enables major performance improvements of up to 100x, and we're happy it's available now on Microsoft Azure HDInsight, our Hadoop-based solution for the cloud." 

"Tez is on its way to becoming a cornerstone of core Apache projects like Apache Hive and Apache Pig and has been embraced by other important Open Source projects like Cascading. We look forward to continuing to grow our community and driving Tez adoption," added Shah. 

Availability and Oversight
As with all Apache products, Apache Tez software is released under the Apache License v2.0, and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project’s day-to-day operations, including community development and product releases. For documentation and ways to become involved with Apache Tez, visit http://tez.apache.org/

About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees more than one hundred and seventy leading Open Source projects, including Apache HTTP Server --the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 400 individual Members and 3,500 Committers successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Budget Direct, Citrix, Cloudera, Comcast, Facebook, Google, Hortonworks, HP, Huawei, IBM, InMotion Hosting, Matt Mullenweg, Microsoft, Pivotal, Produban, WANdisco, and Yahoo. For more information, visit http://www.apache.org/ or follow @TheASF on Twitter

"Apache", "Hadoop", "Apache Hadoop", "Hive", "Apache Hive", "MapReduce", "Hadoop MapReduce", "Pig", "Apache Pig", "Tez", "Apache Tez", and "ApacheCon" are trademarks of The Apache Software Foundation. All other brands and trademarks are the property of their respective owners. 

# # #

Friday May 30, 2014

The Apache Software Foundation Announces Apache™ Spark™ v1.0

Open Source large-scale, flexible, "Hadoop Swiss Army Knife" cluster computing framework offers enhanced data analysis and richer integration with other Apache projects

Forest Hill, MD –30 May 2014– The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 170 Open Source projects and initiatives, announced today the availability of Apache Spark v1.0, the super-fast, Open Source large-scale data processing and advanced analytics engine.

Apache Spark has been dubbed a "Hadoop Swiss Army knife" for its remarkable speed and ease of use, allowing developers to quickly write applications in Java, Scala, or Python, using its built-in set of over 80 high-level operators. With Spark, programs can run up to 100x faster than Apache Hadoop MapReduce in memory.

"1.0 is a huge milestone for the fast-growing Spark community. Every contributor and user who's helped bring Spark to this point should feel proud of this release," said Matei Zaharia, Vice President of Apache Spark.

Apache Spark is well-suited for machine learning,  interactive queries, and stream processing. It is 100% compatible with Hadoop's Distributed File System (HDFS), HBase, Cassandra, as well as any Hadoop storage system, making existing data immediately usable in Spark. In addition, Spark supports SQL queries, streaming data, and complex analytics such as machine learning and graph algorithms out-of-the-box.

New in v1.0, Apache Spark offers strong API stability guarantees (backward-compatibility throughout the 1.X series), a new Spark SQL component for accessing structured data, as well as richer integration with other Apache projects (Hadoop YARN, Hive, and Mesos).

Patrick Wendell, software engineer at Databricks and Apache Spark 1.0 release manager explained, "In addition to providing long-term stability for Spark's core APIs, this release contains a several new features. Spark 1.0 adds a unified submission tool for deploying applications on a local machine, Mesos, YARN, or a dedicated cluster. We've added a new module, Spark SQL, to provide schema-aware data modeling and SQL language support in Spark. Spark's machine learning library, MLLib, has been enhanced with several new algorithms. Spark’s streaming and graph libraries have also seen major updates. Across the board, we've focused on building tools to empower the data scientists, statisticians and engineers who must grapple with large data sets every day."

Spark was originally developed at UC Berkeley AMP Lab, and its ease of use has made it a go-to solution for both small and large enterprise environments across a wide range of industries, including Alibaba, ClearStory Data, Cloudera, Databricks, IBM, Intel, MapR, Ooyala, and Yahoo, among others. Not only are organizations rapidly adopting and deploying Apache Spark, many contributors are committing code to the project as well.

"Apache Spark is an important big data technology in delivering a high performance analytics solution for the IT industry and satisfying the fast-growing customer demand," said Michael Greene, Vice President and General Manager of System Technologies and Optimization at Intel. "Intel is proud to participate in its development and we congratulate the community on this release."

"At NASA, we're really excited to leverage Spark and its highly interactive analytic capabilities and the speedups offered by 1.0 along with Spark SQL are going to help out critical projects looking at measurement of Snow in the Western US and also on projects related to Regional Climate Modeling and in Model Evaluation for the U.S. National Climate Assessment related Activities," said Chris Mattmann, an ASF Director, Chief Architect, Instrument and Science Data Systems Section at NASA JPL, and Adjunct Associate Professor at the University of Southern California. "I'm looking forward to designing Spark-related projects in my Software Architectures and in my Search Engines courses at USC as well. The community is one of our most active at the ASF and the interest has really peaked and these guys are doing a great job."

"We're continuing to see very fast growth — 102 individuals have contributed patches to this release over the past four months, which is our highest number of contributors ever," added Zaharia.

Availability and Oversight
As with all Apache products, Apache Spark software is released under the Apache License v2.0, and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project’s day-to-day operations, including community development and product releases. For documentation and ways to become involved with Apache Spark, visit http://spark.apache.org/

About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees more than one hundred and seventy leading Open Source projects, including Apache HTTP Server --the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 400 individual Members and 3,500 Committers successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Budget Direct, Citrix, Cloudera, Comcast, Facebook, Google, Hortonworks, HP, Huawei, IBM, InMotion Hosting, Matt Mullenweg, Microsoft, Pivotal, Produban, WANdisco, and Yahoo. For more information, visit http://www.apache.org/ or follow @TheASF on Twitter.

"Apache", "Spark", "Apache Spark", and "ApacheCon" are trademarks of The Apache Software Foundation. All other brands and trademarks are the property of their respective owners.

 # # #

Wednesday October 16, 2013

The Apache Software Foundation Announces Apache™ Hadoop™ 2

Foundation of next-generation Open Source Big Data Cloud computing platform runs multiple applications simultaneously to enable users to quickly and efficiently leverage data in multiple ways at supercomputing speed.

Forest Hill, MD –16 October 2013– The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of nearly 150 Open Source projects and initiatives, today announced Apache™ Hadoop™ 2, the latest version of the Open Source software framework for reliable, scalable, distributed computing.

A foundation of Cloud computing and at the epicenter of "big data" solutions, Apache Hadoop enables data-intensive distributed applications to work with thousands of nodes and exabytes of data. Hadoop enables organizations to more efficiently and cost-effectively store, process, manage and analyze the growing volumes of data being created and collected every day. Apache Hadoop connects thousands of servers to process and analyze data at supercomputing speed.

The project's latest release marks a major milestone more than four years in the making, and has achieved the level of stability and enterprise-readiness to earn the General Availability designation. 

"With the release of stable Hadoop 2, the community celebrates not only an iteration of the software, but an inflection point in the project's development. We believe this platform is capable of supporting new applications and research in large-scale, commodity computing," said Apache Hadoop Vice President Chris Douglas. "The Apache Software Foundation creates the conditions for innovative, community-driven technology like Hadoop to evolve. When that process converges, the result is inspiring."

"Hadoop 2 marks a major evolution of the open source project that has been built collectively by passionate and dedicated developers and committers in the Apache community who are committed to bringing greater usability and stability to the data platform," said Arun C. Murthy, release manager of Apache Hadoop 2 and Founder of Hortonworks Inc. "It has been an honor and pleasure to work with the community and a personal thrill to see our four years of work on YARN finally coming to fruition in the GA of Hadoop 2.  Hadoop is truly becoming a cornerstone of the modern data architecture by enabling organizations to leverage the value of all their data, including capturing net-new data types, to drive innovative new services and applications."

"What started out a few years ago as a scalable batch processing system for Java programmers has now emerged as the kernel of the operating system for big data," said original Hadoop creator and ASF Board member Doug Cutting. "Over a dozen Apache projects integrate with Hadoop, with ten more in the Apache Incubator poised to soon join their ranks."

Dubbed a "Swiss army knife of the 21st century" and named "Innovation of the Year" by the 2011 Media Guardian Innovation Awards, Apache Hadoop is widely deployed at enterprise organizations around the globe, including industry leaders from across the Internet and social networking landscape such as Amazon Web Services, AOL, Apple, eBay, Facebook, foursquare, HP, LinkedIn, Netflix, The New York Times, Rackspace, and Twitter. Other technology leaders such as Microsoft, IBM, Teradata, SAP have integrated Apache Hadoop into their offerings. Yahoo!, an early pioneer, hosts the world’s largest known Hadoop production environment to date, spanning more than 35,000 nodes.

Under the Hood
Apache Hadoop 2 reflects intensive community- development, production experience, extensive testing, and feedback from hundreds of knowledgeable users, data scientists and systems engineers, bringing a highly stable, enterprise-ready release of the fastest-growing big data platform. 

New in Hadoop 2 is the addition of YARN that sits on top of HDFS and serves as a large-scale, distributed operating system for big data applications, enabling multiple applications to run simultaneously for more efficient support of data throughout its entire lifecycle. The culmination of so many other releases in the Hadoop 2.x line, the most current release --2.2.0-- is the first stable release in the 2.x line. Features include support support for:
  • Apache Hadoop YARN, a cornerstone of next generation Apache Hadoop, for running both data-processing applications (e.g. Apache Hadoop MapReduce, Apache Storm etc.) and services (e.g. Apache HBase)
  • High Availability for Apache Hadoop HDFS
  • Federation for Apache Hadoop HDFS for significant scale compared to Apache Hadoop 1.x.
  • Binary Compatibility for existing Apache Hadoop MapReduce applications built for Apache Hadoop 1.x. 
  • Support for Microsoft Windows. 
  • Snapshots for data in Apache Hadoop HDFS. 
  • NFS-v3 Access for Apache Hadoop HDFS. 

"The community has stepped up to the challenge of making Hadoop enterprise-ready, hardening the filesystem, providing high availability, adding critical security capabilities,and delivering integrations to enable consolidation of any kind or amount of enterprise data," said Aaron Myers, member of the Apache Hadoop Project Management Committee and Engineer at Cloudera.

"Today, with the announcement of Hadoop 2 and YARN, we've taken another step. Beyond the basic multitenancy customers have enjoyed for the past year, enabling them to mix batch, interactive and real-time workloads, they now have the ability to do so from within a stable foundational part of the Hadoop ecosystem. It's a testament to the community's work that now every distribution of Apache Hadoop will enjoy these benefits, ensuring that customers can deliver the applications they need, on a single Hadoop platform."

"It has been an honor and pleasure to work with the community and a personal thrill to see our four years of work on YARN finally coming to fruition in the GA of Hadoop 2," added Murthy. "Apache Hadoop is truly becoming a cornerstone of the modern data architecture by enabling organizations to leverage the value of all their data, including capturing net-new data types, to drive innovative new services and applications."

"A large portion of the credit for this success is due to Apache's open-source model, which has permitted a wide range of users and vendors to productively collaborate on a platform shared by all," added Cutting.

Availability and Oversight
As with all Apache products, Apache Hadoop software is released under the Apache License v2.0, and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. Apache Hadoop release notes, source code, documentation, and related resources are available at http://hadoop.apache.org/.

About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees nearly one hundred fifty leading Open Source projects, including Apache HTTP Server --the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 400 individual Members and 3,500 Committers successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including AMD, Basis Technology, Budget Direct, Citrix, Cloudera, Comcast, Facebook, Go Daddy, Google, HP, Hortonworks, Huawei, IBM, InMotion Hosting, Matt Mullenweg, Microsoft, PSW Group, Pivotal, WANdisco, and Yahoo!. For more information, visit http://www.apache.org/ or follow @TheASF on Twitter.

"Apache", "Apache Hadoop", "Hadoop", and "ApacheCon" are trademarks of The Apache Software Foundation. All other brands and trademarks are the property of their respective owners.

# # #

Wednesday July 24, 2013

The Apache Software Foundation Announces Apache™ Mesos™ as a Top-Level Project

Powerful Open Source cluster manager in use at Airbnb, Twitter, and the University of California, among others, for dynamic resource sharing across cluster applications

24 July 2013 --Forest Hill, MD-- The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of nearly 150 Open Source projects and initiatives, announced today that Apache Mesos has graduated from the Apache Incubator to become a Top-Level Project (TLP), signifying that the project's community and products have been well-governed under the ASF's meritocratic process and principles.

Apache Mesos is a cluster manager that provides efficient resource isolation and sharing across distributed applications, or frameworks. It can run multiple frameworks, including Apache Hadoop, MPI, Hypertable, Jenkins, Storm, and Spark, as well as other applications and custom frameworks.

"It was our goal all along to see Mesos become a kernel of the infrastructure stack of the future," said Benjamin Hindman, Vice President of Apache Mesos. "The project’s graduation from the Apache Incubator is recognition that the software is mature and has brought together a diverse community to sustain it in the future."

Initially created at the University of California at Berkeley's AMPLab (the research center also responsible for the original development of Apache Spark) to manage resource sharing and isolation in data centers, Mesos acts as a layer of abstraction between applications and pools of servers. Mesos helps avoid the necessity of creating separate clusters to run individual frameworks and instead making it possible to optimize how jobs are executed across shared machines.

Whilst in the Apache Incubator, Mesos had four releases, and established an Open Source community according to The Apache Way of governance. Additional improvements to the project includes its flexibility to support several application framework languages, and scalability that has been production tested to thousands of nodes and simulated to tens of thousands of nodes and hundreds of frameworks.

Apache Mesos has proven to be reliable for use in production, and has already been adopted by several organizations for cluster management.

"Mesos is the cornerstone of our elastic compute infrastructure," explained Chris Fry, Senior Vice President of Engineering at Twitter. "It's how we build all our new services and is critical for Twitter's continued success at scale … one of the primary keys to our data infrastructure efficiency."

"We're using Mesos to manage cluster resources for most of our data infrastructure," said Brenden Matthews, Engineer at Airbnb and Apache Mesos Committer. "We run Chronos, Storm, and Hadoop on top of Mesos in order to process petabytes of data." (Chronos is an Airbnb-developed Mesos framework as a replacement for cron, and an example of how custom frameworks can be developed on Mesos to leverage its resource sharing).

"Community support for Apache Mesos is encouraging, particularly as more companies assess how they manage their clusters and look for more efficiency," added Hindman. "Now that we've graduated, we look forward to continuing to grow the number of Mesos adopters and fostering an ecosystem around the project."

Availability and Oversight
As with all Apache products, Apache Mesos software is released under the Apache License v2.0, and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. Apache Mesos release notes, source code, documentation, and related resources are available at http://mesos.apache.org.

About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees nearly one hundred fifty leading Open Source projects, including Apache HTTP Server --the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 400 individual Members and 3,500 Committers successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo. The ASF is a US 501(3)(c) not-for-profit charity, funded by individual donations and corporate sponsors including AMD, Basis Technology, Citrix, Cloudera, Facebook, Go Daddy, Google, HP, Hortonworks, Huawei, IBM, InMotion Hosting, Matt Mullenweg, Microsoft, PSW Group, VMware, WANdisco, and Yahoo!. For more information, visit http://www.apache.org/ or follow @TheASF on Twitter.

"Apache", "Apache Mesos", "Mesos", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.

# # #

Wednesday May 16, 2012

The Apache Software Foundation Announces Unprecedented Growth During First Quarter of 2012

Demand for best-in-class Open Source solutions drives landmark achievements

Forest Hill, MD -–16 May 2012-– The Apache Software Foundation (ASF), developers, stewards, and incubators of nearly 150 Open Source projects and initiatives, today announced key milestones achieved in the first quarter of 2012.

Recognized as one of the most compelling communities in Open Source for shepherding, developing, and incubating innovations "The Apache Way", the ASF is responsible for millions of lines of code overseen by an all-volunteer community across six continents. Apache technologies power more than half the Internet, petabytes of data, teraflops of operations, billions of objects, and enhance the lives of countless users and developers.

The record-setting first quarter marked new highs across an array of Apache initiatives, including Top-Level Projects, incubating innovations, sponsorship, individual and corporate contributions, and infrastructure. This unprecedented growth reinforces the broad-reaching success of the ASF's best-in-class software products, the power of the Apache brand, and its highly-emulated community development practices.

"Our landmark success can be attributed to Apache’s longstanding commitment to providing exceptional Open Source products, each with a stable codebase and an active community," said ASF President Jim Jagielski. "The ASF makes it easy for all contributors, regardless of any affiliations, to collaborate."

Top-Level Projects: the ASF's core activities [1] involve the development of its Top-Level Projects (TLPs), whose day-to-day activities are overseen by a self-selected team of active contributors to each project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. As of 2002, the process for establishing new TLPs has been through the Apache Incubator. On occasion, a sub-project of an existing TLP may graduate to become a new, standalone TLP.

New TLPs graduating from the Apache Incubator in Q1 2012 are Apache Accumulo, Apache BVal, Apache Empire-db, Apache Gora, Apache Lucy, Apache OpenNLP, Apache Rave, and Apache Sqoop. This brings the total of TLPs to 104, marking the first time more than 100 TLPs are in active development (the ASF has had 121 TLPs in total; 20 have been retired to the Apache Attic).

Apache projects span Cloud computing and "Big Data" to Search and Semantics to application frameworks and build tools, providing the ability to meet the strong demand for interoperable, adaptable, ubiquitous, and sustainable Open Source solutions. There have been 87 new TLP software releases since January 2012, with milestone releases from Apache Cassandra, Apache Hadoop, Apache HTTP Server, and Apache TomEE.

After six years in development, Big Data powerhouse Apache Hadoop released v1.0 in January 2012, bolstering its popularity "as measured by substantial growth in client inquiries, dramatic rises in attendance at industry events, increasing financial investments, and the introduction of products from leading data management and data integration software vendors," according to Gartner Vice President Merv Adrian [2]. In addition, IDC’s Worldwide Hadoop-MapReduce Ecosystem Software Forecast [3] predicts market growth for Apache Hadoop and supporting Big Data products will exceed 60% annually.

The ASF's flagship project, the Apache HTTP Server, remains the world’s leading Web server, powering an all-time record of more than 425 million websites globally [4], and more than 500 community-developed modules to extend its functionality. In addition, the Apache HTTP Server celebrated its 17th Anniversary with the release of v2.4 in February 2012.

Apache Incubator: Open Source innovations intending to become fully-fledged Apache projects, including code donations from external organizations and existing external projects, must enter through the Apache Incubator [5]. Initiatives in development at the Apache Incubator –-known as "podlings"-- comprise both the project’s codebase and community. There were 19 new software releases from Apache Incubator podlings since January 2012.

A record 51 podlings are currently undergoing incubation, including Apache Bloodhound, Apache Cordova (formerly Callback), Apache Flex, Apache Giraph, and Apache Wave. Apache OpenOffice --the leading Open Source office productivity suite, and ASF’s first end-user-facing project-- successfully transitioned nearly 10 million lines of code in preparation for the release of OpenOffice v3.4, the first official Apache release under the auspices of The ASF. Apache OpenOffice v3.4 is now fully compliant under the Apache License v2, and was downloaded over 1 million times in its first week.

Over the past decade, 85 podlings have graduated from the Apache Incubator; 3 projects were retired, and 27 are considered dormant. New podlings in 2012 are Apache CloudStack, Apache DeviceMap, and Apache Syncope.

Sponsors: as a private, 501(c)(3) non-profit charitable organization, The ASF is funded through tax-deductible contributions from corporations, foundations, and private individuals [6]. For the past six years, The ASF Sponsorship Program [7] has helped offset day-to-day operating expenses such as bandwidth and connectivity, servers and hardware, legal and accounting services, marketing and public relations, general office expenditures, and support staff. New sponsors in 2012 include Citrix (Platinum), GoDaddy (Silver), Huawei (Silver), and Twitter (Bronze). They join Facebook, Google, Microsoft, and Yahoo! at the Platinum level; AMD, Hortonworks, HP, and IBM at the Gold level; Basis Technology, Cloudera, Matt Mullenweg, PSW Group, and SpringSource at the Silver level; and AirPlus International, BlueNog, Digital Primates, FuseSource, Intuit, Liip AG, Lucid Imagination, Talend, Two Sigma Investments, and WANdisco at the Bronze level.

Apache Members and Committers: in 1999, the ASF incorporated with an inaugural membership of 21 individuals who oversaw the progress of the Apache HTTP Server. Additions to this core group grew with developers who contributed code, patches, or documentation. Some of these contributors were subsequently granted "Committer" status by the Membership [8], granting access to: commit (write) directly to the code repository, vote on community-related decisions, and propose an active user for Committership. Those Committers [9] that demonstrate merit in the Foundation's growth, evolution, and progress are nominated for ASF Membership by existing members. The meritocratic "Contributor-Committer-Member" approach is the central governing process [10] across the Apache ecosystem.

The ASF's 391 active Members (449 total elected; 55 emeritus, 3 deceased); will be voting in new ASF Members in the upcoming annual Members meeting. The Committership process is ongoing, with no formal nomination or election timeframe. There are currently 2,967 active Apache Committers.

Contributions: in order for a project to become hosted at Apache it has to be licensed to the ASF [11] with a grant or contributor agreement in order for the ASF to gain the necessary intellectual property rights for the development and distribution of its projects. Whist all contributors of ideas, code, or documentation to the ASF must sign a Contributor License Agreement (CLA), copyright remains with the contributor --a license for reuse is given to the Foundation.

- Individuals --a signed Individual CLA (ICLA) is required before an individual is given commit rights to an ASF project to clearly define the terms under which intellectual property has been contributed, and allow the project to be defended should there be a legal dispute regarding its software. 251 Individual CLAs have been signed in 2012 thus far; a total of 4,651 have been signed overall.

- Corporations/Institutions --organizations whose employees are assigned to work on Apache projects as part of an employment agreement may sign a Corporate CLA (CCLA) to contribute intellectual property via the corporation. In addition, every developer must also sign their own ICLA to cover any of their contributions as an individual that are not owned by the corporation signing the CCLA. Companies that have signed CCLAs include Cloudera, Facebook, Hortonworks, kippdata GmbH, LinkedIn, and SoundCloud. In 2012 thus far, 17 new Corporate CLAs have been signed, totaling 384 overall.

Software Grants: those individuals or corporations that donate a body of existing software or documentation to an Apache project needs to execute a formal Software Grant Agreement (SGA) with the ASF. Typically, this is done after negotiating approval with the ASF Incubator or one of the PMCs, as the ASF will not accept software without a viable community to support a collaborative project. 9 SGAs have been signed during Q1 2012, with 337 SGAs on file.

Community Relations: in addition to Apache Members and Committers, countless developers and users contribute to the growth of Apache-based activities across the Open Source landscape. Two of the ASF's outreach-oriented committees include Community Development [12], which handles initiatives such as Google Summer of Code (17 TLPs and 9 Podlings are currently mentoring 41 student projects); and Conference Planning [13], which oversees Apache-themed BarCamps and MeetUps, and ApacheCon, the ASF's official conference, trainings, and expo. 2012 marks the return of ApacheCon Europe, taking place late Fall in Germany.

Infrastructure: a distributed team on two continents comprising 10 rotating volunteers and 4 paid staff keep the ASF infrastructure [14] of roughly two dozen servers and more than 75 distinct hosts --accessed by millions of people across the globe-- running 24x7x365. The Apache Infrastructure team has released 99.96GB of artifacts so far this year.

"There's no stopping the interest in Apache-led projects --from the number of innovations in the Incubator, to best-in-breed solutions powering mission-critical applications, to the widespread popularity of the Apache License," added Jagielski. "The Apache community at-large is driving this momentum by providing code, documentation, bug reports, design feedback, testing, evangelizing, mentoring, and more. There’s always a way to contribute!"

About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees nearly one hundred fifty leading Open Source projects, including Apache HTTP Server — the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 400 individual Members and 3,500 Committers successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo. The ASF is a US 501(3)(c) not-for-profit charity, funded by individual donations and corporate sponsors including AMD, Basis Technology, Citrix, Cloudera, Facebook, GoDaddy, Google, IBM, HP, Hortonworks, Huawei, Matt Mullenweg, Microsoft, PSW Group, SpringSource, and Yahoo!. For more information, visit http://www.apache.org/.

"Apache", "Apache ACE", "Apache Bloodhound", "Apache CloudStack", "Apache Cordova", "Apache Deltacloud", "Apache DeviceMap", "Apache Flex", "Apache Giraph", "Apache Hadoop", "Apache OpenOffice", "Apache Rave", "Apache Sqoop", "Apache Syncope", "Apache TomEE", "Apache Wave", and "ApacheCon" are trademarks of The Apache Software Foundation. All other brands and trademarks are the property of their respective owners.

Resources:
[1] Foundation Overview - http://www.apache.org/foundation/
[2] Apache Hadoop v1.0 press release - http://blogs.apache.org/foundation/date/20120104
[3] IDC Worldwide Hadoop-MapReduce Ecosystem Software Forecast - http://www.idg.com/www/pr.nsf/ByID/MBEN-8U6JAG
[4] Netcraft May 2012 Web Server Survey - http://news.netcraft.com/archives/2012/05/02/may-2012-web-server-survey.html
[5] Apache Incubator - http://incubator.apache.org/
[6] List of ASF Sponsors - http://www.apache.org/foundation/thanks.html
[7] ASF Sponsorship Program - http://apache.org/foundation/sponsorship.html
[8] ASF Members - http://apache.org/foundation/members.html
[9] ASF Committer Index - http://people.apache.org/committer-index.html
[10] How the ASF Works - http://www.apache.org/foundation/how-it-works.html
[11] Apache Licenses, CLAs, and Software Grants - http://www.apache.org/licenses/
[12] Apache Community Development - http://community.apache.org/
[13] Apache Conferences Committee - http://www.apache.org/foundation/conferences.html
[14] Apache Infrastructure - http://www.apache.org/dev/infrastructure.html

# # #

Monday April 02, 2012

The Apache Software Foundation Announces Apache Sqoop as a Top-Level Project

Open Source big data tool used for efficient bulk transfer between Apache Hadoop and structured datastores.

Forest Hill, MD --The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of nearly 150 Open Source projects and initiatives, today announced that Apache Sqoop has graduated from the Apache Incubator to become a Top-Level Project (TLP), signifying that the Project’s community and products have been well-governed under the ASF's meritocratic process and principles.

Designed to efficiently transfer bulk data between Apache Hadoop and structured datastores such as relational databases, Apache Sqoop allows the import of data from external datastores and enterprise data warehouses into Hadoop Distributed File System or related systems like Apache Hive and HBase.

"The Sqoop Project has demonstrated its maturity by graduating from the Apache Incubator," explained Arvind Prabhakar, Vice President of Apache Sqoop. "With jobs transferring data on the order of billions of rows, Sqoop is proving its value as a critical component of production environments."

Building on the Hadoop infrastructure, Sqoop parallelizes data transfer for fast performance and best utilization of system and network resources. In addition, Sqoop allows fast copying of data from external systems to Hadoop to make data analysis more efficient and mitigates the risk of excessive load to external systems. 

"Connectivity to other databases and warehouses is a critical component for the evolution of Hadoop as an enterprise solution, and that's where Sqoop plays a very important role" said Deepak Reddy, Hadoop Manager at Coupons.com. "We use Sqoop extensively to store and exchange data between Hadoop and other warehouses like Netezza. The power of Sqoop also comes in the ability to write free-form queries against structured databases and pull that data into Hadoop."

"Sqoop has been an integral part of our production data pipeline" said Bohan Chen, Director of the Hadoop Development and Operations team at Apollo Group. "It provides a reliable and scalable way to import data from relational databases and export the aggregation results to relational databases."

Since entering the Apache Incubator in June 2011, Sqoop was quickly embraced as an ideal SQL-to-Hadoop data transfer solution. The Project provides connectors for popular systems such as MySQL, PostgreSQL, Oracle, SQL Server and DB2, and also allows for the development of drop-in connectors that provide high speed connectivity with specialized systems like enterprise data warehouses.

Craig Ling, Director of Business Systems at Tsavo Media, said "We adopted the use of Sqoop to transfer data into and out of Hadoop with our other systems over a year ago. It is straight forward and easy to use, which has opened the door to allow team members to start consuming data autonomously, maximizing the analytical value of our data repositories."

Availability and Oversight

Apache Sqoop software is released under the Apache License v2.0, and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. Apache Sqoop source code, documentation, mailing lists, and related resources are available at http://sqoop.apache.org/. A timeline of the project's history through graduation from the Apache Incubator is also available.

About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees nearly one hundred fifty leading Open Source projects, including Apache HTTP Server — the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 350 individual Members and 3,000 Committers successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo. The ASF is a US 501(3)(c) not-for-profit charity, funded by individual donations and corporate sponsors including AMD, Basis Technology, Cloudera, Facebook, Google, IBM, HP, Hortonworks, Matt Mullenweg, Microsoft, PSW Group, SpringSource/VMware, and Yahoo!. For more information, visit http://www.apache.org/.

"Apache", "Apache Sqoop", and "ApacheCon" are trademarks of The Apache Software Foundation. All other brands and trademarks are the property of their respective owners.

#  #  #

Wednesday January 04, 2012

The Apache Software Foundation Announces Apache Hadoop™ v1.0

Open Source "Big Data" Cloud computing platform powers millions of compute-hours to process exabytes of data for Amazon.com, AOL, Apple, eBay, Facebook, foursquare, HP, IBM, LinkedIn, Microsoft, Netflix, The New York Times, Rackspace, Twitter, Yahoo!, and more.

4 January 2012 —FOREST HILL, MD— The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of nearly 150 Open Source projects and initiatives, today announced Apache™ Hadoop™ v1.0, the Open Source software framework for reliable, scalable, distributed computing. The project’s latest release marks a major milestone six years in the making, and has achieved the level of stability and enterprise-readiness to earn the 1.0 designation.

A foundation of Cloud computing and at the epicenter of "big data" solutions, Apache Hadoop enables data-intensive distributed applications to work with thousands of nodes and exabytes of data. Hadoop enables organizations to more efficiently and cost-effectively store, process, manage and analyze the growing volumes of data being created and collected every day. Apache Hadoop connects thousands of servers to process and analyze data at supercomputing speed. 

"This release is the culmination of a lot of hard work and cooperation from a vibrant Apache community group of dedicated software developers and committers that has brought new levels of stability and production expertise to the Hadoop project," said Arun C. Murthy, Vice President of Apache Hadoop. "Hadoop is becoming the de facto data platform that enables organizations to store, process and query vast torrents of data, and the new release represents an important step forward in performance, stability and security.

"Originating with technologies developed by Yahoo, Google, and other Web 2.0 pioneers in the mid-2000s, Hadoop is now central to the big data strategies of enterprises, service providers, and other organizations," wrote James Kobielus in the independent Forrester Research, Inc. report, "Enterprise Hadoop: The Emerging Core Of Big Data" (October 2011).

Dubbed a "Swiss army knife of the 21st century" and named "Innovation of the Year" by the 2011 Media Guardian Innovation Awards, Apache Hadoop is widely deployed at organizations around the globe, including industry leaders from across the Internet and social networking landscape such as Amazon Web Services, AOL, Apple, eBay, Facebook, foursquare, HP, LinkedIn, Netflix, The New York Times, Rackspace, Twitter, and Yahoo!. Other technology leaders such as Microsoft and IBM have integrated Apache Hadoop into their offerings. Yahoo!, an early pioneer, hosts the world’s largest known Hadoop production environment to date, spanning more than 42,000 nodes.

"Achieving the 1.0 release status is a momentous achievement from the Apache Hadoop community and the result of hard development work and shared learnings over the years," said Jay Rossiter, senior vice president, Cloud Platform Group at Yahoo!. "Apache Hadoop will continue to be an important area of investment for Yahoo!. Today Hadoop powers every click at Yahoo!, helping to deliver personalized content and experiences to more than 700 million consumers worldwide."

"Apache Hadoop is in use worldwide in many of the biggest and most innovative data applications," said Eric Baldeschwieler, CEO of Hortonworks. "The v1.0 release combines proven scalability and reliability with security and other features that make Apache Hadoop truly enterprise-ready."

"Gartner is seeing a steady increase in interest in Apache Hadoop and related "big data" technologies, as measured by substantial growth in client inquiries, dramatic rises in attendance at industry events, increasing financial investments and the introduction of products from leading data management and data integration software vendors," said Merv Adrian, Research Vice President at Gartner, Inc. "The 1.0 release of Apache Hadoop marks a major milestone for this open source offering as enterprises across multiple industries begin to integrate it into their technology architecture plans."

Apache Hadoop v1.0 reflects six years of development, production experience, extensive testing, and feedback from hundreds of knowledgeable users, data scientists, systems engineers, bringing a highly stable, enterprise-ready release of the fastest-growing big data platform. It includes support for:

HBase (sync and flush support for transaction logging)
Security (strong authentication via Kerberos)
Webhdfs (RESTful API to HDFS)
Performance enhanced access to local files for HBase
Other performance enhancements, bug fixes, and features
All version 0.20.205 and prior 0.20.2xx features

"We are excited to celebrate Hadoop's milestone achievement," said William Lazzaro, Director of Engineering at Concurrent Computer Corporation. "Implementing Hadoop at Concurrent has enabled us to transform massive amounts of real-time data into actionable business insights, and we continue to look forward to the ever-improving iterations of Hadoop."

"Hadoop, the first ubiquitous platform to emerge from the ongoing proliferation of Big Data and noSQL technologies, is set to make the transition from Web to Enterprise technology in 2012," said James Governor, co-founder of RedMonk, "driven by adoption and integration by every major vendor in the commercial data analytics market. The Apache Software Foundation plays a crucial role in supporting the platform and its ecosystem."

Availability and Oversight
As with all Apache products, Apache Hadoop software is released under the Apache License v2.0, and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. Apache Hadoop release notes, source code, documentation, and related resources are available at http://hadoop.apache.org/.

About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees nearly one hundred fifty leading Open Source projects, including Apache HTTP Server — the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 350 individual Members and 3,000 Committers successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo. The ASF is a US 501(3)(c) not-for-profit charity, funded by individual donations and corporate sponsors including AMD, Basis Technology, Cloudera, Facebook, Google, IBM, HP, Hortonworks, Matt Mullenweg, Microsoft, PSW Group, SpringSource/VMware, and Yahoo!. For more information, visit http://www.apache.org/.

"Apache", "Apache Hadoop", and "ApacheCon" are trademarks of The Apache Software Foundation. All other brands and trademarks are the property of their respective owners.

# # #

Media Contact:
Sally Khudairi
Vice President, The Apache Software Foundation
+1 617 921 8656 <press@apache.org>

Thursday March 24, 2011

Apache Hadoop Wins MediaGuardian "Innovator Of The Year" Award

Congratulations to the Apache Hadoop Project for winning the top prize at the 2011 MediaGuardian Innovation Awards in London!

Beating out nominess such as the iPad and WikiLeaks, judges of the fourth annual Media Guardian Innovation Awards (Megas) considered Apache Hadoop a "Swiss Army knife of the 21st Century" and a greater catalyst for innovation by "having the potential to completely change the face of media innovations across the globe."

ASF Chairman and original Hadoop creator Doug Cutting said, "it's great to see the continued recogntition for the technology and I am happy to accept the MediaGuardian Innovator of the Year award on behalf of this flourishing Apache community."

Kudos to all involved with the Apache Hadoop Project!

For more information on Apache Hadoop, visit http://hadoop.apache.org/
For more infomration on the Megas, please see http://www.guardian.co.uk/megas

# # #

Monday February 14, 2011

Apache Innovation Bolsters IBM's "Smartest Machine on Earth" in First-ever Man vs. Machine Competition on Jeopardy! Quiz Show

Apache UIMA and Apache Hadoop Advance Data Intelligence and Semantics Capabilities of Watson Supercomputer

Forest Hill, MD – 14 February 2011 – The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of nearly 150 Open Source projects and initiatives, today announced that Apache UIMA and Apache Hadoop play key roles in the data intelligence and analytic proficiency of the IBM Watson supercomputer, playing against human champions on the TV show "Jeopardy!".

Processing 80 trillion operations (teraflops) per second, Watson will access 200 million pages of content against 6 million logic rules to "understand" the nuances, meanings, and patterns in spoken human language, and compete in the trivia game show Jeopardy!. Contestants are presented with clues in the form of answers, and must phrase their responses as questions within a 5-second timeframe. 

Hundreds of Apache UIMA Annotators and thousands of algorithms help Watson –which runs disconnected from the Internet– access vast databases to simultaneously comprehend clues and formulate answers. Watson then analyzes 500 gigabytes of preprocessed information to match potential meanings for the question and a potential answer to the question. Helping Watson do this is:

  • Apache UIMA: standards-based frameworks, infrastructure and components that facilitate the analysis and annotation of an array of unstructured content (such as text, audio and video). Watson uses Apache UIMA for real-time content analytics and natural language processing, to comprehend clues, find possible answers, gather supporting evidence, score each answer, compute its confidence in each answer, and improve contextual understanding (machine learning) – all under 3 seconds.

  • Apache Hadoop: software framework that enables data-intensive distributed applications to work with thousands of nodes and petabytes of data. A foundation of Cloud computing, Apache Hadoop enables Watson to access, sort, and process data in a massively parallel system (90+ server cluster/2,880 processor cores/16 terabytes of RAM/4 terabytes of disk storage).

The Watson system uses UIMA as its principal infrastructure for component interoperability and makes extensive use of the UIMA-AS scale-out capabilities that can exploit modern, highly parallel hardware architectures. UIMA manages all work flow and communication between processes, which are spread across the cluster. Apache Hadoop manages the task of preprocessing Watson's enormous information sources by deploying UIMA pipelines as Hadoop mappers, running UIMA analytics.

"The success and influence of Watson clearly shows that open source in general, and specifically open source software developed and released by the ASF, is deeply entwined in all layers and aspects of technology," said ASF President Jim Jagielski. "Apache software is part of computing and information technology DNA, forming complete or integral solutions to advanced problems, and leveraging the software under the non-restrictive Apache License allows for extremely rapid development of cutting edge technology."

Watson faces off against record-breaking (human) Jeopardy champions Ken Jennings and Brad Rutter for the $1M grand prize 14-16 February 2011. 100% of Watson's winnings will be donated to charity; Rutter and Jennings have committed to donating 50% of their prizes.


Availability
All ASF products, including Apache UIMA and Apache Hadoop, are available to the public free of charge under the Apache Software Licence v2.0. Downloads, documentation, and related resources are available at http://www.apache.org/.


About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees nearly one hundred fifty leading Open Source projects, including Apache HTTP Server — the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 300 individual Members and 2,500 Committers successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo. The ASF is a US 501(3)(c) not-for-profit charity, funded by individual donations and corporate sponsors including AMD, Basis Technology, Cloudera, Facebook, Google, IBM, HP, Matt Mullenweg, Microsoft, SpringSource, and Yahoo!. For more information, visit http://www.apache.org/.

# # #

Contact:

Sally Khudairi
The Apache Software Foundation
press@apache.org
+1 617 921 8656

Calendar

Search

Hot Blogs (today's hits)

Tag Cloud

Categories

Feeds

Links

Navigation