The Apache Software Foundation Blog
Success at Apache: Meritocracy and Me.
by Tom Barber
When Sally asked for volunteers to help with a blog post series "Success at Apache" I realised there was a very human story to tell about how the ASF helped me get to where I am today and hopefully where I'll go tomorrow. Over the years I have worked on and run a number of Open Source projects whilst working with an awful lot of Open Source software. One day I was browsing Slashdot as you do, yeah I know a lot of people disparage it, but it's an awfully hard habit to kick, and without it I wouldn't have got involved in the ASF so I owe it a lot. Anyway, one day when browsing Slashdot I saw this article (https://it.slashdot.org/story/11/01/08/1544204/apache-to-steward-nasa-built-middleware), I had been working in the Open Source business intelligence industry for a few years at that point and I spent a lot of time hacking around and managing data systems, so I wondered how I could get some help out of OODT (http://oodt.apache.org/). Also as a kid I had always loved everything about space, I was a huge Apollo fan, had a small telescope, went to the total eclipse in the UK in 1999 and so on. I thought this OODT project would be a fun way for me to chat nonsense to a few NASA employees, find out how they did stuff and do a bit of Open Source hacking on the side, which would at least let me participate in some NASA related development work, and so it began.
As I mentioned at the start, this blog series is about success at Apache, hopefully this proves that success can come in a number of ways, the ASF was selected by NASA as the home for its data middleware platform, that proves that the NASA deemed the incubation process, the license and ecosystem acceptable, that is success the the Apache Foundation. Similarly the foundation has proved very successful in placing people into employment from a range of different walks of life into new lines of work, and that is exactly what happened to me and the reason I wanted to share my story about success at Apache.
= = =
"Success at Apache" is a new monthly blog series that focuses on the processes behind why the ASF "just works". 1) Project Independence https://s.apache.org/CE0V 2) All Carrot and No Stick https://s.apache.org/ykoG 3) Asynchronous Decision Making https://s.apache.org/PMvk 4) Rule of the Makers https://s.apache.org/yFgQ 5) JFDI --the unconditional love of contributors https://s.apache.org/4pjM
Posted at 01:05PM May 01, 2017 by Sally in SuccessAtApache | |
The Apache Software Foundation Announces Apache® CarbonData™ as a Top-Level Project
Open Source Big Data analytics accelerator in use at Bank of Communications, Hulu, Huawei, SAIC Motor, Zhejiang Mobile, among others.
Forest Hill, MD –1 May 2017– The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today that Apache® CarbonData™ has graduated from the Apache Incubator to become a Top-Level Project (TLP), signifying that the project's community and products have been well-governed under the ASF's meritocratic process and principles.
Apache CarbonData is an indexed columnar store file format for fast analytics on Big Data platforms (including Apache Hadoop, Apache Spark, among others) to help speed up queries an order of magnitude faster over petabytes of data.
"We are very proud to complete the incubation process and graduate as an Apache Top-Level Project," said Liang Chen, Vice President of Apache CarbonData. "The CarbonData community grew rapidly over last ten months, both in terms of size and diversity. Since entering the Apache Incubator, we have completed 4 releases, and exceeded 90 contributors from 10 different organizations."
With the aim of using a unified file format to satisfy all kinds of data analysis cases, Apache CarbonData seamlessly integrates with Hadoop and Spark to improve Big Data analysis efficiency. In benchmarks, CarbonData's faster interactive query helps in speeding up queries approximately 10x faster than standard column-oriented SQL on Hadoop data stores.
- Unique data organization to allow faster filtering and better compression;
- Multi-level Indexing to enable faster search and speeding up query processing;
- Deep Apache Spark Integration for dataframe + SQL compliance;
- Advanced push down optimization to minimize the amount of data being read processed, converted, transmitted, and shuffled;
- Efficient compression and global encoding schemes to further improve aggregation query performance;
- Dictionary encoding for reduced storage space and faster processing; and
- Data update + delete support using standard SQL syntax.
Apache CarbonData is in use at an array of organizations, including Bank of Communications, medical/pharma social platform DXY, Hulu, Huawei, group online retailer MEITUAN, SAIC Motor, Zhejiang Mobile, among others.
"CarbonData has very good performance as a ‘SQL on Hadoop’ solution," said Tan Sheng, Director of SAIC Motor’s Big Data team. "It is suitable for SAIC Motor to adopt as a central Big Data platform component. Not only do we use Apache CarbonData, we also actively participate in its community as contributors."
"Apache CarbonData is great, as helped our audit business to improve 7-10X performance based on 14 billion rows of data," said Wei Zhao, Senior Engineer at Bank of Communications.
"Apache CarbonData is very suitable for our filter query cases, and has averaged 20x improvement on performance," said William Zhu, Architecture team member at DXY. "And, as CarbonData supports data update and delete, this feature is very useful. We would consider CarbonData as our all-in-one solution to unify all analysis data."
CarbonData was first developed at Huawei in 2013. The project was submitted to the Apache Incubator in June 2016, and had its first official release two months later. The project won top honors in the BlackDuck 2016 Open Source Rookies of the Year's Big Data category.
"Apache CarbonData is a great example of the value of the incubation process," said Jean-Baptiste Onofré, Apache CarbonData Incubator Mentor and Project Management Committee member. "Helping grow the CarbonData developer and user communities has increased our visibility, which allowed us to extend our use cases and tests, and gather new ideas. The initial CarbonData committers did (and are still doing) great work to welcome new users and contributors, clearly understanding it's a step forward for the project."
"We will continue to put our efforts towards optimizing data format efficiency for Big Data ecosystem and provide an unified and high performance data storage solution," added Liang. "The Apache CarbonData community welcomes interested contributors to work with us on our journey forward."
Catch Apache CarbonData in action at ApacheCon (16-18 May/Miami), and Spark Summit (5-7 June/San Francisco).
Availability and Oversight
Apache CarbonData software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache CarbonData, visit http://carbondata.apache.org/ , https://twitter.com/ApacheCarbonDat , and https://www.facebook.com/carbondata/
About the Apache Incubator
The Apache Incubator is the entry path for projects and codebases wishing to become part of the efforts at The Apache Software Foundation. All code donations from external organizations and existing external projects wishing to join the ASF enter through the Incubator to: 1) ensure all donations are in accordance with the ASF legal standards; and 2) develop new communities that adhere to our guiding principles. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF. For more information, visit http://incubator.apache.org/
About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees more than 350 leading Open Source projects, including Apache HTTP Server --the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 620 individual Members and 6,000 Committers successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Alibaba Cloud Computing, ARM, Bloomberg, Budget Direct, Capital One, Cash Store, Cerner, Cloudera, Comcast, Confluent, Facebook, Google, Hortonworks, HP, Huawei, IBM, InMotion Hosting, iSigma, LeaseWeb, Microsoft, ODPi, PhoenixNAP, Pivotal, Private Internet Access, Produban, Red Hat, Serenata Flowers, Target, WANdisco, and Yahoo. For more information, visit http://www.apache.org/ and https://twitter.com/TheASF
© The Apache Software Foundation. "Apache", "CarbonData", "Apache CarbonData", "Hadoop", "Apache Hadoop", "Spark", "Apache Spark", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.
# # #
Posted at 11:01AM May 01, 2017 by Sally in General | |
The Apache Software Foundation Announces Apache® Mahout™ v0.13.0
- Collaborative filtering – mines user behavior and makes product recommendations (such as eCommerce product recommenders);
- Regression – estimates a numerical value based on values of other inputs;
- Clustering – takes items in a particular class (such as Web pages or newspaper articles) and organizes them into naturally occurring groups, such that items belonging to the same group are similar to each other; and
- Classifying – learns from existing categorizations and then assigns unclassified items to the best category.
Posted at 10:00AM May 01, 2017 by Sally in General | |