Entries tagged [asf]

Thursday September 16, 2021

Success at Apache: from Mentee to PMC

by Ephraim Anierobi

This post is about how I became a committer and a Project Management Committee (PMC) member of Apache Airflow, and provides guidance to those new to programming, are new to contributing to open-source projects, and want to become committers and PMC members in their respective Apache projects.

About a year and a half after changing my career from electrical engineering to software development, I became a committer and a Project Management Committee member of Apache Airflow. Becoming a committer and a PMC member is a reward and a kind of validation that you are on the right part of your journey.

On February 16, 2021, I accepted an invitation to become a committer in Apache Airflow. It came as a surprise, as I was not expecting it. Six months down the line, I received another surprise invitation to become a PMC member in Apache Airflow.

These are impressive feats for me because before contributing to Apache Airflow, I didn't have experience working with other programmers. I was making websites and taught a few friends of mine how to make their own. I didn't have a mentor, and no one has ever seen my code to advise whether to continue on my journey or drop the idea of becoming a programmer.

While I desired to work with experienced programmers to improve my skills, I feared people seeing my code would talk me down. I almost gave up on my journey only to come across an Outreachy post on Twitter looking for interns for open source projects. Outreachy is a tech diversity program that provides three months of paid, remote internships to people underrepresented in tech.

I was ready to change my career and was looking for mentorship, but couldn't find an internship that could help me get started in my journey. In Nigeria where I'm living, your location affects your chances of getting an entry-level job. I was not close to the major cities. 

So I applied for an internship through Outreachy. 

There are two application processes. The initial application involves explaining your background and why you should be accepted into the program. You must pass the initial application before you could proceed to the next. The second application process (called the contribution period) is where you choose an open source project that matches your skill sets and then contribute to it. You must have some minimum contributions before you could be accepted.

That was how I found Apache Airflow.

You could imagine the joy I had when I was accepted into the program.

Here are things I did which I believe would help you in your journey to becoming an Apache committer and a PMC member.

Asking Questions

Asking questions is the fastest way to learn. Don't be afraid to ask questions if you do not understand something. I ask questions a lot and I always get answers, but I didn't start by asking questions: I made 40 commits to the repository without understanding what Airflow does. It was not until I joined my new employer Astronomer that I learned what DAG is and what a data pipeline is. Now I can easily reproduce issues following someone's descriptions. I wish I had asked questions earlier --I could have had more experience by now!

Start small

If you are like me, with little experience, start contributing from the minor issues. Find good first issues and work on them. You don't have to wait to contribute a large change before contributing.

While working on the REST API project, which I got hired by Outreachy to do, I was looking at the codebase. I started with Airflow providers because it was easy for me to understand. There were so many requests about providers at the time and I started looking into it, reading the code base, and helping with the providers. I didn't go into the core straight up; I avoided it. My first PR was on simple database migration during the Outreachy contribution period.

Refactor codes

Airflow is complex. Till now, I'm still learning it. Just last week I learned about how the execution date works. I know there are a lot of other things I have not understood very well but refactoring helped me to understand a lot.

When I was to work in the scheduler, I found the file was so large that I went back and forth without progress. I worked on separating the files and I'm glad I did because after that I could contribute. I recommend refactoring code but do not go into large refactoring. A little at a time, with the hope to understand the project. Avoid the core of the project if you are just starting.

Issues

One thing about issues is that most reporters would tell you how to reproduce them. Most times, you would find that the issue is quite easy to fix. I usually jump on those and fix them. Other times, I had to contact my superiors before I could fix it.

Looking at reported issues gives an added advantage that you could learn how the software works in the real world. Try to reproduce as many issues as possible. It adds to your knowledge.

Pull Requests

Here's where you can learn a great deal. I start my day by looking at the PRs. Most PRs link to issues. I read the issues and study PRs. I must admit that some of these PRs are just too complex for me. If I don't understand it, sometimes I ask questions, other times I go to the next PR. When I jump to the next PR, I record the topic that made me jump to the next and plan on reading about it some other time.

When you make a PR, ask for reviews in the community channel of communication. Airflow uses Slack and the mailing list for communications. You should ask for reviews in the slack channel and not the mailing list. The reviews not only give information on how to fix the problem but also teach you best practices in programming.

Culture

The ASF has a code of conduct that covers the Foundations activities as well as the projects. Read it first.

Among many other things, you would learn in Apache Airflow is communication. How to communicate with people in a civil manner. Spend time reading PR reviews, you will learn a lot and especially how to ask people to make changes to their code.

Conclusion

You don't have to wait for an invitation to contribute to an Apache project. You don't have to become an Outreachy intern to get involved with something you're interested in.

Don't be afraid to make a PR because nobody will penalize you if you're wrong. I know the feeling that people may think you are not good enough, forget it, they know you are new to the field and if you are thinking that they don't know your level in the language, forget it too, they know you are still a junior because it says so in your code. I can't count how many times I have had code reviews that showed me a better way to implement the code. Be open-minded, make mistakes, and excel.


Ephraim Anierobi started to work on the Apache Airflow project as an Outreachy Intern in May 2020. He became a committer in February 2021 and a member of the Apache Airflow Project Management Committee (PMC) in August 2021. He is a software engineer at Astronomer.

= = = "Success at Apache" is a monthly blog series that focuses on the processes behind why the ASF "just works" https://blogs.apache.org/foundation/category/SuccessAtApache

Monday August 30, 2021

The Apache Drill Project Announces Apache® Drill(TM) v1.19 Milestone Release

Open Source, enterprise-grade, schema-free Big Data SQL query engine used by thousands of organizations, including Ant Group, Cisco, Ericsson, Intuit, MicroStrategy, Tableau, TIBCO, TransUnion, Twitter, and more.

Wilmington, DE —30 August 2021— The Apache Drill Project announced the release of Apache® DrillTM v1.19, the schema-free Big Data SQL query engine for Apache Hadoop®, NoSQL, and Cloud storage.

"Drill 1.19 is our biggest release ever," said Charles Givre, Vice President of Apache Drill. "With an already short learning curve, Drill 1.19 makes it even easier for users to quickly query, analyze, and visualize data from disparate sources and complex data sets.”

An "SQL-on-Hadoop" engine, Apache Drill is easy to deploy, highly performant, able to quickly process trillions of records, and scalable from a single laptop to a 1000-node cluster. With its schema-free JSON model (the first distributed SQL query engine of its kind), Drill is able to query complex semi-structured data in situ without requiring users to define schemas or transform data. It provides plug-and-play integration with existing Hive and HBase deployments, and is extensible out-of-the-box to access multiple data sources, such as S3 and Apache HDFS, HBase, and Hive. Additionally, Drill can directly query data from REST APIs to include platforms like SalesForce and ServiceNow. 

Drill supports the ANSI SQL 2003 standard syntax ecosystem as well as dozens of NoSQL databases and file systems, including Apache HBase, MongoDB, Elasticsearch, Cassandra, REST APIs, , HDFS, MapR-FS, Amazon S3, Azure Blob Storage, Google Cloud Storage, NAS,  local files, and more. Drill leverages familiar BI tools (such as Apache Superset, Tableau, MicroStrategy, QlikView and Excel) as well as data virtualization and visualization tools, and runs interactive queries on Hive tables with different Hive metastores.

Apache Drill v1.19
Drill is designed from the ground up to support high-performance analysis on rapidly evolving data on modern Big Data applications. v1.19 reflects more than 100 changes, improvements, and new features that include:

  • New Connectors for Apache Cassandra, Elasticsearch, and Splunk.

  • New Format Reader for XML without schemas

  • Added Avro support for Kafka plugin

  • Integrated password vault for secure credential storage

  • Support for Linux ARM64 systems

  • Added limit pushdowns for file systems, HTTP REST APIs and MongoDB

  • Added streaming for Drill's REST API

  • Integration with Apache Airflow


Developers, analysts, business users, and data scientists use Apache Drill for data exploration and analysis for its enterprise-grade reliability, security, and performance. Drill's flexibility and ease-of-use have attracted thousands of users that include Ant Group, Cardlytics, Cisco, Ericsson, Intuit, MicroStrategy, Qlik, Tableau, TIBCO, TransUnion, Twitter, National University of Singapore, and more.

"Individuals, businesses, and organizations of all types rely on Apache Drill's rich functionality," added Givre. "We invite everyone to participate in our user and developer lists as well as our Slack channel, and contribute to the project to build on our momentum and help improve the future experience for all Drill users."

Catch Apache Drill in action at ApacheCon@Home, taking place online 21-23 September 2021. For more information and to register, visit https://www.apachecon.com/ .

Availability and Oversight
Apache Drill software is released under the Apache License v2.0 and is overseen by a volunteer, self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases.

About Apache Drill
Apache Drill is the Open Source, schema-free Big Data SQL query engine for Apache Hadoop, NoSQL, and Cloud storage. For more information, including documentation and ways to become involved with Apache Drill, visit http://drill.apache.org/ , https://twitter.com/ApacheDrill , and https://apache-drill.slack.com/ .

© The Apache Software Foundation. "Apache", "Drill", "Apache Drill", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.

#  #  #

Tuesday February 16, 2021

The Apache Software Foundation Announces Apache® Gobblin™ as a Top-Level Project

Open Source distributed Big Data integration framework in use at Apple, CERN, Comcast, Intel, LinkedIn, Nerdwallet, PayPal, Prezi, Roku, Sandia National Labs, Swisscom, Verizon, and more.

Wilmington, DE —16 February 2021— The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today Apache® Gobblin™ as a Top-Level Project (TLP).

Apache Gobblin is a distributed Big Data integration framework used in both streaming and batch data ecosystems. The project originated at LinkedIn in 2014, was open-sourced in 2015, and entered the Apache Incubator in February 2017.

"We are excited that Gobblin has completed the incubation process and is now an Apache Top-Level Project," said Abhishek Tiwari, Vice President of Apache Gobblin and software engineering manager at LinkedIn. "Since entering the Apache Incubator, we have completed four releases and grown our community the Apache Way to more than 75 contributors from around the world."

Apache Gobblin is used to integrate hundreds of terabytes and thousands of datasets per day by simplifying the ingestion, replication, organization, and lifecycle management processes across numerous execution environments, data velocities, scale, connectors, and more.

"Originally creating this project, seeing it come to life and solve mission-critical problems at many companies has been a very gratifying experience for me and the entire Gobblin team," said Shirshanka Das, Founder and CTO at Acryl Data, and member of the Apache Gobblin Project Management Committee.

As a highly scalable data management solution for structured and byte-oriented data in heterogeneous data ecosystems, Apache Gobblin makes the arduous task of creating and maintaining a modern data lake easy. It supports the three main capabilities required by every data team: 

  • Ingestion and export of data from a variety of sources and sinks into and out of the data lake while supporting simple transformations. 
  • Data Organization within the lake (e.g. compaction, partitioning, deduplication).
  • Lifecycle and Compliance Management of data within the lake (e.g. data retention, fine-grain data deletions) driven by metadata.

"Apache Gobblin supports deployment models all the way from a single-process standalone application to thousands of containers running in cloud-native environments, ensuring that your data plane can scale with your company’s growth," added Das.

Apache Gobblin is in use at Apple, CERN, Comcast, Intel, LinkedIn, Nerdwallet, PayPal, Prezi, Roku, Sandia National Laboratories, Swisscom, and Verizon, among many others.

"We chose Apache Gobblin as our primary data ingestion tool at Prezi because it proved to scale, and it is a swiss army knife of data ingestion," said Tamas Nemeth, Tech Lead and Manager at Prezi. "Today, we ingest, deduplicate, and compact more than 1200 Apache Kafka topics with its help, and this number is still growing. We are looking forward to continuing to contribute to the project and helping the community enable other companies to use Apache Gobblin."

"Apache Gobblin has been at the center stage of the data management story at LinkedIn. We leverage it for various use-cases ranging from ingestion, replication, compaction, retention, and more," said Kapil Surlaker, Vice President of Engineering at LinkedIn. "It is battle-tested and serves us well at exabyte scale. We firmly believe in the data wrangling capabilities that Gobblin has to offer, and we will continue to contribute heavily and collaborate with the Apache Gobblin community. We are happy to see that Gobblin has established itself as an industry standard and is now an Apache Top-Level Project."

"Open community and meritocracy are the key drivers for Apache Gobblin's success," added Tiwari. "We invite everyone interested in the data management space to join us and help shape the future of Gobblin."

Catch Apache Gobblin in action in the upcoming hackathon planned for late Q1 2021. Details will be posted on the Apache Gobblin mailing lists and Twitter feed listed below.

Availability and Oversight
Apache Gobblin software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache Gobblin, visit https://gobblin.apache.org/ and https://twitter.com/ApacheGobblin 

About the Apache Incubator
The Apache Incubator is the primary entry path for projects and codebases wishing to become part of the efforts at The Apache Software Foundation. All code donations from external organizations and existing external projects enter the ASF through the Incubator to: 1) ensure all donations are in accordance with the ASF legal standards; and 2) develop new communities that adhere to our guiding principles. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF. For more information, visit http://incubator.apache.org/ 

About The Apache Software Foundation (ASF)
Established in 1999, The Apache Software Foundation is the world’s largest Open Source foundation, stewarding 227M+ lines of code and providing more than $20B+ worth of software to the public at 100% no cost. The ASF’s all-volunteer community grew from 21 original founders overseeing the Apache HTTP Server to 813 individual Members and 206 Project Management Committees who successfully lead 350+ Apache projects and initiatives in collaboration with nearly 8,000 Committers through the ASF’s meritocratic process known as "The Apache Way". Apache software is integral to nearly every end user computing device, from laptops to tablets to mobile devices across enterprises and mission-critical applications. Apache projects power most of the Internet, manage exabytes of data, execute teraflops of operations, and store billions of objects in virtually every industry. The commercially-friendly and permissive Apache License v2 is an Open Source industry standard, helping launch billion dollar corporations and benefiting countless users worldwide. The ASF is a US 501(c)(3) not-for-profit charitable organization funded by individual donations and corporate sponsors including Aetna, Alibaba Cloud Computing, Amazon Web Services, Anonymous, Baidu, Bloomberg, Budget Direct, Capital One, Cloudera, Comcast, Didi Chuxing, Facebook, Google, Handshake, Huawei, IBM, Microsoft, Pineapple Fund, Red Hat, Reprise Software, Target, Tencent, Union Investment, Verizon Media, and Workday. For more information, visit http://apache.org/ and https://twitter.com/TheASF 

© The Apache Software Foundation. "Apache", "Gobblin", "Apache Gobblin", "Hadoop", "Apache Hadoop", "MapReduce", "Apache MapReduce", "Mesos", "Apache Mesos", "YARN", "Apache YARN", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.

# # #

Wednesday February 03, 2021

The Apache Software Foundation Announces Apache® DataSketches™ as a Top-Level Project

Open Source high-performance Big Data streaming algorithm library in use at Nielsen Identity, Permutive, Splice Machine, and Verizon Media, among others.

Wilmington, DE —3 February 2021— The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today Apache® DataSketches™ as a Top-Level Project (TLP).

Apache DataSketches is a highly performant Big Data analysis library for scalable approximate algorithms. The project originated at Yahoo in 2012, was open-sourced in 2015, and entered the Apache Incubator in March 2019.

"We are excited to be part of the ASF," said Lee Rhodes, Vice President of Apache DataSketches. "We have learned a great deal from the incubation process and look forward to working with new users of our library that want to take advantage of sketching technology."

Apache DataSketches’s library of specialized streaming algorithms —known as sketches— comprise small data structures that process data at massive scale. Sketches are ideal for queries that cannot afford the time or huge compute resources needed to generate exact results. Where approximate results are acceptable, sketches are the only viable alternative for interactive queries with real-time analysis. Apache DataSketches is:

  • Fast —produces approximate results at orders of magnitude faster than traditional methods -- user configurable size vs accuracy tradeoff;
  • Efficient —sketch algorithms process data in a single pass for both real-time and batch;
  • Mergeable —allows for parallelization;
  • Optimized for large-scale computing environments that process Big Data —such as Apache Hadoop, Apache Spark, Apache Druid, Apache Hive, Apache Pig, PostgreSQL;
  • Binary compatible across multiple languages and platforms —available in Java, C++, and Python;
  • Expanded Analysis —including count distinct with set operations, quantiles, most frequent items (heavy hitters), matrix computations, and more; and
  • Mathematically defined and proven error properties —provides a priori and a posteriori error estimation and upper and lower bounds with statistically derived confidence intervals.

Apache DataSketches is used in large-scale computing environments such as Nielsen Identity, Permutive, Splice Machine, and Verizon Media, among others, as well as Apache Druid and Apache Pinot (incubating).

"The Apache DataSketches project takes powerful algorithms for data summarization and analysis, and makes them available to everyone," said Professor Graham Cormode of the University of Warwick. "While these methods are tremendously useful in practice, their descriptions were previously only in highly technical scientific papers. This project has made robust, dependable and well-documented implementations available to all. Already the library has been used for a wide range of applications, including service quality, monitoring, ad analytics and the sciences."

"Using Apache DataSketches has enabled Apache Druid users to perform common tasks such as quantiles and unique counting in a highly performant and efficient manner," said Gian Merlino, Vice President of Apache Druid. "We have worked closely together over the years to make the power of DataSketches accessible to Apache Druid users, helping us provide real-time analytics at scale."

"Sketches are fundamental to calculating many of our key company metrics," said Tom Miller, Director of Software Development Engineering at Verizon Media. "It allows us to greatly simplify our data processing and reduce storage costs by allowing us to calculate non-additive metrics across user specified dimension combinations at report time instead of having to either retain raw data or pre-calculate for each set of dimensions."

"Combining Apache Druid and DataSketches allows us to provide our customers real-time insights into their target audiences and advertising campaigns," said Yakir Buskilla, Senior Vice President of Research and Development and General Manager Israel at Nielsen Identity. "The ability to evaluate set expressions make the Theta Sketch especially powerful for multi-set cardinality estimation as well as funnel analysis."

“Apache DataSketches has provided us with a solid theoretical foundation upon which we are able to store and process data at scale - in a simple, fast and cost-efficient manner," said David Cromberge, Senior Software Engineer at Permutive. "It has been a pleasure to engage with their creators and community who have been helpful at every step of the way.”

"We use DataSketches's Theta-Sketches for distinct-count aggregations that are used to solve large multi-set cardinality approximation," said Mayank Shrivastava, Committer and member of the Apache Pinot (incubating) Podling Project Management Committee. "The ability to evaluate set expressions make the Theta Sketch especially powerful for multi-set cardinality estimation as well as funnel analysis."

"We welcome those interested in streaming algorithms to visit us, learn about this exciting technology, and contribute to Apache DataSketches to make our project even better," added Rhodes.

Availability and Oversight
Apache DataSketches software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache DataSketches, visit https://datasketches.apache.org .

About the Apache Incubator
The Apache Incubator is the primary entry path for projects and codebases wishing to become part of the efforts at The Apache Software Foundation. All code donations from external organizations and existing external projects enter the ASF through the Incubator to: 1) ensure all donations are in accordance with the ASF legal standards; and 2) develop new communities that adhere to our guiding principles. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF. For more information, visit http://incubator.apache.org/ .

About The Apache Software Foundation (ASF)
Established in 1999, The Apache Software Foundation is the world’s largest Open Source foundation, stewarding 227M+ lines of code and providing more than $20B+ worth of software to the public at 100% no cost. The ASF’s all-volunteer community grew from 21 original founders overseeing the Apache HTTP Server to 813 individual Members and 206 Project Management Committees who successfully lead 350+ Apache projects and initiatives in collaboration with nearly 8,000 Committers through the ASF’s meritocratic process known as "The Apache Way". Apache software is integral to nearly every end user computing device, from laptops to tablets to mobile devices across enterprises and mission-critical applications. Apache projects power most of the Internet, manage exabytes of data, execute teraflops of operations, and store billions of objects in virtually every industry. The commercially-friendly and permissive Apache License v2 is an Open Source industry standard, helping launch billion dollar corporations and benefiting countless users worldwide. The ASF is a US 501(c)(3) not-for-profit charitable organization funded by individual donations and corporate sponsors including Aetna, Alibaba Cloud Computing, Amazon Web Services, Anonymous, Baidu, Bloomberg, Budget Direct, Capital One, Cloudera, Comcast, Didi Chuxing, Facebook, Google, Handshake, Huawei, IBM, Microsoft, Pineapple Fund, Red Hat, Reprise Software, Target, Tencent, Union Investment, Verizon Media, and Workday. For more information, visit http://apache.org/ and https://twitter.com/TheASF .

© The Apache Software Foundation. "Apache", "DataSketches", "Apache DataSketches", "Druid", "Apache Druid", "Hadoop", "Apache Hadoop", "Hive", "Apache Hive", "Pig", "Apache Pig", "Pinot (incubating)", "Apache Pinot (incubating)", "Spark", "Apache Spark", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.

# # #

Friday January 01, 2021

Apache in 2020 - By The Digits


Whilst 2020 has been quite a challenging year world-wide, the all-volunteer Apache community has demonstrated commendable strength, resilience, and commitment to our tenet of "Community Over Code" — 


  • 238 Apache Projects, sub-projects, incubating podlings, and their communities produced nearly 3,500 releases across dozens of categories. Release Categories: API Gateways, Application Performance Management, Big Data, Blockchain, Build Management Cloud Computing, Content, Cryptography, Customer Profile Platform, Databases, eMail, Enterprise Resource Planning, FinTech, Identity Management, Integrated Development Environments, Integration, IoT, Libraries, Logging, Machine Learning, Messaging, Natural Language Processing, Operating Systems, Programming Languages, Remote Desktop Gateway, Search, Security Frameworks, Servers, Services Framework, Templating, Testing, Version Control, Web Conferencing, Web Crawlers, Web Frameworks, and more.



  • Apache events moved online, and attracted our most diverse and greatest number of participants. ApacheCon@Home drew nearly 5,750 participants from more than 150 countries, who enjoyed 300+ sessions across 27 tracks. A staggering 1.5M+ viewers tuned in to the Apache Roadshow/China over its 2-day online event.


Additional highlights:


Apache Projects —https://projects.apache.org/


  • Total number of projects + sub-projects - 342
  • Top-Level Projects - 199
  • Podlings undergoing development in the Apache Incubator - 41
  • New Top-Level Projects that graduated from the Incubator - 10 


Community/People —http://home.apache.org/


The ASF’s merit-driven "Contributor-Committer-Member" progression is the central governing process across the Apache ecosystem. The core Apache Group of 21 individual Members grew with developers who contributed code, patches, or documentation. Some of these contributors were subsequently granted Committer status by the Membership, and provided access to: 1) commit code directly to Apache repositories; 2) vote on community-related decisions; and 3) propose an active user for Committership. Today, ASF Committers contribute not just code and documentation, but also an array of initiatives that provide value across the greater Apache ecosystem, including Project promotion and community development through mentoring, events, and diversity and inclusion programs. Those Committers who demonstrate merit in the Foundation's growth, evolution, and progress are nominated for ASF Membership by existing members.


The Apache community continues to grow: 


  • We welcomed 3,612 contributors in 2020, 51.87% of whom were newcomers to Apache
  • 905 individuals earned Committer status, totalling 8,022. 
  • 34 individuals were elected as new ASF Members, totalling 813.


Apache Projects/Code —https://projects.apache.org/statistics.html


3,258 Apache Committers changed 117,350,563 lines of code over 247,451 commits.


Top 5 Committers

  • Andrea Cosentino (6,357 commits; 2,003,123 lines changed)
  • Jean-Baptiste Onofré (3,120 commits; 735,656 lines changed)
  • Claus Ibsen (2,838 commits; 1,919,860 lines changed)
  • Mark Thomas (2,360 commits; 185,548 lines changed)
  • Gary Gregory (2,188 commits; 234,845 lines changed)


Top 5 Apache Project Repositories by Size (Lines of Code)


  • Tuweni (incubating; 7,822,771 --Tuweni is Apache's first project in the Blockchain space)
  • Flex (7,007,693)
  • NetBeans (6,582,707)
  • OpenOffice (6,376,683)
  • Hadoop (3,521,559)

Top 5 Apache Project Repositories by Commits


  • Camel
  • Flink
  • Airflow
  • Lucene/Solr
  • Spark


GitHub: Top 5 Most Active Apache Project Sources (clones)


  • Thrift
  • Beam
  • Arrow
  • Geode
  • Cordova


GitHub: Top 5 Most Active Apache Project Sources (visits)


  • Spark
  • Flink
  • Kafka
  • Beam
  • Camel



Mailing Lists —https://lists.apache.org/


"If it didn’t happen on-list, it didn’t happen"


The ASF’s day-to-day operations, including Apache project and community development, takes place on ~1,450 public and ~700 private mailing lists. 


In 2020, 18,388 authors sent 2,139,458 emails on 774,364 topics.


Top 5 most active Apache Project user@ mailing lists


  • Flink
  • Lucene-Solr
  • OpenMeetings
  • Ignite
  • Tomcat


Top 5 most active Apache Project dev@ mailing lists


  • Tomcat
  • Flink
  • Royale
  • James
  • Beam


Contributor License Agreements and Software Grants —https://www.apache.org/licenses/


Individuals who are granted write access to the Apache repositories must submit an Individual Contributor License Agreement (ICLA). Corporations that have assigned employees to work on Apache projects as part of an employment agreement may sign a Corporate CLA (CCLA) for contributing intellectual property via the corporation. Individuals or corporations donating a body of existing software or documentation to one of the Apache projects need to execute a formal Software Grant Agreement (SGA) with the ASF. Over the past year, the ASF had received: 


  • ICLAs - 708
  • CCLAs - 35
  • Grants - 35


Sponsorship and Individual Support —http://apache.org/foundation/contributing.html


The ASF benefits from the generosity of hundreds of individual donors and corporate Sponsors, whose support helps offset the ASF's day-to-day expenses for Accounting, Fundraising, Infrastructure, Legal, Marketing & Publicity, and other services.


ASF Sponsors provide financial backing for the ASF's operations. They are:


PLATINUM: Amazon Web Services, Facebook, Comcast, Google, Huawei, Pineapple Fund, Tencent, and Verizon Media.


GOLD: Anonymous, Baidu, Bloomberg, Cloudera, Handshake, IBM, Reprise Software, Union Investment, and Workday.


SILVER: Aetna, Alibaba Cloud Computing, Budget Direct, Capital One, Cerner, Inspur, Red Hat, and Target.


BRONZE: Airport Rentals, The Blog Starter, Bookmakers. Cash Store, Bestecasinobonussen.nl, Casino2k, Curity, The Economic Secretariat, Gundry MD, Host Advice, HostChecka.com, Indian Online Casino, Journal Review, LeoVegas, Miro-Kredit AG, Mutuo Kredit AG, Online Holland Casino, ProPrivacy, PureVPN, RX-M, SCAMS.info, SevenJackpots.com, Software Guru, Start a Blog by Ryan Robinson, Talend, The Best VPN, Top10VPN, Twitter, and Xplenty.


ASF Targeted Sponsors provide the Foundation with non-financial contributions for specific operational activities or programs. They include:


TARGETED PLATINUM: Amazon Web Services, CloudBees, DLA Piper, JetBrains, LeaseWeb, Microsoft, OSU Open Source Labs, Sonatype, and Verizon Media.


TARGETED GOLD: Atlassian, The CrytpoFund, Datadog, PhoenixNAP, and Quenda.


TARGETED SILVER: HotWax Systems, Manning Publications, and Rackspace.


TARGETED BRONZE: Bintray, Education Networks of America, Friend of Apache Cordova, Google, Hopsie, No-IP, PagerDuty, Peregrine Computer Consultants Corporation, Sonic.net, SURFnet, and Virtru.



Apache Members, Committers, contributors, users, supporters, and Sponsors further the ASF’s mission of providing Open Source software for the public good. Help keep Apache software accessible to everyone by making a contribution* to the ASF https://donate.apache.org/ , becoming a Sponsor, or adding us to your Corporate Giving program. Please visit http://apache.org/foundation/contributing.html for more information.


Best wishes for a stellar 2021!



* The ASF is a US 501(c)(3) not-for-profit charitable organization, whose tax identification number is 47-0825376. The ASF is recognized by Charity Navigator and cited with the Gold Seal of Transparency by GuideStar.


# # #

Thursday June 04, 2020

The Apache Software Foundation Announces Apache® Hudi™ as a Top-Level Project

Open Source data lake technology for stream processing on top of Apache Hadoop in use at Alibaba, Tencent, Uber, and more.

Wakefield, MA —4 June 2020— The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today Apache® Hudi™ as a Top-Level Project (TLP).

Apache Hudi (Hadoop Upserts Deletes and Incrementals) data lake technology enables stream processing on top of Apache Hadoop compatible cloud stores & distributed file systems. The project was originally developed at Uber in 2016 (code-named and pronounced "Hoodie"), open-sourced in 2017, and submitted to the Apache Incubator in January 2019.

"Learning and growing the Apache way in the incubator was a rewarding experience," said Vinoth Chandar, Vice President of Apache Hudi. "As a community, we are humbled by how far we have advanced the project together, while at the same time, excited about the challenges ahead."

Apache Hudi is used to manage petabyte-scale data lakes using stream processing primitives like upserts and incremental change streams on Apache Hadoop Distributed File System (HDFS) or cloud stores. Hudi data lakes provide fresh data while being an order of magnitude efficient over traditional batch processing. Features include:

  • Upsert/Delete support with fast, pluggable indexing
  • Transactionally commit/rollback data
  • Change capture from Hudi tables for stream processing
  • Support for Apache Hive, Apache Spark, Apache Impala and Presto query engines
  • Built-in data ingestion tool supporting Apache Kafka, Apache Sqoop and other common data sources
  • Optimize query performance by managing file sizes, storage layout
  • Fast row based ingestion format with async compaction into columnar format
  • Timeline metadata for audit tracking

Apache Hudi is in use at organizations such as Alibaba Group, EMIS Health, Linknovate, Tathastu.AI, Tencent, and Uber, and is supported as part of Amazon EMR by Amazon Web Services. A partial list of those deploying Hudi is available at https://hudi.apache.org/docs/powered_by.html

"We are very pleased to see Apache Hudi graduate to an Apache Top-Level Project. Apache Hudi is supported in Amazon EMR release 5.28 and higher, and enables customers with data in Amazon S3 data lakes to perform record-level inserts, updates, and deletes for privacy regulations, change data capture (CDC), and simplified data pipeline development," said Rahul Pathak, General Manager, Analytics, AWS. “We look forward to working with our customers and the Apache Hudi community to help advance the project."

"At Uber, Hudi powers one of the largest transactional data lakes on the planet in near real time to provide meaningful experiences to users worldwide," said Nishith Agarwal, member of the Apache Hudi Project Management Committee. "With over 150 petabytes of data and more than 500 billion records ingested per day, Uber’s use cases range from business critical workflows to analytics and machine learning."

"Using Apache Hudi, end-users can handle either read-heavy or write-heavy use cases, and Hudi will manage the underlying data stored on HDFS/COS/CHDFS using Apache Parquet and Apache Avro," said Felix Zheng, Lead of Cloud Real-Time Computing Service Technology at Tencent.

"As cloud infrastructure becomes more sophisticated, data analysis and computing solutions gradually begin to build data lake platforms based on cloud object storage and computing resources," said Li Wei, Technical Lead on Data Lake Analytics, at Alibaba Cloud. "Apache Hudi is a very good incremental storage engine that helps users manage the data in the data lake in an open way and accelerate users' computing and analysis."

"Apache Hudi is a key building block for the Hopsworks Feature Store, providing versioned features, incremental and atomic updates to features, and indexed time-travel queries for features," said Jim Dowling, CEO/Co-Founder at Logical Clocks. "The graduation of Hudi to a top-level Apache project is also the graduation of the open-source data lake from its earlier data swamp incarnation to a modern ACID-enabled, enterprise-ready data platform."

"Hudi's graduation to a top-level Apache project is a result of the efforts of many dedicated contributors in the Hudi community," said Jennifer Anderson, Senior Director of Platform Engineering at Uber. "Hudi is critical to the performance and scalability of Uber's big data infrastructure. We're excited to see it gain traction and achieve this major milestone."

"Thus far, Hudi has started a meaningful discussion in the industry about the wide gaps between data warehouses and data lakes. We have also taken strides to bridge some of them, with the help of the Apache community," added Chandar. "But, we are only getting started with our deeply technical roadmap. We certainly look forward to a lot more contributions and collaborations from the community to get there. Everyone’s invited!"

Catch Apache Hudi in action at Virtual Berlin Buzzwords 7-12 June 2020, as well as at MeetUps, and other events.

Availability and Oversight
Apache Hudi software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache Hudi, visit http://hudi.apache.org/ and https://twitter.com/apachehudi 

About the Apache Incubator
The Apache Incubator is the primary entry path for projects and codebases wishing to become part of the efforts at The Apache Software Foundation. All code donations from external organizations and existing external projects enter the ASF through the Incubator to: 1) ensure all donations are in accordance with the ASF legal standards; and 2) develop new communities that adhere to our guiding principles. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF. For more information, visit http://incubator.apache.org/ 

About The Apache Software Foundation (ASF)
Established in 1999, The Apache Software Foundation (ASF) is the world’s largest Open Source foundation, stewarding 200M+ lines of code and providing more than $20B+ worth of software to the public at 100% no cost. The ASF’s all-volunteer community grew from 21 original founders overseeing the Apache HTTP Server to 765 individual Members and 206 Project Management Committees who successfully lead 350+ Apache projects and initiatives in collaboration with 7,600 Committers through the ASF’s meritocratic process known as "The Apache Way". Apache software is integral to nearly every end user computing device, from laptops to tablets to mobile devices across enterprises and mission-critical applications. Apache projects power most of the Internet, manage exabytes of data, execute teraflops of operations, and store billions of objects in virtually every industry. The commercially-friendly and permissive Apache License v2 is an Open Source industry standard, helping launch billion dollar corporations and benefiting countless users worldwide. The ASF is a US 501(c)(3) not-for-profit charitable organization funded by individual donations and corporate sponsors including Aetna, Alibaba Cloud Computing, Amazon Web Services, Anonymous, Baidu, Bloomberg, Budget Direct, Capital One, CarGurus, Cerner, Cloudera, Comcast, Facebook, Google, Handshake, Huawei, IBM, Indeed, Inspur, Leaseweb, Microsoft, Pineapple Fund, Red Hat, Target, Tencent, Union Investment, Verizon Media, and Workday. For more information, visit http://apache.org/ and https://twitter.com/TheASF 

© The Apache Software Foundation. "Apache", "Hudi", "Apache Hudi", "Hadoop", "Apache Hadoop", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.

# # #

Wednesday January 01, 2020

Apache in 2019 - By The Digits

What an accomplished year for The Apache Software Foundation: 2019 heralded 20 years of Open Source leadership "The Apache Way". Our rallying cry of "Community Over Code" informs everything we do, with billions worldwide benefiting from more than $20B worth of community-led software, provided 100% free-of-charge. Highlights include:

Apache Projects —https://projects.apache.org/

  • Total number of projects + sub-projects - 339
  • Top-Level Projects - 206
  • Podlings in the Apache Incubator - 46
  • ASF Committees (non-Projects) - 11
  • Other groups, including operations/support - 60


Community/People —http://home.apache.org/

  • Apache Committers - 7,203 (7,038 active)
  • ASF Members (individuals) - 765
  • New Members elected - 40


Apache Projects/Code —https://projects.apache.org/statistics.html

3,081 Apache Committers changed 59,309,787 lines of code over 171,689 commits, with an all-time high of 12,250 individuals contributing to Apache projects this year.


Profile of Apache Committers in 2019



More than 75% of contributors in 2019 were new to Apache


Top 5 Committers
  1. Andrea Cosentino (3,841 commits; 588,217 lines changed)
  2. Tilman Hausherr (2,791 commits; 64,805 lines changed)
  3. Claus Ibsen (2,562 commits; 628,919 lines changed)
  4. Jean-Baptiste Onofré (2,498 commits; 81,563 lines changed)
  5. Mark Thomas (2,452 commits; 331,234 lines changed)

Top 5 Apache Project Repositories by Commits
  1. Camel
  2. HBase
  3. Flink
  4. Beam
  5. Hadoop

Top 5 Apache Project Repositories by Size (Lines of Code)
  1. NetBeans (8,354,466)
  2. OpenOffice (7,828,646)
  3. Flex (whiteboard: 5,233,277)
  4. Mynewt (core: 4,108.323)
  5. Flex (SDK: 3,933,522)

Mailing Lists —https://lists.apache.org/
  • Total number of mailing lists 1,399
  • 19,385 authors sent 2,116,421 emails on 1,034,478 topics

Top 5 most active Apache user@ mailing lists
  1. Flink
  2. Lucene-Solr
  3. Ignite
  4. Kafka
  5. Tomcat

Top 5 most active Apache dev@ mailing lists
  1. Beam
  2. Flink
  3. Tomcat
  4. Royale
  5. NetBeans

Contributor License Agreements and Software Grants —https://www.apache.org/licenses/

We welcomed an average of 187 new code contributors and 1,670 new people filing issues each month during 2019. Individuals who are granted write access to the Apache repositories must submit an Individual Contributor License Agreement (ICLA). Corporations that have assigned employees to work on Apache projects as part of an employment agreement may sign a Corporate CLA (CCLA) for contributing intellectual property via the corporation. Individuals or corporations donating a body of existing software or documentation to one of the Apache projects need to execute a formal Software Grant Agreement (SGA) with the ASF. 
  • ICLAs - 759
  • CCLAs - 34
  • Grants - 40

Sponsorship and Individual Support —http://apache.org/foundation/contributing.html

The generous support of hundreds of individual donors and Sponsors helps offset the ASF's day-to-day operating expenses that include Infrastructure, Accounting, Legal, Fundraising, Marketing & Publicity, and other services.

ASF Sponsors provide financial backing for the ASF's operations.

  • Platinum: Amazon Web Services, Cloudera, Comcast, Facebook, Google, Leaseweb, Microsoft, Pineapple Fund, Tencent, and Verizon Media.
  • Gold: Anonymous, ARM, Bloomberg, Handshake, Huawei, IBM, Indeed, Union Investment, and Workday.
  • Silver: Aetna, Alibaba Cloud Computing, Baidu, Budget Direct, Capital One, CarGurus, Cerner, Inspur, ODPi, Private Internet Access, Red Hat, and Target.
  • Bronze: Airport Rentals, Bestecasinobonussen.nl, The Blog Starter, Bookmakers, Cash Store, Casino2k, Cloudsoft, The Economic Secretariat, Emerio, Footprints Recruiting, Gundry MD, HostChecka.com, HostingAdvice.com, Journal Review, LeoVegas Indian Online Casino, Host Advice, Mutuo Kredit AG, Online Holland Casino, ProPrivacy, PureVPN, RX-M, SCAMS.info, Site Builder Report, Start a Blog by Ryan Robinson, Talend, The Best VPN, Top10VPN, Twitter, and Web Hosting Secret Revealed.

ASF Targeted Sponsors provide the Foundation with non-financial contributions for specific activities or programs.

  • Targeted Platinum: CloudBees, DLA Piper, JetBrains, Microsoft, OSU Open Source Labs, Sonatype, and Verizon Media.
  • Targeted Gold: Atlassian, The CrytpoFund, Datadog, PhoenixNAP, and Quenda.
  • Targeted Silver: Amazon Web Services, HotWax Systems, and Rackspace.
  • Targeted Bronze: Bintray, Education Networks of America, Google, Hopsie, No-IP, PagerDuty, Peregrine Computer Consultants Corporation, Sonic.net, SURFnet, and Virtru.


Collectively, our Members, Committers, contributors, users, supporters, and sponsors further our mission of providing Open Source software for the public good. Learn more about The Apache Software Foundation's activities in the FY2019 Annual Report https://s.apache.org/FY2019AnnualReport

Help keep Apache software accessible to everyone: to sponsor or make a contribution* to the ASF, please visit http://apache.org/foundation/contributing.html

Here's to a brilliant 2020!

* The ASF is a US 501(c)(3) not-for-profit charitable organization, whose tax identification number is 47-0825376. The ASF is recognized by Charity Navigator and cited with the Gold Seal of Transparency by GuideStar.

# # #

Monday June 24, 2019

Statement by The Apache Software Foundation Board of Directors

It is with a mix of sadness and appreciation that the ASF Board accepted the resignations of Board Member Jim Jagielski, Chairman Phil Steitz, and Executive Vice President Ross Gardler last month.

As an ASF co-founder, Jim has held every officer position since the Foundation’s incorporation, with the exception of a one-year break in 2018. He has played a substantial role in the development and success of the organization and is a recognized advocate of Open Source at the developer and corporate levels.

An ASF Member since 2005, Phil was instrumental in the adoption, growth, and ubiquity of Apache Java projects across many industries, most visibly financial services. He served as Vice President Apache Commons for four years, and as ASF Chairman August 2017 - May 2019.

Ross has been championing The Apache Way to governments, corporations, and educational institutions for nearly two decades. Since becoming an ASF Member in 2005, he served as Vice President of Community Development (2009-2012), ASF Director and President (2015-2016), and ASF Executive Vice President October 2016 - May 2019.

We laud their contributions to many of the ASF's achievements over the past two decades [1]. Their motivation, vision, and passion is truly inspiring. Whilst we will greatly miss their day-to-day leadership at the executive level, we are heartened that the Foundation will continue to benefit through their participation as ASF Members.

We look forward to the next chapter of the ASF as we continue to support the Foundation and hundreds of Apache projects and their communities who advance our mission of providing software for the public good at 100% no cost.

We are committed to ensuring the Foundation remains effective and stable. It's a unique opportunity in the ASF's history to build upon the accomplishments of past Boards, apply new methodologies, and work through diverse perspectives with the aim of helping the ASF continue its successful trajectory.

We appreciate your trust and are happy to discuss our progress at our upcoming ApacheCons in Las Vegas and Berlin [2].

[1] https://s.apache.org/ASF20thAnniversary
[2] https://www.apachecon.com/

# # #

Thursday March 28, 2019

Announcing New ASF Board of Directors

At The Apache Software Foundation (ASF) Members' Meeting held this week, the following individuals were elected to the ASF Board of Directors:

 - Rich Bowen
 - Shane Curcuru
 - Jim Jagielski
 - Myrle Krantz
 - Daniel Ruggeri
 - Craig Russell
 - Roman Shaposhnik
 - Phil Steitz
 - Joan Touzet

The ASF thanks Bertrand Delacretaz, Isabel Drost-Fromm, Ted Dunning, Brett Porter, and Mark Thomas, who chose not to stand for re-election this year. The Foundation thanks them for their service, and welcomes our new and returning directors.

An overview of the ASF's governance, along with the complete list of ASF Board of Directors, Executive Officers, and Project/Committee Vice Presidents, can be found at http://apache.org/foundation/

For more information on the Foundation's operations and structure, see http://apache.org/foundation/how-it-works.html#structure

# # #

Monday March 18, 2019

Apache Software Foundation Platinum Sponsor Profile: Leaseweb

with Robert van der Meulen, Global Product Strategy Lead at Leaseweb


Robert is Global Product Strategy Lead at Leaseweb. Fascinated by technology, Robert studied computer sciences, and after his studies, he delved into the then relatively young and rapidly developing internet technology. He soon understood that the internet would be at the center of almost everything we do and wanted to be part of it. Robert is passionate about using technology to improve people's lives. He contributed to the Debian project as a developer later introduced Apache CloudStack in Leaseweb and has been active in the open source community for quite some time. During his 9 years at Leaseweb, he worked hard to make sure digital transformation, from how we communicate to how we do business, is part of the company mission. Follow @Leaseweb on Twitter.


"Many Apache projects are being built by – mostly – volunteers and motivated individuals, and the world can use, change and develop all of those. It's important to support the people that make this possible."


How did Leaseweb's work with Open Source begin?

There's a long history of open source within Leaseweb. When you do large-scale hosting, open source operating systems, tools, and applications are always pretty much part of your product – and working on open source projects brings mutual benefits. This can mean running a mirror for your customers and "the outside world", fixing and reporting bugs, or helping with actual features or changes to "products" you use. As our services portfolio grew, we started using and contributing to more open source projects, especially when Cloud was becoming a bigger part of the portfolio – bringing the need for more middleware and (platform) management.

Why Apache? How is Leaseweb involved with the ASF, and for how long?

If you count the Apache HTTP Server predating the ASF, we've probably been using ASF software in one way or another for as long as we exist – which is incidentally pretty close to the ASF (we celebrated our 20th anniversary about 2 years ago). Our contributions and use of ASF projects grew significantly when Apache CloudStack became part of the portfolio. CloudStack being open source and managed by the ASF after cloud.com and Citrix adventures gave us a nudge to start using it more. I'd say CloudStack is a significant part of our Cloud portfolio right now – we run large deployments all over the world, often supporting critical customer applications.

Why is support for foundations such as the ASF important? How does helping the ASF help Leaseweb?

Support for foundations such as the ASF is important because those foundations are important :-) . Any big open source project at some point needs the infrastructure to continue to run – and it's great if a project can rely on an organization like the ASF for that infrastructure so the focus can be on making the project great. Open source projects can grow and be more successful if they can more easily deal with governance, financials and administration, as well as tangible infrastructure and tools. Helping an organization like the ASF helps the ASF projects all over, which has an impact on the software we use as part of our products.

What sets the ASF apart from other software foundations or consortia?

The simple distinguishing factor is the size of the ASF. There's a vast number of projects which are part of the ASF, and most of those are significant in the free software "portfolio". Many of those components and projects are used in a less visible way, being part of the hard-core infrastructure of the Internet – but without them, many tools and devices we use would not be able to function. What I like about the ASF, is that it embodies the open source spirit by valuing consensus and being an enabler for the projects that are part of the Foundation. 

What does "The Apache Way" mean to Leaseweb? What makes The Apache Way special?

There's overlap between The Apache Way and the values we try to stick to in Leaseweb. 

Important ones are Merit, Open and Consensus – in part due to our backgrounds. We value facts, deeply dislike assumptions, and like to make decisions based on proper motivation and data. Growth makes people happy and gives great results, so we like people to do more of what they're good at – and to get even better at it. This also means being open to new opinions and ideas – executing on them because they make sense, not because where they come from. I think those values are recognizable in many open source communities as well as in The Apache Way.

Do you have any guidelines for promoting innovation? There is no limit with Open Source: how do you stay focused?

Open source projects are often tools for us or part of a product offering or service – which means we'd look at what tools fit best to get to where we want to be. Innovation is more a natural part of that, as we need it to continue to offer the right services to our markets. It's pretty much impossible to do that without constantly innovating and adding new features and products. Focus is a more difficult one; a large part of that focus is driven by our target markets and customers, so listening to that market and those customers to figure out what they want is super important. That information, along with keeping a close eye on market trends, gives us direction and focus from a product and services perspective. From a more technical angle, great engineers are always looking for ways to make their systems and services better :-) . Lots of input (and obviously the execution!) comes from the product teams – either by constant optimization and small changes, or by bigger business cases or ideas that can be executed. 

How does Apache fit into Leaseweb's long-term strategy/plans?

A number of our leading Cloud products are based on Apache software. We use Apache CloudStack for various private cloud and VPS offerings, and those platforms are continually growing and evolving – and we keep adding more with most of the new locations we open. Along with the CloudStack platforms, hosting environments obviously have many deployments using Apache web servers. Within our technical teams we consume lots of different Apache projects and actively contribute to a number of them (we have a dedicated CloudStack team that includes one of the Apache CloudStack PMC members). Every software solution has its limits, and obviously this goes for CloudStack too – but also we're happy we can change or help change the things that could be better. 

Money is just one way to support the ASF. How else do you contribute? What recommendations do you have for others to participate?

An important part of the ASF support is the platform we provide; it's obvious that The Apache Software Foundation would run their infrastructure on Apache CloudStack, but I'm happy the Leaseweb team and infrastructure is delivering and maintaining that particular CloudStack setup. There's a sense of pride; the people building the software are trusting our services to do part of that on. Next to that, it proves to me we provide a set of services and products we can be proud of and one that fits the requirements of a bunch of pretty hardcore Cloud developers.

How does it feel to be able to offer this level of support?

It feels great! It's important to, if you have the opportunity, give something back. Many Apache projects are being built by – mostly – volunteers and motivated individuals, and the world can use, change and develop all of those. It's important to support the people that make this possible.

Are there any other thoughts on the experience of being a large-scale donor that you would like to share? What else do we need to know?

Not much. I personally really enjoy seeing what happens with the support we provide – what projects it makes possible, what things it makes more easy or better. Tangible insight in the results is a big motivator as well as a proof point. 

# # #

Sponsors of The Apache Software Foundation such as Leaseweb enable the all-volunteer ASF to ensure its 300+ community-driven software products remain available to billions of users around the world at no cost, and to incubate the next generation of Open Source innovations. For more information sponsorship and on ways to support the ASF, visit http://apache.org/foundation/contributing.html . 

Tuesday February 05, 2019

Success at Apache: For Love or Money: Volunteer vs. Professional Open Source

EDITOR'S NOTE: "Success at Apache" reflects diverse, personal experiences from our community members, with particular focus on the people and processes behind why the ASF "just works". The post below is the result of a discussion with the author that originated in early September 2018 and remained unpublished as its tone deviates from the general tenor of this series.Over the past few months, this topic has increased in visibility and relevance with the greater community, and we have made an exception in publishing due its timeliness and representation of the evolution of Open Source communities, both within and without Apache.

by Rich Bowen

A few weeks ago, a colleague asked me what I believed to be the biggest threat facing Open Source today. I answered that I think it's full-time Open Source developers, and the effect they have on part-time volunteer developers.

Long ago (it actually hasn't been very long, it just seems that way sometimes) Open Source was developed primarily by part-time hobbyist developers, working on their evenings and weekends on things that they were passionate about. Sure, there were full-time developers, but they were in the minority. Those of us working a few hours on the weekends envied them greatly. We wished that we, too, could get paid to do the thing that we love.

Now, 20 years on, the overwhelming majority of Open Source development is done by full-timers, working 9-5 on Open Source software. And those who are working nights and weekends are often made to feel that they are less important than those that are putting in the long hours.

Most of the time, this is unintentional. The full-timers are not intentionally marginalizing the part-timers. It just happens as a result of the time that they're able to put into it.

Imagine, if you will, that you're an "evenings-and-weekends" contributor to a project. You have an idea to add a new feature, and you propose it on the mailing list, as per your project culture. And you start working on it, a couple of hours on Friday evening, and then a few more hours on Saturday morning before you have to mow the lawn and take your kids to gymnastics practice. Then there's the cross country meet, and next thing you know, it's Monday morning, and you're back at work.

All week you think about what you're going to do next weekend.

But, Friday evening comes, and you `git pull`, and, lo and behold, one of the full-timers has taken your starting point, turned it in a new direction, completed the feature, and there's been a new release of the project. All while you were punching the clock on your unrelated job.

This is great for the product, of course. It moves faster. Users get features faster. Releases come out faster.

But, meanwhile, you have been told pretty clearly that your contribution wasn't good enough. Or, at the very least, that it wasn't fast enough.

The Cost of Professionalism

And of course there are lots of other benefits, too. Open Source code, as a whole, is probably better than it used to be, because people have more time to focus. The features are more driven by actual use cases, since there's all sorts of customer feedback that goes into the road map. But the volunteerism that made Open Source work in the first place is getting slowly squelched.

This is happening daily across the Open Source world, and MOST of it is unintentional. People are just doing their jobs, after all.

We are also starting to see places where projects are actively shunning the part timers, because they are not pulling their weight. Indeed, in recent weeks I've been told this explicitly by a prominent developer on a project that I follow. He feels that the part timers are stealing his glory, because they have their names on the list of contributors, but they aren't keeping up with the volume of his contributions.

But, whether or not it is intentional, I worry about what this will do to the culture of open source as a whole. I do not in any way begrudge the full-timers their jobs. It's what I dreamed of for *years* when I was an evenings-and-weekends open source developer, and it's what I have now. I am *thrilled* to be paid to work full time in the Open Source world. But I worry that most new Open Source projects are completely corporate driven, and have none of the passion, the scratch-your-own-itch, and the personal drive with which Open Source began.

While most of the professional Open Source developers I have worked with in my years at Red Hat have been passionate and personally invested in the projects that they work on, there's a certain percentage of them for whom this is just a job. If they were reassigned to some other project tomorrow, they'd switch over with no remorse. And I see this more and more as the years go by. Open Source projects as passion is giving way to developers that are working on whatever their manager has assigned, and have no personal investment whatsoever.

This doesn't in any way mean that they are bad people. Work is honorable.

I just worry about what effect this will have, and what Open Source will look like 20 years from now.


Rich Bowen has been doing open source-y stuff since about 1995, and has been a member of the Apache Software Foundation since 2002. He currently serves on the ASF Board of Directors. By day, he's the CentOS Community Manager, working for Red Hat. Read Rich's earlier post, "Success at Apache: Wearing Small Hats".

# # #

"Success at Apache" is a monthly blog series that focuses on the people and processes behind why the ASF "just works". https://blogs.apache.org/foundation/category/SuccessAtApache

Monday January 28, 2019

The Apache Software Foundation Welcomes Amazon Web Services (AWS) as its Newest Sponsor

The Apache Software Foundation (ASF) today welcomed Amazon Web Services (AWS), the latest company to sponsor the ASF at the Platinum level.

"We are pleased to have AWS as a Platinum Sponsor," said ASF Chairman Phil Steitz. "In addition to helping support the ASF’s day-to-day operations through a Platinum sponsorship, AWS has been a Silver-level Targeted Sponsor providing the ASF Infrastructure team with AWS Cloud credits over the past two years. More than 350 Apache projects and their communities directly benefit from the generosity of ASF Sponsors."

AWS joins the following Sponsors:

Platinum level --Cloudera, Comcast, Facebook, Google, LeaseWeb, Microsoft, Oath, Pineapple Fund, and Tencent;

Gold level --Anonymous, ARM, Bloomberg, Handshake, Hortonworks, Huawei, IBM, Indeed, Pivotal, and Union Investment;

Silver level --Aetna, Alibaba Cloud Computing, Baidu, Budget Direct, Capital One, Cerner, Inspur, ODPi, Private Internet Access, Red Hat, and Target;

Bronze level --Airport Rentals, The Blog Starter, Bookmakers, Cash Store, Casino Bonus, Casino2k, Cloudsoft, Emerio, Footprints Recruiting, HostChecka.com, HostingAdvice.com, HostPapa Web Hosting, The Linux Foundation, Mobile Slots, Mutuo Kredit, Online Holland Casino, RX-M, SCAMS.info, Site Builder Report, Talend, The Best VPN, Twitter, and Web Hosting Secret Revealed.

For more information on becoming a Sponsor of the ASF, please see http://apache.org/foundation/sponsorship.html

# # #

Tuesday January 15, 2019

Apache Software Foundation Gold Sponsor Profile: Bloomberg

with Kevin Fleming, Head of Open Source Community Engagement and Member of the CTO Office at Bloomberg


Kevin has spent more than 20 years in the technology industry. In 2004, he started a VOIP service provider company and chose Asterisk as his platform. 9 months later he was offered a position at Digium to work on Asterisk full-time. After seven years of developing and managing the Asterisk project, and helping to design and build the Asterisk SCF project, he moved on to Bloomberg LP, where he works with various teams to help produce and support Bloomberg's open software, used by its customers and partners to integrate with the Bloomberg Terminal. Follow @realkpfleming and @TechAtBloomberg on Twitter.


“ASF's very explicit statement that every participant

in a project is participating personally is really

a big differentiating factor to the other models.”


How did Bloomberg's work with Open Source begin?

Bloomberg has been a consumer of Open Source software for decades, but our involvement as a community collaborator and contributor began in earnest about seven years ago. That was a result of a company-wide decision to begin using Open Source tools in the development of our applications and infrastructure when possible, instead of commercial or proprietary tools.

Open Source tools are important to us because, when a tool’s source code is available, you have the flexibility to understand how it works, to modify it, and to support it by yourself, and you don’t have to rely on a vendor who might go away or whose priorities might change. That's not to say that there aren’t vendors who support their Open Source tools, because there are, but we're not locked into a channel model. Open Source tools give us control over our own destiny. If something becomes important enough to us, we can form an internal team to provide support and enhancements for that tool and the team can contribute those enhancements back to the Open Source community, if appropriate.

Why Apache? How is Bloomberg involved with the ASF, and for how long?

Bloomberg has been involved with the ASF for almost all of the seven years that we've been an active participant in the Open Source community. Apache is the home of dozens of Open Source projects that are fundamentally important to us. These tools support our data science workflows, data processing workflows, and web services, as well as other internal and external services that the company operates.

It's important to us that the organization providing a project's home is able to effectively support the project, and we want developers that work on projects in the Apache organization to focus on actually developing software, without having to spend time managing infrastructure (like bug trackers, source code repositories, and related tools). Contributors to Apache projects can focus on their own project because of the support from the foundation that these projects receive.

Why is support for foundations such as the ASF important? How does helping the ASF help Bloomberg?

Supporting organizations like the ASF is important to Bloomberg because of its governance model. Every participant in one of these Open Source projects has equal representation and footing, and developers are valued based on the merits of their contribution. Projects operating under a different governance model might not offer same type of participation for developers unless their company has made a significant financial contribution to the organization. Bloomberg participates in those types of organizations as well, but we strongly prefer those that allow everyone to participate. Not only can input come from a broader community, but also this allows contributors with a varied level of experience to participate in projects.

Contributors on these projects aren't only developers; they also include people with varied skills like documentation, project management, and marketing. Sometimes a project's decision-makers don’t write any code. While useful tools are developed within both governance models, the way an Apache project's roadmap is set, planned, and decided upon is what's important to us.

What sets the ASF apart from other software foundations or consortia?

The one really significant difference from other charitable foundations that we also support is the ASF's governance model. ASF's very explicit statement that every participant in a project is participating personally is really a big differentiating factor to the other models. 

What does 'The Apache Way' mean to Bloomberg? What makes 'The Apache Way' special?

As a result, meetings held to discuss the next major phase of an Apache project's development don't feel like any company has a representative group. ASF's policy states that you're there representing your own personal interests and not anyone else's. Even though I may know where everyone in the room works, it doesn’t matter and is not relevant. That doesn’t mean you can't say 'the company I work for uses this piece of software in a certain way,' and obviously the way you use the software will impact your opinion, but you don’t make suggestions because that’s what your company wants. Your decisions about a project should be influenced by what you think is best for the broader community. A contributor's entire currency is based on their contributions and how the community values their participation.

Do you have any guidelines for promoting innovation? There is no limit with Open Source: how do you stay focused?

We often have to consider different solutions because of changing business requirements, new types of systems to manage, or some other reason. We strongly encourage people to take a few steps back from their daily work of maintaining our systems to think about what the service may need to support in the future. We want to know if we're using the right tool, if we should use something else, or if we should improve an existing tool.

We strongly encourage people to consider that they're not just delivering a service today and that they don't want that service to be irrelevant to clients six months down the road. They need to build applications and provide databases and services that have the right functionality that meet our clients' present needs. But, you also have to choose solutions that can grow with you and that you can grow, and you have to be able to take technology in the direction you need it to go. You can do this with Open Source easier than you can with a commercial, off-the-shelf solution.

How does Apache fit into Bloomberg's long-term strategy/plans?

It's strategically important to us to ensure that the ASF continues to be a place where new, interesting, and exciting Open Source projects want to go to as their home. As these projects evolve beyond just a few people putting code up on GitHub, they can receive the benefits and support that ASF has to offer – this is a good thing for Bloomberg and the greater Open Source community.

Money is just one way to support the ASF. How else do you contribute? What recommendations do you have for others to participate?

We participate in helping the ASF decide how best to support organizations like us, and we look for ways to further help the ASF through regular conversations with them. We help spread the word about how Apache projects are run and why that's valuable and important – marketing and evangelizing is another way we contribute to the overall health of the community and the projects to which our developers contribute.

There are a few different models for how people can contribute to Apache projects. We have some employees who contribute outside of work, in their personal time. On the other end of the spectrum, we have staff that spend most of their day working on Open Source projects, including reviewing patches that aren't contributed by Bloomberg employees. These Open Source projects are important to Bloomberg, and this work ensures that the projects move forward. Of course we also have team members who contribute to projects as the need arises, which may mean that they contribute to a dozen (or more) projects in any given year.

We also host collaborative weekend sprints that are focused on a specific Open Source project or group of projects. Between 50 and 100 people attend these events with project leaders so they can learn how to become contributors to that specific project, and we provide the support system so they can participate. We ask people to break up into teams so they can, while working as part of a small group, tackle open items on the project's 'to do' list. The groups dive in and try to figure out how to solve each problem, and experts are available to help when people get stuck or to merge participants' PRs on the spot.

For some of those attending, this may be their initial contribution experience, and it's important that this experience is very collaborative, friendly, and productive. The goal of these events is to train people to be Open Source contributors. While the sprint might not be for an Apache project, some participants end up contributing to Apache projects in the future. Once someone becomes a contributor, they become more comfortable with the process and know what to expect, and they're able to translate that experience to subsequent projects.

How does it feel to be able to offer this level of support?

We're pleased to be able to participate and encourage more companies to sponsor the Apache Software Foundation. Everyone wins when everyone collaborates.

# # #

Sponsors of The Apache Software Foundation such as Bloomberg enable the all-volunteer ASF to ensure its 300+ community-driven software products remain available to billions of users around the world at no cost, and to incubate the next generation of Open Source innovations. For more information sponsorship and on ways to support the ASF, visit http://apache.org/foundation/contributing.html . 

Monday January 07, 2019

Success at Apache: Accidentally Finding Awesome

by Daniel Ruggeri

My involvement with the ASF started very simply. 

I was a server administrator who wanted to have a bug or two fixed and a feature or two added to the Apache HTTP Server (httpd). I spent some time learning the codebase a bit and submitted my first patch to the httpd dev list. I wasn't expecting much of a response, as my primary motivation was to avoid having to maintain the fixes locally, but I was willing to do it if I had to. Surprisingly enough, I got a quick response from "those developer folks" and was thanked for the contribution and given a bit of guidance on when it might hit the stable branch. Wow... cool: that was neat. 

Fast forward a few months and I found a few other bugs and even added a few features myself. Every time, "those folks" (who I certainly wasn't a part of because I just wasn't as brilliant) took the contribution and incorporated it into the code base or gave pointers on how to improve it. This was getting to be pretty cool, but I was convinced "they" were just happy for the free labor.

Around that time, I was hitting some serious career growth. Working with httpd so much at $dayjob, I got to the point where I knew the ins and outs of the proxy quite well. I was also challenged by a mentor to consider giving public talks. After mulling over it, I figured I'd give it a shot and since 98% of $dayjob revolved around the httpd proxy and cool things you could do with it, I submitted a talk to ApacheCon. Again, thinking "Maybe 'those folks' don't have a full schedule" and are seeking content.

Amazingly, "those folks" accepted the talk. I gave my first *real* public speech at ApacheCon. I remember it vividly: I was nervous as hell. I mean... could you believe it? The PRESIDENT/one of the founders of the ASF was in the audience. The guy that WROTE THE MANUAL AND BOOK about httpd was also sitting right there. Oh, by the way, the guy that WROTE THE BOOK about modules was across the aisle a few rows back. Not to mention the fact that I saw several name tags I recognized as 'heavy hitters' in the community. I couldn't believe it, but they actually took the time out of their day to hear what I had to say.

While I think the presentation was probably "OK" at best, I was still welcomed and even got to chat with "those folks" about the future of the project and where it might be taken. I also got to chat with folks from several other projects. I heard about this "Hadoop" thing from a newfound friend (didn't make much sense to me) and enjoyed some awesome meals with folks from other projects (I didn't even come close to understanding what they did, but wrote down the names for later research). I connected with a fellow server admin who had some cool ideas for httpd and spent some time brainstorming how to make our @dayjobs better. 

I learned so many things in that first year at ApacheCon. 

Not too long afterwards, I was invited to be a committer on the httpd project. This floored me because, for the first time, I realized that this wasn't about sucking up free patches from the outside world. I started to see that the project was interested in me being part of it.

So, off I went. I continued submitting ever-more-interesting patches here and there, going to ApacheCons and giving ever-more-refined versions of my proxy talk and spending ever-more-time with other folks in the community. I did this really cool "barcamp" thing where we talked about whatever we wanted and got to expand our minds. 

There was also something about making the terrible (wonderful?) mistake of buying the first round the last night of the ‘con once, too: shenanigans all around. 

After a while, I didn't feel like just an outsider trying to run a better server at $dayjob. I felt like I was part of this bigger thing that was going on. I was hanging out with people I genuinely consider friends as opposed to "those folks". I was loving it and I wanted to share it. So I did the unthinkable... in a moment of boldness or temporary insanity enhanced by a faulty governor that avoids embarrassment, I gave a completely ad-hoc lightning talk titled "I love this community." And at that point... the cat was out of the bag and I was effectively 'all in'.

The really cool thing is that I wasn't actively trying to join the community. Heck, I didn't even realize a community was there. Instead the community was actually pulling me toward it. I had no idea what the *depth* and *richness* of where things would go. I started as an outsider with zero expectation other than making my life a bit easier at work and stumbled upon one of the most rewarding things I've found outside of blood family.

... THIS is why I love this community and THIS is why I want to serve it. So, as it follows, this is where the pride comes from. With a family this welcoming, it makes it easy to become more involved. So… the only thought to leave you with is “How are YOU going to get move involved?”

Daniel Ruggeri is an Open Source evangelist and lover of tech. At work, he is responsible for setting the direction of the Web and Cloud space for Mastercard and he spends his time playing with infrastructure and the code that powers it both inside the firewall and outside. He is a member of The ASF and has contributed code to Open Source projects from simple pet projects to widely utilized servers. As a lover of Open Source, he even teaches courses about Open Source Software Development (and will share the curriculum with you!).

# # #

"Success at Apache" is a monthly blog series that focuses on the people and processes behind why the ASF "just works". https://blogs.apache.org/foundation/category/SuccessAtApache

Tuesday January 01, 2019

Apache in 2018 - By The Digits

It's been a great year for the Apache community at-large. With nearly 200M lines of code under the ASF's stewardship, our ongoing success is the result of community-led development "The Apache Way", executed through the collaborative efforts of more than 300 Apache projects and their communities. Highlights include:

Apache Projects https://projects.apache.org/
  • Total number of projects + sub-projects - 328 (not including Apache Labs initiatives)
  • Top-Level Projects - 198
  • Podlings in the Apache Incubator - 51
  • Other groups, including operations/support - 62

Community/People —http://home.apache.org/
  • Apache Committers - 7,032 (6,693 active)
  • ASF Members (individuals) - 730
  • New Members elected - 44

Apache Projects/Code —https://projects.apache.org/statistics.html

3,208 Apache Committers changed 78,493,228 lines of code over 201,220 commits. We also  welcomed 4,638 new code contributors and 15,861 new issue/pull request contributors.

 Top 5 Apache Code Committers 

  1. Andrea Cosentino (2,508 commits; 237,224 lines changed)
  2. Jean-Baptiste Onofré (2,098 commits; 1,208,851 lines changed)
  3. Duo Zhang (1,956 commits; 809,085 lines changed)
  4. Mark Thomas (1,823 commits; 179,883 lines changed)
  5. Tilman Hausherr (1,736 commits; 81,940 lines changed)

Top 5 Apache Project Repositories by Commits
  1. Hadoop
  2. HBase
  3. Beam
  4. Camel
  5. Flink
Top 5 Apache Project Repositories by Size (Lines of Code)
  1. OpenOffice (7,822,699)
  2. NetBeans (7,741,506)
  3. Flex (whiteboard: 5,233,722; SDK 3,933,522)
  4. Mynewt (documentation: 4,381.072)
  5. Hadoop (3,881,797)

"If it didn't happen on-list, it didn't happen." https://lists.apache.org/
  • Total number of mailing lists 1,131
  • 19,435 authors sent 1,497,005 emails on 505,793 topics
Top 5 most active Apache user@ mailing lists
  1. Flink
  2. Lucene
  3. Ignite
  4. Cassandra
  5. Kafka
Top 5 most active Apache dev@ mailing lists
  1. Beam
  2. Ignite
  3. Kafka
  4. Tomcat
  5. James

Contributor License Agreements and Software Grants —https://www.apache.org/licenses/

We welcomed an average of 387 new code contributors and 1,250 new people filing issues each month. Individuals who are granted write access to the Apache repositories must submit an Individual Contributor License Agreement (ICLA). Corporations that have assigned employees to work on Apache projects as part of an employment agreement may sign a Corporate CLA (CCLA) for contributing intellectual property via the corporation. Individuals or corporations donating a body of existing software or documentation to one of the Apache projects need to execute a formal Software Grant Agreement (SGA) with the ASF. 
  • ICLAs signed - 831
  • CCLAs signed - 35
  • Software Grants submitted - 25

Sponsorship and Individual Support —http://apache.org/foundation/contributing.html

Thank you to our hundreds of individual donors and Sponsors whose generous support helps offset the ASF's day-to-day operating expenses that include Infrastructure, Accounting, Fundraising, Marketing & Publicity, and more.
  • Platinum: Cloudera, Comcast, Facebook, Google, LeaseWeb, Microsoft, Oath, Pineapple Fund, and Tencent Cloud.

  • Gold: Anonymous, ARM, Bloomberg, Handshake, Hortonworks, Huawei, IBM, Indeed, Pivotal, and Union Investment.

  • Silver: Aetna, Alibaba Cloud Computing, Baidu, Budget Direct, Capital One, Cerner, Inspur, ODPi, Private Internet Access, Red Hat, and Target.

  • Bronze: Airport Rentals, Best VPN, The Blog Starter, Bookmakers, Cash Store, Casino Bonus, Casino2k, Cloudsoft, Emerio, Footprints Recruiting, HostChecka.com, HostingAdvice.com, HostPapa Web Hosting, The Linux Foundation, Mobile Slots, Mutuo Kredit AG, Online Holland Casino, RX-M, SCAMS.info, Site Builder Report, Talend, The Best VPN, Twitter, and Web Hosting Secret Revealed.

ASF Targeted Sponsors provide the Foundation with contributions for specific activities or programs.
  • Targeted Platinum: DLA Piper, Microsoft, Oath, OSU Open Source Labs, and Sonatype.

  • Targeted Gold: Atlassian, The CrytpoFund, Datadog, PhoenixNAP, and Quenda.

  • Targeted Silver: Amazon Web Services, HotWax Systems, and Rackspace.

  • Targeted Bronze: Bintray, Education Networks of America, Google, Hopsie, No-IP, PagerDuty, Peregrine Computer Consultants Corporation, Sonic.net, SURFnet, and Virtru.


Together, our Members, Committers, contributors, users, supporters, and sponsors continue to build on our mission of providing Open Source software for the public good and are helping keep Apache software accessible to everyone.

Wishing you the best in 2019!

# # #

Calendar

Search

Hot Blogs (today's hits)

Tag Cloud

Categories

Feeds

Links

Navigation