Entries tagged [why]

Tuesday April 10, 2018

Success at Apache: Am I there yet? A n00b's perspective

by Charles Givre

Let me start out by saying that I am not a developer. I do have a technical background, but I hadn't coded in Java for at least 10 years before I got involved in the Apache Drill project. One has to wonder how, as a non-developer, I ended up as a committer for the Drill project. In this blog post, I'd like to share with you how I came to be involved with the Drill project.

But first, why Drill?

I first heard about Drill at an industry conference several years ago. I was speaking with Dr. Ellen Friedman about some data issues we were having and she casually mentioned have I tried Drill? I had not heard of it at that point, so I did some research and it seemed as if Drill could solve a lot of problems that my clients were having. But then, I tried using it and kept getting stuck.  

If you aren't familiar with Apache Drill, Drill is an SQL engine which allows you to query any kind of self-describing data. After experimenting with Drill for a while, I was impressed enough to thing that the tool had major potential in security. One of the biggest problems that Drill solves is the need to Extract, Transform, Load (ETL) data into an analytic tool before actually doing analysis of that data. This ETL process adds no value to anything really, and costs large enterprises literally millions of dollars as well as adding unnecessary delays between the time data is ingested and when the data is actually available for analysis. In security applications, this delay directly translates into risk. The longer it takes to make your data available, the more time it will take to potentially find malicious activity and hence, more risk. Therefore, if you're able to query the data without having to do any kind of ETL or ingestion, you are lowering your risk as well as potentially saving millions of dollars.

Getting Involved

Unfortunately, when I started using Drill, I saw this potential, but I couldn't get it to work. My next step from here was to try to get assistance at my company. I pitched the ideas to my company leadership, but it proved very difficult to get the company to pull Java developers from revenue generating projects to work on this "pie-in-the-sky", unproven project. After spending several months on this, I got really frustrated and decided that I was going to try to do it myself, however, I really had no idea what I was doing. I hadn't coded in Java for at least 10 years at the time, and had zero experience with all the modern Java development tools such as Maven and Git. What I did have was persistence, so I started asking for help and decided that I was going to dive right in and start adding the functionality that I felt Drill needed to be useful in security applications. I started working on something that someone else started—the HTTPD format plugin for Drill. Most of the coding was done, but there was still enough there for me to get my hands dirty and start figuring things out.

What I learned

I still would not consider myself a developer, but after getting that particular item committed to the codebase, I learned a lot about how open source projects actually work as well as writing production quality code. Since then, I've tried to add at least one bit of new functionality to each Drill release. I would encourage anyone who is interested in contributing to an Open Source project at the Apache Software Foundation, to dive right in, and start. There are still a lot of ideas I have for Drill, and with time, I hope to have the time to see them through to implementation.

In conclusion, I'm fairly certain that my involvement with Drill and the Apache Software Foundation is really just beginning. I'm currently working on the O'Reilly book about Apache Drill with a fellow Drill committer. It is my hope that the book will spark additional interest in Apache Drill. Open Source software is at the heart of the ongoing data revolution which is dramatically expanding what is possible with data. I firmly believe that Apache Drill will have a role to play in this data revolution and I'm honored to have the opportunity to play a small role in developing Drill.

Charles Givre CISSP is a Lead Data Scientist at Deutsche Bank where he works in the Chief Information Security Office (CISO). Mr. Givre is an active data science instructor and regularly teaches classes about data science and security at various industry conferences, such as BlackHat. Mr. Givre is a committer for the Apache Drill project and together with Mr. Paul Rogers, is working on the forthcoming O’Reilly book about Apache Drill. He can be reached at cgivre(at)apache(dot)org.  

= = =

"Success at Apache" is a monthly blog series that focuses on the processes behind why the ASF "just works" https://blogs.apache.org/foundation/category/SuccessAtApache

# # #

Monday March 05, 2018

Success at Apache: Open Innovation from a Non-native English Country

by Von Gosling

When I saw the "Success at Apache" series, I thought about writing something about my, being from a non-native English country, Open Source experience these past few years. Last year, RocketMQ graduated from the Apache Incubator and became one of the Apache Top-Level Projects. As one of the original co-founders of RocketMQ, I was proud to see an Open Source community from Apache RocketMQ that has an ever-growing diversity. The Apache Software Foundation (ASF), one of the most famous and great technology brands, has thousands of companies’ software infrastructure based on their projects. This is proven from the worldwide download mirror activity in ASF statistics. As an early implementer/pioneer of Open Source in China, Apache HTTP Server, Apache Tomcat, Apache Struts 1.x, and Apache Maven are my favorite software stacks when I worked for building distributed and high-performance websites.

Last year, I wrote an article about the road to the Apache TLP, which is published in China’s InfoQ. Some people asked me how to be more ‘Apache’ and how to build a more diverse community. These are the questions that many people are concerned about. In this blog post, I will address how to be more collaborative around the world, especially in non-native English countries.

Open Communication
With more and more instant messaging apps coming up in Android and IOS world, the younger generation prefers to communicate using such way, which has spread to the daily coding life for the majority of people. But, it is not search engine friendly and in most cases it does not support multi-channel for multi-language. I have been involved in many such local technology groups, together we have discussed what went wrong, explored ideas about how to solve it, and come up with a good solution together. This method worked for all my past projects, but when we hope to be more involved in Open Source around the world, that method does not work well. I remember clearly when RocketMQ began to discuss the process for its proposal, some people complained about what we have to do in the local community. We learned much about from this discussion in the community, and thus, found an effective solution. Hence in the Apache RocketMQ community, we encourage users to ask the question using the user email list. In order to make the communication process effective, we answer the question in the same language of the question. With more and more committers coming from different countries, this solution will help to grow the more diverse community. But, as John Ament said in another "Success at Apache" post https://s.apache.org/x9Be --open communication isn't for everything. We also allow private communication between the users and us as some questions might not be proper to discuss publicly. But that isn't a part of the decision making process. Likewise, anytime we're talking about individuals in either a positive or negative way should be conducted on the private list for a project.

Easy ways to be involved in the community
This is another top concern in the Open Source world. Some people may not know that in China there are many local communities about Apache Projects, such as Apache HTTP Server, Apache Tomcat, Apache Spark, and Apache Hadoop. Such Projects have corresponding Chinese documentations. On the other hand, we try our best to improve the English documents. We consider the messages behind every document page. If one finds a minor or big native narrative polish, one could leave a message, or send feedback to our dev or user email list. Besides documentation, we also hold programming marathons in the community irregularly to get more involved with the community. We could find more users who have more interest, especially cross-domain technology in such campaigns. Recently, we open sourced more tasks in the Google Summer of Code. Students will develop Open Source software full-time for three months. We will provide mentoring and project ideas, and in return have the chance to get new code developed and --most importantly-- to identify and bring in new committers. It is another chance to let PMC members know how to improve and let more students get involved in the community easily.

In China, Internet giants like Alibaba are devoting themselves into Open Source projects hence according to my personal experience, it made sense to help more excellent Chinese projects to come into the Incubator. Right before the Lunar New Year, another famous project from China, Dubbo, started its Apache journey. I am glad to be a local mentor and hope to continue to share what we have learned. Thanks to the ASF, more and more Open Source projects will benefit our daily coding. That is a great appeal around the world’s Open Source field.

Von Gosling is a senior technology manager working at Alibaba Group. He has extensive industry software development experience, especially in distributed tech., reliable Web architecture and performance tuning. He holds many patents in the distributed system, recommendation etc. he has been a frequent speaker at Open Source and architect conferences worldwide including ApacheCon and QCon. He has been the lead for messaging at Alibaba as well as the Tenth and Sixteenth CJK OSS Award recipient. He is the original Apache RocketMQ co-founder and Linux OpenMessaging Standard Initiator.

= = =

"Success at Apache" is a monthly blog series that focuses on the processes behind why the ASF "just works". 1) Project Independence https://s.apache.org/CE0V 2) All Carrot and No Stick https://s.apache.org/ykoG 3) Asynchronous Decision Making https://s.apache.org/PMvk 4) Rule of the Makers https://s.apache.org/yFgQ 5) JFDI --the unconditional love of contributors https://s.apache.org/4pjM 6) Meritocracy and Me https://s.apache.org/tQQh 7) Learning to Build a Stronger Community https://s.apache.org/x9Be 8) Meritocracy. https://s.apache.org/DiEo 9) Lowering Barriers to Open Innovation https://s.apache.org/dAlg 10) Scratch your own itch. https://s.apache.org/Apah 11) What a Long Strange (and Great) Trip It's Been https://s.apache.org/gVuN 12) A Newbie's Narrative https://s.apache.org/A72H 13) Contributing to Open Source even with a high-pressure job https://s.apache.org/lM9O 14) Open Innovation from a Non-native English Country https://s.apache.org/lh61

# # # 

Monday February 26, 2018

Success at Apache: Contributing to Open Source even with a high-pressure job

by Anthony Shaw

I believe in the mission of the ASF for many reasons, but the first is the reason why I got into open-source software- free and open access to knowledge.

Back when I was age 12 (1998), I started to learn to program in dBase 4. dBase 4 and the compiler Clipper were not cheap, especially for a $5-a-week paper-round. The box with the software was unwanted by a local company and it came with the manuals. We didn't have the internet at home yet so I was left to go by the manual, and what I could find from second-hand stores and office cleanout sales. For the next decade, I learnt to case based on what I could find, borrow and scavange until in 2002 when I got a copy of Linux and assembled a couple of machines from unwanted parts from the village computer store.

This is where I discovered free and open-source software and really started to build on my coding skills.

My goals were to learn and to share what I'd learnt that others could get to where they needed to go faster. It also helped that software skills were well sought-after in Europe so it set off me in a career in IT.

20 years after I learnt to code, I've moved out of software-engineering and into Learning and Development at Dimension Data for a 29,000 person technology company that operates in 49 countries across the world. My current roles involves about 3-months a year of travel (15 countries typically), managing a department of over 30 people spread across 4 countries and 4 timezones and delivering on large and complex initiatives with high-degrees of change and short deadlines.

In 2016 I made a choice after getting promoted into my current role that I would continue to contribute to the open-source projects I'd worked on for years. But I set myself 3 rules;

1. I would not take away from time with my family
2. I would not interfere with my work commitments
3. I would look after my health

My open-source contributions

For the past 4 years I've made around 1,000-2,000 contributions annually. These have consisted of bug fixes, submissions, and to around 50 projects.

The largest contributions I've made have been to Apache Libcloud, a multi-cloud abstraction library written in Python. Initially this was driven by a work commitment to contribute an integration with the cloud API we'd designed, but I soon realised the power of the library. Going back to my original goal of free and open access to knowledge, I'd seen an alarming trend in the computing world. Proprietary APIs were driving what is known in the industry as "stickiness" or to be frank, lock-in.

Cloud lock-in means that anyone without access to a reliable network, money or willing to sign up to these contracts is being pushed out of advances in technology. I know developers that are students, in remote areas such as rural Australia, Asia and Africa, or those who simply have little money.

Apache Libcloud's design means that you can design applications which can be deployed to OSS platforms like Apache CloudStack and OpenStack.

After finishing the work driver around 100 hours developing a container abstraction layer for Apache Libcloud that meant that developers could write automation for OSS platforms like Kubernetes using the same API as you would with a public cloud provider.

This was all whilst managing family time, work commitment and my health.

These are my 3 tips for maintaining contributions with a high-pressure job:

1. Pick a project that you care about

This is the most important, something that just sparks your curiosity is good fun, but long term interest often dwindles. I've been victim of "ooh shiny thing" many times in the past, but as my career has taken off, I've had to develop the discipline to stop myself from writing my own scripting language, or building an automated sprinkler system from scratch. I stop and remind myself that I might have the time this second, but what about next week and next month? Stop and prioritise.

Prioritise projects that mean something to you.

The 2 OSS projects I commit the most to are Apache Libcloud and SaltStack. I believe in Apache Libcloud's mission of giving open-access to cloud platforms. My SaltStack contributions have been focused around cloud abstraction, networking API abstraction and other fixes and utilities that make it easier for developers and end-users.

The difference between picking something shiny and something you believe in is that long-term you commit more and you find it easier to jump in and help when you can. But how do you find the time?

2. Choosing your tasks wisely and making time

I get asked this question all the time, "how do you find the time". When I try and convince people to contribute to OSS the response is always about time.

Get rid of the things that don't add value

If you can afford to, hire help to give you back time in your week. Not only does open-source help with your skills and knowledge, but it increases your value to a potential employer. Hiring someone to blow the leaves, or help with the chores once a week doesn't need to cost a lot, but if you work out how much value you can get back from that time it often makes sense.

Another thing I've been strict about is binge-watching TV series and gaming. Playing 100-hours of the latest game might be fun, but I find developing more rewarding in the medium-to-long term. Find ways to unwind that don't consume so much time, like meditation, exercise, or reading.

But, if you do need to put your feet up and watch some TV for a few hours, don't feel guilty about it. 

Work smart, not hard

When I do sit down to contribute something, it'll have been carefully planned and thought through what I'm going to do, what I'm going to test and how I'm going to structure it. I try and complete tasks quickly, with foresight and a goal. Once I've completed this 1 module, with tests, I'll submit my contribution. Don't try and refactor the whole project over a weekend. Keep it simple.

But we all know sometimes the best plans go out the window. If you find yourself going down one of those rabbit holes, where you can't get something to compile or you can't debug one of those zombie bugs we love so much as developers.

Stop yourself.

You can easily sit until 3am banging your head against the wall trying to figure it out. This was my advice when I used to manage development teams. If you get stuck, take a break, ask for help and if that still doesn't work, move onto something else. 

Sometimes I pause working on a task if I can't figure it out. Pause for an hour, a week, or even a whole year. When you have one of those "aha" moments, you go back in and finish the job.

It saves time, it delivers better software and it's a good skill to have as a developer.

Find time

A contribution comes down to 3 things:

1. An idea
2. An understanding
3. A "change", like a fix, feature, test, code-review, documentation etc.

The ideas come to me through reading, listening to users or looking at bug submissions. I do this as and when I have a spare minute. This is normally on my lunch break, when I'm waiting for someone or something. 

The time for understanding I get by listening to podcasts and talking to people at conferences. I get a few hours a week in the car and I spend time doing some chores. During that time I always have headphones on to listen the newest Python podcast or OSS update.

The time to sit down and write, code, or test comes for me on the plane (where I'm writing this blog post!). Last year I did enough miles in the air to fly around the world 8 times, most of that time was spent coding, relaxing or sleeping. Aside from that, time spent in airport lounges, on the train or waiting for people I'll whip out my laptop. Any plane that has Wi-Fi I can push changes, else the minute we land I'll have a laptop open and running git push.

Weekend-time is off limits unless I'm travelling or I'm alone. That's rule 1 -- do not take away from time with the family.

3. Managing your workload and avoiding burnout

There are 2 components to this, managing your work commitment and managing your contributions. You need to do both to succeed. 

It's ok to stop and take a break. There is always a pull-request to merge, a bug to inspect, and an email from an end-user. If you need to take a break for a while, talk to the team, ask for help and be frank. We're all in the same boat, contribution is optional. 

So many times I see people contribution feeling like they have a complete obligation to test and fix bugs at 2am 
and then go to work at 8am. This is normally because they care about the project, they care about quality and they care about their reputation but sometimes you need to step back.

A strong project community will step up and help. If you know that work is going to be tough for the next few months, tell the team and set yourself a limit. Wind back for a bit until things calm down. 

Managing work commitments is tough, because there are often financial consequences (or at least a perception of them).

After 7 hours, you're not really adding value. I used to have a lounge-chair next to my desk and now I have a hammock as I work from home. After a few hours of solid concentration I'll happily go and sit down and do nothing for an hour. Your brain needs a break, sure you'll get the odd "working hard" jab from a passer by but I'm working smarter not harder. Once I'm refreshed I'll finish the next task about 30-40% quicker, to a better level of quality and insight. On the occasion I've done 12-14 hour work days, my brain is shutting down to conserve energy and your critical thinking is the first thing to switch off. Followed by logical thinking, this is where you make mistakes and deliver work that is less than a quality you'd normally expect.

I live close to the beach so my time out is going for a swim in the ocean or spending a bit of time with my family. As a manager I also see a responsibility to make it clear that it's encouraged to step back and recharge. Just in our chat-channel to say that I'll be offline for a couple of hours as I'm going to the beach mid-afternoon. I don't feel guilty about it and I hope they do the same.

Learn how to say no and don't feel guilty about it. When I coach people on this I ask, "who asked you to do this? Was no an option? What value is there in delivering this? What is consequence of not doing it? Who else could do it?"

Everyone wants to be helpful and indispensible, but your reliability is just as important to your reputation and what you deliver. 


Look after your health, be smart with your time and contribute for a cause.

Anthony Shaw is the Group Director of Innovation and Talent Development at Dimension Data, an NTT company. Anthony is an open-source advocate, member of the Apache Software Foundation and Python Software Foundation and active contributor to over 20 open-source projects including Apache Libcloud and SaltStack. At Dimension Data, Anthony is driving digital transformation for Dimension Data’s global clients across 50 countries and 30,000 employees. Key initiatives are software skills, automation, DevOps and Cloud. Anthony is based in Sydney, Australia and blogs about skills, software and automation to 170,000 readers annually.

= = =

"Success at Apache" is a monthly blog series that focuses on the processes behind why the ASF "just works". 1) Project Independence https://s.apache.org/CE0V 2) All Carrot and No Stick https://s.apache.org/ykoG 3) Asynchronous Decision Making https://s.apache.org/PMvk 4) Rule of the Makers https://s.apache.org/yFgQ 5) JFDI --the unconditional love of contributors https://s.apache.org/4pjM 6) Meritocracy and Me https://s.apache.org/tQQh 7) Learning to Build a Stronger Community https://s.apache.org/x9Be 8) Meritocracy. https://s.apache.org/DiEo 9) Lowering Barriers to Open Innovation https://s.apache.org/dAlg 10) Scratch your own itch. https://s.apache.org/Apah 11) What a Long Strange (and Great) Trip It's Been https://s.apache.org/gVuN 12) A Newbie's Narrative https://s.apache.org/A72H 13) Contributing to Open Source even with a high-pressure job https://s.apache.org/lM9O

# # # 

Monday February 05, 2018

Success at Apache: A Newbie’s Narrative

by Kuhu Shukla

As I sit at my desk on a rather frosty morning with my coffee, looking up new JIRAs from the previous day in the Apache Tez project, I feel rather pleased. The latest community release vote is complete, the bug fixes that we so badly needed are in and the new release that we tested out internally on our many thousand strong cluster is looking good. Today I am looking at a new stack trace from a different Apache project process and it is hard to miss how much of the exceptional code I get to look at every day comes from people all around the globe. A contributor leaves a JIRA comment before he goes on to pick up his kid from soccer practice while someone else wakes up to find that her effort on a bug fix for the past two months has finally come to fruition through a binding +1.

Yahoo – which joined AOL, HuffPost, Tumblr, Engadget, and many more brands to form the Verizon subsidiary Oath last year – has been at the frontier of open source adoption and contribution since before I was in high school. So while I have no historical trajectories to share, I do have a story on how I found myself in an epic journey of migrating all of Yahoo jobs from Apache MapReduce to Apache Tez, a then new DAG based execution engine.

Oath grid infrastructure is through and through driven by Apache technologies be it storage through HDFS, resource management through YARN, job execution frameworks with Tez and user interface engines such as Hive, Hue, Pig, Sqoop, Spark, Storm. Our grid solution is specifically tailored to Oath's business-critical data pipeline needs using the polymorphic technologies hosted, developed and maintained by the Apache community.

On the third day of my job at Yahoo in 2015, I received a YouTube link on An Introduction to Apache Tez. I watched it carefully trying to keep up with all the questions I had and recognized a few names from my academic readings of Yarn ACM papers. I continued to ramp up on YARN and HDFS, the foundational Apache technologies Oath heavily contributes to even today. For the first few weeks I spent time picking out my favorite (necessary) mailing lists to subscribe to and getting started on setting up on a pseudo-distributed Hadoop cluster. I continued to find my footing with newbie contributions and being ever more careful with whitespaces in my patches. One thing was clear – Tez was the next big thing for us. By the time I could truly call myself a contributor in the Hadoop community nearly 80-90% of the Yahoo jobs were now running with Tez. But just like hiking up the Grand Canyon, the last 20% is where all the pain was. Being a part of the solution to this challenge was a happy prospect and thankfully contributing to Tez became a goal in my next quarter.

The next sprint planning meeting ended with me getting my first major Tez assignment – progress reporting. The progress reporting in Tez was non-existent – "Just needs an API fix,"  I thought. Like almost all bugs in this ecosystem, it was not easy. How do you define progress? How is it different for different kinds of outputs in a graph? The questions were many.

I, however, did not have to go far to get answers. The Tez community actively came to a newbie's rescue, finding answers and posing important questions. I started attending the bi-weekly Tez community sync up calls and asking existing contributors and committers for course correction. Suddenly the team was much bigger, the goals much more chiseled. This was new to anyone like me who came from the networking industry, where the most open part of the code are the RFCs and the implementation details are often hidden. These meetings served as a clean room for our coding ideas and experiments. Ideas were shared, to the extent of which data structure we should pick and what a future user of Tez would take from it. In between the usual status updates and extensive knowledge transfers were made. 

Oath uses Apache Pig and Apache Hive extensively and most of the urgent requirements and requests came from Pig and Hive developers and users. Each issue led to a community JIRA and as we started running Tez at Oath scale, new feature ideas and bugs around performance and resource utilization materialized. Every year most of the Hadoop team at Oath travels to the Hadoop Summit where we meet our cohorts from the Apache community and we stand for hours discussing the state of the art and what is next for the project. One such discussion set the course for the next year and a half for me.

We needed an innovative way to shuffle data. Frameworks like MapReduce and Tez have a shuffle phase in their processing life cycle wherein the data from upstream producers is made available to downstream consumers. Even though Apache Tez was designed with a feature set corresponding to optimization requirements in Pig and Hive, the Shuffle Handler Service was retrofitted from MapReduce at the time of the project's inception. With several thousands of jobs on our clusters leveraging these features in Tez, the Shuffle Handler Service became a clear performance bottleneck. So as we stood talking about our experience with Tez with our friends from the community, we decided to implement a new Shuffle Handler for Tez. All the conversation points were tracked now through an umbrella JIRA TEZ-3334 and the to-do list was long. I picked a few JIRAs and as I started reading through I realized, this is all new code I get to contribute to and review. There might be a better way to put this, but to be honest it was just a lot of fun! All the white boards were full, the team took walks post lunch and discussed how to go about defining the API. Countless hours were spent debugging hangs while fetching data and looking at stack traces and Wireshark captures from our test runs. Six months in and we had the feature on our sandbox clusters. There were moments ranging from sheer frustration to absolute exhilaration with high fives as we continued to address review comments and fixing big and small issues with this evolving feature.

As much as owning your code is valued everywhere in the software community, I would never go on to say “I did this!” In fact, “we did!” It is this strong sense of shared ownership and fluid team structure that makes the open source experience at Apache truly rewarding. This is just one example. A lot of the work that was done in Tez was leveraged by the Hive and Pig community and cross Apache product community interaction made the work ever more interesting and challenging. Triaging and fixing issues with the Tez rollout led us to hit a 100% migration score last year and we also rolled the Tez Shuffle Handler Service out to our research clusters. As of last year we have run around 100 million Tez DAGs with a total of 50 billion tasks over almost 38,000 nodes.

In 2018 as I move on to explore Hadoop 3.0 as our future release, I hope that if someone outside the Apache community is reading this, it will inspire and intrigue them to contribute to a project of their choice. As an astronomy aficionado, going from a newbie Apache contributor to a newbie Apache committer was very much like looking through my telescope - it has endless possibilities and challenges you to be your best.

Kuhu Shukla is a software engineer at Oath and did her Masters in Computer Science at North Carolina State University. She works on the Big Data Platforms team on Apache Tez, YARN and HDFS with a lot of talented Apache PMCs and Committers in Champaign, Illinois. A recent Apache Tez Committer herself she continues to contribute to YARN and HDFS and spoke at the 2017 Dataworks Hadoop Summit on "Tez Shuffle Handler : Shuffling At Scale With Apache Hadoop". Prior to that she worked on Juniper Networks' router and switch configuration APIs. She likes to participate in open source conferences and women in tech events. In her spare time she loves singing Indian classical and jazz, laughing, whale watching, hiking and peering through her Dobsonian telescope.

= = =

"Success at Apache" is a monthly blog series that focuses on the processes behind why the ASF "just works". 1) Project Independence https://s.apache.org/CE0V 2) All Carrot and No Stick https://s.apache.org/ykoG 3) Asynchronous Decision Making https://s.apache.org/PMvk 4) Rule of the Makers https://s.apache.org/yFgQ 5) JFDI --the unconditional love of contributors https://s.apache.org/4pjM 6) Meritocracy and Me https://s.apache.org/tQQh 7) Learning to Build a Stronger Community https://s.apache.org/x9Be 8) Meritocracy. https://s.apache.org/DiEo 9) Lowering Barriers to Open Innovation https://s.apache.org/dAlg 10) Scratch your own itch. https://s.apache.org/Apah 11) What a Long Strange (and Great) Trip It's Been https://s.apache.org/gVuN 12) A Newbie's Narrative https://s.apache.org/A72H

# # #  

Tuesday December 12, 2017

Success at Apache: What a Long Strange (and Great) Trip It's Been

By Jim Jagielski

It is normally during this time of year that people get awful retrospective. We look over the last 12 months and come to terms with what kind of year it has been. We congratulate ourselves on the good and (hopefully) learn from the bad. We basically assess the ending year and start planning, even a little bit, on the one to come.

In general, we reminisce.

I am thinking not about 2017, however, but instead of 1995 and the origins of The Apache Software Foundation. And what a long, strange, and great trip it's been. And how incredibly lucky I've been to be a part of it.

A common saying is that success is mostly about being there at the right place at the right time, and although I'm not sure about the "success" part, it certainly applies to me. At the time I was working at NASA and was starting off a side business as an ISP and Web Hoster, and using the old NCSA web-server. I had created a small reputation for myself as an "expert" on a flavor of UNIX called A/UX, which was Apple's UNIX offering at the time. In addition to being the editor of the FAQ for A/UX, I also ported a bunch of "free software" to that platform and that's how I got started with Apache, providing patches to support A/UX, which is what I used as my web hosting platform. It was really no different than what I did for other software projects at the time.

And then something wonderful happened. I got hooked.

I really, really enjoyed the people I was collaborating with. I wasn't an "outsider" providing patches, I was part of the inner circle. I was a full fledged member of the Apache Group. I started to really understand just how all this really could change the world, and how I could maybe be a small part of it.

As a result, Apache changed my life, literally. Instead of doing software development as a way of "getting my job done" (at NASA, I was a power system engineer, and so I would code modeling and simulation software for spacecraft solar arrays, batteries and orbital mechanics), I starting doing software development as my job, in addition to my hobby. Apache and Open Source became a huge part of my life, and my career changed to focus on Open Source almost primarily, a change that continues to this day.

During this time I've been fortunate enough to work with, and learn from, extremely talented people. Not only related to code, but legal matters, inter-personal skills, presentation skills, etc. I've had opportunities that I never imagined and met people I never would have had expected otherwise. I'm made great friends. I've been mentored by incredibly giving people and have mentored in return. And have seen my mentees become mentors themselves.

Over the years, I've seen Apache grow from a rag-tagged group of people working on a web server to one of the leading Open Source foundations in the world with more than 300 projects under our belt. I've been blessed to serve on the board of the ASF for every single year since we incorporated in 1999, seeing 2nd and now 3rd "generation" Apache Members take on the reins.

The Open Source movement, and especially Apache, have given more to me than I could ever pay back, and that is why I still volunteer and contribute. Of course, to be honest, I still get a kick out of it, and love what I am doing, and continue to enjoy the opportunities and, especially, the people that I get to work with.

But, you see, I'm nothing special. All this is also open and available to you. You too can change the world, and have your world changed in return. We all have talents that can be shared, talents that can be recognized and rewarded. Apache is a family, always looking for new family members. 

So take that first step. Find a project and community you want to a part of. Jump in. Have fun. Grow. Learn. Teach. Live.

But just be prepared to get hooked, and have your life change.

Jim Jagielski is a well known and acknowledged expert and visionary in Open Source, an accomplished coder, and frequent engaging presenter on all things Open, Web and Cloud related. As a developer, he’s made substantial code contributions to just about every core technology behind the Internet and Web and in 2012 was awarded the O’Reilly Open Source Award and in 2015 received the Innovation Luminary Award from the EU. He is likely best known as one of the developers and co-founders of the Apache Software Foundation, where he has previously served as both Chairman and President and where he’s been on the Board Of Directors since day one. Currently he is Vice-Chairman. He's served as President of the Outercurve Foundation and was also a director of the Open Source Initiative (OSI). Up until recently, he worked at Capital One as a Sr. Director in the Tech Fellows program. He credits his wife Eileen in keeping him sane. 

= = =

"Success at Apache" is a monthly blog series that focuses on the processes behind why the ASF "just works". 1) Project Independence https://s.apache.org/CE0V 2) All Carrot and No Stick https://s.apache.org/ykoG 3) Asynchronous Decision Making https://s.apache.org/PMvk 4) Rule of the Makers https://s.apache.org/yFgQ 5) JFDI --the unconditional love of contributors https://s.apache.org/4pjM 6) Meritocracy and Me https://s.apache.org/tQQh 7) Learning to Build a Stronger Community https://s.apache.org/x9Be 8) Meritocracy. https://s.apache.org/DiEo 9) Lowering Barriers to Open Innovation https://s.apache.org/dAlg 10) Scratch your own itch. https://s.apache.org/Apah 11) What a Long Strange (and Great) Trip It's Been https://s.apache.org/gVuN

# # # 

Wednesday October 25, 2017

Success at Apache: Scratch Your Own Itch.

By Ignasi Barrera

Recently I was at an industry conference and was happy to see many people stopping by the Apache booth. I was pleased that they were familiar with the Apache brand, yet puzzled to learn that so many were unfamiliar with The Apache Software Foundation (ASF).

It's important to recognize not just Apache's diverse projects and communities, but also the entity behind their success.

Gone are the days when software, and technology in general, was developed privately for the benefit of the few. As technology evolves, the challenges we face become more complex, and the only way to effectively move forward to create the technology of the future is to collaborate and work together. Open Source is a perfect framework for that, and organizations like the ASF carry out a decisive role in protecting its spirit and principles.

The ASF's mission is to provide software for the public good. We take it one step further, by giving all our Open Source software away for free. According to this mission, the foundation was established back in 1999 as a US 501(c)(3) non-profit charitable organization, and constitutes an independent legal entity to which companies and individuals can donate resources and be assured that those resources will be used for the public benefit. Its all-volunteer nature, along with the meritocracy model followed by its communities, are the pillars of the neutral, trusted space where Apache software is developed.

We strongly believe that good software is built by strong communities. Successful Open Source projects are the result of the work and collaboration in their communities and the people behind them. It is all about the people. Experience has shown us that helping people work together as peers is key in producing software in a sustainable way, and we have collected the lessons learned all these years in what we call "The Apache Way".

This Apache Way is a set of core behaviors all Apache projects follow that are designed to ensure projects are independent and diverse, and that anyone can participate no matter what gender, culture, time zone, employer, or even expertise they have. One can start collaborating with a project by contributing patches or implementing new features, but merit is not only measured by code contributions. Helping users, improving documentation, promoting the project, and other non-coding activities are very valuable and recognized as such, and the recognition of this merit and implication is expressed by granting more privileges in the project: from commit access, to invitations to join the Project Management Committee, to invitations to join the ASF Membership. One of the great differentiators between the ASF and other open source foundations is that the ASF does not dictate the technical direction of its projects: each Apache project is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides their respective project's day-to-day operations, including community development and product releases. Meritocracy drives the growth of the communities, and ensures anyone can contribute to projects that are ruled by the people who is involved and really cares about them.

Learning to work this way is not always easy, though. Projects come to the Foundation from very different backgrounds and whilst some of them already have communities that are used to collaborate in open ways, others find it challenging to embrace these core behaviors. The Apache Incubator is the main entry point for codebases and their communities wishing to officially become part of the Foundation, and is where they learn how to put all these principles in practice. Some will find this way of working a good way to rule a project and will graduate as an Apache top-level project, some may find that the Foundation is not the best option for them and choose to leave. Both options are good outcomes, as projects will have invested time in thinking about their community model and how they want governance to be, and this always benefits the Open Source world.

This Open Source model not only exists to create sustainable Open Source projects, but also to meet the expectations of the rest of the world. Software developed at Apache comes with a set of guarantees granted by the popular and business-friendly Apache License, but also with others that are the product of this open governance model, such as project independence or a well-defined project lifecycle. The ASF not only defines how projects operate while active, but also what happens when a project reaches its end-of-life, which is also important for adoption but often not considered by Open Source projects.

These guarantees, along with the reputation earned by many years of producing high-quality open source software, make the +300 freely available Apache projects, from Abdera to HTTP Server to Hadoop to Zookeeper, a trusted choice for individuals and companies looking for Open Source solutions.

The saying "Scratch Your Own Itch" is popular in the tech space, and is an integral principle at the ASF. Apache Committers have a responsibility to the community to help create a product that will outlive the interest of any particular volunteer, as well as for helping to grow and maintain the health of the Apache community.

As an ASF Member, I'm helping with project outreach and mentoring new individuals that make up the greater Apache community.

The Apache Software Foundation provides a safe place for Open Source development, and will keep evolving as technology evolves, welcoming all kinds of projects and communities, and helping people embrace Open Source. Let's see what the future holds for the Open Source world and how we can contribute to making it a better place. Scratch your own itch.

Ignasi Barrera is a long-term Open Source contributor and became involved with the ASF in 2013, when jclouds was first submitted to the Apache Incubator. He is a member of the Apache jclouds Project Management Committee and still actively contributes to the project. Ignasi became an ASF Member in 2015, and helps with community development activities and the promotion of Open Source. 

= = =

"Success at Apache" is a monthly blog series that focuses on the processes behind why the ASF "just works". 1) Project Independence https://s.apache.org/CE0V 2) All Carrot and No Stick https://s.apache.org/ykoG 3) Asynchronous Decision Making https://s.apache.org/PMvk 4) Rule of the Makers https://s.apache.org/yFgQ 5) JFDI --the unconditional love of contributors https://s.apache.org/4pjM 6) Meritocracy and Me https://s.apache.org/tQQh 7) Learning to Build a Stronger Community https://s.apache.org/x9Be 8) Meritocracy. https://s.apache.org/DiEo 9) Lowering Barriers to Open Innovation https://s.apache.org/dAlg 10) Scratch your own itch. https://s.apache.org/Apah

# # #

Monday October 02, 2017

Success at Apache: All My Roads Led to Apache

by Pat Ferrel

I became involved with Apache in 2011. After several years in startups where, as CTO, I felt too removed from building things. Looking for a change, I was keenly aware that the most interesting thing about the startups was our early use of Machine Learning techniques and I wanted to see if building ML solutions, for companies new to the field might not be more satisfying. I started by spending nearly a year in researching the type of applications we had needed in the startups: Natural Language Processing (NLP), text analysis, clustering, and classification. In those days Apache Mahout http://mahout.apache.org/ had several good solutions that were designed for Big Data and approachable by an individual. These ideas seem fairly commonplace now but were in early days only 6 years ago.

Given a great platform to experiment with, I built a web site to advertise expertise in ML but also to showcase many examples from my experiments, including a topic-oriented content site based on clustered and classified text that used NLP to add entities to text. I blogged about things I had learned and techniques that produce results.

Then I got the first contact about a project and it was from a completely unexpected direction: recommenders. Fortunately Apache Mahout then had the state-of-the-art OSS suite of recommenders so I took the consulting job. The company had rolled their own recommender and was selling it as a service but it was old and they wanted to investigate replacing it. 

Welcome to Big Data

The nature of recommenders means you deal with huge amounts of data because you have to track several million people’s actions over years. We had data from a large online retailer and were tasked with using this data to beat the in-house recommender. Specifically they wanted to see if they could improve performance (better results and faster compute times) and get something easier to maintain. 

The first job of a good consultant is to define the problem and outline a path to resolution that fits with the company’s competencies. To me this meant looking at the current system and the expertise of the people working on it. We had Data Scientists and Java Software Developers who knew what it was like to deal with Big Data. They had a highly performant method for gathering data and were quite good at running Apache Hadoop-based analytics. This was seldom the case back then but happily allowed me to look at less turnkey applications and assume the use of important Apache tools.

We agreed on a plan and the basic building blocks including a method for comparing results. I did the research and proposed several candidates for the tests including the Apache Mahout recommenders. It was pretty easy to rank the recommender engines we had and do some exploration of parameter tuning and choices to get our best "challenger" results. The nice thing is that we beat the old threadbare in-house recommender by a significant amount (12%). The winner was the Apache Mahout Cooccurrence Recommender using the Log-Likelihood Ratio as the core cooccurrence metric. This even though we had tested against several Matrix Factorization recommenders, including Mahout's. 

We need something new 

Up till this time I was only a user of Apache projects (discounting a few minor code contributions) but what I found in all recommenders we studied is a fundamental problem that is still mostly unsolved today. We had data from a retailer that included user "buys" but also 100 times more user "views". None of the recommenders could deal with this multimodal data. I consulted the authors and maintainers of the Mahout recommenders and several others we had targeted. We got some suggestions added them to our own ideas and set out to test them. For various reasons, that are beyond the scope of this post, none of the easy solutions helped and actually produced worse results so I had fulfilled the contract and left with a feeling of unfinished business.

One of the mentors of Apache Mahout, Ted Dunning, had suggested a new idea during this time. There was something about it that seemed very intriguing. He had proposed a way to use one type of user behavior to predict another. This was an aha moment for me because it codified intuition. I remember the first time he wrote in email on the Mahout user mailing list the equation that crystallized it all. I began to imagine the implications; all sorts of new data that could be useful, not just "views" but contextual data like location, and enrichment data like tag or category preferences. These all seem to obviously have a bearing on recommendations but now we had a beautiful simple equation to test the intuition.

Becoming a Committer

I set out to hack the Mahout Cooccurrence Recommender to become a Correlated Cross-Occurrence (CCO) recommender. But without some way of testing the algorithm and code we couldn’t be sure it was worth including in Mahout. The datasets publicly available at the time did not have the kind of data we needed (there had been no direct use for it until then) so I scraped the film review site rottentomatoes.com to collect "fresh" and "rotten" reviews of movies. This gave us two different behaviors with very different meanings. Naively you might think, weight one positive and the other negative and so did I but that produced worse results than ignoring the "dislikes". However when I ran cross-validation tests comparing the Mahout Cooccurrence Recommender using likes only, to CCO using both user actions, we got some quite interesting results. The question was: do "dislikes" predict "likes" and when I got 20% lift in predictive precision we could conclude that they do. Not only was intuition right but the new algorithm could tease out the data to make use of it.

The hack was accepted into Mahout Examples and I was invited to become a committer. Then the world changed.

Apache Spark and Mahout-Samsara

When I became a committer Mahout was written on Apache Hadoop MapReduce in Java (as was my hack). But it had also become obvious to most Mahout committers that the future was with much more performant engines like Apache Spark. Committers Dmitriy Lyubimov and Sebastian Schelter had been working on a Spark version of Mahout. In an instant of project time virtually all committers saw this as the future of Mahout, if also a major pivot. 

In retrospect I'm not sure I've ever seen an Apache project change so much in so little time. Today Mahout is deprecating lots of old Hadoop MapReduce code as it falls from use and the new Mahout is truly new. The Mahout subtitle Samsara, references the cycle of life, death, and rebirth in the Hindu tradition. Mahout started as algorithms written specifically for MapReduce, now Mahout-Samsara is a linear algebra DSL in Scala used to roll-your-own algorithms but with most interesting algorithms in very simple DSL-based implementations. Mahout eventually took this transformation even further to include other compute engines like Apache Flink and is now running on GPUs. But I get ahead of things...

Those were exciting times and though I helped with the DSL I remained fixed on implementing CCO, which was first included in Mahout 0.10.0 in October 2014.


Now we have the CCO algorithm implemented on modern compute engines but several other problems remained in order to actually deploy a recommender. This is because CCO creates a model that needs to be deployed on a special type of server that computes similarity in real time. In Machine Learning terms this is a K-Nearest Neighbors engine, known in concrete terms as Lucene, or it's scalable server derivatives like Solr and Elasticsearch. A turnkey recommender also requires a highly performant massively scalable DB, like HBase. Putting these together we could get a nearly turnkey recommendation server that made use of multimodal real time user behavior. But I didn't see a candidate for all these in Apache and so looked elsewhere. This required an integration project, not Mahout, which integrated with other services but provided none of its own.

I found a project that included everything I needed and was Apache licensed but was run by a small startup called PredictionIO. They had a Machine Learning Server that was a framework for Templates that could implement a wide range of Algorithms. The Server also included nice high-level integrations with Elasticsearch (Lucene server), Spark, and HBase. In May of 2015 I had the first running CCO Server build on Mahout and a whole list of other Apache projects.

Back to Apache

PredictionIO was at the right place to get swept up in a major move to embrace ML/AI by Salesforce Inc. who bought them as part of the Einstein initiative. Since PIO was Apache licensed OSS it was still available and so was the Template I was calling the Universal Recommender. But there was a question now about the future of PIO; what would Salesforce do with it? The old team, that I had worked closely with, wanted to see the project move forward in OSS and Salesforce seemed to agree, but large corporations often have a mixed record in promoting their own OSS projects. In this case Salesforce decided to remove the question by submitting PredictionIO to the Apache Incubator.

The old team was joined by people like me from outside Salesforce to create a project that follows the Apache Way and is free of corporate dominance. I am a committer to PredictionIO, which has three releases under Apache Incubator vigilance and the Universal Recommender is now at v0.6.0, the most popular of PredictionIO Template Algorithms.

With the 3rd release of PIO from Apache we are now in the process of graduation to an Apache Top-Level Project, hatched by the Apache Incubator. I fully expect that we'll be celebrating soon.


My journey began with a specific problem to solve. Each step to produce the solution has led back to Apache in one way or another, through mentors, collaboration, use of, and commitment to several projects. But I now have my mature scalable, performant, state-of-the-art nearly turnkey Universal Recommender.  Now we can ingest and get improvements from many types of behavior, enrichment data, and context--using it in real time to serve recommendations subject to robust business rules. My small consulting company ActionML actionml.com now has a powerful tool to solve real problems and we make a living (at least partly) by helping people deploy and tune it for their data.

This is a story of someone single mindedly following a goal over several years. There are many ways to do this in the Software Development world, but not all OSS projects are open to bringing people in. The Apache Software Foundation most certainly is and openly recruits as diverse a group of committers and members as possible. If you want to make a difference and influence the course of an OSS project Apache is a good place to look. Start by getting involved with a project of interest, make contributions, get involved in discussions. If the match is good you'll be invited in as a committer and move on from there. I think of Apache as a do-ocracy, if you do something of value it goes a long way towards being invited in.  


Slides describing the CCO Algorithm: https://www.slideshare.net/pferrel/unified-recommender-39986309

IBM DevWorks Post on "Making one thing Predict Another": https://developer.ibm.com/dwblog/2017/mahout-spark-correlated-cross-occurences/

Apache Mahout CCO Implementation: http://mahout.apache.org/users/algorithms/intro-cooccurrence-spark.html

Apache PredictionIO: http://predictionio.incubator.apache.org/

The Universal Recommender Template: http://predictionio.incubator.apache.org/gallery/template-gallery/

Professional Support for the Universal Recommender: http://actionml.com/universal-recommender

# # #

"Success at Apache" focuses on the processes behind why the ASF "just works". 1) Project Independence https://s.apache.org/CE0V 2) All Carrot and No Stick https://s.apache.org/ykoG 3) Asynchronous Decision Making https://s.apache.org/PMvk 4) Rule of the Makers https://s.apache.org/yFgQ 5) JFDI --the unconditional love of contributors https://s.apache.org/4pjM 6) Meritocracy and Me https://s.apache.org/tQQh 7) Learning to Build a Stronger Community https://s.apache.org/x9Be 8) Meritocracy. https://s.apache.org/DiEo 9) Lowering Barriers to Open Innovation https://s.apache.org/dAlg

Tuesday September 05, 2017

Success at Apache: Lowering Barriers to Open Innovation

By Luke Han

Over the past decade, I was a Java developer using many Apache projects such as Tomcat, Jakarta, Struts, and Velocity. In 2010 I stepped into the Big Data field and started to actively participate in Apache projects, and became an ASF Member 3 years ago. In addition to being the VP of Apache Kylin, I helped projects such as Apache Eagle and CarbonData move to the ASF, and have been a mentor for Apache Superset, Weex, and RocketMQ. Today, I'm co-founder/CEO of Kyligence (prior to that, I was Big Data Product Lead of eBay, and Chief Consultant of Actuate China).

Apache Kylin, as its name may suggest, originated from China ("Kylin": A powerful yet gentle fire-breathing creature in eastern mythology. Also written as Qilin. "Apache Kylin": OLAP on Hadoop, capable of analyzing petabytes of data within seconds http://kylin.apache.org/ ). I started this project with a few members in early 2015. 

As a pioneer of the first highly-recognized Apache project from the Eastern world, I was proud to see that, within 2 years, Kylin has helped over 500 organizations across the globe to solve their Big Data challenges. 

Before Kylin graduated from the Apache Incubator, the Kylin team faced a lot of cultural challenges. Since a great number of projects from China had failed in the past, we too received many questions and doubts from both eastern and western worlds. As our native language is not English, communication with mentors did become difficult during the coaching process. Fortunately, by fully embracing The Apache Way, Kylin is able to succeed with strong support from the Apache community members. Much more beyond the Kylin software, our team has also worked with those talented people in a way to spread our Chinese voice to the world. 

While developing high-quality software, we are engaging more Westerners to understand the Eastern culture. I had many chances to travel and meet people across the globe since I initiated Kylin. Some of them are Apache directors and mentors, some of them are developers and contributors. Some are from US, Australia, Canada and Chile; some are from Japan and Taiwan. Some are impressed with Kylin, some are curious about Easterners’ attitude toward Open Source software. I asked them a lot of questions about The Apache Way, and they all generously coached me and my team with lovely and detailed answers. We too could reach consensuses after intensive and open arguments. Kylin received much more encouragement and recognition than I expected.

As a VP of a Top-Level Project, my responsibility grew after Kylin graduated from the Apache Incubator. Kylin faced more opportunities as it has been bug-fixed quickly and tested frequently, with the nature of an Open Source software. In the China’s well-knowingly-big market, Apache Kylin has received many users’ feedback and evolved fast. We received many suggestions from both developers’ perspective and products’ perspective. Beyond my expectation, many community members are passionately writing tools for Kylin and helping users better understand and use Kylin. Assembling members’ ideas, we are also sharing our knowledge as a way to give back to the community. 

Thanks to ASF and everyone involved in the Open Source community, I have the opportunity to work with people that I’ve always admired and make a difference in the world all together. I feel I and my team are deeply connected with such warm, global, open community.

= = =

"Success at Apache" is a monthly blog series that focuses on the processes behind why the ASF "just works". 1) Project Independence https://s.apache.org/CE0V 2) All Carrot and No Stick https://s.apache.org/ykoG 3) Asynchronous Decision Making https://s.apache.org/PMvk 4) Rule of the Makers https://s.apache.org/yFgQ 5) JFDI --the unconditional love of contributors https://s.apache.org/4pjM 6) Meritocracy and Me https://s.apache.org/tQQh 7) Learning to Build a Stronger Community https://s.apache.org/x9Be 8) Meritocracy. https://s.apache.org/DiEo

# # # 

Tuesday August 15, 2017

Success at Apache: Meritocracy.

By Kevin A. McGrail

The Apache Software Foundation is not a democracy.

It's an elitist organization that does not support an innate right to vote. We aren't capitalists because you can't buy a seat on our board. We aren't socialists since we place building working communities over software. Monarchy doesn't fit because Kings and Pawns work together as equals.

What we are is a Meritocracy. To be able to have a say, you have to prove your worth in a system of merit. Meritocracy is a key part of The Apache Way. With it, the ASF creates amazing software with amazing people that continues to change the way the world computes.

Merit has no basis on Age, Sex, Religion, Ethnicity, Race, Country of Origin, Sexual Preference, Social Status, Income Level, Lineage, and/or Physical/Cultural Traits*.

In honor of The Apache Way, the ASF has created two wristbands to share with the tech community. The first is silver and announces our Meritocracy. 

The second, because merit is NOT rooted in biological differences, is brash and bold in Red announcing "do I.T. like a girl".  The idea comes from Code Like A Girl wristbands coupled with a small bit of double entendre to start a conversation about improving inclusion.

If you'd like some wristbands, they'll be debossed in a single color like the pictures below. Just send me an email at kmcgrail(at)apache(dot)org. I'll try and send out as many as I can for free. If you like/hate the idea, feel free to send me an email as well and tell me what you would do differently.

* NOTE: We do take into serious account whether you are a Cat or a Dog person.

Kevin A. McGrail is a cybersecurity expert and Open Source advocate who loves stopping spammers. He got involved with the ASF when the Apache SpamAssassin project joined the foundation in 2004. Today he still helps the SpamAssassin project while also serving as an executive officer and VP of Fundraising.

= = =

"Success at Apache" is a new monthly blog series that focuses on the processes behind why the ASF "just works". 1) Project Independence https://s.apache.org/CE0V 2) All Carrot and No Stick https://s.apache.org/ykoG 3) Asynchronous Decision Making https://s.apache.org/PMvk 4) Rule of the Makers https://s.apache.org/yFgQ 5) JFDI --the unconditional love of contributors https://s.apache.org/4pjM 6) Meritocracy and Me https://s.apache.org/tQQh 7) Learning to Build a Stronger Community https://s.apache.org/x9Be

Monday June 05, 2017

Success at Apache: Learning to Build a Stronger Community

by John Ament

As the next line in the series of "Success at Apache", I had to think about what kind of blog post I wanted to write.  Given my personal focus, it made sense to focus on new projects coming in and the incubator.  When I'm not busy dreaming up new ideas and working on personal projects, I'm helping new projects get in to Apache, keeping their goals in alignment with the Apache Way http://apache.org/foundation/governance/ . I'm a member of a few different PMCs here at Apache, notably the Incubator. I'm a mentor to five different podlings right now. While my primary programming focus is on programming models, my podlings are all over the place. Starting a new project here at Apache can be a daunting task: how do I get in? What if I don't build a diverse community?  Becoming a podling has more to do with the community than it does the technical aspects of the project. We don't expect you to be experts in it, but we do expect new projects to be experts in how their own software works. We want to teach you, and we want you to be receptive to learning about The Apache Software Foundation and its best practices.

I'm not sure if everyone does it, but I build a lot of parallels between how an ASF project works and how an Agile team works. Agile teams start off as a bunch of people who don't really know each other but have assembled themselves into an informal team focused on solving a problem, or some number of problems, knowing that they can only do it together. They have common goals and objectives, but lack camaraderie early on to be able to work together smoothly. Over time, they get to know one another, figure out strengths and weaknesses and can resolve issues together. A well-functioning team isn't one at the beginning. It takes time and practice for them to work well - both together and as an outwardly facing unit.

Projects here at Apache follow the same type of maturity progression. Whether it's learning The Apache Way or learning to work with one another, it takes them time to mature and get into a good groove. 

Open Communication
The ASF is pretty big on open communication, wherever it's a sensible solution. We want to discuss with each other what we're doing, ideas around how to solve it and come up with a good solution together, as a team, in an open manner.  

This all ties into agile practices. We host stand ups to talk about what we're doing and see if others have an opinion about what we're doing.

When a project comes to Apache, the original authors need to remember that they're bringing in a lot of experience, and the expectation is that those existing contributors must help get new contributors from the outside - outside their organization specifically - to contribute into the project. By driving towards open communication, outside of your own organization, you're encouraging more people to participate. This sort of governance model ensures that all parties who can participate are aware of decisions being made.

Open Communication isn't for everything though. We need to remember to be respectful in our communications with others and if it's felt that something’s awry - speak privately. But remember that isn't part of the decision making process. Likewise, anytime we're talking about individuals in either a positive or negative way that should be conducted on the private list for a project.

Turning Into a Well Oiled Machine
Once a project begins to grow, new people start to get attracted to it. As a community, you have to figure out how to work together. Building a community of diverse ideas and skills will ensure that new ideas keep flowing. Contributors can react quickly to a user's question on list and help them resolve the problem, put in an enhancement request or get a bug report squashed in a following commit. Time is of the essence right now because I have availability to work on this.

There can't be a long drawn out waterfall style process when dealing with Open Source. At the same time, making sure there's a documented decision process and in sometimes an in depth design is critical for both new contributors and existing alike to come to a shared understanding of what is being proposed.

Projects need to plan for longevity. Longevity comes in many forms. A strong backlog of features is important. Having a diverse set of committers is even more critical. You could even say that each helps create the other. Just like any feature set, we get to a point where the feature is complete enough that we can move on to another feature.  

How do you get there?
Apache's main way to go to these points is to incubate http://incubator.apache.org/ . You can't get to this point by yourselves, experiencing with first-hand from existing Foundation members will help get your community to turn a new leaf and adopt this way of working. We want you to be successful, as long as your project can dedicate itself to the practices that have been set forth within the Foundation.

New projects may be comfortable with a champion http://incubator.apache.org/incubation/Roles_and_Responsibilities.html#Champion that can work with them closely, answering their questions up front. While a lot of the pre-incubation chatter will happen off list, it is important that potential new podlings subscribe to the incubator general list http://incubator.apache.org/guides/lists.html#general+at+incubator.apache.org and understand both the goings on of a podling as well as try to build their list of mentors http://incubator.apache.org/incubation/Roles_and_Responsibilities.html#Mentor in the open. Mentors are extremely important to a podling, and understanding their roles and why you need to pick great mentors is something your champion and the rest of the Incubator community can help explain. Participating in our public discussion lists is sometimes the first step to joining the foundation at a deeper level.

Where do we go next?
If you're a potential new project, feel free to reach out on the Incubator mailing lists http://incubator.apache.org/guides/lists.html#general+at+incubator.apache.org to get started. We'd love to hear from you and get you acquainted with The Apache Software Foundation.

If you're on an existing project, we want to hear your perspectives on how the Foundation works. You may want to reach out to dev@community http://community.apache.org/lists.html to let others know your thoughts, or even just subscribe and see what others have to say. We're all working together to make the foundation better. The more input we receive, both the positive and the negative, will help shape everyone's actions in the community.

= = =

"Success at Apache" is a new monthly blog series that focuses on the processes behind why the ASF "just works". 1) Project Independence https://s.apache.org/CE0V 2) All Carrot and No Stick https://s.apache.org/ykoG 3) Asynchronous Decision Making https://s.apache.org/PMvk 4) Rule of the Makers https://s.apache.org/yFgQ 5) JFDI --the unconditional love of contributors https://s.apache.org/4pjM 6) Meritocracy and Me https://s.apache.org/tQQh

Monday May 01, 2017

Success at Apache: Meritocracy and Me.

by Tom Barber

When Sally asked for volunteers to help with a blog post series "Success at Apache" I realised there was a very human story to tell about how the ASF helped me get to where I am today and hopefully where I'll go tomorrow. Over the years I have worked on and run a number of Open Source projects whilst working with an awful lot of Open Source software. One day I was browsing Slashdot as you do, yeah I know a lot of people disparage it, but it's an awfully hard habit to kick, and without it I wouldn't have got involved in the ASF so I owe it a lot. Anyway, one day when browsing Slashdot I saw this article (https://it.slashdot.org/story/11/01/08/1544204/apache-to-steward-nasa-built-middleware), I had been working in the Open Source business intelligence industry for a few years at that point and I spent a lot of time hacking around and managing data systems, so I wondered how I could get some help out of OODT (http://oodt.apache.org/). Also as a kid I had always loved everything about space, I was a huge Apollo fan, had a small telescope, went to the total eclipse in the UK in 1999 and so on. I thought this OODT project would be a fun way for me to chat nonsense to a few NASA employees, find out how they did stuff and do a bit of Open Source hacking on the side, which would at least let me participate in some NASA related development work, and so it began.

For those of you who haven't heard of Apache OODT it is a middleware layer for building data systems. Originally written by NASA JPL and then Open Sourced to The Apache Software Foundation it provides data ingest capabilities, metadata extraction, data workflows and resource management. I started by asking pretty dumb questions on the IRC channel, posting stuff on the mailing lists and trying to figure out how this reasonably expansive stack of software even operated. Chris Mattmann and Sean Kelly guided me through the opening foray into OODT development and education. Eventually, having submitted a few bug fixes, I volunteered to be a release manager for an OODT release and that got me more involved. Not too long after Sean asked if I fancied having a go at being the project chair, which I duly accepted. Behind the scenes cogs were turning and in a matter of a few weeks, I'd gone from a committer and PMC member to Chair to ASF member, it was certainly a hectic time, trying to keep up with all the new things I had to do, mailing lists to follow and so on, but what a period in my ASF experience, lots of fun!

That was just over 2 years ago and I'm still happily stewarding the OODT folks and keeping the cogs turning, releases happening and Jira tickets triaged. Alongside that it is truly an honour to be involved with the ASF as a member and although the politics can get tedious, the foundation is an amazing place for people to learn to work on great software as part of a distributed team. 18 months ago I was getting a little jaded with the monotony of the BI work I was doing, there are only so many sales databases and budget reports one guy can take and after being in BI 8 or so years I felt like it was time for a change of scenery, I just didn't know what. So I blasted out an email to 10 or so people I knew or had had some contact with over the years who might be able to give me a job, or know someone who was looking for a Java developer, BI guy, Open Source advocate, that type of thing. I'd included some OODT folks on my email, not because I thought there was a chance of a job using it, but just in case they happened to know someone out in California needing some remote help. Everyone said no, except Chris Mattmann who said if I could hold on a few months he might have something for me at NASA! That response floored me, I'd never even considered that as an option and knew that with a young family it would be highly unlikely I'd be able to move to California, so I played along assuming it would fall through. But the as the process dragged on and contracts got drawn up we got closer and closer to it becoming real and the excitement grew, there was the tangible possibility of me fulfilling at least a bit of a life long dream, no I wouldn't be an astronaut, but there was the chance of employment by NASA.

Eventually 6 months or so later, the paperwork was signed and I joined the ranks at NASA JPL, working as an Apache OODT and devops guy. What is great is that having 10 years of business and development experience, I feel like I can very much make a positive contribution to the team, and in part that is down to what I have learnt developing and coding at the ASF. It has been an amazing experience  and a wonderful 12 months. I never dreamt an opportunity like that would arise and it is 100% down to the great work the ASF does in stewarding new projects through the incubation process and into mainstream adoption. Without the ASF I would likely still be a BI guy dealing with run of the mill data warehouses, instead I work on Genomics Search Engines, help hunt criminals on the dark web and a host of other stuff. Life sometimes throws you an opportunity that you don't expect and the ASF certainly facilitated that.

Last week I was in Pasadena, visiting the JPL facility and getting the guided tour, and doing a bit of work. It was amazing talking to such a dedicated group of people who clearly have a big passion for what they do. Getting to see their mission control, the mars rovers and various satellite mock ups was awe inspiring but what excited me the most was getting to sit down and pick the brains of people with whom I have worked with at the ASF for years yet not met in the flesh. Finally making that human connection means a lot.

What the ASF offers here is the ability to learn to work as a distributed team without the pressures of the "real world". Everyone at the ASF, pretty much, is a volunteer and other volunteers recognise that, and so it reduces the pressure, but whilst reducing the pressure it teaches you how to make binding decisions as a disparate group, how to keep records and how to ship good quality code whilst living in different timezones. At the ASF some of us might meet once or twice a year at ApacheCon, Fosdem or elsewhere, but largely all communications is done via mailing list. This can cause issues when people "just want to get it done" but it also provides an immutable record of what is going on in a project and who said what. This proves equally useful out in the "real world" where you want to track business decisions or look up historical records of why a certain choice was made. Also dealing with people who you don't work with on a daily basis also helps you think more about your communication style, what is fine to say and what isn't and also how you structure your communications, which is also very important in a business setting. Do you know the person? Do they understand your nuances? Is English their native language? etc

The other thing I find the ASF offers is understanding. Last year I was diagnosed with Aspergers at the age of 33, which is pretty late. What is nice is that generally, people like to listen, and if you have something that affects your personal or professional life, people who you've met at the ASF will often lend an understanding ear to allow you to off load or discuss something completely unrelated to the project you might be working on. Or like me, you can just stand up at the front of an ApacheCon lightning talk and tell everyone! Either way, you can generally find someone in the Apache family who will provide a sounding board for anything you want to discuss.

These days I spend my spare time still working on OODT stuff, but also doing a lot of public speaking and mentoring and whenever I do I make sure I talk up the Apache Software Foundation because it has given me the chance of a life time and one that I'll be forever grateful for. If you aren't involved in development here at the ASF, get involved, you don't have to be a coder, you just need to like helping out in a fun, Open Source community.

As I mentioned at the start, this blog series is about success at Apache, hopefully this proves that success can come in a number of ways, the ASF was selected by NASA as the home for its data middleware platform, that proves that the NASA deemed the incubation process, the license and ecosystem acceptable, that is success the the Apache Foundation. Similarly the foundation has proved very successful in placing people into employment from a range of different walks of life into new lines of work, and that is exactly what happened to me and the reason I wanted to share my story about success at Apache.

= = =

"Success at Apache" is a new monthly blog series that focuses on the processes behind why the ASF "just works". 1) Project Independence https://s.apache.org/CE0V 2) All Carrot and No Stick https://s.apache.org/ykoG 3) Asynchronous Decision Making https://s.apache.org/PMvk 4) Rule of the Makers https://s.apache.org/yFgQ 5) JFDI --the unconditional love of contributors https://s.apache.org/4pjM



Hot Blogs (today's hits)

Tag Cloud