The Apache Software Foundation Blog

Monday February 26, 2018

Success at Apache: Contributing to Open Source even with a high-pressure job

by Anthony Shaw

I believe in the mission of the ASF for many reasons, but the first is the reason why I got into open-source software- free and open access to knowledge.

Back when I was age 12 (1998), I started to learn to program in dBase 4. dBase 4 and the compiler Clipper were not cheap, especially for a $5-a-week paper-round. The box with the software was unwanted by a local company and it came with the manuals. We didn't have the internet at home yet so I was left to go by the manual, and what I could find from second-hand stores and office cleanout sales. For the next decade, I learnt to case based on what I could find, borrow and scavange until in 2002 when I got a copy of Linux and assembled a couple of machines from unwanted parts from the village computer store.

This is where I discovered free and open-source software and really started to build on my coding skills.

My goals were to learn and to share what I'd learnt that others could get to where they needed to go faster. It also helped that software skills were well sought-after in Europe so it set off me in a career in IT.

20 years after I learnt to code, I've moved out of software-engineering and into Learning and Development at Dimension Data for a 29,000 person technology company that operates in 49 countries across the world. My current roles involves about 3-months a year of travel (15 countries typically), managing a department of over 30 people spread across 4 countries and 4 timezones and delivering on large and complex initiatives with high-degrees of change and short deadlines.

In 2016 I made a choice after getting promoted into my current role that I would continue to contribute to the open-source projects I'd worked on for years. But I set myself 3 rules;

1. I would not take away from time with my family
2. I would not interfere with my work commitments
3. I would look after my health

My open-source contributions

For the past 4 years I've made around 1,000-2,000 contributions annually. These have consisted of bug fixes, submissions, and to around 50 projects.

The largest contributions I've made have been to Apache Libcloud, a multi-cloud abstraction library written in Python. Initially this was driven by a work commitment to contribute an integration with the cloud API we'd designed, but I soon realised the power of the library. Going back to my original goal of free and open access to knowledge, I'd seen an alarming trend in the computing world. Proprietary APIs were driving what is known in the industry as "stickiness" or to be frank, lock-in.

Cloud lock-in means that anyone without access to a reliable network, money or willing to sign up to these contracts is being pushed out of advances in technology. I know developers that are students, in remote areas such as rural Australia, Asia and Africa, or those who simply have little money.

Apache Libcloud's design means that you can design applications which can be deployed to OSS platforms like Apache CloudStack and OpenStack.

After finishing the work driver around 100 hours developing a container abstraction layer for Apache Libcloud that meant that developers could write automation for OSS platforms like Kubernetes using the same API as you would with a public cloud provider.

This was all whilst managing family time, work commitment and my health.

These are my 3 tips for maintaining contributions with a high-pressure job:

1. Pick a project that you care about

This is the most important, something that just sparks your curiosity is good fun, but long term interest often dwindles. I've been victim of "ooh shiny thing" many times in the past, but as my career has taken off, I've had to develop the discipline to stop myself from writing my own scripting language, or building an automated sprinkler system from scratch. I stop and remind myself that I might have the time this second, but what about next week and next month? Stop and prioritise.

Prioritise projects that mean something to you.

The 2 OSS projects I commit the most to are Apache Libcloud and SaltStack. I believe in Apache Libcloud's mission of giving open-access to cloud platforms. My SaltStack contributions have been focused around cloud abstraction, networking API abstraction and other fixes and utilities that make it easier for developers and end-users.

The difference between picking something shiny and something you believe in is that long-term you commit more and you find it easier to jump in and help when you can. But how do you find the time?

2. Choosing your tasks wisely and making time

I get asked this question all the time, "how do you find the time". When I try and convince people to contribute to OSS the response is always about time.

Get rid of the things that don't add value

If you can afford to, hire help to give you back time in your week. Not only does open-source help with your skills and knowledge, but it increases your value to a potential employer. Hiring someone to blow the leaves, or help with the chores once a week doesn't need to cost a lot, but if you work out how much value you can get back from that time it often makes sense.

Another thing I've been strict about is binge-watching TV series and gaming. Playing 100-hours of the latest game might be fun, but I find developing more rewarding in the medium-to-long term. Find ways to unwind that don't consume so much time, like meditation, exercise, or reading.

But, if you do need to put your feet up and watch some TV for a few hours, don't feel guilty about it. 

Work smart, not hard

When I do sit down to contribute something, it'll have been carefully planned and thought through what I'm going to do, what I'm going to test and how I'm going to structure it. I try and complete tasks quickly, with foresight and a goal. Once I've completed this 1 module, with tests, I'll submit my contribution. Don't try and refactor the whole project over a weekend. Keep it simple.

But we all know sometimes the best plans go out the window. If you find yourself going down one of those rabbit holes, where you can't get something to compile or you can't debug one of those zombie bugs we love so much as developers.

Stop yourself.

You can easily sit until 3am banging your head against the wall trying to figure it out. This was my advice when I used to manage development teams. If you get stuck, take a break, ask for help and if that still doesn't work, move onto something else. 

Sometimes I pause working on a task if I can't figure it out. Pause for an hour, a week, or even a whole year. When you have one of those "aha" moments, you go back in and finish the job.

It saves time, it delivers better software and it's a good skill to have as a developer.

Find time

A contribution comes down to 3 things:

1. An idea
2. An understanding
3. A "change", like a fix, feature, test, code-review, documentation etc.

The ideas come to me through reading, listening to users or looking at bug submissions. I do this as and when I have a spare minute. This is normally on my lunch break, when I'm waiting for someone or something. 

The time for understanding I get by listening to podcasts and talking to people at conferences. I get a few hours a week in the car and I spend time doing some chores. During that time I always have headphones on to listen the newest Python podcast or OSS update.

The time to sit down and write, code, or test comes for me on the plane (where I'm writing this blog post!). Last year I did enough miles in the air to fly around the world 8 times, most of that time was spent coding, relaxing or sleeping. Aside from that, time spent in airport lounges, on the train or waiting for people I'll whip out my laptop. Any plane that has Wi-Fi I can push changes, else the minute we land I'll have a laptop open and running git push.

Weekend-time is off limits unless I'm travelling or I'm alone. That's rule 1 -- do not take away from time with the family.

3. Managing your workload and avoiding burnout

There are 2 components to this, managing your work commitment and managing your contributions. You need to do both to succeed. 

It's ok to stop and take a break. There is always a pull-request to merge, a bug to inspect, and an email from an end-user. If you need to take a break for a while, talk to the team, ask for help and be frank. We're all in the same boat, contribution is optional. 

So many times I see people contribution feeling like they have a complete obligation to test and fix bugs at 2am 
and then go to work at 8am. This is normally because they care about the project, they care about quality and they care about their reputation but sometimes you need to step back.

A strong project community will step up and help. If you know that work is going to be tough for the next few months, tell the team and set yourself a limit. Wind back for a bit until things calm down. 

Managing work commitments is tough, because there are often financial consequences (or at least a perception of them).

After 7 hours, you're not really adding value. I used to have a lounge-chair next to my desk and now I have a hammock as I work from home. After a few hours of solid concentration I'll happily go and sit down and do nothing for an hour. Your brain needs a break, sure you'll get the odd "working hard" jab from a passer by but I'm working smarter not harder. Once I'm refreshed I'll finish the next task about 30-40% quicker, to a better level of quality and insight. On the occasion I've done 12-14 hour work days, my brain is shutting down to conserve energy and your critical thinking is the first thing to switch off. Followed by logical thinking, this is where you make mistakes and deliver work that is less than a quality you'd normally expect.

I live close to the beach so my time out is going for a swim in the ocean or spending a bit of time with my family. As a manager I also see a responsibility to make it clear that it's encouraged to step back and recharge. Just in our chat-channel to say that I'll be offline for a couple of hours as I'm going to the beach mid-afternoon. I don't feel guilty about it and I hope they do the same.

Learn how to say no and don't feel guilty about it. When I coach people on this I ask, "who asked you to do this? Was no an option? What value is there in delivering this? What is consequence of not doing it? Who else could do it?"

Everyone wants to be helpful and indispensible, but your reliability is just as important to your reputation and what you deliver. 


Look after your health, be smart with your time and contribute for a cause.

Anthony Shaw is the Group Director of Innovation and Talent Development at Dimension Data, an NTT company. Anthony is an open-source advocate, member of the Apache Software Foundation and Python Software Foundation and active contributor to over 20 open-source projects including Apache Libcloud and SaltStack. At Dimension Data, Anthony is driving digital transformation for Dimension Data’s global clients across 50 countries and 30,000 employees. Key initiatives are software skills, automation, DevOps and Cloud. Anthony is based in Sydney, Australia and blogs about skills, software and automation to 170,000 readers annually.

= = =

"Success at Apache" is a monthly blog series that focuses on the processes behind why the ASF "just works". 1) Project Independence 2) All Carrot and No Stick 3) Asynchronous Decision Making 4) Rule of the Makers 5) JFDI --the unconditional love of contributors 6) Meritocracy and Me 7) Learning to Build a Stronger Community 8) Meritocracy. 9) Lowering Barriers to Open Innovation 10) Scratch your own itch. 11) What a Long Strange (and Great) Trip It's Been 12) A Newbie's Narrative 13) Contributing to Open Source even with a high-pressure job

# # # 

Monday February 05, 2018

Success at Apache: A Newbie’s Narrative

by Kuhu Shukla

As I sit at my desk on a rather frosty morning with my coffee, looking up new JIRAs from the previous day in the Apache Tez project, I feel rather pleased. The latest community release vote is complete, the bug fixes that we so badly needed are in and the new release that we tested out internally on our many thousand strong cluster is looking good. Today I am looking at a new stack trace from a different Apache project process and it is hard to miss how much of the exceptional code I get to look at every day comes from people all around the globe. A contributor leaves a JIRA comment before he goes on to pick up his kid from soccer practice while someone else wakes up to find that her effort on a bug fix for the past two months has finally come to fruition through a binding +1.

Yahoo – which joined AOL, HuffPost, Tumblr, Engadget, and many more brands to form the Verizon subsidiary Oath last year – has been at the frontier of open source adoption and contribution since before I was in high school. So while I have no historical trajectories to share, I do have a story on how I found myself in an epic journey of migrating all of Yahoo jobs from Apache MapReduce to Apache Tez, a then new DAG based execution engine.

Oath grid infrastructure is through and through driven by Apache technologies be it storage through HDFS, resource management through YARN, job execution frameworks with Tez and user interface engines such as Hive, Hue, Pig, Sqoop, Spark, Storm. Our grid solution is specifically tailored to Oath's business-critical data pipeline needs using the polymorphic technologies hosted, developed and maintained by the Apache community.

On the third day of my job at Yahoo in 2015, I received a YouTube link on An Introduction to Apache Tez. I watched it carefully trying to keep up with all the questions I had and recognized a few names from my academic readings of Yarn ACM papers. I continued to ramp up on YARN and HDFS, the foundational Apache technologies Oath heavily contributes to even today. For the first few weeks I spent time picking out my favorite (necessary) mailing lists to subscribe to and getting started on setting up on a pseudo-distributed Hadoop cluster. I continued to find my footing with newbie contributions and being ever more careful with whitespaces in my patches. One thing was clear – Tez was the next big thing for us. By the time I could truly call myself a contributor in the Hadoop community nearly 80-90% of the Yahoo jobs were now running with Tez. But just like hiking up the Grand Canyon, the last 20% is where all the pain was. Being a part of the solution to this challenge was a happy prospect and thankfully contributing to Tez became a goal in my next quarter.

The next sprint planning meeting ended with me getting my first major Tez assignment – progress reporting. The progress reporting in Tez was non-existent – "Just needs an API fix,"  I thought. Like almost all bugs in this ecosystem, it was not easy. How do you define progress? How is it different for different kinds of outputs in a graph? The questions were many.

I, however, did not have to go far to get answers. The Tez community actively came to a newbie's rescue, finding answers and posing important questions. I started attending the bi-weekly Tez community sync up calls and asking existing contributors and committers for course correction. Suddenly the team was much bigger, the goals much more chiseled. This was new to anyone like me who came from the networking industry, where the most open part of the code are the RFCs and the implementation details are often hidden. These meetings served as a clean room for our coding ideas and experiments. Ideas were shared, to the extent of which data structure we should pick and what a future user of Tez would take from it. In between the usual status updates and extensive knowledge transfers were made. 

Oath uses Apache Pig and Apache Hive extensively and most of the urgent requirements and requests came from Pig and Hive developers and users. Each issue led to a community JIRA and as we started running Tez at Oath scale, new feature ideas and bugs around performance and resource utilization materialized. Every year most of the Hadoop team at Oath travels to the Hadoop Summit where we meet our cohorts from the Apache community and we stand for hours discussing the state of the art and what is next for the project. One such discussion set the course for the next year and a half for me.

We needed an innovative way to shuffle data. Frameworks like MapReduce and Tez have a shuffle phase in their processing life cycle wherein the data from upstream producers is made available to downstream consumers. Even though Apache Tez was designed with a feature set corresponding to optimization requirements in Pig and Hive, the Shuffle Handler Service was retrofitted from MapReduce at the time of the project's inception. With several thousands of jobs on our clusters leveraging these features in Tez, the Shuffle Handler Service became a clear performance bottleneck. So as we stood talking about our experience with Tez with our friends from the community, we decided to implement a new Shuffle Handler for Tez. All the conversation points were tracked now through an umbrella JIRA TEZ-3334 and the to-do list was long. I picked a few JIRAs and as I started reading through I realized, this is all new code I get to contribute to and review. There might be a better way to put this, but to be honest it was just a lot of fun! All the white boards were full, the team took walks post lunch and discussed how to go about defining the API. Countless hours were spent debugging hangs while fetching data and looking at stack traces and Wireshark captures from our test runs. Six months in and we had the feature on our sandbox clusters. There were moments ranging from sheer frustration to absolute exhilaration with high fives as we continued to address review comments and fixing big and small issues with this evolving feature.

As much as owning your code is valued everywhere in the software community, I would never go on to say “I did this!” In fact, “we did!” It is this strong sense of shared ownership and fluid team structure that makes the open source experience at Apache truly rewarding. This is just one example. A lot of the work that was done in Tez was leveraged by the Hive and Pig community and cross Apache product community interaction made the work ever more interesting and challenging. Triaging and fixing issues with the Tez rollout led us to hit a 100% migration score last year and we also rolled the Tez Shuffle Handler Service out to our research clusters. As of last year we have run around 100 million Tez DAGs with a total of 50 billion tasks over almost 38,000 nodes.

In 2018 as I move on to explore Hadoop 3.0 as our future release, I hope that if someone outside the Apache community is reading this, it will inspire and intrigue them to contribute to a project of their choice. As an astronomy aficionado, going from a newbie Apache contributor to a newbie Apache committer was very much like looking through my telescope - it has endless possibilities and challenges you to be your best.

Kuhu Shukla is a software engineer at Oath and did her Masters in Computer Science at North Carolina State University. She works on the Big Data Platforms team on Apache Tez, YARN and HDFS with a lot of talented Apache PMCs and Committers in Champaign, Illinois. A recent Apache Tez Committer herself she continues to contribute to YARN and HDFS and spoke at the 2017 Dataworks Hadoop Summit on "Tez Shuffle Handler : Shuffling At Scale With Apache Hadoop". Prior to that she worked on Juniper Networks' router and switch configuration APIs. She likes to participate in open source conferences and women in tech events. In her spare time she loves singing Indian classical and jazz, laughing, whale watching, hiking and peering through her Dobsonian telescope.

= = =

"Success at Apache" is a monthly blog series that focuses on the processes behind why the ASF "just works". 1) Project Independence 2) All Carrot and No Stick 3) Asynchronous Decision Making 4) Rule of the Makers 5) JFDI --the unconditional love of contributors 6) Meritocracy and Me 7) Learning to Build a Stronger Community 8) Meritocracy. 9) Lowering Barriers to Open Innovation 10) Scratch your own itch. 11) What a Long Strange (and Great) Trip It's Been 12) A Newbie's Narrative

# # #  



Hot Blogs (today's hits)

Tag Cloud