The Apache Software Foundation Blog

Monday December 03, 2018

Success at Apache: Cookie Monster

by Isabel Drost-Fromm

As a researcher interested in machine learning, Web- and social graphs I joined the Nutch mailing lists back in 2005 when the project was still on SourceForge. I started tinkering with Nutch Writeables to store the data I needed for my analysis – something that today some may know as Hadoop Writeables – the Nutch wiki still has a link to the material that I could get published out of those experiments: https://wiki.apache.org/nutch/AcademicArticles

After leaving academia I remained on the Nutch and Lucene mailing lists - until one day I saw the idea of an "Apache Text" project mentioned: https://lists.apache.org/thread.html/ac22faddbef946b66d544e590fe1b2a54b60215c98cc38a2f995ee06@1176254016@%3Cdev.lucene.apache.org%3E ... I got in touch with Grant Ingersoll, over the course of half a year that vague idea was turned into a plan to have a scalable machine learning project at Apache: Scalable in terms of community, dataset size but also commercially friendly when it comes to licensing – Apache Mahout was created.

Some ideas turn into something with a life on its own. The story I'm going to tell has little to do with great technical or economic achievements that were made with software developed at The Apache Software Foundation. However it has a lot to do with the kind of cross community links that exist between projects at Apache. It also has a lot to do with the fact that there are people active in Apache projects for whom the project is more than merely a day job.

But let's start at the beginning: Little over a year ago, in April or May 2017 Stefan Rudnitzki, one of my then-new colleagues at Europace AG was showing me around the office – mentioning in particular that there's space for meetups of 100 up to 200 people. It was the year when it was unclear whether or not there would be an ApacheCon EU. The combination of those two pieces of information put  an interesting idea in our heads: Why not pull ASF interested people to Berlin and have them discuss cross-community, behind-the-scenes, OSS economics, decentralized project management, coordination of work without discretionary power topics?

In a first step we ran a rough version of the idea past a handful of friends at Apache – and received encouragement. The idea got bigger, new aspects were added and we thought "Let's get more specific!".

In a working backwards model the next thing that was written was a press release (in big, bold, red letters marked as "draft, imaginary, DO NOT PUBLISH!!!!!!!") describing a conference on all things open source behind the scenes. The format helped identify important open question marks – like: 

  • "We don't have a name for the event yet!"
  • "We need to decide on a date."
  • "We need to come up with a clearer list of topics to cover."
  • "What's our target audience?"
  • "If this is a full day event – what will we do about catering?"

What helped me personally was having learnt from Sally in her ASF media training what a real press release actually should look like. 

As for the name that was found missing in the initial press release draft:  After weeks of trying several approaches to come up with a catchy name, I went to pick up my child from kindergarten. What caught my eye was a poster announcing a beneficial concert to collect donations for better equipment and toys – an *a capella* concert: .oO(FOSS A Capella?) .oO(FOSS Backstage?)

The press release formatted version of the vision was first run by Europace – though people here are fairly regularly running after hours meetups, hosting an entire full-day conference is a slightly different scale. After the idea had been met with approval here, it was run by the Apache Community Development mailing list – which we used to keep current planning status transparent and public. 

With the idea out in the open it grew beyond something that can easily be run as a small side project. Years ago to create Berlin Buzzwords I had been working together with an event agency called newthinking communications GmbH. They were founded in 2003 by Andreas Gebhard and Markus Beckedahl in the spirit to create a network on the interface between digital technologies and society. Today, the focus lies in the organisation of events such as Berlin Buzzwords and FOSS Backstage as well as content management services (based on Drupal) for NGOs and political parties as well the conferences named above. So I got in touch with newthinking – and was delighted to receive "Sure, we are going to help out" as an answer. 

So, what about the cookies? One of the first offers I received after publishing that we were to run a FOSS Backstage full day Micro Summit in November 2017 was: "If you need support with providing cookies for the coffee break – I'm happy  to bake some, if there's no more than 40 attendees." Half jokingly I responded that I would add another 40 cookies, lest someone sends me a 3D model of an ASF feather cookie cutter. Lo and behold  the next thing I know is that someone sends me a model file for an ASF cookie cutter (which by now even made it to the then VP trademarks – who was interested in putting it to good use himself). Just a few weeks later I attended Open Source Summit in Prague. Guess what happened? Someone who knew I'd be there brought some printed cookie cutters with him from Australia.

In the meantime we had a one day / two tracks FOSS Backstage Micro Summit in November 2017 kindly hosted by Europace AG. I was able to talk several people into baking ASF cookies (including sugar coatings in the appropriate colours). In addition with the support of both, Newthinking communications GmbH, the ASF planners, and the ASF community development PMC an Apache Roadshow was co-located with the actual FOSS Backstage in June this year – a two day, multiple tracks event featuring Danese Cooper and Shane Coughlan for keynotes, a host of speakers with all sorts of relevant and inspiring stories to share, as well as fishbowl discussions on topics like Open Source monetization. One of the loveliest feedback we received: "This doesn't feel like an inaugural conference, given the professional organisation. You surely did manage to successfully invite people from a great variety of FOSS projects and foundations."

Having a press release draft ready was helpful when starting to drum up interest for the real event: With all details filled in, the "Draft/ Do not share"-warning removed it ended up getting sent to the press and published for real.

We started with a scope of all things FOSS economics, decentralised organisation, cross-cultural team-building, volunteer motivation, licensing and legal. In 2019 we want to align these aspects towards InnerSource, work collaboration principles and modern work models so that teams, companies and organisations can learn from the experiences we all make while working on Open Source projects. We are glad to have the event backed by newthinking GmbH next year again.


Isabel Drost-Fromm is (currently board-) member of the Apache Software Foundation, co-founder of Apache Mahout and mentored several incubating projects. Interested in all things search and text mining with a thorough background in open source project management and open collaboration she is working Europace AG as Open Source Strategist. True to the nature of people living in Berlin she loves having friends fly in for a brief visit –- as a result she co-founded and is still one the creative heads behind both, Berlin Buzzwords, a tech conference on all things search, scale and storage as well as FOSS Backstage, a conference on all things Free and Open Source behind the scenes and how it interrelates with business and InnerSource.

= = =

"Success at Apache" is a monthly blog series that focuses on the processes behind why the ASF "just works" https://blogs.apache.org/foundation/category/SuccessAtApache

Monday November 05, 2018

Success at Apache: Wearing Small Hats

by Rich Bowen

Within The Apache Software Foundation, many of us have different roles. I am a committer on the Apache httpd project, and also a PMC member on that project. I am the Vice President, Conferences. I am a board member. And I’m a member of the Foundation. I'm also an employee of Red Hat, and may, at times, be perceived to be speaking for my employer.

I am a father, husband, brother, son, employee, and so on. How I interact with my daughter is very different from how I interact with my manager. I use different language, wield different authority, and expect different results.


Ten years ago at ApacheCon in Oakland, Bertrand Delacretaz gave a talk about hats. We all laughed a lot. But he was making a serious point. At the Apache Software Foundation –indeed, in life– we all wear many different hats.

However, whereas it's pretty clear, in real life, whether I’m addressing my daughter or my manager, on Apache mailing lists it's seldom, if ever, clear which hat I'm wearing in any given situation.

I like to operate on the following principle when communicating in the Apache community: Wear the smallest hat possible for the situation, but assume that everyone is seeing the biggest hat possible.

So, what does that mean?

In the list above of my Apache hats (Committer, PMC Member, Foundation Member, V.P. Conferences, Director), there are various levels of authority. As a project committer, I can make code changes, but as a PMC member, I can reject other people’s changes. As a Foundation Member, I can express an opinion, but as a Director, I can state the official position of the Foundation.

The difficulty comes when, on a mailing list, I say something, intending it to be my personal opinion (i.e., Foundation Member hat) and someone reads it as the official position of the Foundation (i.e., Foundation Director hat).

Thus, in any given situation, I have an obligation to wield the smallest stick I possibly can, appropriate to the situation. Also, to clearly communicate how I am speaking, if there’s any chance of confusion, by saying things like "speaking as a member, and expressing my private opinion …", or "It is the opinion of the Board of Directors that …"  And, since there’s always a chance of confusion, due to many factors, it’s worthwhile to make this clarification almost every time, if you’re in a position where you do, in fact, wear multiple hats.

By wearing the smallest hat possible i.e., speaking with the voice with the least authority you allow other people to be free to express their own dissenting opinions without feeling that they have already been overruled. This is in line with our culture of providing a level playing field, where all voices are equal, and all opinions are weighed the same.

Rich Bowen has been doing open source-y stuff since about 1995, and has been a member of the Apache Software Foundation since 2002. He currently serves on the ASF Board of Directors. By day, he's the CentOS Community Manager, working for Red Hat.

= = =

"Success at Apache" is a monthly blog series that focuses on the processes behind why the ASF "just works" https://blogs.apache.org/foundation/category/SuccessAtApache

Monday October 01, 2018

Success at Apache: carrying forward the benefits

by Mikael Ståldal

Back in 2013, I worked as software developer and architect at a small IT company. We needed a logging framework for the Java based server platform we were developing. Initially, we tried to use the java.util.logging framework built in to the Java platform. But we soon realized that it was insufficient for our needs.

Looking for alternatives, I found Apache Log4j 1.2 and Logback 1.0. I was not very impressed with either of them, and a bit frustrated with the fragmentation of logging frameworks for Java.

Then I discovered Apache Log4j 2, which was still in beta, but looked promising and I decided to try it out. It worked well for us, and I decided to get involved to improve and contribute to it. I hoped that it once will become the standard logging framework for the Java platform.

We used Graylog to aggregate and analyze log events, and I developed a plugin for Log4j 2 to make it easier to integrate with Graylog. This plugin, GelfLayout, was contributed to the project and became part of Log4j 2. Then I got invited to be committer in the Log4j 2 project.

2015 I got a new job at another IT company. We also needed a Java based logging framework, and I introduced Log4j 2. We found the plugin I had developed at the previous job useful. Later, I developed another plugin for our needs, KafkaAppender, and contributed it to the project. Then I got invited to the Apache Logging PMC (as a committer September 2015, and later member of the Project Management Committee June 2016).

So, thanks to open source and the Apache Software Foundation, I was able to develop a software component when working for one employer, and then continue to use it at my next employer.

How can you do this yourself? Most likely you already use some open source software at work, and there are probably cases where it can be changed to fit your needs better. Make use of your right to make changes, and consider contribute them upstream if they constitute generic improvements. That will benefit the larger community by improving widespread software. It will also benefit the company you work for since they will get less maintenance burden when never versions are released. And finally it will benefit your future self when you work for another company using the same open source software. Read more about how to contribute to an Apache open source project here: http://apache.org/foundation/getinvolved.html


Mikael Ståldal works as software developer, developing mostly in Java and Scala. He has worked at various IT companies in Stockholm. Mikael is also committer and PMC member of the Apache Logging Services project.

= = =

"Success at Apache" is a monthly blog series that focuses on the processes behind why the ASF "just works" https://blogs.apache.org/foundation/category/SuccessAtApache

Wednesday September 05, 2018

Success at Apache — 赢在 Apache: If it helps others, all the better.

by Sally Khudairi, with contributions by Ignasi Barrera, Von Gosling, Luke Han, Kevin A. McGrail, and Anthony Shaw. Translations by Ted Liu and David Zhenwei Dong.

I became active in The Apache Software Foundation at its inception in 1999. I am responsible for elevating the ASF's visibility, and supporting the Foundation by counseling 350+ Apache projects and their communities in the areas of messaging, outreach, and engagement.

As a global, virtual, and diverse community, the ASF relies on countless Apache Members, Committers, and Contributors to help share our values and explain our processes with others. We have grown from a single project to hundreds of projects and communities https://projects.apache.org/committees.html?date through "The Apache Way": an inclusive process and judicious reinforcement of "Community Over Code". 

We launched the "Success at Apache" blog series following the Media & Analyst Training at ApacheCon Seville in 2016. I asked ASF Board Member and VP Brand Management Mark Thomas his opinion on what he thinks are some of the reasons that the ASF "just works". His immediate response was: "project independence". I asked if he'd put that in writing —in his own words— as our community members' personal experiences help others see The Apache Way through their unique perspectives. Shortly afterwards, we published "Project Independence" https://blogs.apache.org/foundation/entry/success_at_apache_project_independence and "Success at Apache" was born https://blogs.apache.org/foundation/feed/entries/atom?cat=SuccessAtApache

Whilst English is the ASF's official language, localization often helps foster understanding,  encourage adoption, and onboard new contributors more quickly. A while back, ASF Member Ted Liu told me that he and some of his coworkers had translated a handful of our "Success at Apache" blog posts into Mandarin Chinese. He asked if it would be useful to us.

Why yes, of course: new approaches to promote and propagate The Apache Way are always appreciated.

. . . 

赢在 Apache

- Meritocracy, by Kevin A. McGrail, Vice President of ASF Fundraising. Translation by David Zhenwei Dong.

- Lowering Barriers to Open Innovation, by Luke Han, Vice President of Apache Kylin. Translation by David Zhenwei Dong.

- Scratch Your Own Itch, by Ignasi Barrera, a member of the Apache jclouds PMC. Translation by Ted Liu.

- Contributing to Open Source Even with a High-pressure Job, by Anthony Shaw, contributor to over 20 Open Source projects, including Apache Libcloud. Translation by Ted Liu.

- Open Innovation from a Non-native English Country, by Von Gosling, original co-founder of Apache RocketMQ. Translation by Ted Liu.

. . .

Ted's action reflects one of our greatest successes at Apache: the mindset of "If this is helpful to me, that's good (= "scratch your own itch"). If it helps others, all the better." 

After all, we didn't become the world's largest Open Source foundation by not being helpful. There's always something needing to be done in an all-volunteer community: if you'd like to help the ASF, we will happily accept your assistance where possible. Plus, helping others feels pretty great.

Whether contributing code and writing documentation http://apache.org/foundation/getinvolved.html , mentoring community members http://community.apache.org/ , supporting the ASF through an individual donation or corporate sponsorship http://apache.org/foundation/contributing.html , or serving in myriad other ways https://helpwanted.apache.org/ , we thank you.

For an immersive, rewarding experience with dozens of Apache projects and hundreds of user and developer community members, consider joining us at ApacheCon in Montreal 24-27 September 2018 http://apachecon.com/ . This year's event is extra special, as we're celebrating the 20th Anniversary of ApacheCon —huzzah! All are welcome https://blogs.apache.org/comdev/entry/my-first-experience-of-apachecon

We look forward to seeing you there and sharing your Success at Apache.


Sally Khudairi is Vice President of Marketing & Publicity at The Apache Software Foundation (ASF) where, in 2002, she was elected its first female and non-technical Member. Over her 25-year career in the Web, Khudairi has been lauded as a dynamic communications strategist and expert in next-generation innovations, and has played an integral role in building campaigns for some of the industry’s most prominent standards and organizations. Prior to launching the ASF in 1999, Khudairi was deputy to Sir Tim Berners-Lee as Head of Communications at the World Wide Web Consortium (W3C), overseeing the launch of 17 specifications that include PNG, CSS, HTML4 and XML. She is Managing Director/Luxury & Technology Practice lead at HALO Worldwide and Founder/Chief Marketing Officer at OptDyn.

= = =

"Success at Apache" is a monthly blog series that focuses on the processes behind why the ASF "just works" https://blogs.apache.org/foundation/category/SuccessAtApache

Tuesday August 07, 2018

Success at Apache: the Apache Legal Shield - a pragmatic view

by Bertrand Delacretaz

I became active in the ASF in 2001 via Gianugo Rabellino -- he was the one who started the discussions with Apache Fop about me donating the jfor XLS-FO to RTF converter that I had developed earlier. It was already too late to uninvent RTF which is a terrible format, but I digress. I am currently a member of the Board of Directors of the ASF and have been doing a lot of thinking (and presentations) about what makes the ASF tick in terms of collaboration and Shared Neurons.

Section 12.1 of the Apache Bylaws https://www.apache.org/foundation/bylaws describes the legal protection that the Apache Software Foundation provides to our directors, officers and members.

I'm not a lawyer by far, however, and that language is a bit hard for me to parse, so I thought I'd try to clarify what this means for our contributors and learn more about it in the process.

If you go into detail there's certainly more to it but I think the items below are the absolute basics that every PMC member https://www.apache.org/foundation/how-it-works.html should understand in order to benefit from the legal shield that the Foundation provides.

What is a "Legal Shield" ?

An important goal of the Apache Bylaws and policies is to isolate our contributors from any legal action that might be taken against the Foundation, if they act as specified in those policies.

That's what we mean by "legal shield": a way for our individual volunters to be sheltered from legal suits directed at the Foundation's projects, as mentioned in our "How the ASF works" document https://www.apache.org/foundation/how-it-works.html .

Acts of the Foundation

The first thing is to make sure our software releases are "Acts of the Foundation" as opposed to something that people do in their own name. This is natural if we follow our release policy https://www.apache.org/legal/release-policy.html , which defines a simple release approval process for releasing source code that makes the project's PMC https://www.apache.org/foundation/how-it-works.html responsible for the release, as opposed to our individual contributors and release managers.

This means that if the released software is ever involved in legal action and someone has to testify or produce information as part of a subpoena, or worse, it's the Foundation which is in charge of that and not our individual contributors. These things happen from time to time, not very often but they can represent a lot of work and aggravation that none of us are looking for. The 2011 subpoena to Apache around Java and Android http://www.groklaw.net/articlebasic.php?story=20110509221136468 is just one example of that. Produce documents reflecting all communications between someone and Apache, how fun is that?

The goal of our release process is to make it very clear what an Apache Release is, and also clarify that anyone using our software in other ways, by getting it directly from our code repositories for example, does so at their own risk. If it's not an Apache Release we didn't give it to them, they grabbed it on their own initiative and have to accept the consequences of that.

The Rest is for Contributors

This leads to a second and related item: developer builds, which happen much more often than releases, often daily, and that people can easily download and use.

Those builds are meant for contributors to our projects, to use in development and testing as part of their contribution activities.

To avoid any confusion, it is important to clearly label them as such, and to draw a clear line between them and official Apache Releases. They should only be advertised in places where developers who are part of our communities (as opposed to the general public) can see them, and with suitable disclaimers.

In our world of continuous deployment and automated builds, the lines between what's a release and what's just tagged code that works for someone are often blurred. That's totally fine from a technical point of view, and often desirable when one wants to move fast, but we shouldn't forget about the possible legal implications ot distributing software.

Let's make sure we take advantage of the well-designed Apache Legal Shield that the Foundation provides to us, by strictly following our release policy and clearly specifying what is what in terms of downloadable software.

I never thought I'd write a blog post on a legal topic, so here's the FUN DISCLAIMER: As mentioned, I am not a lawyer by far, and the above should not be considered legal advice - just a pragmatic view that can hopefully help our contributors better understand the related issues. For legal advice, consult your own legal advisor! And if you're thirsty after reading all this, get a drink and give a toast to the ASF and its founders!

Many thanks to the fellow Apache members who provided feedback and additional ideas for this post.

. . . 

Bertrand Delacretaz works as a Principal Scientist with the Adobe Research team in Basel, Switzerland. He spends a good portion of his time advocating and implementing Open Development as a way to make geographically dispersed teams more efficient and more fun for his coworkers. Bertrand is also an active Member of the Apache Software Foundation, currently on his tenth term on the Foundation's Board of Directors (Fiscal Year 2018-2019).

= = =

"Success at Apache" is a monthly blog series that focuses on the processes behind why the ASF "just works" https://blogs.apache.org/foundation/category/SuccessAtApache

Monday July 09, 2018

Success at Apache: The Apache Way for Executives

by Alex Karasulu

I'm a long time member of the Apache Software Foundation and have been an executive officer of several corporations over the course of the past 20 years. I've co-founded several projects in the community and mentored several others.

The "Apache Way" has benefited several aspects of my life, however I never imagined it would help make me a better executive. Even non-technical executives, in organizations totally outside of the realm of technology, can benefit from the Zen of the Apache Way.

Life is hard when you're stupid

I was involved in a number of early dot com startups as an executive, however that was before my involvement with Apache and long before any exposure to the Apache Way. To this day, I remember how opportunistic decisions for short term gains, the lack of collaboration, openness and communication kept causing friction that made my job and ultimately my life much harder than it had to be.

Learning while on the job

Exposure to the philosophy began early even while lurking on mailing lists but picked up more while incubating the Apache Directory Project where I worked with others to grow an active community. Meanwhile, I was the Chief Technology Officer of a large financial services company called Alliance Capital Partners. It was 2002, and the first time I had to conduct myself as a C-Suite executive in an enterprise that was obviously not a technology company. Incidentally, the lack of hands-on coding got me working on a pet project that ultimately became the Apache Directory Server and Apache MINA. The project was medicine to keep me sane and technically up to date. Unbeknownst to me, this would save my career, not as a developer, but as an executive.

The Apache Way makes life easier

The most important and first lesson I learned from the Apache Community was to avoid short term gains that were unsustainable in the long term. This very important core principle derives in part from the concept of "community over code". It does not matter how much code you write, or how good your code is if you cannot get along, compromise, and communicate respectfully with your peers. The code does not write itself, its the community behind it that keeps the code alive. Involving only the most technically proficient contributors should never trump the need to build a sustainable community. I saw projects often suffer from self-centered yet skilled coders added as committers for short term gain at the detriment of a healthy sustainable community. So as a corollary to community over code, avoid short term gains that get in the way of the long term sustainability of an organization's culture. This has immense applications for any executive in both technical and non-technical fields.

While growing my new development organization in this financial services organization, I decided to avoid hiring people that seemed to be very skilled technically but lacked the desire or social skills to collaborate with others. Thanks to experiences at Apache, I could start telling them apart much better than I did before. Also, I was calmer and less anxious when hiring to fill gaps on the team. It was better not to have the resource than to introduce a bad apple onto the team. 

This was contrary to how I had operated earlier and started producing great results. The application of this basic principle lead to a solid team that worked better together than ever before in the past. They were able to leverage each others' skills thanks to collaboration to out perform any one skilled developer. This is all thanks to the concept of community over code where social skills, and collaboration were stressed more than technical skills. In the end, being kind, listening, and asking smart questions begets the kind of collaboration needed to build complex software. 

Not only did this help with developers, it also worked with teams that did not produce code like project managers under the CTO office. The rule is golden, and IMHO should be applied to any executive's decision making process regardless of the nature of the business or topic at hand.

Inner Source is the Apache Way

Executives drive the architecture and cultural direction of their organizations and the Apache Way provides a solid framework to create healthy foundations through open collaboration, communication and the availability of knowledge for everyone to participate.

Several very successful technology companies have adopted the Apache Way without really realizing they're doing so.  In 2000, Tim O'Reilly coined the term Inner Source https://en.wikipedia.org/wiki/Inner_source to apply Open Source principles to any organization. Tim was essentially talking about applying the Apache Way within organizations. The Apache Way has proven itself with companies like IBM, Google, Microsoft, SAP, PayPal and even financial institutions like Capital One which have adopted the Inner Source methodology which is one and the same.

Without going into the details, of which we the Apache Community are intimately aware (using it daily within our projects), I would like to stress how important the approach is for executives outside of Apache to understand. The Apache Way can save organizations from all out disaster, not to mention billions of dollars by impacting the quality of services and products they produce. Again this does not only apply to companies in technological sectors. Capital One a financial services company has also used Open Source methods for internal projects to be extremely successful https://www.oreilly.com/ideas/using-open-source-methods-for-internal-software-projects .

Conclusions

The Apache Way provides several benefits to executives aware of the approach. Executives can directly integrate the principles of the Apache Way into their own thinking to improve their potential for personal success. However the biggest value comes from the cultural framework it produces for the entire organization, however to leverage it in their organizations, executives must be aware of it. The Apache Way has personally helped me grow as an effective executive and it can help others as well. It also provides a compass for how to properly build effective organizations, not only technical ones.


Alex Karasulu is an entrepreneur with over 25 years of experience in the software industry and a recognized leader in the Open Source community. He is widely known as the original author of the Apache Directory Server, used by IBM both as the foundation of the Rational Directory Server and also integrated into the Websphere Application Server. Alex co-founded several Apache projects, including MINA, and Felix, among others, which, along with their communities, thrive independently past his day-to-day involvement in the projects. He is the founder of Safehaus, where he authored the first low-resource mobile OTP algorithms in Open Source with the OATH community that was later adopted by Google in their Authenticator product. In addition to IBM, Atlassian, Cisco, and Polycom are just a few of the many companies that sell commercial hardware and software solutions that bundle or embed software and products that Alex has created. Alex holds a BSc. in Computer Science and Physics from Columbia University. He is the founder and co-CEO of OptDyn.

= = =

"Success at Apache" is a monthly blog series that focuses on the processes behind why the ASF "just works" https://blogs.apache.org/foundation/category/SuccessAtApache

Monday June 04, 2018

Success at Apache: the Chance to Influence the World

by Weiwei Yang

I submitted my first patch to Apache Hadoop in 2015, a very simple bug fix with just a few lines of changes. However the feeling is still vivid to me when the patch was accepted, I felt great accomplishment. It was not about how big the change was, but rather because I knew even a small change would help a lot of people. This is the best thing I like about working in Open Source, the work I've done has the chance to influence the world. 

As of today, I have contributed nearly 200 patches to Apache Hadoop, over 20k lines of code. I still feel happy when the community accepts my patches. I believe that having such passion is an essential for an individual contributor to make the way to Apache. Unless your company paid you to work on Open Source, you must find yourself such accomplishment during the work, otherwise the commitment won't last. Like me, I spent over 3 years until I received commit privileges for Hadoop. In retrospect, it was a tough, challenging but fast growth journey. I am glad I did not give up and finally get where I am now.

If you are hired by a commercial company that sells products or services powered by Open Source software, then congratulations, you are on a shortcut to Apache. Such companies usually have a strong team working directly on Open Source projects and a lot of committers. Being a member of such organization, you will have more time working on the project, get faster feedback of your patches, opportunities to participate more discussions and much deeper involvement. Unfortunately, I was not working for such companies. Moreover, my native language is not English and I have a big timezone gap with the majority people from the community. That makes my path to Apache much more difficult. I believe there are many people, just like me at 3 years ago, who are willing to contribute but finding it hard to. In this post, I will share some tips how to work with the Apache community and how to grow up to a committer.

First, it's important that you know things that are public to everyone. Every Open Source project has its own tutorials introducing how to contribute, be sure you have read that before working on any patches. Those documents generally tell you how to contribute code in the "Apache" way, and how to collaborate with the community.

Second, don't mind fixing bugs. Actually I suggest to begin with fixing bugs. You may find bugs in your daily work, or somebody reported to the community. No matter if they are big or not, bugs must be fixed so that it's easier to get attention from the community. In an Open Source community, everyone volunteers to review some other ones' patches. So don't be upset if nobody gets to your patch quickly, try to soft ping committers around this area. But never push them for anything. And always be polite.

More involvement. There are many ways to get more involvement. First, if a community sets up a MeetUp once in a while, try to attend even you are remote or in an inconvenient local time. Such MeetUps can help you gather information of the development status, current community focus etc. It also helps others to get familiar with your face; second, try to participate in more discussions. This could be discussions on mailing lists, issue tracking systems or a Web conference that discusses a particular issue/design. In my opinion, this is the hardest part especially for contributors from overseas.

Be self-motivated and passionate. Nobody forces you to work on Open Source projects, you need to keep motivating yourself. Like I first mentioned in this post, there are more ways to be self-motivated than just feeling accomplished. Working in the community gives you the chance to work in a diverse environment, meet people from different companies and different countries; you can get as many chances as you want to solve difficult real problems, and improve your skills; you can build your reputation in the community which also helps your career development.

I truly hope my experiences would help people. Now I am working at Alibaba Group, and it gives me more reason to write this post. I see a lot of talented people around, they have solid skills, they have done and are doing a lot work to make Hadoop better. They are open to contributing back but are having various of difficulties to work with the community. I am committed to helping grow this community, and I do believe an open and diverse community will help the project thrive.  


Weiwei Yang is a Staff Engineer working at Alibaba Group. He has been working on Big Data area for over 8 years, most of time working on Apache Hadoop. He contributed to several Apache projects such as YARN, HDFS, MapReduce, Ambari and Slider, and an active Hadoop committer. At present, he is working in Alibaba’s data infrastructure team and is focusing on evolving Apache YARN to support mixed workloads, improve performance and cluster utilization. Prior to that, he worked in IBM for several years and won multiple Open Source contribution awards.

= = =

"Success at Apache" is a monthly blog series that focuses on the processes behind why the ASF "just works" https://blogs.apache.org/foundation/category/SuccessAtApache

# # #

Monday May 07, 2018

Success at Apache: Dip into the Apache Way

by Nick Couchman

Like other recent contributors to this blog, I am not a developer by trade. My day job is as a Linux Systems Engineer and team manager, and, truth be told, my programming skills are not something I would rely on to make a living. Despite these facts, I've found something beyond acceptance in being a part of the Apache Guacamole project: mentoring.

Most of my experience with The Apache Software Foundation (ASF) has been with retrieving the Apache Web Server (httpd) http://httpd.apache.org/ from the download page, and getting involved with the ASF was more accidental than anything else. I've brushed arms with the Guacamole project http://guacamole.apache.org/ at several times over the past decade. As a systems administrator/engineer, and one who prefers Linux to some of the commercial alternatives, I'm always happy to see software produced that is truly cross-platform, and, as many current trends are demonstrating, Web browser applications are the pinnacle of cross-platform applications. I used Guacamole in various applications in my place of employment, but always saw opportunities to improve it – add a feature here or there, make it more administrator or user friendly, etc.

After a recent job change, I found myself with a little more free time than I had previously had, and a desire to do something productive with that time. I started thinking about how I could give back to the Open Source community  I've long been a user of many software packages made freely-available to the world, and my appreciation for the developers and companies that produce and support these efforts had, for a while, made me want to do something to return the favor and give back to that community. I also needed to challenge myself and fill some of my free time, and growing my programming skills seemed like a good way to accomplish these goals.

When I settled on Guacamole, I found that it had entered into the Apache Incubator http://incubator.apache.org/ programming in an effort to get the project accepted by The Apache Software Foundation. I thought that was cool, but didn’t think much else of it at the time, and I knew little about the organization. The Incubator program helps potential ASF projects learn how to create a certain culture and community that encourages development and interaction.

This culture is created, in large part, by the Apache Way, a set of guiding principles and behaviors for projects within the ASF. One of the biggest keys to my success, thus far, in contributing to the Guacamole project is the concept of mentoring  not a behavior or principle officially outlined in Apache Way documents, but rather a byproduct of those principles. It seems that it is very human to be dismissive of people that don't measure up to our standard in some way or another, and my programming skills are, by far, the weakest of any of the current contributors to the Guacamole project. However, instead of ridicule or dismissal or discouragement, the other developers within the project have been accepting, helpful, and provided guidance.

And, as with any good education opportunity, they don't do this by giving me the answers or telling me how to do something, they do it by providing examples, references, and pointers that help me to think through the why and make my way to the how to write better code. The result? I still wouldn't rely on my programming skills for my day job, but I've come a long way in the 18 months that I've been a part of the project, and the code I write today is better than when I started.

Finally, this involvement actually makes me better at my day job. Not only does it give me a stronger appreciation for the effort that goes into writing the software that I use on a regular basis, but, more practically, it gives me a stronger set of skills for debugging problems and tracking down bugs that occur. I'm better able to locate the actual cause of problems, provide useful descriptions of those problems, and interact with the software engineers and developers in various places responsible for writing, improving, and supporting those applications.

At this point, my involvement with The Apache Software Foundation is limited to the Guacamole project, and will probably stay that way for the foreseeable future, but it's great to be involved with an organization and community that has a very diverse community of developers and projects, and know that, should I choose to add another challenge to my life, there are other projects out there that would welcome the involvement and would provide similarly positive experiences in helping me grow in my ability to give back to the open source community. If you're itching to dust off or learn some programming skills then I encourage you to look at the many available Apache Software Foundation projects available and jump into one of the communities. You'll almost certainly want to join one of the mailing lists for the project and your involvement can grow from there.

Nick Couchman is a Senior Linux Systems Engineer and Technical Team Lead for a major cosmetics conglomerate, and spends his days trying to convince everyone that they should run more Linux and less...other stuff.  He spends his evenings with his family and increasingly small amounts of free time contributing to the Apache Guacamole project, learning how to write C, Java, and JavaScript.

= = =

"Success at Apache" is a monthly blog series that focuses on the processes behind why the ASF "just works" https://blogs.apache.org/foundation/category/SuccessAtApache

# # # 

Tuesday April 10, 2018

Success at Apache: Am I there yet? A n00b's perspective

by Charles Givre

Let me start out by saying that I am not a developer. I do have a technical background, but I hadn't coded in Java for at least 10 years before I got involved in the Apache Drill project. One has to wonder how, as a non-developer, I ended up as a committer for the Drill project. In this blog post, I'd like to share with you how I came to be involved with the Drill project.

But first, why Drill?

I first heard about Drill at an industry conference several years ago. I was speaking with Dr. Ellen Friedman about some data issues we were having and she casually mentioned have I tried Drill? I had not heard of it at that point, so I did some research and it seemed as if Drill could solve a lot of problems that my clients were having. But then, I tried using it and kept getting stuck.  

If you aren't familiar with Apache Drill, Drill is an SQL engine which allows you to query any kind of self-describing data. After experimenting with Drill for a while, I was impressed enough to thing that the tool had major potential in security. One of the biggest problems that Drill solves is the need to Extract, Transform, Load (ETL) data into an analytic tool before actually doing analysis of that data. This ETL process adds no value to anything really, and costs large enterprises literally millions of dollars as well as adding unnecessary delays between the time data is ingested and when the data is actually available for analysis. In security applications, this delay directly translates into risk. The longer it takes to make your data available, the more time it will take to potentially find malicious activity and hence, more risk. Therefore, if you're able to query the data without having to do any kind of ETL or ingestion, you are lowering your risk as well as potentially saving millions of dollars.

Getting Involved

Unfortunately, when I started using Drill, I saw this potential, but I couldn't get it to work. My next step from here was to try to get assistance at my company. I pitched the ideas to my company leadership, but it proved very difficult to get the company to pull Java developers from revenue generating projects to work on this "pie-in-the-sky", unproven project. After spending several months on this, I got really frustrated and decided that I was going to try to do it myself, however, I really had no idea what I was doing. I hadn't coded in Java for at least 10 years at the time, and had zero experience with all the modern Java development tools such as Maven and Git. What I did have was persistence, so I started asking for help and decided that I was going to dive right in and start adding the functionality that I felt Drill needed to be useful in security applications. I started working on something that someone else started—the HTTPD format plugin for Drill. Most of the coding was done, but there was still enough there for me to get my hands dirty and start figuring things out.

What I learned

I still would not consider myself a developer, but after getting that particular item committed to the codebase, I learned a lot about how open source projects actually work as well as writing production quality code. Since then, I've tried to add at least one bit of new functionality to each Drill release. I would encourage anyone who is interested in contributing to an Open Source project at the Apache Software Foundation, to dive right in, and start. There are still a lot of ideas I have for Drill, and with time, I hope to have the time to see them through to implementation.

In conclusion, I'm fairly certain that my involvement with Drill and the Apache Software Foundation is really just beginning. I'm currently working on the O'Reilly book about Apache Drill with a fellow Drill committer. It is my hope that the book will spark additional interest in Apache Drill. Open Source software is at the heart of the ongoing data revolution which is dramatically expanding what is possible with data. I firmly believe that Apache Drill will have a role to play in this data revolution and I'm honored to have the opportunity to play a small role in developing Drill.

Charles Givre CISSP is a Lead Data Scientist at Deutsche Bank where he works in the Chief Information Security Office (CISO). Mr. Givre is an active data science instructor and regularly teaches classes about data science and security at various industry conferences, such as BlackHat. Mr. Givre is a committer for the Apache Drill project and together with Mr. Paul Rogers, is working on the forthcoming O’Reilly book about Apache Drill. He can be reached at cgivre(at)apache(dot)org.  

= = =

"Success at Apache" is a monthly blog series that focuses on the processes behind why the ASF "just works" https://blogs.apache.org/foundation/category/SuccessAtApache

# # #

Monday March 05, 2018

Success at Apache: Open Innovation from a Non-native English Country

by Von Gosling

When I saw the "Success at Apache" series, I thought about writing something about my, being from a non-native English country, Open Source experience these past few years. Last year, RocketMQ graduated from the Apache Incubator and became one of the Apache Top-Level Projects. As one of the original co-founders of RocketMQ, I was proud to see an Open Source community from Apache RocketMQ that has an ever-growing diversity. The Apache Software Foundation (ASF), one of the most famous and great technology brands, has thousands of companies’ software infrastructure based on their projects. This is proven from the worldwide download mirror activity in ASF statistics. As an early implementer/pioneer of Open Source in China, Apache HTTP Server, Apache Tomcat, Apache Struts 1.x, and Apache Maven are my favorite software stacks when I worked for building distributed and high-performance websites.

Last year, I wrote an article about the road to the Apache TLP, which is published in China’s InfoQ. Some people asked me how to be more ‘Apache’ and how to build a more diverse community. These are the questions that many people are concerned about. In this blog post, I will address how to be more collaborative around the world, especially in non-native English countries.

Open Communication
With more and more instant messaging apps coming up in Android and IOS world, the younger generation prefers to communicate using such way, which has spread to the daily coding life for the majority of people. But, it is not search engine friendly and in most cases it does not support multi-channel for multi-language. I have been involved in many such local technology groups, together we have discussed what went wrong, explored ideas about how to solve it, and come up with a good solution together. This method worked for all my past projects, but when we hope to be more involved in Open Source around the world, that method does not work well. I remember clearly when RocketMQ began to discuss the process for its proposal, some people complained about what we have to do in the local community. We learned much about from this discussion in the community, and thus, found an effective solution. Hence in the Apache RocketMQ community, we encourage users to ask the question using the user email list. In order to make the communication process effective, we answer the question in the same language of the question. With more and more committers coming from different countries, this solution will help to grow the more diverse community. But, as John Ament said in another "Success at Apache" post https://s.apache.org/x9Be --open communication isn't for everything. We also allow private communication between the users and us as some questions might not be proper to discuss publicly. But that isn't a part of the decision making process. Likewise, anytime we're talking about individuals in either a positive or negative way should be conducted on the private list for a project.

Easy ways to be involved in the community
This is another top concern in the Open Source world. Some people may not know that in China there are many local communities about Apache Projects, such as Apache HTTP Server, Apache Tomcat, Apache Spark, and Apache Hadoop. Such Projects have corresponding Chinese documentations. On the other hand, we try our best to improve the English documents. We consider the messages behind every document page. If one finds a minor or big native narrative polish, one could leave a message, or send feedback to our dev or user email list. Besides documentation, we also hold programming marathons in the community irregularly to get more involved with the community. We could find more users who have more interest, especially cross-domain technology in such campaigns. Recently, we open sourced more tasks in the Google Summer of Code. Students will develop Open Source software full-time for three months. We will provide mentoring and project ideas, and in return have the chance to get new code developed and --most importantly-- to identify and bring in new committers. It is another chance to let PMC members know how to improve and let more students get involved in the community easily.

In China, Internet giants like Alibaba are devoting themselves into Open Source projects hence according to my personal experience, it made sense to help more excellent Chinese projects to come into the Incubator. Right before the Lunar New Year, another famous project from China, Dubbo, started its Apache journey. I am glad to be a local mentor and hope to continue to share what we have learned. Thanks to the ASF, more and more Open Source projects will benefit our daily coding. That is a great appeal around the world’s Open Source field.

Von Gosling is a senior technology manager working at Alibaba Group. He has extensive industry software development experience, especially in distributed tech., reliable Web architecture and performance tuning. He holds many patents in the distributed system, recommendation etc. he has been a frequent speaker at Open Source and architect conferences worldwide including ApacheCon and QCon. He has been the lead for messaging at Alibaba as well as the Tenth and Sixteenth CJK OSS Award recipient. He is the original Apache RocketMQ co-founder and Linux OpenMessaging Standard Initiator.

= = =

"Success at Apache" is a monthly blog series that focuses on the processes behind why the ASF "just works". 1) Project Independence https://s.apache.org/CE0V 2) All Carrot and No Stick https://s.apache.org/ykoG 3) Asynchronous Decision Making https://s.apache.org/PMvk 4) Rule of the Makers https://s.apache.org/yFgQ 5) JFDI --the unconditional love of contributors https://s.apache.org/4pjM 6) Meritocracy and Me https://s.apache.org/tQQh 7) Learning to Build a Stronger Community https://s.apache.org/x9Be 8) Meritocracy. https://s.apache.org/DiEo 9) Lowering Barriers to Open Innovation https://s.apache.org/dAlg 10) Scratch your own itch. https://s.apache.org/Apah 11) What a Long Strange (and Great) Trip It's Been https://s.apache.org/gVuN 12) A Newbie's Narrative https://s.apache.org/A72H 13) Contributing to Open Source even with a high-pressure job https://s.apache.org/lM9O 14) Open Innovation from a Non-native English Country https://s.apache.org/lh61

# # # 

Monday February 26, 2018

Success at Apache: Contributing to Open Source even with a high-pressure job

by Anthony Shaw

I believe in the mission of the ASF for many reasons, but the first is the reason why I got into open-source software- free and open access to knowledge.

Back when I was age 12 (1998), I started to learn to program in dBase 4. dBase 4 and the compiler Clipper were not cheap, especially for a $5-a-week paper-round. The box with the software was unwanted by a local company and it came with the manuals. We didn't have the internet at home yet so I was left to go by the manual, and what I could find from second-hand stores and office cleanout sales. For the next decade, I learnt to case based on what I could find, borrow and scavange until in 2002 when I got a copy of Linux and assembled a couple of machines from unwanted parts from the village computer store.

This is where I discovered free and open-source software and really started to build on my coding skills.

My goals were to learn and to share what I'd learnt that others could get to where they needed to go faster. It also helped that software skills were well sought-after in Europe so it set off me in a career in IT.

20 years after I learnt to code, I've moved out of software-engineering and into Learning and Development at Dimension Data for a 29,000 person technology company that operates in 49 countries across the world. My current roles involves about 3-months a year of travel (15 countries typically), managing a department of over 30 people spread across 4 countries and 4 timezones and delivering on large and complex initiatives with high-degrees of change and short deadlines.

In 2016 I made a choice after getting promoted into my current role that I would continue to contribute to the open-source projects I'd worked on for years. But I set myself 3 rules;

1. I would not take away from time with my family
2. I would not interfere with my work commitments
3. I would look after my health

My open-source contributions

For the past 4 years I've made around 1,000-2,000 contributions annually. These have consisted of bug fixes, submissions, and to around 50 projects.

The largest contributions I've made have been to Apache Libcloud, a multi-cloud abstraction library written in Python. Initially this was driven by a work commitment to contribute an integration with the cloud API we'd designed, but I soon realised the power of the library. Going back to my original goal of free and open access to knowledge, I'd seen an alarming trend in the computing world. Proprietary APIs were driving what is known in the industry as "stickiness" or to be frank, lock-in.

Cloud lock-in means that anyone without access to a reliable network, money or willing to sign up to these contracts is being pushed out of advances in technology. I know developers that are students, in remote areas such as rural Australia, Asia and Africa, or those who simply have little money.

Apache Libcloud's design means that you can design applications which can be deployed to OSS platforms like Apache CloudStack and OpenStack.

After finishing the work driver around 100 hours developing a container abstraction layer for Apache Libcloud that meant that developers could write automation for OSS platforms like Kubernetes using the same API as you would with a public cloud provider.

This was all whilst managing family time, work commitment and my health.

These are my 3 tips for maintaining contributions with a high-pressure job:

1. Pick a project that you care about

This is the most important, something that just sparks your curiosity is good fun, but long term interest often dwindles. I've been victim of "ooh shiny thing" many times in the past, but as my career has taken off, I've had to develop the discipline to stop myself from writing my own scripting language, or building an automated sprinkler system from scratch. I stop and remind myself that I might have the time this second, but what about next week and next month? Stop and prioritise.

Prioritise projects that mean something to you.

The 2 OSS projects I commit the most to are Apache Libcloud and SaltStack. I believe in Apache Libcloud's mission of giving open-access to cloud platforms. My SaltStack contributions have been focused around cloud abstraction, networking API abstraction and other fixes and utilities that make it easier for developers and end-users.

The difference between picking something shiny and something you believe in is that long-term you commit more and you find it easier to jump in and help when you can. But how do you find the time?

2. Choosing your tasks wisely and making time

I get asked this question all the time, "how do you find the time". When I try and convince people to contribute to OSS the response is always about time.

Get rid of the things that don't add value

If you can afford to, hire help to give you back time in your week. Not only does open-source help with your skills and knowledge, but it increases your value to a potential employer. Hiring someone to blow the leaves, or help with the chores once a week doesn't need to cost a lot, but if you work out how much value you can get back from that time it often makes sense.

Another thing I've been strict about is binge-watching TV series and gaming. Playing 100-hours of the latest game might be fun, but I find developing more rewarding in the medium-to-long term. Find ways to unwind that don't consume so much time, like meditation, exercise, or reading.

But, if you do need to put your feet up and watch some TV for a few hours, don't feel guilty about it. 

Work smart, not hard

When I do sit down to contribute something, it'll have been carefully planned and thought through what I'm going to do, what I'm going to test and how I'm going to structure it. I try and complete tasks quickly, with foresight and a goal. Once I've completed this 1 module, with tests, I'll submit my contribution. Don't try and refactor the whole project over a weekend. Keep it simple.

But we all know sometimes the best plans go out the window. If you find yourself going down one of those rabbit holes, where you can't get something to compile or you can't debug one of those zombie bugs we love so much as developers.

Stop yourself.

You can easily sit until 3am banging your head against the wall trying to figure it out. This was my advice when I used to manage development teams. If you get stuck, take a break, ask for help and if that still doesn't work, move onto something else. 

Sometimes I pause working on a task if I can't figure it out. Pause for an hour, a week, or even a whole year. When you have one of those "aha" moments, you go back in and finish the job.

It saves time, it delivers better software and it's a good skill to have as a developer.

Find time

A contribution comes down to 3 things:

1. An idea
2. An understanding
3. A "change", like a fix, feature, test, code-review, documentation etc.

The ideas come to me through reading, listening to users or looking at bug submissions. I do this as and when I have a spare minute. This is normally on my lunch break, when I'm waiting for someone or something. 

The time for understanding I get by listening to podcasts and talking to people at conferences. I get a few hours a week in the car and I spend time doing some chores. During that time I always have headphones on to listen the newest Python podcast or OSS update.

The time to sit down and write, code, or test comes for me on the plane (where I'm writing this blog post!). Last year I did enough miles in the air to fly around the world 8 times, most of that time was spent coding, relaxing or sleeping. Aside from that, time spent in airport lounges, on the train or waiting for people I'll whip out my laptop. Any plane that has Wi-Fi I can push changes, else the minute we land I'll have a laptop open and running git push.

Weekend-time is off limits unless I'm travelling or I'm alone. That's rule 1 -- do not take away from time with the family.

3. Managing your workload and avoiding burnout

There are 2 components to this, managing your work commitment and managing your contributions. You need to do both to succeed. 

It's ok to stop and take a break. There is always a pull-request to merge, a bug to inspect, and an email from an end-user. If you need to take a break for a while, talk to the team, ask for help and be frank. We're all in the same boat, contribution is optional. 

So many times I see people contribution feeling like they have a complete obligation to test and fix bugs at 2am 
and then go to work at 8am. This is normally because they care about the project, they care about quality and they care about their reputation but sometimes you need to step back.

A strong project community will step up and help. If you know that work is going to be tough for the next few months, tell the team and set yourself a limit. Wind back for a bit until things calm down. 

Managing work commitments is tough, because there are often financial consequences (or at least a perception of them).

After 7 hours, you're not really adding value. I used to have a lounge-chair next to my desk and now I have a hammock as I work from home. After a few hours of solid concentration I'll happily go and sit down and do nothing for an hour. Your brain needs a break, sure you'll get the odd "working hard" jab from a passer by but I'm working smarter not harder. Once I'm refreshed I'll finish the next task about 30-40% quicker, to a better level of quality and insight. On the occasion I've done 12-14 hour work days, my brain is shutting down to conserve energy and your critical thinking is the first thing to switch off. Followed by logical thinking, this is where you make mistakes and deliver work that is less than a quality you'd normally expect.

I live close to the beach so my time out is going for a swim in the ocean or spending a bit of time with my family. As a manager I also see a responsibility to make it clear that it's encouraged to step back and recharge. Just in our chat-channel to say that I'll be offline for a couple of hours as I'm going to the beach mid-afternoon. I don't feel guilty about it and I hope they do the same.

Learn how to say no and don't feel guilty about it. When I coach people on this I ask, "who asked you to do this? Was no an option? What value is there in delivering this? What is consequence of not doing it? Who else could do it?"

Everyone wants to be helpful and indispensible, but your reliability is just as important to your reputation and what you deliver. 

Conclusion

Look after your health, be smart with your time and contribute for a cause.

Anthony Shaw is the Group Director of Innovation and Talent Development at Dimension Data, an NTT company. Anthony is an open-source advocate, member of the Apache Software Foundation and Python Software Foundation and active contributor to over 20 open-source projects including Apache Libcloud and SaltStack. At Dimension Data, Anthony is driving digital transformation for Dimension Data’s global clients across 50 countries and 30,000 employees. Key initiatives are software skills, automation, DevOps and Cloud. Anthony is based in Sydney, Australia and blogs about skills, software and automation to 170,000 readers annually.

= = =

"Success at Apache" is a monthly blog series that focuses on the processes behind why the ASF "just works". 1) Project Independence https://s.apache.org/CE0V 2) All Carrot and No Stick https://s.apache.org/ykoG 3) Asynchronous Decision Making https://s.apache.org/PMvk 4) Rule of the Makers https://s.apache.org/yFgQ 5) JFDI --the unconditional love of contributors https://s.apache.org/4pjM 6) Meritocracy and Me https://s.apache.org/tQQh 7) Learning to Build a Stronger Community https://s.apache.org/x9Be 8) Meritocracy. https://s.apache.org/DiEo 9) Lowering Barriers to Open Innovation https://s.apache.org/dAlg 10) Scratch your own itch. https://s.apache.org/Apah 11) What a Long Strange (and Great) Trip It's Been https://s.apache.org/gVuN 12) A Newbie's Narrative https://s.apache.org/A72H 13) Contributing to Open Source even with a high-pressure job https://s.apache.org/lM9O

# # # 

Monday February 05, 2018

Success at Apache: A Newbie’s Narrative

by Kuhu Shukla

As I sit at my desk on a rather frosty morning with my coffee, looking up new JIRAs from the previous day in the Apache Tez project, I feel rather pleased. The latest community release vote is complete, the bug fixes that we so badly needed are in and the new release that we tested out internally on our many thousand strong cluster is looking good. Today I am looking at a new stack trace from a different Apache project process and it is hard to miss how much of the exceptional code I get to look at every day comes from people all around the globe. A contributor leaves a JIRA comment before he goes on to pick up his kid from soccer practice while someone else wakes up to find that her effort on a bug fix for the past two months has finally come to fruition through a binding +1.

Yahoo – which joined AOL, HuffPost, Tumblr, Engadget, and many more brands to form the Verizon subsidiary Oath last year – has been at the frontier of open source adoption and contribution since before I was in high school. So while I have no historical trajectories to share, I do have a story on how I found myself in an epic journey of migrating all of Yahoo jobs from Apache MapReduce to Apache Tez, a then new DAG based execution engine.

Oath grid infrastructure is through and through driven by Apache technologies be it storage through HDFS, resource management through YARN, job execution frameworks with Tez and user interface engines such as Hive, Hue, Pig, Sqoop, Spark, Storm. Our grid solution is specifically tailored to Oath's business-critical data pipeline needs using the polymorphic technologies hosted, developed and maintained by the Apache community.

On the third day of my job at Yahoo in 2015, I received a YouTube link on An Introduction to Apache Tez. I watched it carefully trying to keep up with all the questions I had and recognized a few names from my academic readings of Yarn ACM papers. I continued to ramp up on YARN and HDFS, the foundational Apache technologies Oath heavily contributes to even today. For the first few weeks I spent time picking out my favorite (necessary) mailing lists to subscribe to and getting started on setting up on a pseudo-distributed Hadoop cluster. I continued to find my footing with newbie contributions and being ever more careful with whitespaces in my patches. One thing was clear – Tez was the next big thing for us. By the time I could truly call myself a contributor in the Hadoop community nearly 80-90% of the Yahoo jobs were now running with Tez. But just like hiking up the Grand Canyon, the last 20% is where all the pain was. Being a part of the solution to this challenge was a happy prospect and thankfully contributing to Tez became a goal in my next quarter.

The next sprint planning meeting ended with me getting my first major Tez assignment – progress reporting. The progress reporting in Tez was non-existent – "Just needs an API fix,"  I thought. Like almost all bugs in this ecosystem, it was not easy. How do you define progress? How is it different for different kinds of outputs in a graph? The questions were many.

I, however, did not have to go far to get answers. The Tez community actively came to a newbie's rescue, finding answers and posing important questions. I started attending the bi-weekly Tez community sync up calls and asking existing contributors and committers for course correction. Suddenly the team was much bigger, the goals much more chiseled. This was new to anyone like me who came from the networking industry, where the most open part of the code are the RFCs and the implementation details are often hidden. These meetings served as a clean room for our coding ideas and experiments. Ideas were shared, to the extent of which data structure we should pick and what a future user of Tez would take from it. In between the usual status updates and extensive knowledge transfers were made. 

Oath uses Apache Pig and Apache Hive extensively and most of the urgent requirements and requests came from Pig and Hive developers and users. Each issue led to a community JIRA and as we started running Tez at Oath scale, new feature ideas and bugs around performance and resource utilization materialized. Every year most of the Hadoop team at Oath travels to the Hadoop Summit where we meet our cohorts from the Apache community and we stand for hours discussing the state of the art and what is next for the project. One such discussion set the course for the next year and a half for me.

We needed an innovative way to shuffle data. Frameworks like MapReduce and Tez have a shuffle phase in their processing life cycle wherein the data from upstream producers is made available to downstream consumers. Even though Apache Tez was designed with a feature set corresponding to optimization requirements in Pig and Hive, the Shuffle Handler Service was retrofitted from MapReduce at the time of the project's inception. With several thousands of jobs on our clusters leveraging these features in Tez, the Shuffle Handler Service became a clear performance bottleneck. So as we stood talking about our experience with Tez with our friends from the community, we decided to implement a new Shuffle Handler for Tez. All the conversation points were tracked now through an umbrella JIRA TEZ-3334 and the to-do list was long. I picked a few JIRAs and as I started reading through I realized, this is all new code I get to contribute to and review. There might be a better way to put this, but to be honest it was just a lot of fun! All the white boards were full, the team took walks post lunch and discussed how to go about defining the API. Countless hours were spent debugging hangs while fetching data and looking at stack traces and Wireshark captures from our test runs. Six months in and we had the feature on our sandbox clusters. There were moments ranging from sheer frustration to absolute exhilaration with high fives as we continued to address review comments and fixing big and small issues with this evolving feature.

As much as owning your code is valued everywhere in the software community, I would never go on to say “I did this!” In fact, “we did!” It is this strong sense of shared ownership and fluid team structure that makes the open source experience at Apache truly rewarding. This is just one example. A lot of the work that was done in Tez was leveraged by the Hive and Pig community and cross Apache product community interaction made the work ever more interesting and challenging. Triaging and fixing issues with the Tez rollout led us to hit a 100% migration score last year and we also rolled the Tez Shuffle Handler Service out to our research clusters. As of last year we have run around 100 million Tez DAGs with a total of 50 billion tasks over almost 38,000 nodes.

In 2018 as I move on to explore Hadoop 3.0 as our future release, I hope that if someone outside the Apache community is reading this, it will inspire and intrigue them to contribute to a project of their choice. As an astronomy aficionado, going from a newbie Apache contributor to a newbie Apache committer was very much like looking through my telescope - it has endless possibilities and challenges you to be your best.

Kuhu Shukla is a software engineer at Oath and did her Masters in Computer Science at North Carolina State University. She works on the Big Data Platforms team on Apache Tez, YARN and HDFS with a lot of talented Apache PMCs and Committers in Champaign, Illinois. A recent Apache Tez Committer herself she continues to contribute to YARN and HDFS and spoke at the 2017 Dataworks Hadoop Summit on "Tez Shuffle Handler : Shuffling At Scale With Apache Hadoop". Prior to that she worked on Juniper Networks' router and switch configuration APIs. She likes to participate in open source conferences and women in tech events. In her spare time she loves singing Indian classical and jazz, laughing, whale watching, hiking and peering through her Dobsonian telescope.

= = =

"Success at Apache" is a monthly blog series that focuses on the processes behind why the ASF "just works". 1) Project Independence https://s.apache.org/CE0V 2) All Carrot and No Stick https://s.apache.org/ykoG 3) Asynchronous Decision Making https://s.apache.org/PMvk 4) Rule of the Makers https://s.apache.org/yFgQ 5) JFDI --the unconditional love of contributors https://s.apache.org/4pjM 6) Meritocracy and Me https://s.apache.org/tQQh 7) Learning to Build a Stronger Community https://s.apache.org/x9Be 8) Meritocracy. https://s.apache.org/DiEo 9) Lowering Barriers to Open Innovation https://s.apache.org/dAlg 10) Scratch your own itch. https://s.apache.org/Apah 11) What a Long Strange (and Great) Trip It's Been https://s.apache.org/gVuN 12) A Newbie's Narrative https://s.apache.org/A72H

# # #  

Tuesday December 12, 2017

Success at Apache: What a Long Strange (and Great) Trip It's Been

By Jim Jagielski

It is normally during this time of year that people get awful retrospective. We look over the last 12 months and come to terms with what kind of year it has been. We congratulate ourselves on the good and (hopefully) learn from the bad. We basically assess the ending year and start planning, even a little bit, on the one to come.

In general, we reminisce.

I am thinking not about 2017, however, but instead of 1995 and the origins of The Apache Software Foundation. And what a long, strange, and great trip it's been. And how incredibly lucky I've been to be a part of it.

A common saying is that success is mostly about being there at the right place at the right time, and although I'm not sure about the "success" part, it certainly applies to me. At the time I was working at NASA and was starting off a side business as an ISP and Web Hoster, and using the old NCSA web-server. I had created a small reputation for myself as an "expert" on a flavor of UNIX called A/UX, which was Apple's UNIX offering at the time. In addition to being the editor of the FAQ for A/UX, I also ported a bunch of "free software" to that platform and that's how I got started with Apache, providing patches to support A/UX, which is what I used as my web hosting platform. It was really no different than what I did for other software projects at the time.

And then something wonderful happened. I got hooked.

I really, really enjoyed the people I was collaborating with. I wasn't an "outsider" providing patches, I was part of the inner circle. I was a full fledged member of the Apache Group. I started to really understand just how all this really could change the world, and how I could maybe be a small part of it.

As a result, Apache changed my life, literally. Instead of doing software development as a way of "getting my job done" (at NASA, I was a power system engineer, and so I would code modeling and simulation software for spacecraft solar arrays, batteries and orbital mechanics), I starting doing software development as my job, in addition to my hobby. Apache and Open Source became a huge part of my life, and my career changed to focus on Open Source almost primarily, a change that continues to this day.

During this time I've been fortunate enough to work with, and learn from, extremely talented people. Not only related to code, but legal matters, inter-personal skills, presentation skills, etc. I've had opportunities that I never imagined and met people I never would have had expected otherwise. I'm made great friends. I've been mentored by incredibly giving people and have mentored in return. And have seen my mentees become mentors themselves.

Over the years, I've seen Apache grow from a rag-tagged group of people working on a web server to one of the leading Open Source foundations in the world with more than 300 projects under our belt. I've been blessed to serve on the board of the ASF for every single year since we incorporated in 1999, seeing 2nd and now 3rd "generation" Apache Members take on the reins.

The Open Source movement, and especially Apache, have given more to me than I could ever pay back, and that is why I still volunteer and contribute. Of course, to be honest, I still get a kick out of it, and love what I am doing, and continue to enjoy the opportunities and, especially, the people that I get to work with.

But, you see, I'm nothing special. All this is also open and available to you. You too can change the world, and have your world changed in return. We all have talents that can be shared, talents that can be recognized and rewarded. Apache is a family, always looking for new family members. 

So take that first step. Find a project and community you want to a part of. Jump in. Have fun. Grow. Learn. Teach. Live.

But just be prepared to get hooked, and have your life change.


Jim Jagielski is a well known and acknowledged expert and visionary in Open Source, an accomplished coder, and frequent engaging presenter on all things Open, Web and Cloud related. As a developer, he’s made substantial code contributions to just about every core technology behind the Internet and Web and in 2012 was awarded the O’Reilly Open Source Award and in 2015 received the Innovation Luminary Award from the EU. He is likely best known as one of the developers and co-founders of the Apache Software Foundation, where he has previously served as both Chairman and President and where he’s been on the Board Of Directors since day one. Currently he is Vice-Chairman. He's served as President of the Outercurve Foundation and was also a director of the Open Source Initiative (OSI). Up until recently, he worked at Capital One as a Sr. Director in the Tech Fellows program. He credits his wife Eileen in keeping him sane. 

= = =

"Success at Apache" is a monthly blog series that focuses on the processes behind why the ASF "just works". 1) Project Independence https://s.apache.org/CE0V 2) All Carrot and No Stick https://s.apache.org/ykoG 3) Asynchronous Decision Making https://s.apache.org/PMvk 4) Rule of the Makers https://s.apache.org/yFgQ 5) JFDI --the unconditional love of contributors https://s.apache.org/4pjM 6) Meritocracy and Me https://s.apache.org/tQQh 7) Learning to Build a Stronger Community https://s.apache.org/x9Be 8) Meritocracy. https://s.apache.org/DiEo 9) Lowering Barriers to Open Innovation https://s.apache.org/dAlg 10) Scratch your own itch. https://s.apache.org/Apah 11) What a Long Strange (and Great) Trip It's Been https://s.apache.org/gVuN

# # # 

Wednesday October 25, 2017

Success at Apache: Scratch Your Own Itch.

By Ignasi Barrera

Recently I was at an industry conference and was happy to see many people stopping by the Apache booth. I was pleased that they were familiar with the Apache brand, yet puzzled to learn that so many were unfamiliar with The Apache Software Foundation (ASF).

It's important to recognize not just Apache's diverse projects and communities, but also the entity behind their success.

Gone are the days when software, and technology in general, was developed privately for the benefit of the few. As technology evolves, the challenges we face become more complex, and the only way to effectively move forward to create the technology of the future is to collaborate and work together. Open Source is a perfect framework for that, and organizations like the ASF carry out a decisive role in protecting its spirit and principles.

The ASF's mission is to provide software for the public good. We take it one step further, by giving all our Open Source software away for free. According to this mission, the foundation was established back in 1999 as a US 501(c)(3) non-profit charitable organization, and constitutes an independent legal entity to which companies and individuals can donate resources and be assured that those resources will be used for the public benefit. Its all-volunteer nature, along with the meritocracy model followed by its communities, are the pillars of the neutral, trusted space where Apache software is developed.

We strongly believe that good software is built by strong communities. Successful Open Source projects are the result of the work and collaboration in their communities and the people behind them. It is all about the people. Experience has shown us that helping people work together as peers is key in producing software in a sustainable way, and we have collected the lessons learned all these years in what we call "The Apache Way".

This Apache Way is a set of core behaviors all Apache projects follow that are designed to ensure projects are independent and diverse, and that anyone can participate no matter what gender, culture, time zone, employer, or even expertise they have. One can start collaborating with a project by contributing patches or implementing new features, but merit is not only measured by code contributions. Helping users, improving documentation, promoting the project, and other non-coding activities are very valuable and recognized as such, and the recognition of this merit and implication is expressed by granting more privileges in the project: from commit access, to invitations to join the Project Management Committee, to invitations to join the ASF Membership. One of the great differentiators between the ASF and other open source foundations is that the ASF does not dictate the technical direction of its projects: each Apache project is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides their respective project's day-to-day operations, including community development and product releases. Meritocracy drives the growth of the communities, and ensures anyone can contribute to projects that are ruled by the people who is involved and really cares about them.

Learning to work this way is not always easy, though. Projects come to the Foundation from very different backgrounds and whilst some of them already have communities that are used to collaborate in open ways, others find it challenging to embrace these core behaviors. The Apache Incubator is the main entry point for codebases and their communities wishing to officially become part of the Foundation, and is where they learn how to put all these principles in practice. Some will find this way of working a good way to rule a project and will graduate as an Apache top-level project, some may find that the Foundation is not the best option for them and choose to leave. Both options are good outcomes, as projects will have invested time in thinking about their community model and how they want governance to be, and this always benefits the Open Source world.

This Open Source model not only exists to create sustainable Open Source projects, but also to meet the expectations of the rest of the world. Software developed at Apache comes with a set of guarantees granted by the popular and business-friendly Apache License, but also with others that are the product of this open governance model, such as project independence or a well-defined project lifecycle. The ASF not only defines how projects operate while active, but also what happens when a project reaches its end-of-life, which is also important for adoption but often not considered by Open Source projects.

These guarantees, along with the reputation earned by many years of producing high-quality open source software, make the +300 freely available Apache projects, from Abdera to HTTP Server to Hadoop to Zookeeper, a trusted choice for individuals and companies looking for Open Source solutions.

The saying "Scratch Your Own Itch" is popular in the tech space, and is an integral principle at the ASF. Apache Committers have a responsibility to the community to help create a product that will outlive the interest of any particular volunteer, as well as for helping to grow and maintain the health of the Apache community.

As an ASF Member, I'm helping with project outreach and mentoring new individuals that make up the greater Apache community.

The Apache Software Foundation provides a safe place for Open Source development, and will keep evolving as technology evolves, welcoming all kinds of projects and communities, and helping people embrace Open Source. Let's see what the future holds for the Open Source world and how we can contribute to making it a better place. Scratch your own itch.


Ignasi Barrera is a long-term Open Source contributor and became involved with the ASF in 2013, when jclouds was first submitted to the Apache Incubator. He is a member of the Apache jclouds Project Management Committee and still actively contributes to the project. Ignasi became an ASF Member in 2015, and helps with community development activities and the promotion of Open Source. 

= = =

"Success at Apache" is a monthly blog series that focuses on the processes behind why the ASF "just works". 1) Project Independence https://s.apache.org/CE0V 2) All Carrot and No Stick https://s.apache.org/ykoG 3) Asynchronous Decision Making https://s.apache.org/PMvk 4) Rule of the Makers https://s.apache.org/yFgQ 5) JFDI --the unconditional love of contributors https://s.apache.org/4pjM 6) Meritocracy and Me https://s.apache.org/tQQh 7) Learning to Build a Stronger Community https://s.apache.org/x9Be 8) Meritocracy. https://s.apache.org/DiEo 9) Lowering Barriers to Open Innovation https://s.apache.org/dAlg 10) Scratch your own itch. https://s.apache.org/Apah

# # #

Monday October 02, 2017

Success at Apache: All My Roads Led to Apache

by Pat Ferrel

I became involved with Apache in 2011. After several years in startups where, as CTO, I felt too removed from building things. Looking for a change, I was keenly aware that the most interesting thing about the startups was our early use of Machine Learning techniques and I wanted to see if building ML solutions, for companies new to the field might not be more satisfying. I started by spending nearly a year in researching the type of applications we had needed in the startups: Natural Language Processing (NLP), text analysis, clustering, and classification. In those days Apache Mahout http://mahout.apache.org/ had several good solutions that were designed for Big Data and approachable by an individual. These ideas seem fairly commonplace now but were in early days only 6 years ago.

Given a great platform to experiment with, I built a web site to advertise expertise in ML but also to showcase many examples from my experiments, including a topic-oriented content site based on clustered and classified text that used NLP to add entities to text. I blogged about things I had learned and techniques that produce results.

Then I got the first contact about a project and it was from a completely unexpected direction: recommenders. Fortunately Apache Mahout then had the state-of-the-art OSS suite of recommenders so I took the consulting job. The company had rolled their own recommender and was selling it as a service but it was old and they wanted to investigate replacing it. 

Welcome to Big Data

The nature of recommenders means you deal with huge amounts of data because you have to track several million people’s actions over years. We had data from a large online retailer and were tasked with using this data to beat the in-house recommender. Specifically they wanted to see if they could improve performance (better results and faster compute times) and get something easier to maintain. 

The first job of a good consultant is to define the problem and outline a path to resolution that fits with the company’s competencies. To me this meant looking at the current system and the expertise of the people working on it. We had Data Scientists and Java Software Developers who knew what it was like to deal with Big Data. They had a highly performant method for gathering data and were quite good at running Apache Hadoop-based analytics. This was seldom the case back then but happily allowed me to look at less turnkey applications and assume the use of important Apache tools.

We agreed on a plan and the basic building blocks including a method for comparing results. I did the research and proposed several candidates for the tests including the Apache Mahout recommenders. It was pretty easy to rank the recommender engines we had and do some exploration of parameter tuning and choices to get our best "challenger" results. The nice thing is that we beat the old threadbare in-house recommender by a significant amount (12%). The winner was the Apache Mahout Cooccurrence Recommender using the Log-Likelihood Ratio as the core cooccurrence metric. This even though we had tested against several Matrix Factorization recommenders, including Mahout's. 

We need something new 

Up till this time I was only a user of Apache projects (discounting a few minor code contributions) but what I found in all recommenders we studied is a fundamental problem that is still mostly unsolved today. We had data from a retailer that included user "buys" but also 100 times more user "views". None of the recommenders could deal with this multimodal data. I consulted the authors and maintainers of the Mahout recommenders and several others we had targeted. We got some suggestions added them to our own ideas and set out to test them. For various reasons, that are beyond the scope of this post, none of the easy solutions helped and actually produced worse results so I had fulfilled the contract and left with a feeling of unfinished business.

One of the mentors of Apache Mahout, Ted Dunning, had suggested a new idea during this time. There was something about it that seemed very intriguing. He had proposed a way to use one type of user behavior to predict another. This was an aha moment for me because it codified intuition. I remember the first time he wrote in email on the Mahout user mailing list the equation that crystallized it all. I began to imagine the implications; all sorts of new data that could be useful, not just "views" but contextual data like location, and enrichment data like tag or category preferences. These all seem to obviously have a bearing on recommendations but now we had a beautiful simple equation to test the intuition.

Becoming a Committer

I set out to hack the Mahout Cooccurrence Recommender to become a Correlated Cross-Occurrence (CCO) recommender. But without some way of testing the algorithm and code we couldn’t be sure it was worth including in Mahout. The datasets publicly available at the time did not have the kind of data we needed (there had been no direct use for it until then) so I scraped the film review site rottentomatoes.com to collect "fresh" and "rotten" reviews of movies. This gave us two different behaviors with very different meanings. Naively you might think, weight one positive and the other negative and so did I but that produced worse results than ignoring the "dislikes". However when I ran cross-validation tests comparing the Mahout Cooccurrence Recommender using likes only, to CCO using both user actions, we got some quite interesting results. The question was: do "dislikes" predict "likes" and when I got 20% lift in predictive precision we could conclude that they do. Not only was intuition right but the new algorithm could tease out the data to make use of it.

The hack was accepted into Mahout Examples and I was invited to become a committer. Then the world changed.

Apache Spark and Mahout-Samsara

When I became a committer Mahout was written on Apache Hadoop MapReduce in Java (as was my hack). But it had also become obvious to most Mahout committers that the future was with much more performant engines like Apache Spark. Committers Dmitriy Lyubimov and Sebastian Schelter had been working on a Spark version of Mahout. In an instant of project time virtually all committers saw this as the future of Mahout, if also a major pivot. 

In retrospect I'm not sure I've ever seen an Apache project change so much in so little time. Today Mahout is deprecating lots of old Hadoop MapReduce code as it falls from use and the new Mahout is truly new. The Mahout subtitle Samsara, references the cycle of life, death, and rebirth in the Hindu tradition. Mahout started as algorithms written specifically for MapReduce, now Mahout-Samsara is a linear algebra DSL in Scala used to roll-your-own algorithms but with most interesting algorithms in very simple DSL-based implementations. Mahout eventually took this transformation even further to include other compute engines like Apache Flink and is now running on GPUs. But I get ahead of things...

Those were exciting times and though I helped with the DSL I remained fixed on implementing CCO, which was first included in Mahout 0.10.0 in October 2014.

PredictionIO

Now we have the CCO algorithm implemented on modern compute engines but several other problems remained in order to actually deploy a recommender. This is because CCO creates a model that needs to be deployed on a special type of server that computes similarity in real time. In Machine Learning terms this is a K-Nearest Neighbors engine, known in concrete terms as Lucene, or it's scalable server derivatives like Solr and Elasticsearch. A turnkey recommender also requires a highly performant massively scalable DB, like HBase. Putting these together we could get a nearly turnkey recommendation server that made use of multimodal real time user behavior. But I didn't see a candidate for all these in Apache and so looked elsewhere. This required an integration project, not Mahout, which integrated with other services but provided none of its own.

I found a project that included everything I needed and was Apache licensed but was run by a small startup called PredictionIO. They had a Machine Learning Server that was a framework for Templates that could implement a wide range of Algorithms. The Server also included nice high-level integrations with Elasticsearch (Lucene server), Spark, and HBase. In May of 2015 I had the first running CCO Server build on Mahout and a whole list of other Apache projects.

Back to Apache

PredictionIO was at the right place to get swept up in a major move to embrace ML/AI by Salesforce Inc. who bought them as part of the Einstein initiative. Since PIO was Apache licensed OSS it was still available and so was the Template I was calling the Universal Recommender. But there was a question now about the future of PIO; what would Salesforce do with it? The old team, that I had worked closely with, wanted to see the project move forward in OSS and Salesforce seemed to agree, but large corporations often have a mixed record in promoting their own OSS projects. In this case Salesforce decided to remove the question by submitting PredictionIO to the Apache Incubator.

The old team was joined by people like me from outside Salesforce to create a project that follows the Apache Way and is free of corporate dominance. I am a committer to PredictionIO, which has three releases under Apache Incubator vigilance and the Universal Recommender is now at v0.6.0, the most popular of PredictionIO Template Algorithms.

With the 3rd release of PIO from Apache we are now in the process of graduation to an Apache Top-Level Project, hatched by the Apache Incubator. I fully expect that we'll be celebrating soon.

Postscript

My journey began with a specific problem to solve. Each step to produce the solution has led back to Apache in one way or another, through mentors, collaboration, use of, and commitment to several projects. But I now have my mature scalable, performant, state-of-the-art nearly turnkey Universal Recommender.  Now we can ingest and get improvements from many types of behavior, enrichment data, and context--using it in real time to serve recommendations subject to robust business rules. My small consulting company ActionML actionml.com now has a powerful tool to solve real problems and we make a living (at least partly) by helping people deploy and tune it for their data.

This is a story of someone single mindedly following a goal over several years. There are many ways to do this in the Software Development world, but not all OSS projects are open to bringing people in. The Apache Software Foundation most certainly is and openly recruits as diverse a group of committers and members as possible. If you want to make a difference and influence the course of an OSS project Apache is a good place to look. Start by getting involved with a project of interest, make contributions, get involved in discussions. If the match is good you'll be invited in as a committer and move on from there. I think of Apache as a do-ocracy, if you do something of value it goes a long way towards being invited in.  

References

Slides describing the CCO Algorithm: https://www.slideshare.net/pferrel/unified-recommender-39986309

IBM DevWorks Post on "Making one thing Predict Another": https://developer.ibm.com/dwblog/2017/mahout-spark-correlated-cross-occurences/

Apache Mahout CCO Implementation: http://mahout.apache.org/users/algorithms/intro-cooccurrence-spark.html

Apache PredictionIO: http://predictionio.incubator.apache.org/

The Universal Recommender Template: http://predictionio.incubator.apache.org/gallery/template-gallery/

Professional Support for the Universal Recommender: http://actionml.com/universal-recommender

# # #

"Success at Apache" focuses on the processes behind why the ASF "just works". 1) Project Independence https://s.apache.org/CE0V 2) All Carrot and No Stick https://s.apache.org/ykoG 3) Asynchronous Decision Making https://s.apache.org/PMvk 4) Rule of the Makers https://s.apache.org/yFgQ 5) JFDI --the unconditional love of contributors https://s.apache.org/4pjM 6) Meritocracy and Me https://s.apache.org/tQQh 7) Learning to Build a Stronger Community https://s.apache.org/x9Be 8) Meritocracy. https://s.apache.org/DiEo 9) Lowering Barriers to Open Innovation https://s.apache.org/dAlg

Calendar

Search

Hot Blogs (today's hits)

Tag Cloud

Categories

Feeds

Links

Navigation