Entries tagged [focuses]

Monday February 05, 2018

Success at Apache: A Newbie’s Narrative

by Kuhu Shukla

As I sit at my desk on a rather frosty morning with my coffee, looking up new JIRAs from the previous day in the Apache Tez project, I feel rather pleased. The latest community release vote is complete, the bug fixes that we so badly needed are in and the new release that we tested out internally on our many thousand strong cluster is looking good. Today I am looking at a new stack trace from a different Apache project process and it is hard to miss how much of the exceptional code I get to look at every day comes from people all around the globe. A contributor leaves a JIRA comment before he goes on to pick up his kid from soccer practice while someone else wakes up to find that her effort on a bug fix for the past two months has finally come to fruition through a binding +1.

Yahoo – which joined AOL, HuffPost, Tumblr, Engadget, and many more brands to form the Verizon subsidiary Oath last year – has been at the frontier of open source adoption and contribution since before I was in high school. So while I have no historical trajectories to share, I do have a story on how I found myself in an epic journey of migrating all of Yahoo jobs from Apache MapReduce to Apache Tez, a then new DAG based execution engine.

Oath grid infrastructure is through and through driven by Apache technologies be it storage through HDFS, resource management through YARN, job execution frameworks with Tez and user interface engines such as Hive, Hue, Pig, Sqoop, Spark, Storm. Our grid solution is specifically tailored to Oath's business-critical data pipeline needs using the polymorphic technologies hosted, developed and maintained by the Apache community.

On the third day of my job at Yahoo in 2015, I received a YouTube link on An Introduction to Apache Tez. I watched it carefully trying to keep up with all the questions I had and recognized a few names from my academic readings of Yarn ACM papers. I continued to ramp up on YARN and HDFS, the foundational Apache technologies Oath heavily contributes to even today. For the first few weeks I spent time picking out my favorite (necessary) mailing lists to subscribe to and getting started on setting up on a pseudo-distributed Hadoop cluster. I continued to find my footing with newbie contributions and being ever more careful with whitespaces in my patches. One thing was clear – Tez was the next big thing for us. By the time I could truly call myself a contributor in the Hadoop community nearly 80-90% of the Yahoo jobs were now running with Tez. But just like hiking up the Grand Canyon, the last 20% is where all the pain was. Being a part of the solution to this challenge was a happy prospect and thankfully contributing to Tez became a goal in my next quarter.

The next sprint planning meeting ended with me getting my first major Tez assignment – progress reporting. The progress reporting in Tez was non-existent – "Just needs an API fix,"  I thought. Like almost all bugs in this ecosystem, it was not easy. How do you define progress? How is it different for different kinds of outputs in a graph? The questions were many.

I, however, did not have to go far to get answers. The Tez community actively came to a newbie's rescue, finding answers and posing important questions. I started attending the bi-weekly Tez community sync up calls and asking existing contributors and committers for course correction. Suddenly the team was much bigger, the goals much more chiseled. This was new to anyone like me who came from the networking industry, where the most open part of the code are the RFCs and the implementation details are often hidden. These meetings served as a clean room for our coding ideas and experiments. Ideas were shared, to the extent of which data structure we should pick and what a future user of Tez would take from it. In between the usual status updates and extensive knowledge transfers were made. 

Oath uses Apache Pig and Apache Hive extensively and most of the urgent requirements and requests came from Pig and Hive developers and users. Each issue led to a community JIRA and as we started running Tez at Oath scale, new feature ideas and bugs around performance and resource utilization materialized. Every year most of the Hadoop team at Oath travels to the Hadoop Summit where we meet our cohorts from the Apache community and we stand for hours discussing the state of the art and what is next for the project. One such discussion set the course for the next year and a half for me.

We needed an innovative way to shuffle data. Frameworks like MapReduce and Tez have a shuffle phase in their processing life cycle wherein the data from upstream producers is made available to downstream consumers. Even though Apache Tez was designed with a feature set corresponding to optimization requirements in Pig and Hive, the Shuffle Handler Service was retrofitted from MapReduce at the time of the project's inception. With several thousands of jobs on our clusters leveraging these features in Tez, the Shuffle Handler Service became a clear performance bottleneck. So as we stood talking about our experience with Tez with our friends from the community, we decided to implement a new Shuffle Handler for Tez. All the conversation points were tracked now through an umbrella JIRA TEZ-3334 and the to-do list was long. I picked a few JIRAs and as I started reading through I realized, this is all new code I get to contribute to and review. There might be a better way to put this, but to be honest it was just a lot of fun! All the white boards were full, the team took walks post lunch and discussed how to go about defining the API. Countless hours were spent debugging hangs while fetching data and looking at stack traces and Wireshark captures from our test runs. Six months in and we had the feature on our sandbox clusters. There were moments ranging from sheer frustration to absolute exhilaration with high fives as we continued to address review comments and fixing big and small issues with this evolving feature.

As much as owning your code is valued everywhere in the software community, I would never go on to say “I did this!” In fact, “we did!” It is this strong sense of shared ownership and fluid team structure that makes the open source experience at Apache truly rewarding. This is just one example. A lot of the work that was done in Tez was leveraged by the Hive and Pig community and cross Apache product community interaction made the work ever more interesting and challenging. Triaging and fixing issues with the Tez rollout led us to hit a 100% migration score last year and we also rolled the Tez Shuffle Handler Service out to our research clusters. As of last year we have run around 100 million Tez DAGs with a total of 50 billion tasks over almost 38,000 nodes.

In 2018 as I move on to explore Hadoop 3.0 as our future release, I hope that if someone outside the Apache community is reading this, it will inspire and intrigue them to contribute to a project of their choice. As an astronomy aficionado, going from a newbie Apache contributor to a newbie Apache committer was very much like looking through my telescope - it has endless possibilities and challenges you to be your best.

Kuhu Shukla is a software engineer at Oath and did her Masters in Computer Science at North Carolina State University. She works on the Big Data Platforms team on Apache Tez, YARN and HDFS with a lot of talented Apache PMCs and Committers in Champaign, Illinois. A recent Apache Tez Committer herself she continues to contribute to YARN and HDFS and spoke at the 2017 Dataworks Hadoop Summit on "Tez Shuffle Handler : Shuffling At Scale With Apache Hadoop". Prior to that she worked on Juniper Networks' router and switch configuration APIs. She likes to participate in open source conferences and women in tech events. In her spare time she loves singing Indian classical and jazz, laughing, whale watching, hiking and peering through her Dobsonian telescope.

= = =

"Success at Apache" is a monthly blog series that focuses on the processes behind why the ASF "just works". 1) Project Independence https://s.apache.org/CE0V 2) All Carrot and No Stick https://s.apache.org/ykoG 3) Asynchronous Decision Making https://s.apache.org/PMvk 4) Rule of the Makers https://s.apache.org/yFgQ 5) JFDI --the unconditional love of contributors https://s.apache.org/4pjM 6) Meritocracy and Me https://s.apache.org/tQQh 7) Learning to Build a Stronger Community https://s.apache.org/x9Be 8) Meritocracy. https://s.apache.org/DiEo 9) Lowering Barriers to Open Innovation https://s.apache.org/dAlg 10) Scratch your own itch. https://s.apache.org/Apah 11) What a Long Strange (and Great) Trip It's Been https://s.apache.org/gVuN 12) A Newbie's Narrative https://s.apache.org/A72H

# # #  

Tuesday December 12, 2017

Success at Apache: What a Long Strange (and Great) Trip It's Been

By Jim Jagielski

It is normally during this time of year that people get awful retrospective. We look over the last 12 months and come to terms with what kind of year it has been. We congratulate ourselves on the good and (hopefully) learn from the bad. We basically assess the ending year and start planning, even a little bit, on the one to come.

In general, we reminisce.

I am thinking not about 2017, however, but instead of 1995 and the origins of The Apache Software Foundation. And what a long, strange, and great trip it's been. And how incredibly lucky I've been to be a part of it.

A common saying is that success is mostly about being there at the right place at the right time, and although I'm not sure about the "success" part, it certainly applies to me. At the time I was working at NASA and was starting off a side business as an ISP and Web Hoster, and using the old NCSA web-server. I had created a small reputation for myself as an "expert" on a flavor of UNIX called A/UX, which was Apple's UNIX offering at the time. In addition to being the editor of the FAQ for A/UX, I also ported a bunch of "free software" to that platform and that's how I got started with Apache, providing patches to support A/UX, which is what I used as my web hosting platform. It was really no different than what I did for other software projects at the time.

And then something wonderful happened. I got hooked.

I really, really enjoyed the people I was collaborating with. I wasn't an "outsider" providing patches, I was part of the inner circle. I was a full fledged member of the Apache Group. I started to really understand just how all this really could change the world, and how I could maybe be a small part of it.

As a result, Apache changed my life, literally. Instead of doing software development as a way of "getting my job done" (at NASA, I was a power system engineer, and so I would code modeling and simulation software for spacecraft solar arrays, batteries and orbital mechanics), I starting doing software development as my job, in addition to my hobby. Apache and Open Source became a huge part of my life, and my career changed to focus on Open Source almost primarily, a change that continues to this day.

During this time I've been fortunate enough to work with, and learn from, extremely talented people. Not only related to code, but legal matters, inter-personal skills, presentation skills, etc. I've had opportunities that I never imagined and met people I never would have had expected otherwise. I'm made great friends. I've been mentored by incredibly giving people and have mentored in return. And have seen my mentees become mentors themselves.

Over the years, I've seen Apache grow from a rag-tagged group of people working on a web server to one of the leading Open Source foundations in the world with more than 300 projects under our belt. I've been blessed to serve on the board of the ASF for every single year since we incorporated in 1999, seeing 2nd and now 3rd "generation" Apache Members take on the reins.

The Open Source movement, and especially Apache, have given more to me than I could ever pay back, and that is why I still volunteer and contribute. Of course, to be honest, I still get a kick out of it, and love what I am doing, and continue to enjoy the opportunities and, especially, the people that I get to work with.

But, you see, I'm nothing special. All this is also open and available to you. You too can change the world, and have your world changed in return. We all have talents that can be shared, talents that can be recognized and rewarded. Apache is a family, always looking for new family members. 

So take that first step. Find a project and community you want to a part of. Jump in. Have fun. Grow. Learn. Teach. Live.

But just be prepared to get hooked, and have your life change.

Jim Jagielski is a well known and acknowledged expert and visionary in Open Source, an accomplished coder, and frequent engaging presenter on all things Open, Web and Cloud related. As a developer, he’s made substantial code contributions to just about every core technology behind the Internet and Web and in 2012 was awarded the O’Reilly Open Source Award and in 2015 received the Innovation Luminary Award from the EU. He is likely best known as one of the developers and co-founders of the Apache Software Foundation, where he has previously served as both Chairman and President and where he’s been on the Board Of Directors since day one. Currently he is Vice-Chairman. He's served as President of the Outercurve Foundation and was also a director of the Open Source Initiative (OSI). Up until recently, he worked at Capital One as a Sr. Director in the Tech Fellows program. He credits his wife Eileen in keeping him sane. 

= = =

"Success at Apache" is a monthly blog series that focuses on the processes behind why the ASF "just works". 1) Project Independence https://s.apache.org/CE0V 2) All Carrot and No Stick https://s.apache.org/ykoG 3) Asynchronous Decision Making https://s.apache.org/PMvk 4) Rule of the Makers https://s.apache.org/yFgQ 5) JFDI --the unconditional love of contributors https://s.apache.org/4pjM 6) Meritocracy and Me https://s.apache.org/tQQh 7) Learning to Build a Stronger Community https://s.apache.org/x9Be 8) Meritocracy. https://s.apache.org/DiEo 9) Lowering Barriers to Open Innovation https://s.apache.org/dAlg 10) Scratch your own itch. https://s.apache.org/Apah 11) What a Long Strange (and Great) Trip It's Been https://s.apache.org/gVuN

# # # 

Wednesday October 25, 2017

Success at Apache: Scratch Your Own Itch.

By Ignasi Barrera

Recently I was at an industry conference and was happy to see many people stopping by the Apache booth. I was pleased that they were familiar with the Apache brand, yet puzzled to learn that so many were unfamiliar with The Apache Software Foundation (ASF).

It's important to recognize not just Apache's diverse projects and communities, but also the entity behind their success.

Gone are the days when software, and technology in general, was developed privately for the benefit of the few. As technology evolves, the challenges we face become more complex, and the only way to effectively move forward to create the technology of the future is to collaborate and work together. Open Source is a perfect framework for that, and organizations like the ASF carry out a decisive role in protecting its spirit and principles.

The ASF's mission is to provide software for the public good. We take it one step further, by giving all our Open Source software away for free. According to this mission, the foundation was established back in 1999 as a US 501(c)(3) non-profit charitable organization, and constitutes an independent legal entity to which companies and individuals can donate resources and be assured that those resources will be used for the public benefit. Its all-volunteer nature, along with the meritocracy model followed by its communities, are the pillars of the neutral, trusted space where Apache software is developed.

We strongly believe that good software is built by strong communities. Successful Open Source projects are the result of the work and collaboration in their communities and the people behind them. It is all about the people. Experience has shown us that helping people work together as peers is key in producing software in a sustainable way, and we have collected the lessons learned all these years in what we call "The Apache Way".

This Apache Way is a set of core behaviors all Apache projects follow that are designed to ensure projects are independent and diverse, and that anyone can participate no matter what gender, culture, time zone, employer, or even expertise they have. One can start collaborating with a project by contributing patches or implementing new features, but merit is not only measured by code contributions. Helping users, improving documentation, promoting the project, and other non-coding activities are very valuable and recognized as such, and the recognition of this merit and implication is expressed by granting more privileges in the project: from commit access, to invitations to join the Project Management Committee, to invitations to join the ASF Membership. One of the great differentiators between the ASF and other open source foundations is that the ASF does not dictate the technical direction of its projects: each Apache project is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides their respective project's day-to-day operations, including community development and product releases. Meritocracy drives the growth of the communities, and ensures anyone can contribute to projects that are ruled by the people who is involved and really cares about them.

Learning to work this way is not always easy, though. Projects come to the Foundation from very different backgrounds and whilst some of them already have communities that are used to collaborate in open ways, others find it challenging to embrace these core behaviors. The Apache Incubator is the main entry point for codebases and their communities wishing to officially become part of the Foundation, and is where they learn how to put all these principles in practice. Some will find this way of working a good way to rule a project and will graduate as an Apache top-level project, some may find that the Foundation is not the best option for them and choose to leave. Both options are good outcomes, as projects will have invested time in thinking about their community model and how they want governance to be, and this always benefits the Open Source world.

This Open Source model not only exists to create sustainable Open Source projects, but also to meet the expectations of the rest of the world. Software developed at Apache comes with a set of guarantees granted by the popular and business-friendly Apache License, but also with others that are the product of this open governance model, such as project independence or a well-defined project lifecycle. The ASF not only defines how projects operate while active, but also what happens when a project reaches its end-of-life, which is also important for adoption but often not considered by Open Source projects.

These guarantees, along with the reputation earned by many years of producing high-quality open source software, make the +300 freely available Apache projects, from Abdera to HTTP Server to Hadoop to Zookeeper, a trusted choice for individuals and companies looking for Open Source solutions.

The saying "Scratch Your Own Itch" is popular in the tech space, and is an integral principle at the ASF. Apache Committers have a responsibility to the community to help create a product that will outlive the interest of any particular volunteer, as well as for helping to grow and maintain the health of the Apache community.

As an ASF Member, I'm helping with project outreach and mentoring new individuals that make up the greater Apache community.

The Apache Software Foundation provides a safe place for Open Source development, and will keep evolving as technology evolves, welcoming all kinds of projects and communities, and helping people embrace Open Source. Let's see what the future holds for the Open Source world and how we can contribute to making it a better place. Scratch your own itch.

Ignasi Barrera is a long-term Open Source contributor and became involved with the ASF in 2013, when jclouds was first submitted to the Apache Incubator. He is a member of the Apache jclouds Project Management Committee and still actively contributes to the project. Ignasi became an ASF Member in 2015, and helps with community development activities and the promotion of Open Source. 

= = =

"Success at Apache" is a monthly blog series that focuses on the processes behind why the ASF "just works". 1) Project Independence https://s.apache.org/CE0V 2) All Carrot and No Stick https://s.apache.org/ykoG 3) Asynchronous Decision Making https://s.apache.org/PMvk 4) Rule of the Makers https://s.apache.org/yFgQ 5) JFDI --the unconditional love of contributors https://s.apache.org/4pjM 6) Meritocracy and Me https://s.apache.org/tQQh 7) Learning to Build a Stronger Community https://s.apache.org/x9Be 8) Meritocracy. https://s.apache.org/DiEo 9) Lowering Barriers to Open Innovation https://s.apache.org/dAlg 10) Scratch your own itch. https://s.apache.org/Apah

# # #

Monday October 02, 2017

Success at Apache: All My Roads Led to Apache

by Pat Ferrel

I became involved with Apache in 2011. After several years in startups where, as CTO, I felt too removed from building things. Looking for a change, I was keenly aware that the most interesting thing about the startups was our early use of Machine Learning techniques and I wanted to see if building ML solutions, for companies new to the field might not be more satisfying. I started by spending nearly a year in researching the type of applications we had needed in the startups: Natural Language Processing (NLP), text analysis, clustering, and classification. In those days Apache Mahout http://mahout.apache.org/ had several good solutions that were designed for Big Data and approachable by an individual. These ideas seem fairly commonplace now but were in early days only 6 years ago.

Given a great platform to experiment with, I built a web site to advertise expertise in ML but also to showcase many examples from my experiments, including a topic-oriented content site based on clustered and classified text that used NLP to add entities to text. I blogged about things I had learned and techniques that produce results.

Then I got the first contact about a project and it was from a completely unexpected direction: recommenders. Fortunately Apache Mahout then had the state-of-the-art OSS suite of recommenders so I took the consulting job. The company had rolled their own recommender and was selling it as a service but it was old and they wanted to investigate replacing it. 

Welcome to Big Data

The nature of recommenders means you deal with huge amounts of data because you have to track several million people’s actions over years. We had data from a large online retailer and were tasked with using this data to beat the in-house recommender. Specifically they wanted to see if they could improve performance (better results and faster compute times) and get something easier to maintain. 

The first job of a good consultant is to define the problem and outline a path to resolution that fits with the company’s competencies. To me this meant looking at the current system and the expertise of the people working on it. We had Data Scientists and Java Software Developers who knew what it was like to deal with Big Data. They had a highly performant method for gathering data and were quite good at running Apache Hadoop-based analytics. This was seldom the case back then but happily allowed me to look at less turnkey applications and assume the use of important Apache tools.

We agreed on a plan and the basic building blocks including a method for comparing results. I did the research and proposed several candidates for the tests including the Apache Mahout recommenders. It was pretty easy to rank the recommender engines we had and do some exploration of parameter tuning and choices to get our best "challenger" results. The nice thing is that we beat the old threadbare in-house recommender by a significant amount (12%). The winner was the Apache Mahout Cooccurrence Recommender using the Log-Likelihood Ratio as the core cooccurrence metric. This even though we had tested against several Matrix Factorization recommenders, including Mahout's. 

We need something new 

Up till this time I was only a user of Apache projects (discounting a few minor code contributions) but what I found in all recommenders we studied is a fundamental problem that is still mostly unsolved today. We had data from a retailer that included user "buys" but also 100 times more user "views". None of the recommenders could deal with this multimodal data. I consulted the authors and maintainers of the Mahout recommenders and several others we had targeted. We got some suggestions added them to our own ideas and set out to test them. For various reasons, that are beyond the scope of this post, none of the easy solutions helped and actually produced worse results so I had fulfilled the contract and left with a feeling of unfinished business.

One of the mentors of Apache Mahout, Ted Dunning, had suggested a new idea during this time. There was something about it that seemed very intriguing. He had proposed a way to use one type of user behavior to predict another. This was an aha moment for me because it codified intuition. I remember the first time he wrote in email on the Mahout user mailing list the equation that crystallized it all. I began to imagine the implications; all sorts of new data that could be useful, not just "views" but contextual data like location, and enrichment data like tag or category preferences. These all seem to obviously have a bearing on recommendations but now we had a beautiful simple equation to test the intuition.

Becoming a Committer

I set out to hack the Mahout Cooccurrence Recommender to become a Correlated Cross-Occurrence (CCO) recommender. But without some way of testing the algorithm and code we couldn’t be sure it was worth including in Mahout. The datasets publicly available at the time did not have the kind of data we needed (there had been no direct use for it until then) so I scraped the film review site rottentomatoes.com to collect "fresh" and "rotten" reviews of movies. This gave us two different behaviors with very different meanings. Naively you might think, weight one positive and the other negative and so did I but that produced worse results than ignoring the "dislikes". However when I ran cross-validation tests comparing the Mahout Cooccurrence Recommender using likes only, to CCO using both user actions, we got some quite interesting results. The question was: do "dislikes" predict "likes" and when I got 20% lift in predictive precision we could conclude that they do. Not only was intuition right but the new algorithm could tease out the data to make use of it.

The hack was accepted into Mahout Examples and I was invited to become a committer. Then the world changed.

Apache Spark and Mahout-Samsara

When I became a committer Mahout was written on Apache Hadoop MapReduce in Java (as was my hack). But it had also become obvious to most Mahout committers that the future was with much more performant engines like Apache Spark. Committers Dmitriy Lyubimov and Sebastian Schelter had been working on a Spark version of Mahout. In an instant of project time virtually all committers saw this as the future of Mahout, if also a major pivot. 

In retrospect I'm not sure I've ever seen an Apache project change so much in so little time. Today Mahout is deprecating lots of old Hadoop MapReduce code as it falls from use and the new Mahout is truly new. The Mahout subtitle Samsara, references the cycle of life, death, and rebirth in the Hindu tradition. Mahout started as algorithms written specifically for MapReduce, now Mahout-Samsara is a linear algebra DSL in Scala used to roll-your-own algorithms but with most interesting algorithms in very simple DSL-based implementations. Mahout eventually took this transformation even further to include other compute engines like Apache Flink and is now running on GPUs. But I get ahead of things...

Those were exciting times and though I helped with the DSL I remained fixed on implementing CCO, which was first included in Mahout 0.10.0 in October 2014.


Now we have the CCO algorithm implemented on modern compute engines but several other problems remained in order to actually deploy a recommender. This is because CCO creates a model that needs to be deployed on a special type of server that computes similarity in real time. In Machine Learning terms this is a K-Nearest Neighbors engine, known in concrete terms as Lucene, or it's scalable server derivatives like Solr and Elasticsearch. A turnkey recommender also requires a highly performant massively scalable DB, like HBase. Putting these together we could get a nearly turnkey recommendation server that made use of multimodal real time user behavior. But I didn't see a candidate for all these in Apache and so looked elsewhere. This required an integration project, not Mahout, which integrated with other services but provided none of its own.

I found a project that included everything I needed and was Apache licensed but was run by a small startup called PredictionIO. They had a Machine Learning Server that was a framework for Templates that could implement a wide range of Algorithms. The Server also included nice high-level integrations with Elasticsearch (Lucene server), Spark, and HBase. In May of 2015 I had the first running CCO Server build on Mahout and a whole list of other Apache projects.

Back to Apache

PredictionIO was at the right place to get swept up in a major move to embrace ML/AI by Salesforce Inc. who bought them as part of the Einstein initiative. Since PIO was Apache licensed OSS it was still available and so was the Template I was calling the Universal Recommender. But there was a question now about the future of PIO; what would Salesforce do with it? The old team, that I had worked closely with, wanted to see the project move forward in OSS and Salesforce seemed to agree, but large corporations often have a mixed record in promoting their own OSS projects. In this case Salesforce decided to remove the question by submitting PredictionIO to the Apache Incubator.

The old team was joined by people like me from outside Salesforce to create a project that follows the Apache Way and is free of corporate dominance. I am a committer to PredictionIO, which has three releases under Apache Incubator vigilance and the Universal Recommender is now at v0.6.0, the most popular of PredictionIO Template Algorithms.

With the 3rd release of PIO from Apache we are now in the process of graduation to an Apache Top-Level Project, hatched by the Apache Incubator. I fully expect that we'll be celebrating soon.


My journey began with a specific problem to solve. Each step to produce the solution has led back to Apache in one way or another, through mentors, collaboration, use of, and commitment to several projects. But I now have my mature scalable, performant, state-of-the-art nearly turnkey Universal Recommender.  Now we can ingest and get improvements from many types of behavior, enrichment data, and context--using it in real time to serve recommendations subject to robust business rules. My small consulting company ActionML actionml.com now has a powerful tool to solve real problems and we make a living (at least partly) by helping people deploy and tune it for their data.

This is a story of someone single mindedly following a goal over several years. There are many ways to do this in the Software Development world, but not all OSS projects are open to bringing people in. The Apache Software Foundation most certainly is and openly recruits as diverse a group of committers and members as possible. If you want to make a difference and influence the course of an OSS project Apache is a good place to look. Start by getting involved with a project of interest, make contributions, get involved in discussions. If the match is good you'll be invited in as a committer and move on from there. I think of Apache as a do-ocracy, if you do something of value it goes a long way towards being invited in.  


Slides describing the CCO Algorithm: https://www.slideshare.net/pferrel/unified-recommender-39986309

IBM DevWorks Post on "Making one thing Predict Another": https://developer.ibm.com/dwblog/2017/mahout-spark-correlated-cross-occurences/

Apache Mahout CCO Implementation: http://mahout.apache.org/users/algorithms/intro-cooccurrence-spark.html

Apache PredictionIO: http://predictionio.incubator.apache.org/

The Universal Recommender Template: http://predictionio.incubator.apache.org/gallery/template-gallery/

Professional Support for the Universal Recommender: http://actionml.com/universal-recommender

# # #

"Success at Apache" focuses on the processes behind why the ASF "just works". 1) Project Independence https://s.apache.org/CE0V 2) All Carrot and No Stick https://s.apache.org/ykoG 3) Asynchronous Decision Making https://s.apache.org/PMvk 4) Rule of the Makers https://s.apache.org/yFgQ 5) JFDI --the unconditional love of contributors https://s.apache.org/4pjM 6) Meritocracy and Me https://s.apache.org/tQQh 7) Learning to Build a Stronger Community https://s.apache.org/x9Be 8) Meritocracy. https://s.apache.org/DiEo 9) Lowering Barriers to Open Innovation https://s.apache.org/dAlg

Tuesday September 05, 2017

Success at Apache: Lowering Barriers to Open Innovation

By Luke Han

Over the past decade, I was a Java developer using many Apache projects such as Tomcat, Jakarta, Struts, and Velocity. In 2010 I stepped into the Big Data field and started to actively participate in Apache projects, and became an ASF Member 3 years ago. In addition to being the VP of Apache Kylin, I helped projects such as Apache Eagle and CarbonData move to the ASF, and have been a mentor for Apache Superset, Weex, and RocketMQ. Today, I'm co-founder/CEO of Kyligence (prior to that, I was Big Data Product Lead of eBay, and Chief Consultant of Actuate China).

Apache Kylin, as its name may suggest, originated from China ("Kylin": A powerful yet gentle fire-breathing creature in eastern mythology. Also written as Qilin. "Apache Kylin": OLAP on Hadoop, capable of analyzing petabytes of data within seconds http://kylin.apache.org/ ). I started this project with a few members in early 2015. 

As a pioneer of the first highly-recognized Apache project from the Eastern world, I was proud to see that, within 2 years, Kylin has helped over 500 organizations across the globe to solve their Big Data challenges. 

Before Kylin graduated from the Apache Incubator, the Kylin team faced a lot of cultural challenges. Since a great number of projects from China had failed in the past, we too received many questions and doubts from both eastern and western worlds. As our native language is not English, communication with mentors did become difficult during the coaching process. Fortunately, by fully embracing The Apache Way, Kylin is able to succeed with strong support from the Apache community members. Much more beyond the Kylin software, our team has also worked with those talented people in a way to spread our Chinese voice to the world. 

While developing high-quality software, we are engaging more Westerners to understand the Eastern culture. I had many chances to travel and meet people across the globe since I initiated Kylin. Some of them are Apache directors and mentors, some of them are developers and contributors. Some are from US, Australia, Canada and Chile; some are from Japan and Taiwan. Some are impressed with Kylin, some are curious about Easterners’ attitude toward Open Source software. I asked them a lot of questions about The Apache Way, and they all generously coached me and my team with lovely and detailed answers. We too could reach consensuses after intensive and open arguments. Kylin received much more encouragement and recognition than I expected.

As a VP of a Top-Level Project, my responsibility grew after Kylin graduated from the Apache Incubator. Kylin faced more opportunities as it has been bug-fixed quickly and tested frequently, with the nature of an Open Source software. In the China’s well-knowingly-big market, Apache Kylin has received many users’ feedback and evolved fast. We received many suggestions from both developers’ perspective and products’ perspective. Beyond my expectation, many community members are passionately writing tools for Kylin and helping users better understand and use Kylin. Assembling members’ ideas, we are also sharing our knowledge as a way to give back to the community. 

Thanks to ASF and everyone involved in the Open Source community, I have the opportunity to work with people that I’ve always admired and make a difference in the world all together. I feel I and my team are deeply connected with such warm, global, open community.

= = =

"Success at Apache" is a monthly blog series that focuses on the processes behind why the ASF "just works". 1) Project Independence https://s.apache.org/CE0V 2) All Carrot and No Stick https://s.apache.org/ykoG 3) Asynchronous Decision Making https://s.apache.org/PMvk 4) Rule of the Makers https://s.apache.org/yFgQ 5) JFDI --the unconditional love of contributors https://s.apache.org/4pjM 6) Meritocracy and Me https://s.apache.org/tQQh 7) Learning to Build a Stronger Community https://s.apache.org/x9Be 8) Meritocracy. https://s.apache.org/DiEo

# # # 

Tuesday August 15, 2017

Success at Apache: Meritocracy.

By Kevin A. McGrail

The Apache Software Foundation is not a democracy.

It's an elitist organization that does not support an innate right to vote. We aren't capitalists because you can't buy a seat on our board. We aren't socialists since we place building working communities over software. Monarchy doesn't fit because Kings and Pawns work together as equals.

What we are is a Meritocracy. To be able to have a say, you have to prove your worth in a system of merit. Meritocracy is a key part of The Apache Way. With it, the ASF creates amazing software with amazing people that continues to change the way the world computes.

Merit has no basis on Age, Sex, Religion, Ethnicity, Race, Country of Origin, Sexual Preference, Social Status, Income Level, Lineage, and/or Physical/Cultural Traits*.

In honor of The Apache Way, the ASF has created two wristbands to share with the tech community. The first is silver and announces our Meritocracy. 

The second, because merit is NOT rooted in biological differences, is brash and bold in Red announcing "do I.T. like a girl".  The idea comes from Code Like A Girl wristbands coupled with a small bit of double entendre to start a conversation about improving inclusion.

If you'd like some wristbands, they'll be debossed in a single color like the pictures below. Just send me an email at kmcgrail(at)apache(dot)org. I'll try and send out as many as I can for free. If you like/hate the idea, feel free to send me an email as well and tell me what you would do differently.

* NOTE: We do take into serious account whether you are a Cat or a Dog person.

Kevin A. McGrail is a cybersecurity expert and Open Source advocate who loves stopping spammers. He got involved with the ASF when the Apache SpamAssassin project joined the foundation in 2004. Today he still helps the SpamAssassin project while also serving as an executive officer and VP of Fundraising.

= = =

"Success at Apache" is a new monthly blog series that focuses on the processes behind why the ASF "just works". 1) Project Independence https://s.apache.org/CE0V 2) All Carrot and No Stick https://s.apache.org/ykoG 3) Asynchronous Decision Making https://s.apache.org/PMvk 4) Rule of the Makers https://s.apache.org/yFgQ 5) JFDI --the unconditional love of contributors https://s.apache.org/4pjM 6) Meritocracy and Me https://s.apache.org/tQQh 7) Learning to Build a Stronger Community https://s.apache.org/x9Be

Monday June 05, 2017

Success at Apache: Learning to Build a Stronger Community

by John Ament

As the next line in the series of "Success at Apache", I had to think about what kind of blog post I wanted to write.  Given my personal focus, it made sense to focus on new projects coming in and the incubator.  When I'm not busy dreaming up new ideas and working on personal projects, I'm helping new projects get in to Apache, keeping their goals in alignment with the Apache Way http://apache.org/foundation/governance/ . I'm a member of a few different PMCs here at Apache, notably the Incubator. I'm a mentor to five different podlings right now. While my primary programming focus is on programming models, my podlings are all over the place. Starting a new project here at Apache can be a daunting task: how do I get in? What if I don't build a diverse community?  Becoming a podling has more to do with the community than it does the technical aspects of the project. We don't expect you to be experts in it, but we do expect new projects to be experts in how their own software works. We want to teach you, and we want you to be receptive to learning about The Apache Software Foundation and its best practices.

I'm not sure if everyone does it, but I build a lot of parallels between how an ASF project works and how an Agile team works. Agile teams start off as a bunch of people who don't really know each other but have assembled themselves into an informal team focused on solving a problem, or some number of problems, knowing that they can only do it together. They have common goals and objectives, but lack camaraderie early on to be able to work together smoothly. Over time, they get to know one another, figure out strengths and weaknesses and can resolve issues together. A well-functioning team isn't one at the beginning. It takes time and practice for them to work well - both together and as an outwardly facing unit.

Projects here at Apache follow the same type of maturity progression. Whether it's learning The Apache Way or learning to work with one another, it takes them time to mature and get into a good groove. 

Open Communication
The ASF is pretty big on open communication, wherever it's a sensible solution. We want to discuss with each other what we're doing, ideas around how to solve it and come up with a good solution together, as a team, in an open manner.  

This all ties into agile practices. We host stand ups to talk about what we're doing and see if others have an opinion about what we're doing.

When a project comes to Apache, the original authors need to remember that they're bringing in a lot of experience, and the expectation is that those existing contributors must help get new contributors from the outside - outside their organization specifically - to contribute into the project. By driving towards open communication, outside of your own organization, you're encouraging more people to participate. This sort of governance model ensures that all parties who can participate are aware of decisions being made.

Open Communication isn't for everything though. We need to remember to be respectful in our communications with others and if it's felt that something’s awry - speak privately. But remember that isn't part of the decision making process. Likewise, anytime we're talking about individuals in either a positive or negative way that should be conducted on the private list for a project.

Turning Into a Well Oiled Machine
Once a project begins to grow, new people start to get attracted to it. As a community, you have to figure out how to work together. Building a community of diverse ideas and skills will ensure that new ideas keep flowing. Contributors can react quickly to a user's question on list and help them resolve the problem, put in an enhancement request or get a bug report squashed in a following commit. Time is of the essence right now because I have availability to work on this.

There can't be a long drawn out waterfall style process when dealing with Open Source. At the same time, making sure there's a documented decision process and in sometimes an in depth design is critical for both new contributors and existing alike to come to a shared understanding of what is being proposed.

Projects need to plan for longevity. Longevity comes in many forms. A strong backlog of features is important. Having a diverse set of committers is even more critical. You could even say that each helps create the other. Just like any feature set, we get to a point where the feature is complete enough that we can move on to another feature.  

How do you get there?
Apache's main way to go to these points is to incubate http://incubator.apache.org/ . You can't get to this point by yourselves, experiencing with first-hand from existing Foundation members will help get your community to turn a new leaf and adopt this way of working. We want you to be successful, as long as your project can dedicate itself to the practices that have been set forth within the Foundation.

New projects may be comfortable with a champion http://incubator.apache.org/incubation/Roles_and_Responsibilities.html#Champion that can work with them closely, answering their questions up front. While a lot of the pre-incubation chatter will happen off list, it is important that potential new podlings subscribe to the incubator general list http://incubator.apache.org/guides/lists.html#general+at+incubator.apache.org and understand both the goings on of a podling as well as try to build their list of mentors http://incubator.apache.org/incubation/Roles_and_Responsibilities.html#Mentor in the open. Mentors are extremely important to a podling, and understanding their roles and why you need to pick great mentors is something your champion and the rest of the Incubator community can help explain. Participating in our public discussion lists is sometimes the first step to joining the foundation at a deeper level.

Where do we go next?
If you're a potential new project, feel free to reach out on the Incubator mailing lists http://incubator.apache.org/guides/lists.html#general+at+incubator.apache.org to get started. We'd love to hear from you and get you acquainted with The Apache Software Foundation.

If you're on an existing project, we want to hear your perspectives on how the Foundation works. You may want to reach out to dev@community http://community.apache.org/lists.html to let others know your thoughts, or even just subscribe and see what others have to say. We're all working together to make the foundation better. The more input we receive, both the positive and the negative, will help shape everyone's actions in the community.

= = =

"Success at Apache" is a new monthly blog series that focuses on the processes behind why the ASF "just works". 1) Project Independence https://s.apache.org/CE0V 2) All Carrot and No Stick https://s.apache.org/ykoG 3) Asynchronous Decision Making https://s.apache.org/PMvk 4) Rule of the Makers https://s.apache.org/yFgQ 5) JFDI --the unconditional love of contributors https://s.apache.org/4pjM 6) Meritocracy and Me https://s.apache.org/tQQh

Monday May 01, 2017

Success at Apache: Meritocracy and Me.

by Tom Barber

When Sally asked for volunteers to help with a blog post series "Success at Apache" I realised there was a very human story to tell about how the ASF helped me get to where I am today and hopefully where I'll go tomorrow. Over the years I have worked on and run a number of Open Source projects whilst working with an awful lot of Open Source software. One day I was browsing Slashdot as you do, yeah I know a lot of people disparage it, but it's an awfully hard habit to kick, and without it I wouldn't have got involved in the ASF so I owe it a lot. Anyway, one day when browsing Slashdot I saw this article (https://it.slashdot.org/story/11/01/08/1544204/apache-to-steward-nasa-built-middleware), I had been working in the Open Source business intelligence industry for a few years at that point and I spent a lot of time hacking around and managing data systems, so I wondered how I could get some help out of OODT (http://oodt.apache.org/). Also as a kid I had always loved everything about space, I was a huge Apollo fan, had a small telescope, went to the total eclipse in the UK in 1999 and so on. I thought this OODT project would be a fun way for me to chat nonsense to a few NASA employees, find out how they did stuff and do a bit of Open Source hacking on the side, which would at least let me participate in some NASA related development work, and so it began.

For those of you who haven't heard of Apache OODT it is a middleware layer for building data systems. Originally written by NASA JPL and then Open Sourced to The Apache Software Foundation it provides data ingest capabilities, metadata extraction, data workflows and resource management. I started by asking pretty dumb questions on the IRC channel, posting stuff on the mailing lists and trying to figure out how this reasonably expansive stack of software even operated. Chris Mattmann and Sean Kelly guided me through the opening foray into OODT development and education. Eventually, having submitted a few bug fixes, I volunteered to be a release manager for an OODT release and that got me more involved. Not too long after Sean asked if I fancied having a go at being the project chair, which I duly accepted. Behind the scenes cogs were turning and in a matter of a few weeks, I'd gone from a committer and PMC member to Chair to ASF member, it was certainly a hectic time, trying to keep up with all the new things I had to do, mailing lists to follow and so on, but what a period in my ASF experience, lots of fun!

That was just over 2 years ago and I'm still happily stewarding the OODT folks and keeping the cogs turning, releases happening and Jira tickets triaged. Alongside that it is truly an honour to be involved with the ASF as a member and although the politics can get tedious, the foundation is an amazing place for people to learn to work on great software as part of a distributed team. 18 months ago I was getting a little jaded with the monotony of the BI work I was doing, there are only so many sales databases and budget reports one guy can take and after being in BI 8 or so years I felt like it was time for a change of scenery, I just didn't know what. So I blasted out an email to 10 or so people I knew or had had some contact with over the years who might be able to give me a job, or know someone who was looking for a Java developer, BI guy, Open Source advocate, that type of thing. I'd included some OODT folks on my email, not because I thought there was a chance of a job using it, but just in case they happened to know someone out in California needing some remote help. Everyone said no, except Chris Mattmann who said if I could hold on a few months he might have something for me at NASA! That response floored me, I'd never even considered that as an option and knew that with a young family it would be highly unlikely I'd be able to move to California, so I played along assuming it would fall through. But the as the process dragged on and contracts got drawn up we got closer and closer to it becoming real and the excitement grew, there was the tangible possibility of me fulfilling at least a bit of a life long dream, no I wouldn't be an astronaut, but there was the chance of employment by NASA.

Eventually 6 months or so later, the paperwork was signed and I joined the ranks at NASA JPL, working as an Apache OODT and devops guy. What is great is that having 10 years of business and development experience, I feel like I can very much make a positive contribution to the team, and in part that is down to what I have learnt developing and coding at the ASF. It has been an amazing experience  and a wonderful 12 months. I never dreamt an opportunity like that would arise and it is 100% down to the great work the ASF does in stewarding new projects through the incubation process and into mainstream adoption. Without the ASF I would likely still be a BI guy dealing with run of the mill data warehouses, instead I work on Genomics Search Engines, help hunt criminals on the dark web and a host of other stuff. Life sometimes throws you an opportunity that you don't expect and the ASF certainly facilitated that.

Last week I was in Pasadena, visiting the JPL facility and getting the guided tour, and doing a bit of work. It was amazing talking to such a dedicated group of people who clearly have a big passion for what they do. Getting to see their mission control, the mars rovers and various satellite mock ups was awe inspiring but what excited me the most was getting to sit down and pick the brains of people with whom I have worked with at the ASF for years yet not met in the flesh. Finally making that human connection means a lot.

What the ASF offers here is the ability to learn to work as a distributed team without the pressures of the "real world". Everyone at the ASF, pretty much, is a volunteer and other volunteers recognise that, and so it reduces the pressure, but whilst reducing the pressure it teaches you how to make binding decisions as a disparate group, how to keep records and how to ship good quality code whilst living in different timezones. At the ASF some of us might meet once or twice a year at ApacheCon, Fosdem or elsewhere, but largely all communications is done via mailing list. This can cause issues when people "just want to get it done" but it also provides an immutable record of what is going on in a project and who said what. This proves equally useful out in the "real world" where you want to track business decisions or look up historical records of why a certain choice was made. Also dealing with people who you don't work with on a daily basis also helps you think more about your communication style, what is fine to say and what isn't and also how you structure your communications, which is also very important in a business setting. Do you know the person? Do they understand your nuances? Is English their native language? etc

The other thing I find the ASF offers is understanding. Last year I was diagnosed with Aspergers at the age of 33, which is pretty late. What is nice is that generally, people like to listen, and if you have something that affects your personal or professional life, people who you've met at the ASF will often lend an understanding ear to allow you to off load or discuss something completely unrelated to the project you might be working on. Or like me, you can just stand up at the front of an ApacheCon lightning talk and tell everyone! Either way, you can generally find someone in the Apache family who will provide a sounding board for anything you want to discuss.

These days I spend my spare time still working on OODT stuff, but also doing a lot of public speaking and mentoring and whenever I do I make sure I talk up the Apache Software Foundation because it has given me the chance of a life time and one that I'll be forever grateful for. If you aren't involved in development here at the ASF, get involved, you don't have to be a coder, you just need to like helping out in a fun, Open Source community.

As I mentioned at the start, this blog series is about success at Apache, hopefully this proves that success can come in a number of ways, the ASF was selected by NASA as the home for its data middleware platform, that proves that the NASA deemed the incubation process, the license and ecosystem acceptable, that is success the the Apache Foundation. Similarly the foundation has proved very successful in placing people into employment from a range of different walks of life into new lines of work, and that is exactly what happened to me and the reason I wanted to share my story about success at Apache.

= = =

"Success at Apache" is a new monthly blog series that focuses on the processes behind why the ASF "just works". 1) Project Independence https://s.apache.org/CE0V 2) All Carrot and No Stick https://s.apache.org/ykoG 3) Asynchronous Decision Making https://s.apache.org/PMvk 4) Rule of the Makers https://s.apache.org/yFgQ 5) JFDI --the unconditional love of contributors https://s.apache.org/4pjM



Hot Blogs (today's hits)

Tag Cloud