The Apache Software Foundation Blog
Sponsor Success at Apache: Exploration and Practice of the Apache Way in Tencent
by Mark Shan, Chairman of Tencent Open Source Alliance and Tencent Cloud Open Source Ecosystem General Manager
The Apache Software Foundation (ASF) manages more than 227 million lines of code, has 206 project management committees, leads more than 350 Apache projects and operates through a merit system, with more than 850 members, 8,100+ committers, and tens of thousands of contributors.
Previously the Apache Group, the ASF has grown to one of the largest open source foundations in the world today. It has built the well-known "Apache Way" through its leadership, sound community, and merit thinking, resulting in a set of schemes that promote the sustainable development of open source communities and guide the practice of open source projects.projects.
Since Tencent Open Source was created 11 years ago, a large number of Tencent engineers have formed a deep connection with the Apache community by participating and contributing to Apache projects. Furthermore, by learning from the Apache Way, Tencent is going through its open source journey.
At ApacheCon@Home 2021, I shared how Tencent thinks, explores, and practices open source according to the Apache Way. Below is a synopsis of this presentation:
The Apache Way's Importance in Community Building
The Apache Way is difficult to define. Although the Apache Way has evolved somewhat over the years, the original intention of "high transparency" has remained unchanged. In Mark Shan’s view, Tencent's learning experience from the Apache Way can be summarized into five main points:
- Everyone has the opportunity to participate and can become a contributor. Contributors can increase their impact and personal growth through their contributions to projects.
- The ASF has a structure that encourages contributions from everyone, regardless of employer. This means that, for example, committer and PMC roles are open to anyone who earns the title. Tencent encourages its engineers to participate in the Apache community actively.
- Understanding and practicing open communication is extremely important. Because open source is the collaboration of a global community, Tencent engineers are able to participate in the Apache open source project through asynchronous collaboration using the mailing list. Code and decision-related communication are open and transparent.
- Reaching consensus when making decisions is strongly encouraged. Consensus can maintain project momentum and productivity. But when a complete consensus is impossible, voting or other coordination is available to arrive at binding decisions.
- The most important point is that the Apache community's motto, "community over code", is often emphasized. Because a healthy community is more important than simply good code. A strong and healthy community can always correct code problems, while an unhealthy community may struggle to maintain the code repository in a sustainable manner. In addition, flexibility is also an integral part of ASF's sustainable open source success.
Tencent's Open Source Way —Inspired by the Apache Way Apache projects and their communities are unique and diverse. In the community-led development process, Apache members formed the Apache Way based on their experience. Many of Tencent's open source practices and results are executed following the model of the Apache Way. After many years of incorporating open source into its culture, Tencent has formed an approach, "open collaboration, open source for good," which reflects Tencent's value and vision, and is honed based on open source best practices. Practicing the Apache Way: Contributing and Donating to open source projects Tencent engineers have been contributing to and helping to lead many ASF projects, including: 1) Big Data Over the past four years, several Tencent engineers helped lead releases for Apache Hadoop 2.8.4 and 2.8.5, Apache Ozone 1.0.0 (Ozone was a sub-project of Apache Hadoop, and became a Top-Level Project in 2020), and Apache Spark 2.3.2. Tencent engineers contributed more than 20 features and optimizations to several versions of Hadoop, and are core contributors in multiple Apache computing and AI frameworks that include Flink, HBase, Hive, MXNet, and Parquet. 2) Middleware In 2019, Tencent donated TubeMQ, its self-developed trillion-level big data component, to the ASF through the Apache Incubator. In 2021 the project officially changed its name to Apache InLong, as part of its incubation process. Tencent Applications based on Apache Projects
In addition to self-developed tools, Tencent widely uses ASF open source projects in its various business systems, with particular focus in big data, API gateways, and observability. As one of the largest daily real-time computing companies, Tencent’s overall big data platform exceeds 5 million cores, and must support single-day access message volume exceeding 55 trillion, real-time computing volume exceeding 65 trillion, and daily analysis tasks reaching 15 million.
More than two dozen Apache projects are used in applications like WeChat, QQ, and Tencent Cloud. These big data projects are used for data transmission, storage, computing, and analysis demand scenarios, supported by other technical fields of service governance that include API gateways and observability.
For big data, Apache Ozone is one of the key projects that provides support for Tencent's business scenarios that require large amounts of data and traffic. As early adopters, Tencent's big data platform deployed an Ozone cluster with more than 1,000 nodes as the back-end storage for big data applications and also uses Ozone as the main storage solution for some private data warehouse projects.
Today, Tencent is connecting more and more businesses to Ozone, including data warehouses, machine learning platforms, Kubernetes cluster mounting volume, and more. Ozone helps Tencent operate stably, steadily, and at scale: of more than a thousand units without manual operation and maintenance intervention. In the process of verification and improvement, Tencent has done a lot of optimization work to improve performance and stability.
In addition, Apache Pulsar integrates messaging, storage and functional computing, and it adopts an architecture that separates storage and computing. Pulsar has successfully supported a large number of data and traffic business scenarios within Tencent Cloud and has some practical experience in cloud native environments, such as solving rapid and dynamic expansion and contraction, improving the utilization of cluster resources, and cluster forms.
For API Gateways, Apache APISIX offers high performance, a friendly developer experience, and rich plug-in capabilities, which is why Tencent's internal business chose it. Using APISIX, Tencent internally shares its self-developed cloud system components to solve business pain points and provide efficient API gateway services; externally, Tencent engineers contribute to the Apache APISIX open source community, expanding its influence, and leads the development of the open source community.
For Observability, Apache SkyWalking's application in Tencent's internal observable platform provides great convenience for selecting client tracing reporting of the microservices system, as well as the mechanism of computing storage separation and multi-layer query to provide very excellent performance output.
Building Tencent Open Source with 3 Major Projects
In addition to sponsoring the ASF at the Platinum level, Tencent is currently active in more than 10 foundations worldwide as a top member, including the Linux Foundation and the CNCF Foundation. Tencent is actively involved with open source communities such as Kubernetes, Spring Cloud, MariaDB, and it is also promoting the implementation of open source projects, products and solutions.
Foundations provide intellectual property management framework, code repositories, issue tracking, technical guidance, project governance, financial and public relations, and other services. By participating in these foundations, Tencent has learned many more mature open source organizational governance models and used them to guide the process to open source its internal projects to the world.
Tencent has open sourced more than 130 independent cloud native, big data, artificial intelligence, database, and other projects to the world. These projects obtained more than 370,000 GitHub Stars, and have exceeded 2,000 contributors.
By open sourcing projects across internal departments and to the world, actively collaborating with developers and communities around the world, and training open source talents, Tencent's open source ecosystem continuously improves and grows.
Tencent Cloud’s future will focus on further strengthening the construction of Tencent's open source ecosystem through three major projects:
1) WeOpen: Tencent Open Source community
Tencent Cloud aims to build WeOpen, a platform for open source communication, promotion, and project incubation.
WeOpen is committed to connecting enthusiasts, practitioners, and leaders with global open source foundations to initiate new projects, co-create communities, and hold activities that extend open source culture and inspire the global open source ecosystem to flourish.
2) Establish an industry open source joint laboratory
The open source laboratory is the landing place for the actual projects. Tencent Cloud plans to successively establish industry joint open source laboratories with well-known Chinese universities and open source organizations to provide a platform for students, researchers, and developers in the enterprise to contribute code and scenarios for open source projects to realize in the industry.
Starting with the "Rhino Bird Open Source Talent Cultivation Plan" held by Tencent this year, open source courses and practice training programs for college students help popularize open source culture, encourage contributions, and further the open source talent ecosystem.
3) Release "Cloud Native Open Source White Paper"
At the Cloud Native Industry Conference last May, Tencent Cloud and China's Institute of Information and Communications Technology announced the official preparation of the Cloud Native Open Source White Paper, which is expected to be released by the end of the year.
Tencent welcomes open source practitioners and companies to join their efforts in the above projects.
For more than 22 years, the ASF has proven that the Apache Way is one of the best practices for building an open organization that balances organizational structure and flexibility. Tencent continues to expand its own open source concepts, methodology, and ecosystem building, with plans to participate in more universities, companies, and organizations, while embracing the Apache Way into the future.
Mark Shan has a long career and practical experience in cloud-native, microservices, big data, edge computing, and open-source ecosystem. As the chairperson of Tencent Open Source Alliance, he works full of passion to build the ecosystem for Tencent Open Source and makes great efforts to accelerate innovation in technology and product with the open-source way. At Tencent Cloud, Mark leads the open-source team and works with organizations and communities including Apache Software Foundation,Linux Foundation, Open Atom Foundation, CAICT, COPU and others to build open-source ecosystem. He is also the observer of Linux Foundation Board, chairperson of TARS Foundation, TOC member of Open Atom Foundation and Magnolia Open Source Community, TSC member of Akraino Edge Stack, fellow of China Cloud Native Industry Alliance, advisor of Open Source Community.
= = =
"Success at Apache" is a monthly blog series that focuses on the processes behind why the ASF "just works". "Sponsor Success at Apache" features insights and experiences by select ASF Sponsors https://apache.org/foundation/thanks.html
For more Success at Apache posts, visit https://blogs.apache.org/foundation/category/SuccessAtApache
Posted at 06:44PM Nov 11, 2021 by Sally Khudairi in SuccessAtApache | |
Success at Apache: from Mentee to PMC
by Ephraim Anierobi
This post is about how I became a committer and a Project Management Committee (PMC) member of Apache Airflow, and provides guidance to those new to programming, are new to contributing to open-source projects, and want to become committers and PMC members in their respective Apache projects.
About a year and a half after changing my career from electrical engineering to software development, I became a committer and a Project Management Committee member of Apache Airflow. Becoming a committer and a PMC member is a reward and a kind of validation that you are on the right part of your journey.
On February 16, 2021, I accepted an invitation to become a committer in Apache Airflow. It came as a surprise, as I was not expecting it. Six months down the line, I received another surprise invitation to become a PMC member in Apache Airflow.
These are impressive feats for me because before contributing to Apache Airflow, I didn't have experience working with other programmers. I was making websites and taught a few friends of mine how to make their own. I didn't have a mentor, and no one has ever seen my code to advise whether to continue on my journey or drop the idea of becoming a programmer.
While I desired to work with experienced programmers to improve my skills, I feared people seeing my code would talk me down. I almost gave up on my journey only to come across an Outreachy post on Twitter looking for interns for open source projects. Outreachy is a tech diversity program that provides three months of paid, remote internships to people underrepresented in tech.
I was ready to change my career and was looking for mentorship, but couldn't find an internship that could help me get started in my journey. In Nigeria where I'm living, your location affects your chances of getting an entry-level job. I was not close to the major cities.
So I applied for an internship through Outreachy.
There are two application processes. The initial application involves explaining your background and why you should be accepted into the program. You must pass the initial application before you could proceed to the next. The second application process (called the contribution period) is where you choose an open source project that matches your skill sets and then contribute to it. You must have some minimum contributions before you could be accepted.
That was how I found Apache Airflow.
You could imagine the joy I had when I was accepted into the program.
Here are things I did which I believe would help you in your journey to becoming an Apache committer and a PMC member.
Asking questions is the fastest way to learn. Don't be afraid to ask questions if you do not understand something. I ask questions a lot and I always get answers, but I didn't start by asking questions: I made 40 commits to the repository without understanding what Airflow does. It was not until I joined my new employer Astronomer that I learned what DAG is and what a data pipeline is. Now I can easily reproduce issues following someone's descriptions. I wish I had asked questions earlier --I could have had more experience by now!
If you are like me, with little experience, start contributing from the minor issues. Find good first issues and work on them. You don't have to wait to contribute a large change before contributing.
While working on the REST API project, which I got hired by Outreachy to do, I was looking at the codebase. I started with Airflow providers because it was easy for me to understand. There were so many requests about providers at the time and I started looking into it, reading the code base, and helping with the providers. I didn't go into the core straight up; I avoided it. My first PR was on simple database migration during the Outreachy contribution period.
Airflow is complex. Till now, I'm still learning it. Just last week I learned about how the execution date works. I know there are a lot of other things I have not understood very well but refactoring helped me to understand a lot.
When I was to work in the scheduler, I found the file was so large that I went back and forth without progress. I worked on separating the files and I'm glad I did because after that I could contribute. I recommend refactoring code but do not go into large refactoring. A little at a time, with the hope to understand the project. Avoid the core of the project if you are just starting.
One thing about issues is that most reporters would tell you how to reproduce them. Most times, you would find that the issue is quite easy to fix. I usually jump on those and fix them. Other times, I had to contact my superiors before I could fix it.
Looking at reported issues gives an added advantage that you could learn how the software works in the real world. Try to reproduce as many issues as possible. It adds to your knowledge.
Here's where you can learn a great deal. I start my day by looking at the PRs. Most PRs link to issues. I read the issues and study PRs. I must admit that some of these PRs are just too complex for me. If I don't understand it, sometimes I ask questions, other times I go to the next PR. When I jump to the next PR, I record the topic that made me jump to the next and plan on reading about it some other time.
When you make a PR, ask for reviews in the community channel of communication. Airflow uses Slack and the mailing list for communications. You should ask for reviews in the slack channel and not the mailing list. The reviews not only give information on how to fix the problem but also teach you best practices in programming.
The ASF has a code of conduct that covers the Foundations activities as well as the projects. Read it first.
Among many other things, you would learn in Apache Airflow is communication. How to communicate with people in a civil manner. Spend time reading PR reviews, you will learn a lot and especially how to ask people to make changes to their code.
You don't have to wait for an invitation to contribute to an Apache project. You don't have to become an Outreachy intern to get involved with something you're interested in.
Don't be afraid to make a PR because nobody will penalize you if you're wrong. I know the feeling that people may think you are not good enough, forget it, they know you are new to the field and if you are thinking that they don't know your level in the language, forget it too, they know you are still a junior because it says so in your code. I can't count how many times I have had code reviews that showed me a better way to implement the code. Be open-minded, make mistakes, and excel.
Ephraim Anierobi started to work on the Apache Airflow project as an Outreachy Intern in May 2020. He became a committer in February 2021 and a member of the Apache Airflow Project Management Committee (PMC) in August 2021. He is a software engineer at Astronomer.
Posted at 02:32PM Sep 16, 2021 by Sally Khudairi in SuccessAtApache | |
Success at Apache: Security in Practice
by Jarek Potiuk
This post is about the Apache Software Foundation's Security process and security mindset of the Apache Software project’s PMC put to the best use in practice. From this post you can learn why security practices we apply at our projects are important and how they work when they are applied correctly and when the right security-driven mindset is applied by the PMCs but also how important it is for the users of the Apache Software Foundation projects to keep their software updated - including latest security fixes.
The idea of this article was triggered by a recent blog post of the security researcher Ian Caroll that has earned USD 13.000 on bug bounties by simply following up the results of Apache Security process applied by the Apache Airflow PMC. This saved quite a few businesses a lot of trouble, but it was only possible due to the foundations laid down by the ASF and the PMC of the project.
Here is what Ian Caroll has to say about it: “This issue was a great example of how ASF's transparent way of fixing and disclosing vulnerabilities worked to protect users of their software, and gave many organizations a wake-up call on ensuring they upgrade and protect their open-source software.”
Apache Airflow is one of the most common orchestration software used in the industry currently, and due to its nature, it sounds like an important vector of attack - if you run it internally in your company, you are likely to interact with pretty much all your systems, and if you manage to break in through Airflow, it might cascade into as many systems you connect to. Therefore the Apache Airflow PMC takes security very seriously. So seriously that we have the whole discussion panel about Apache Airflow Security at the Airflow Summit that is coming soon - July 8-16th.
This post's main point is to show how important it is to follow the security best practices for all the software lifecycle and how important it is to think about it at every step of building and releasing the software (and beyond).
Let's start from the very beginning: making sure the code development process is secure. Like most of the ASF projects, the Apache Airflow project is developed in GitHub and together with a growing number of projects we use GitHub Actions to run continuous integration. There are a number of best practices and security hardening practices published by Github that you should follow when you run your CI with GitHub Actions, and we rigorously follow them, including monitoring of the "Security blog of GitHub" and following it’s advisories.
And we have not stopped there. We actively think and discuss the potential security threats and ways how - for example supply chain attacks can be performed on our project, and we share our findings at the discussion mailing lists of the ASF and introducing recommendations for all ASF projects to make use of the best practices. One of the results there is documenting the practices and sharing them at the firstname.lastname@example.org. But we also raised a few security issues to GitHub and as a result of that (at least that’s the feedback we got from GitHub) they implemented some improvements that we apply in practice. The recent example of that is a change implemented by GitHub to allow control of permissions of the GitHub Token used during the CI build which resulted in this PR. Few months ago, we raised concern that having the blanket "write" permission is quite dangerous, and GitHub responded and implemented the change, which allowed us to limit the scope of tokens used for our builds and increase protection against a wide range of attacks - with the supply-chain attacks being recently the most prominent ones, leading to ransomware threats and millions of dollars paid to hackers.
This is where the security mindset for the Apache Airflow PMC starts with and this lays the foundation for the next steps where the Apache Software Foundation takes a crucial role in - releasing the software and monitoring for security vulnerabilities. The ASF has a rather well established process for disclosing and following up with security vulnerabilities for the ASF projects. One that is very straightforward and simple to follow for everyone involved - starting from security researchers, who raise those issues, going through the voluntary (!) security team of the ASF that has to handle (from the upcoming annual report) 387 reports of possible vulnerabilities spanned across 95 of the top level ASF projects, which led to 155 CVEs (Common Vulnerabilities and Exposures) assigned, and end up with the PMC that has to handle solving the issues and follow up with reporting. Heck, ASF even introduced an internal portal to report and keep track of all the CVEs as well as report the yearly security summary report and video.
This process is very clear about responsible disclosure and publishing the vulnerabilities, the way how security researchers, the ASF security team and PMC can collaborate when security is discovered. Quite a recent experience there was discovering and announcing CVE-2021-29621: User enumeration in database authentication in Flask-AppBuilder. This issue was reported to the ASF - following the process - by Dolev Farhi he responsibly disclosed it together with proof-of-concept reproducible scenario that allowed us to quickly verify that the issue exists and (more importantly) that allowed us to verify that the issue is fixed when we fixed it.
At the end of the process this is the message we got from Dolev: "Truly enjoyed working with you. Thanks so much for your help in bringing this to closure and making Airflow what it is."
The CVE was an interesting one because it was not an issue with the Airflow code, but it was introduced by a dependency of Airflow - Flask-AppBuilder. Fortunately the process is built in the way that we can involve and collaborate with other projects in solving it, and we got excellent support from Daniel Gaspar. We tried and tested the fix locally, provided it to Daniel which let Daniel quickly implement it and release a new version of Flask AppBuilder fixing it. This was also important for the Apache Superset project (Daniel is a PMC there as well) which also uses Flask-AppBuilder and suffered from the same vulnerability. This shows how security is a distributed issue and how much cooperation is important and how much a good security process should embrace it. I truly enjoyed cooperation with Daniel, and Dolev as we helped to test release candidate of Flask AppBuilder. Later on, when the CVE was published, we announced it following the regular announcement process.
Here is what Daniel has to say about it: "A great example of multiple open source projects working together, elevating each other to higher quality. The whole is greater than the sum of the parts. Got a clear report with a proposed fix, reproducible steps all backed by the ASF security process, it was a breeze to fix and release."
This leads to the most important point. We can do only as much as we can when it comes to developing and releasing our software. But then it’s up to our users to upgrade to the latest versions. If they don’t, they remain vulnerable. This was the actual reason for the blog post I mentioned initially - despite announcing a CVE-2020-17526 and releasing a fixed version a long time ago, many of our users did not follow the announcements and did not upgrade to the latest version of Airflow. I must stress here the importance of this step - as long as our users do not upgrade to fixed versions, there is not much we can do to help them. It's all in our users' hands! This time it ended up with just USD 13.000 paid to Ian in the form of bounties, because Ian is a responsible security researcher (so called "white hat"). But imagine some bad characters doing the same thing Ian did.
Of course we understand that this might sometimes be difficult to migrate to newer versions of a software, but here we also have another solution that we applied last year, and one that might seem surprising at first, but makes perfect sense when you look at the consequences. Consistent versioning and release support predictability. When we announced Airflow 2.0 last year, there was a small but important change we introduced - full support for Semantic Versioning which we follow rigorously since. We also published a predictable version lifecycle. Why is this important ? Because the users might be pretty sure that they can safely upgrade “patchlevel” version of Airflow when it gets released without even thinking about potential migration problems. Also when you release the "feature" - minor version of Airflow, we promise it is backwards-compatible and even if the migration process might be a bit longer, they can apply it without worrying about spending a lot of time for the migration of their DAGs (DAGs are the users workflow definitions that some of our customers have many thousands of as their entire data processing is orchestrated by Airflow).
We also publish (and will continue to) the support schedule for our major releases, so that the users can be prepared and plan migration to new major releases in advance. As with all software we sometimes will implement backwards-incompatible changes which will cause our users to spend more time on migrations. Those old releases will stop receiving security fixes at some date and the best you can do as a user is to migrate to the supported version before the date!
Which leads to the last and most important point in this article. If you are a diligent reader and look at the announcement I mentioned above for CVE-2021-29621, you will see that the fix for that is only released for Airflow 2 series. Why? Because Airflow 1.10 just reached its end-of-life on June 17th 2021. When we released Airflow 2, half a year ago, we agreed in the community that we will only support Airflow 1.10 with critical/security fixes for 6 months. And we did - for example the CVE-2020-17526 has been addressed in the Airflow 1.10.14.
But this time is over now. This is the first security vulnerability that we addressed only for Airflow 2. If you are still using Airflow 1.10 - you are on your own now. You are no longer protected by the security process of the ASF, the security team of ASF and airflow PMC. What’s even more - security researchers who raise the issues, even if they find it, might not be eager to responsibly disclose it, knowing also that the issue will not be fixed anyway. When you read about the next ransomware attack and millions of dollars paid, think if you would like one day your company to face this kind of dilemma. Even if it costs time and money to keep your software updated, preventing this kind of problem is far cheaper than dealing with the consequences of such an attack.
Upgrade NOW! to the latest release of Airflow 2 and keep on doing it for the future releases!
Be sure to join us at Airflow Summit online 8-16 July https://airflowsummit.org/ --registration is free and open to all.
# # #
Jarek Potiuk started to work on the Apache Airflow project in September 2018. He became an Apache Airflow committer in April 2019 and a member of the Apache Airflow Project Management Committee (PMC) in October 2019. He was elected an ASF Member in April 2021. He is an Apache project mentor in Outreachy and Google Summer of Code and was a mentor in Google Season of Docs. Jarek is an independent Open Source Contributor and Advisor and always keen on making it easier for people with different backgrounds to join OSS projects. = = = "Success at Apache" is a monthly blog series that focuses on the processes behind why the ASF "just works" https://blogs.apache.org/foundation/category/SuccessAtApache
Posted at 04:52PM Jun 23, 2021 by Sally Khudairi in SuccessAtApache | |
Sponsor Success at Apache: The Fork
by Wei Zhou
I joined the Apache CloudStack community in 2012 and became a committer in 2013, eventually becoming a PMC (Project Management Committee) member in 2017. My journey to becoming a PMC was both physical and literal, and included several forks in the road. The forks presented themselves in all aspects of my journey – although the literal forks came later, mainly because my journey began in China.
In 2010 I was working at China Mobile, the world's largest mobile network operator, in Beijing as the manager of a cloud project based on OpenNebula (another Open Source project). A year later, my partner received her PhD in the Netherlands and began working in Belgium, so I started looking for new work opportunities in the area.
In 2012 I visited Europe and had a few interviews, but it was difficult at the time as my English was quite poor.
I was committed to finding a good job and moving to Europe, which meant I needed to improve my English quickly. I studied the language, left China Mobile, and moved to Belgium permanently. It took me seven months to became fluent in English. I re-interviewed at the companies that had rejected me seven months prior and landed a position in Amsterdam at Leaseweb as a Cloud Innovation Engineer. At that time, we had two public cloud platforms based on Apache CloudStack. I was mainly working on the research of Apache CloudStack.
In the first two months I fixed some bugs we found in our productions. Thanks to the CloudStack community, I also received tons of help as I began contributing my changes to the mainstream. In 2013 was invited to be a committer, 3 months after my first submission. It was a huge surprise and a massive honor for me, and I began pushing my changes for new features and bug fixes much more quickly.
A year later, in 2014, Leaseweb released its first private cloud based on Apache CloudStack. It was very welcomed by many customers. As this was happening, we began finding issues with CloudStack as customers were requesting more features and functionalities. The same year, Apache CloudStack moved code repositories to GitHub.com and started using GitHub pull quests to review and merge commits. While all commits should be reviewed and approved by other commits before they are merged into the mainstream, we had already made many changes at Leaseweb and could not wait for next release. Because of this we created our own fork containing all our changes and bug fixes.
We developed very quickly, and our process was much faster than the review/merge process of Apache CloudStack. The gap between our fork and the community was getting bigger and bigger. When we decided to upgrade from CloudStack 4.2.1 to CloudStack 4.7.1, we had to spend half of a year just to port all of our changes in our fork based on CloudStack 4.2.1 to new fork based on CloudStack 4.7.1. The same problem happened again when we tried to upgrade to CloudStack 4.14, and we had to spend around one year to port all of our changes. The lesson we learned from these two upgrades was that we needed to contribute more to the community and maintain a fork as small as possible. After realizing this, we contributed all of our features and bug fixes to the community by creating many GitHub PRs. Some PRs have now already been merged into mainstream, while others are still in review.
My colleague recently asked me, “If you could go back in time, would you still make the Leaseweb fork?” My answer is yes, I would do it again. A fork makes us more flexible, as we can offer more stable production and more functionalities to our customers. However, if I could go back in time, I would have spent much more time contributing our changes to the community. I’ve learned that the gap between the fork and the community should be less than 100 commits.
We learned so much from these two painstakingly long ports and have implemented the above advice. From now on, the Leaseweb fork only contains features we have developed in the past and bug fixes. For new features, we will always contribute to community and deploy to our production only if it is merged into the mainstream. By doing this, we will be able to upgrade to the next CloudStack release much easier, and will benefit more from the community (e.g., more bug fixes, more features by other contributors). When we contribute to the community, we also benefit from knowledge sharing and the contributions from others.
Wei Zhou has been an Apache committer since 2013 and a PMC member since 2017. He has a Masters in Computer Applied Technology from the University of Science and Technology of China, and a PHD in Computer Organization and Architecture from the Institute of Computing Technology, Chinese Academy of Sciences. Wei specializes in all things computers and has over 10 years of experience in software development. He is a Principal Cloud Engineer at Leaseweb.
= = =
"Success at Apache" is a monthly blog series that focuses on the processes behind why the ASF "just works". "Sponsor Success at Apache" features insights and experiences by select ASF Sponsors https://apache.org/foundation/thanks.html
For more Success at Apache posts, visit https://blogs.apache.org/foundation/category/SuccessAtApache
Posted at 04:12PM Apr 12, 2021 by Sally Khudairi in SuccessAtApache | |
Inside Infra: Chris Lambertus --Part II
So, in the scope of the team, I understand that you're a more "senior" developer. Not that you know better; it's not an issue of better or worse, but you're more seasoned. How does ASF compare to other groups that you've worked with? Are there special technical requirements or special security issues you have to be concerned with? Especially as we mentioned before, it seems like there's an unlimited number of project development environments. Are there certain things that you have to consider or accommodate or do that's so different with ASF that you've never experienced before? Can you give a little bit of a frame of reference for folks unfamiliar with how it is within the ASF?
First of all, I'm not a developer. I am terrible at programming. Absolutely, I'm awful at it. I don't consider myself a developer in any way, shape, or form. I am a system administrator, 100%.
...Administrator. Okay, so, you're a more "senior" sysadmin then.
I hesitate to use the word senior, because it has some implications in the industry that I don't necessarily feel are appropriate for the ASF. I believe that I have been doing it longer than most other people on the team just as a career. I'm guessing that's probably what you mean by that.
Right. That's why I used the word "seasoned" also. It's hard because some people go, "Are you saying I'm old, or are you saying hierarchical, that I'm above others?" It's a hard way of describing it, because some folks have been programming or dealing with computers since there were kids, others later in life, but you guys are all moving in the same direction. So, how does one describe it?
Yeah, I think seasoned is a good word. Just like I said, I've been working in the industry as a system administrator since 1992, pretty much continuously with some brief changes in the 2000s. It's not here nor there. So, it's not hierarchical. Everybody is equivalent in terms of the Infra team. Nobody's above anybody else or below anybody else, right?
...I was wondering how is the ASF different from other groups you've worked with.
All right. It's actually not all that different. There are a couple of things that make it unique. Well, a number of things that make it unique. One is that it's completely remote and completely geographically dispersed. Two is that the participants on the team are all from very different backgrounds and cultures and countries, which is fairly unusual for a system admin team, a small system admin team, I would say. But beyond that, it actually shares quite a lot of things that I typically see in system administration teams. There's a central job board, if you will, like the Jira stuff. There's a communications channel. We have Slack.
There's a nominal leader in Greg, that directs the general movement of the barge. Yeah, by and large, it's pretty similar with most environments that I've worked in. I mean, some are much different. Some are very corporate, some are very open. Yeah, now I remember one of your previous questions --one of the biggest challenges that I found is the openness.
The ASF for quite some time has been incredibly public with its configurations, with its systems, with its documentation. These types of things are very unusual in the corporate world or in commercial IT. Typically, you would never make that stuff public. The fact that it is and has been at the ASF, that's been a challenge for me. It's an unusual way to maintain systems. It's got some downsides. Having that stuff available can be concerning at times.
...How so? Help me understand this, because I've been with the ASF forever. What you're mentioning right now reminds me of about 10 years ago, something failed in Infrastructure. I can't remember what it was, but it was a big thing. People were talking about it. It was even in the press at the time. It wasn't catastrophic, but it was big. We actually wrote a blog post about it and we presented about it at ApacheCon. From a marketing perspective and a media perspective, I was uncomfortable, because from a corporate perspective, you don't do that. The fact that we not only encouraged it but published it and educated everyone about it, admitted it, ate it all, we took responsibility, 100%: "Here's what failed. Here's what happened. Here's what we did." People found this to be extremely refreshing, extremely helpful, and it was totally eye opening for me. I had no concept of anything like that before, and I'd been with the ASF for like 10 years already. I've never seen us opening the kimono at that capacity. So, I'm coming at it from a slightly different perspective as you. I understand you don't want to have your config files public. Obviously, that can put you at a different level of exposure and risk.
...Is that required, or is that just part of our culture saying, "This is what we do"?
It's definitely part of the culture. My background is heavily in computer security. Coming on board to the ASF and seeing all this stuff out in the open to me was... I couldn't believe my eyes. "You're doing what?" So, I've actually worked quite a lot to reel that into some extent, because even 10 years ago was nothing like what's happening today in the world of computer security, in the terms of the threats, in terms of what people are looking for, what people are doing, and what people are capable of doing, right? Even to benevolent organizations like ASF, it's distressing.
So, one of the things that I've really tried to encourage is it's okay to be open to some extent, but you have to have some common sense about your security exposure. That's what I've been trying to do just for the entire time that I've been here is just to try and reel some of that in without losing the culture, because I think the culture is valuable. Like you said, the incident that happened whenever that was, I think it was a right decision for the time. Would you do that today? Probably not.
It's not because you wanted to cover something up, but it's because you want to limit your exposure. Yeah, so it's a different culture now, not the ASF, but the world in general. You have to keep that in mind as you move through the day to make sure that you are minimizing your risk and minimizing your security threat vectors.
All right. Have you had instances where a project has basically treated you as their dedicated resource? Has anyone made unusual demands of the team? I’m not asking you to name names, but I can imagine it can get out of hand with all these different projects, especially the corporate ones.
Absolutely. Yeah, the corporate ones are typically the biggest problems, because they come in with a much different mindset than somebody who's come in from developing an Open Source package and has brought it to the ASF. The corporate projects that we've seen really are the ones who are the purveyors of that mentality. They feel Infra is their personal resource, because they don't really have an understanding of the scope of the Foundation. They don't have an understanding of the amount of projects that Infra supports. So, I don't really fault them for that, because it's just a matter of education. They just need to understand where they are placed in terms of the Foundation, in terms of Infra's availability and scalability.
Once we've explained that to people, they get it. We typically don't have any problems after that. But there are a few projects that have come in and just persisted in wanting weird stuff. Some of the things you can provide. Some of the things, you just got to kick back and say, "Hey, this is not something..." Like I mentioned earlier, if it doesn't have a broad benefit to the Foundation, if it's something really specific to your project. Infra is probably not going to support that for you, because we can't support all these one-offs.
So, we'll say, "We'll give you a VM. You can do it yourself." That's worked out pretty well, but there’ve been a few cases even where people like Greg and David have had to go and talk to these projects and say, "Look, how you're approaching this is not appropriate. You need to pull it back. You need to rein it in." But that's really pretty uncommon. I would say just a basic education as people come through the Incubator is sufficient to dispel most of that.
Those kinds of projects... Do they stand down or they wind up hiring their own committers to do their Infra work? Do you have any idea as to how that works? I'm seeing more projects coming in with more diversity in their committership to take care of marketing stuff, for example. That's expected especially as they scale, but from the site administration side of things, Website stuff, it's a very interesting thing to observe. Some project sites’ information is stagnant ... they're focused on specifically developing code. Others are super productive in terms of getting stuff done. I'm always wondering how are they able to handle all this? Curious to see if you had ideas as to what's going on there ...
I will say this, documentation is hard, right? Writing code is comparatively easy, and it's a lot more fun. So, when you're developing a product, your natural instinct is to develop the product, not develop the documentation. So, you get a project that's only got a couple of active members. They're probably not going to spend most of their time writing documentation. They're going to spend most of their time trying to advance the code base. Even within Infra, that's been a huge challenge for us.
Now that we've hired Andrew (ASF Infra team member and technical writer Andrew Wetmore) to help us work on some of this documentation, it's becoming extremely clear as we work through it how much of that documentation has been untouched. It's been stale, for all the same reasons as these projects. Yeah. Some projects will say, "Hey, we need a documentation guy. That's what Infra said, we need a documentation guy." They'll find one. Maybe somebody will volunteer or maybe it's a corporate thing, whatever. So, yeah, I think it really depends on the project. Some people have the resources. Some projects have the resources, and some don't.
Yeah, it's interesting. Again, since day one, since the '90s, documentation has always been an issue for all projects, even when we started with just HTTPd. It's a constant issue.
If I was going to have money to do anything in a project, I would use it on documentation.
Documentation is often the thing we need the most. I mean, how is it going to work otherwise?
Yeah, I agree. Even from just a cognitive aspect, writing code and writing documentation are about polar opposites. The type of mind that goes and writes code isn't usually the type of mind that can write documentation or can write meaningful documentation. I'm guilty of it myself. I can't write documentation, I find it quite difficult. Where building packages and tying things together, and Puppet configuration management, is not difficult for me. So, it's a huge mind split between those two types of things. I absolutely agree that hiring somebody to do documentation is a great use of resources.
We've grown a lot during the time you've been with us, now six plus years. Other than scale, how has Infra changed over the years? What's unusual is that the team is getting smaller. I would presume as the Foundation is scaling upwards, you would have more team members. It's some crazy number: five people, six people, it's so small. It’s hard to understand how you guys handle everything.
Yes, six people, including Andrew and then Greg, right?
Including Andrew, that’s six, but Andrew doesn't handle the day-to-day Jira stuff anyway. He doesn't handle tickets. So, you really are a tiny group. From your perspective and your experience, would you say that that's a small group, considering the workload and the demand?
Yeah, I would say so. Probably based on my experience in other organizations, about half the size that it would be in a commercial environment. Well, to go to your original question there, in terms of what's changed, I think prior to David Nalley, I would say that Infra was extremely reactive. I think that's changed quite a lot. I think David has really brought an element of customer service and customer focus to the team that really had been somewhat lacking in the past.
So that was a proactive decision to go in and say, "We have to better serve our projects," right?
Yeah. I really do credit David with that. I think he brought a huge amount of that to the team and that mindset. It's really improved our relationships, Infra's relationships, with the projects. It's helped us develop tooling like Self-Service.The more that we can move off into those projects, do-it-yourself tooling, the better off we are, because it's less tickets that we have to handle. It's a constant juggle for us between dealing with legacy code, dealing with technical debt from years and years and years and years ago to doing modern things to bring out new tools, and all the while supporting projects.
In what areas are you guys experiencing bursts of growth or demand? Everyone has a slightly different perspective. I know CI comes up a lot in this arena. Greg's always saying (since I deal with ASF’s Sponsors), "We need more." Where do you feel Infra's growing at the highest rate or the most interesting rate? Where do you feel like that's happening?
Yeah, continuous integration was the first thing that came to mind when you said that. The more projects we have, the more need there is for CI. That's fairly linear. Other growth places are things like Infra VMs, machines that we run to support Infra services internally. Prior to the resources that we have now, we used to have a lot of monolithic systems, systems that would run a lot of things. Think of a machine like Minotaur, which used to run two dozen services on one machine. That's not a best practice at all.
Moving to aggressive use of configuration management Puppet, and making sure that systems are easily replicable with the configuration management, has allowed us to really build -- not quite micro services, but single purpose systems, which are a lot easier to maintain, a lot easier to scale than some of those monolithic systems. So, that's been a big growth area for us. Just the number of VMs, number of systems that we're maintaining, it's got to be in the hundreds at this point. I haven't counted. Yeah.
...These microservices that you're mentioning also reduce the single point of failure, which is critical. That keeps you guys scalable and keeps you up and running. That's important.
Yeah, that's right.
I'm curious when was the last time you guys had a fire drill type of thing, where everyone's hands on. You had something recently, right? A couple months ago, all hands on deck, there was something broken. You guys were able to resolve it pretty quickly, but that's uncommon, where something breaks in its entirety.
I don't want to say anything about this, because it's going to cause a problem.
...We can go off the record.
What I mean is I'm going to say it's fine, right? And then something's going to break.
...[laughing] You don't want to jinx it. Okay.
We have failures from time to time. We've had some situations where there's been a problem at a colo. One of our VM providers had an issue and we lost machines. We had to rebuild them with Puppet, our configuration management, and restore stuff from backup. It sucked, but it wasn't a disaster, right? Because we have the backups. We have the capacity. We have the configuration management. So, nobody had to wrack their brains: “How did this work? How did this go together?” We’ve made very, very big strides in avoiding that old mindset of ‘one guy set this up 10 years ago and nobody else knows how it works.’ We're very much trying to avoid that these days.
...Right, bus factor.
Yeah, yeah, yeah. The configuration management systems have been absolutely critical with that. So, that continues to grow. We continue to add to configuration management wherever possible and just make sure that those systems are able to be reconstituted wherever, whenever it's needed.
Cool, cool. Okay. What do you think people would be surprised to know about ASF Infra?
The other guys probably said the same thing, but probably the amount of stuff that we support from the number of people we have. I think that would probably surprise most people in the industry.
That's one answer. I think it was (Infra team member) Chris Thistlethwaite who said "that we exist", that you guys exist. People don't know how it happens. It's like magic. I've always talked about how Infra is this crazy-magic-impossible story. It's like The Little Engine That Could, because you guys are such a tiny group. You have such a good working relationship, and everyone is connected. From the outside, it seems like a completely seamless operation. There's this magic thing behind the scenes, and then you find it's only five, six people running it. That's mind blowing. It's incredible.
I hope that people have that perception. We do try to provide a unified front. In reality, there's not really any infighting in the team. We all generally know what needs to be done. We all generally agree on how to do it. So, the disagreements are fairly minor and not all that common.
Well, that in itself is unusual, right? Think about it. I mean, there's a lot of factions and politics and weirdness, but that tends to happen with larger groups. So, you guys make it work in a way that's awesome.
I think one of the things that makes that the way it is, is because we're all supporting the ASF, right? We're all here, because we support the Foundation, and we want the Foundation to succeed. So, that drives, I think, a lot of the direction and the way that we approach how we support the Foundation.
You guys have a very different common goal, right? You're there for the benefit of the Foundation with a capital F; Projects are there to work on their own thing. Of course, if they can help everybody else, that's good, too. But the focus is different.
...What is your favorite part of the job?
I have to say, the flexibility and the remote aspect of it, along with the constantly changing technology. There are a lot of opportunities to learn new things, and work on new technologies.
...You are all on call for certain periods throughout the week, right? So, because of your 7:00 to 11:00, are you ever on call overnight, or does that just not work out with schedules, or it doesn't matter?
Well, we rotate on call. So, you're on call for a week at a time, starting at, I think, 10:30 or 11:00 Pacific AM and then going through the following week. So, typically, what happens is you'll get the pages when you're on call, regardless of the time of day or night. But the way that it works out, typically, because we have folks in Europe, we have folks in the US, we have folks on the West Coast and the East Coast, that almost always there'll be somebody awake and available to answer.
Sometimes in the middle of the night, if my pager goes off at 2:00 in the morning, I'll look at my phone and I'll see that Humbedooh or Gavin is already working on it. Thanks, guys. Obviously, the same is reciprocated, right? If the phone goes off in the middle of their night and I see that they're on call but it's 3:00 in the morning, I'll grab a ticket if I can, I'll grab the call if I can. We just try to help each other out that way.
You guys are a true team: you have each other's backs in a way that again, is unusual to see. It's almost like family but even better, because even family has infighting and issues. You are there for each other, which is really, really cool to see.
Yeah, let's say we've had our disagreements, but it is a very familial atmosphere.
When you first came into the role, what was your biggest challenge? Was it what you thought it was? How was your experience?
It was an incredibly steep learning curve. When I first started here, we were in the middle of the transition from the "one guy who set up everything, a volunteer five years ago, nobody knows how it works" environment to a configuration management. We were just starting to get into that, and shore up some of our documentation at the time. For me, just coming in and learning all the different systems and all the different processes and all the different edge cases and one-offs and locations for things and who's who and all these, that was incredibly difficult. It took me probably at least a couple of years before I felt comfortable with most of the systems.
Even today, there's stuff out there where I'll be like: "I'm not sure what this means. Do you have any idea what's going on?" Because there's so many little pockets and holes and places and things and historical legacy stuff. It's very complicated. It's been organically grown over a long, long time.
...With a lot of different personalities and a lot of different processes, that is what's unusual. The "quilt" that makes Apache is so diverse.
What are you most proud of with your career with Infra so far?
I'm not really sure, to be honest. I don't tend to think of things like that. I can't really single out one thing and say, "Hey, I'm really particularly proud of that," or whatever. I try and take pride with all my work. Building better backup systems, I think, is definitely a big one. Just getting through some of this mail project has been good as well. When I finally got everything working, that was a pretty proud moment there. I felt pretty good about that. That was a complicated system. It's still a complicated system. I'm still not sure it all works right. That's why we have to test it. By and large, I'm feeling pretty good about it.
That's great. How would your coworkers describe you?
...[laughs] The response is the same with everyone. Everyone laughs, but grumpy is the first one I've ever heard.
I don't really talk too much. I'm not a super verbal person. So, I always seem to come across as grumpy on the chat systems there. It's a schtick, I guess, but it is fun. I'm not really grumpy. Well, most of the time.
What are the biggest threats or concerns that sysadmins need to watch out for? I don’t mean doom-and-gloom unless there’s actually doom-and-gloom ...A lot of non-Apache folks are curious what the Apache guys think. So, is there anything that you could share in terms of advice or trends that are coming up or something that people should be aware of moving forward?
Security, backups, disaster recovery, those are the keystones of any organization that you absolutely must have in place to sleep at night. If you don't have any one of those three, you're in grave danger of doom-and-gloom.
That makes sense. What is your greatest piece of advice for someone looking to have a job like yours?
Oh, boy. Run for the hills [laughing]. Work with as many different things as you can, learn as many different things as you can, and try not to get stuck doing one specific thing. I think in my career I've been such a jack of all trades that it's really helped me to be able to see and build systems that work with a lot of different technologies. You get some people coming in, they're IBM guys, like a specific subset of IBM AIX expertise or something, right? That's all they do. And then when the situation comes around, well, nobody's really using that anymore, you run into a problem, because you're not really marketable anymore. So, the advice that I would give anybody who's trying to get into the system administration field, be broad and learn as much as you can about as many different things as you can.
If you had a magic wand, what would you see happen with ASF Infra?
I think I'd probably just give us more resources. I mean, I don't really have any complaints, to be honest. I think if we had more, then we would do more.
...More ...machines or more cash or more team or more what?
All of those ...I think more cash. Being able to buy more physical compute resources would go a long way for us. We do rely so much on donations and donated resources that it can be a little bit daunting when that donation goes away and you have to scramble to fill the void. Staffing is a complicated one, because it is familial.
Having somebody new come on board, it's challenging. It's nice to have an additional person be able to work on stuff, but going through the process of integrating them into the team and teaching everything else, it's daunting, it's challenging. So, I think having more resources would be more important at least to me than having more staff, because I think we're doing all right with the staff that we have now. So, that's just my perspective.
= = =
Chris is based in California on UTC -8. His favorite thing to drink during the workday is ice water and the occasional Diet Pepsi.
Posted at 06:20PM Jan 24, 2021 by Sally Khudairi in SuccessAtApache | |
Inside Infra: Chris Lambertus --Part I
What's your name and how is it pronounced?
My name is Chris Lambertus (“Kris Lamb bert uhss”): it's pronounced exactly how it's spelled.
When and how did you get involved with the ASF?
I've been aware of the ASF probably at least since the inception of the ASF. I've been working in IT for quite a long while, and I've been very familiar with the ASF projects, because I use them daily in my career. I didn't actually get involved with the ASF until a buddy of mine who was working on the CloudStack project mentioned to me that (ASF VP Infrastructure) David Nalley was looking for somebody to do some contract Infra work. Long story short, I talked to David and I tossed in an application. I was eventually hired as a part time contractor.
...So CloudStack, we're talking about 2012, when CloudStack first came into the Apache Incubator, or was it after that?
I've been aware of the ASF probably since HTTPd, since the original Web server came out. I joined the team late 2014.
Explain your role within the Infra team —how did you get here? Were they looking for someone who specializes in something particular?
My understanding was David was really looking for somebody that had a background in production systems engineering and had been doing it for a long time in a production environment. That's something that I had been doing since 1992: I've been essentially a professional production systems administrator. I knew that skill set was definitely in line with what David was looking for. I think I brought that to the table pretty well. That's basically what I've been doing ever since as a contractor.
What are you responsible for specifically?
That's a complex question, because the ASF Infra sysadmins are essentially responsible for everything. All of us are all responsible for all of the things. We do tend to specialize a little bit. My current project is probably reengineering the mail system: it's the largest one I'm working on right now. I do tend to focus a lot of my efforts on backups. Beyond that, I do a lot of JIRA and Confluence work with Gavin (ASF Infra team member Gavin McDonald). But Puppet, configuration management, again, all these things are things that all the Infra guys support.
In past interviews everyone has basically said, “we do everything”. How does it work? There's no hierarchy. Everyone does everything. Do queries come in and everyone jumps on them? Do you have a round-robin way of getting stuff done? How do you manage with so much going on with Infra? How do you cope with that?
“Cope with it” is an accurate term. Each of us has... I don't really want to call it a specialty, but definitely a focus. If some question comes in about the new mail routing or something that I've been specifically working on, that would go into my bucket as a priority. Certain people have history with certain types of projects. Gavin (Infra team member Gavin McDonald), for example, has been heavily involved with the Continuous Integration infrastructure for many, many, many, many years. So, he tends to be the font of knowledge for all things CI-related.
We tend to break things up that way. Some of the team members definitely have skillsets above and beyond general system administration work. Humbedooh (Infra team member Daniel Gruno’s username) is a very skilled programmer. He then ends up owning a lot of the software that Infra has developed and he has developed. So, questions regarding that, and specialty configurations related to software that he's written tend to go into his bucket. Because of the nature of the team, and because of the nature of the time zones that we're all in, the responsibility of dealing with issues follows on whoever is on call, first of all, and then whoever is awake and available, to handle any situation that comes up regardless of who "owns" the technology.
Describe a typical workday for you.
Apache work for me is basically: I wake up in the morning, hopefully not at 3:00 in the morning, but get out of bed and plop down in front of the computer. Essentially, my lifestyle is I've always been a computer guy. I've always been really focused on computer system administration, not only as my work but also as a hobby. So, I spend the vast majority of my day behind the computer, whether I'm working on Apache stuff or working on other projects, things like that. That'll go on until 11:00 at night. So, my "workday" is essentially me living my life and doing tasks as they arrive and doing projects as necessary and getting things done that need to get done.
...All the things.
Regardless of the time of day, yeah.
How do you keep your workload organized? Folks have all sorts of different systems. Are you an Evernote type of person, or do you keep your own journal? Do you have a certain system to help manage your workload?
Jira is the primary basis for managing my workload with the ASF. We've done a lot of work in terms of building technologies around Jira. Our service level agreement reporting tools I find extremely useful for seeing what's in the queue, what needs to be done, what hasn't been touched in a while, things like that. That really drives a lot of my day-to-day efforts in terms of replying to tickets and servicing customers.
In addition to that, I also use Jira to track my projects. So, if I have a project going on, that's usually a Jira ticket. And then I can go back and refer to those and see where things are, what needs to be done. I've never been a big one for lists of notes. I do have notes that I keep, but by and large, the things that are on the top of my stack maintain on the top of my brain at the same time. So, I don't feel like I forget a lot of things, but I don't take a lot of notes, which is what it is.
…And then there's people who have everything except the monitors covered in Post-it's.
I don't do a Post-it thing, but I have little text files everywhere with notes and things in them.
So, you all have day-to-day tasks that you manage, as well as things that require your immediate attention, as well as long term projects. In my earlier interviews with other Infra team members, everyone's been saying that I have to talk to you, because you're handling “The Email Project”. For those who aren't aware, standard operating procedure at the ASF is “if it didn't happen on-list, it didn't happen”. So, you have, if I'm understanding this correctly, 21 years’ worth of email archives that you're working on. What's going on with this project? What are you handling? Why is it so important?
Well, as you know, email is the lifeblood of the Foundation. Everything that happens here happens on a list. Because of that, the Foundation has amassed a very large quantity of email archives. Those archives are fundamental to the provenance of the Foundation. So, maintaining those and keeping those safe and available is really a top goal of the Infra team.
The mail project, such as it is, is essentially to upgrade and migrate our existing legacy email system to a modern, more supported system. The current email system as it stands was engineered by folks, volunteers, some staffers, I would guess, over 10 years ago, maybe 15 years ago, running on FreeBSD, which we don't really use too much anymore. Actually, we don't really use it at all. They used technologies that were interesting at the time, but are perhaps not so well supported today. So, a lot of it is modernization.
A lot of it is taking a lot of that old tribal knowledge that really doesn't exist anymore and bringing it into the modern era, documenting all the weird little settings that we have and all the edge cases that we manage in email, management of the list systems, mailing lists and their configuration, and making sure that gets upgraded, migrated, modernized. Doing that all in such a way that we don't a) lose anything, or b) suffer any downtime. So, it's a large project. That's really what I've been working on probably for the better part of the last two years, bringing that up to the present era.
You’re like the Titan Atlas: carrying the heavens on your shoulders. That's a massive, massive undertaking. Is there like a deadline for this —where's the end for this project? Is it never ending?
I feel more like Sisyphus than Atlas, but the deadline is as soon as possible. The thing that we're fighting against is the safety and longevity of the old technology. For quite some time, our primary concern was that the hardware that was running the old email system, which was 15 years old, was going to fail. In fact, it did. But fortunately, I basically copied the whole thing off to a separate colocation facility. So, we had an archive of it when it went down, and I was able to bring it all back up.
So, that wasn't a problem. I mean, it was a problem, but it wasn't a disaster as it could have been. So, the deadline is as soon as possible. But in reality, it's going to work until it stops working. I'm not sure how to better state that, because the technology is so old and we really need to get off of it and onto new technology. But there's no hard and fast timeline. Nobody's really cattle prodding me to get it done, but it's the absolute top priority that I have.
...That was actually my follow-up question. Is the “as soon as possible” official, or is this something you're setting for yourself because you just want to get it done?
Oh, that's definitely an official timeline. Yeah.
...I remember our first email servers were a machine under Brian Behlendorf’s desk at the Wired offices. So, we've come a long way since then.
We have, yes.
...You're handling this behemoth. Are you also dealing with the day-to-day putting out the fires, as well as everybody else?
The volume and scale of this project seems so huge. Again, the word 'cope' keeps coming to mind, because knowing what I know —and I don't even know— it's just scratching the tip of the iceberg: it seems astronomical in terms of scale and scope. Are you building everything from scratch for this project? Are you using any kind of commercial packages? This is a huge overhaul. Tell us more about it.
Multitasking has been in my blood for my entire life. I don't typically have a problem of splitting my time and my attention and my energies between multiple projects. You are absolutely right: this is a titanic project. It's one of the reasons why it's taken so long. Like I said, we've been working on this for several years at this point. The reason it's taken so long is twofold: One is I can't spend 100% of my attention on it or else I would go absolutely crazy. So I partition that. I partition my mind and my time, if you will. Just a little bit of time here working on this, working on this particular aspect of it, then I'll go work on some tickets. So, I'll go work on something else. If I was only working on the mail, then other things wouldn't get done, right?
I have to partition it that way. I think the main way I've tackled this type of project... Again, my experience in system administration going back so far, I've worked on a lot of very large scale projects. So, this is in the middle in terms of the scale. But the biggest thing is to break it down into multiple components as small a component as you really can. The first thing to do is to analyze the existing system. "What is it? How is it running? How is it tied together? How are these things all related? Where are the pieces? Where are the tendrils? How far do they go?" “Write that down.”
I started developing documentation that explained a lot of stuff. There was some documentation that existed. I take that and I carry it forward then into the new system. Okay, "what things do I want to keep? What things do I HAVE to keep? What things are legacy? What things don't we use anymore?" That process of discovery, of understanding how it was built, why it was built and what we're still using, and what we don't need to use anymore, is probably the vast majority of the work--just to understand it. Once that's done, we say, "Can we use the old technology, or do we need to use a different technology?"
In the case of the Foundation, we're extremely tied to the way that ezmlm, our mailing list system, works. ezmlm is extremely tied to Qmail. So, converting those into other tools, basically, I'll say, it's too complicated. With the amount of data that we have and the amount of dependence that we have on those configurations, migrating it to a different system would be incredibly difficult. So, what we've done is there are modern versions (and updates for) these pieces of software, ezmlm and Qmail.
What we've done is I've taken those packages and I built them for modern operating systems. I've patched them with current technology, TLS and various modern email stuff, and put that into configuration management and built a system that deploys all those packages in a reproducible fashion. So, at any time, I can just turn on a new machine. I could type in, "This machine is the new mail router," and run Puppet, our configuration management software, on it. It'll deploy all that software automatically.
That's probably the second part of this huge phase of developing this. The phase that we're in right now is testing it to make sure that it works the same way as the old one works. Once that's verified, then we can actually look at migrating the old data onto the new system and deploying it into production. I think that answered your question.
I think so, but it made me think of another question: How did this wind up being "your" project? Was this assigned to you? Did you jump on it going, "Yeah, I'm taking it"? How did you wind up with this?
That is a very good question. I don't really know. I think probably just because I had been working with... Back in, 2015 maybe, we were actually having this exact same discussion: "what do we need to do to migrate this EZMLM, all these mail archives, all this stuff to a new modern system?"
One of the things that we looked at was, "Can we transfer this? Can we translate this to something like Mailman or some newer type of mailing list management system?" We looked at a couple of options. The biggest problem we had was that the archivers were terrible. So, Humbedooh basically ended up writing this thing that became Pony Mail as the answer to that system. Ultimately, that turned out to be a great effort. I think it's going to take us a long way. But in the end, I was the one to continue to work on the email system. For whatever reason, I guess it just became my thing. Maybe because I was the only one willing to do it. I don't know.
...Is the legacy system going to be powered by Apache Pony Mail (incubating) at some point, or is it already in the process?
So yeah, lists.apache.org is our primary advertised archive system. That is what we're telling people to use. In terms of what happens to the old system, that remains a little bit under discussion. I don't know the ultimate disposition of that, but the current plan is lists.apache.org will be the primary access to the mail archives.
I noticed that Pony Mail goes back quite a bit, but it didn't originally go back as far as it does now in terms of the archives. I’m curious to see if everything eventually is going to be migrated to it.
Yes, yes, we actually have a plan to load the previous archives in there. We loaded a subset when we first started it up. I believe they go back to 2012 right now. So yes, we do have a plan to load the previous archives.
Great. I understand some Apache projects and their communities are always asking for new services. How does Infra decide which products you support? Who gets assigned to take the lead on introducing new services or new products? I understand that you develop your own custom solutions as well. How do these get divvied up? Is everything in queue? How does it get done?
When you're talking about a project requesting a service, I think the first thing we look at is, "Is this service extremely specific to this one project, or is it something that has broad appeal to the Foundation?" If it has broad appeal to the Foundation, we've got multiple requests for it, it's a service that we feel we can provide, given the amount of time that we have available, then it's something that we would consider doing.
Obviously, there's a lot of other thought that goes into that in terms of what it is, what it does, what it needs to do, who needs access to it, that we have to evaluate. But generally, if it has broad appeal to the Foundation, it would be something we would look into. If it doesn't, if it's something that's very specific to a certain project, what we typically recommend is that a project request their own VM. They can run the service themselves. That's typically how we’ve approached that in the past.
Has the team been in a situation where you're like, "Hey, this is a really cool thing, let's bring it in," and then throw it on projects or see if anybody wants to do it? Does the converse happen also, where you guys have insight as to something that's hot and new and you think that would be a great fit for Infra, but you have to find a "problem" to connect it to; or is that not something that you deal with? Is your work all reactive, or do you ever come into a situation where you say, "Long-term planning: we want to introduce something brand new"?
I think probably up until maybe five years ago, the work was almost entirely reactive. But the team and the processes that David (Nalley) now put together have really pushed us more in a direction of future planning, of taking the time and taking the mindset of, "What can we do long term to better support projects?" I think selfserve.apache.org is a great example of that. That's something that grew out of a small subset of tools. We got very positive feedback from Committers and Projects about using selfserve.apache.org.
That tool has grown extensively since it was developed. I think one of the best things that we provided recently is the .asf.yaml system, which allows projects to essentially set up 90% of their project metadata in GitHub. It lets them set labels. It lets them set notifications. It lets them set all kinds of things, all self-service. So, it's taken a huge load off of Infra in terms of responding to tickets, and also put a lot of that control in the hands of the projects. That's been incredibly well received. It's definitely, I think, one of the best things that we've done for projects in a while. I think it's a fantastic tool.
That's great. Now, it's also a new way of institutionalizing, so to speak, of "scratch your own itch", but in a way that that's a common deployment. You can do your own thing, but there's a common mechanism or method of doing it, because before —it was like the Wild West, back in the '90s— everyone's just doing their own thing. It didn't really matter, but it wouldn't scale properly: you guys can’t really support them because everyone's doing something and it was a one-off. It's interesting to see that selfserve.apache.org has standardized or unified that process.
Yeah, and one of the things that I really like about it too is because we have so many different projects --they're so varied-- the people that work on them are so varied in their skill sets and their desires and their interest level and their skill level and all this. What we want to be able to do is empower projects to use the tooling and take advantage of the skill sets that they have available. So, we don't want to arbitrarily enforce, "Oh, you must use this particular technology," but we also don't want random technologies to proliferate, like you alluded to, the Wild West. So, it's a very refined balance between, "How do you allow projects to do their own thing in a way that's scalable and supportable?" That's a complex task. It's difficult to manage. I think Self-Serve (selfserve.apache.org) goes a long way to support that.
Speaking of Self-Serve and other solutions the team is providing, the strategic process of figuring out where to go —direction— I know you have David (Nalley), I know you have Greg (ASF Infrastructure Administrator Greg Stein). Does the entire team participate in this? How does this work: is it top-down, or is it bottom-up? Are you guys saying, "Hey, there's a new thing that we should do"? I presume you don't have an annual strategy, but rather an ongoing rolling process; how do strategic decisions get made?
It's a collaborative effort, for sure. I think we do have an annual —when we get together at ApacheCon, we do tend to have a lot of discussions about strategy and about future direction. That's one of the things that we try to do as Infra with our team meetups, and with ApacheCon as well, to get together in person in a room and talk about where we're going to go, what we want to do. I say the process is collaborative, because sometimes it comes from the direction of Greg, or the Board, or David, or whoever. Sometimes it comes from a staffer saying, "Hey, it'd be cool if we could do this."
Sometimes it comes from Projects, or Committers, and they say, "Hey, can we go in this direction? I think it would be useful for X reasons." It just depends. By and large, the decisions for a future strategy are brought up by whoever thinks of it and are discussed within the team at a peer-to-peer level, right? We have very few situations where Greg or David or somebody will come down and say, "Thou shalt do it this way." Yeah, very uncommon to have that happen. It's a very collaborative environment, which I appreciate and works well for me.
So, in light of the pandemic, you guys didn't have your face-to-face. Did you do a virtual annual meeting? Or did it just not happen?
Well, we have a weekly team meeting. Yeah, we didn't do any virtual thing beyond that.
With so many projects at the ASF now, with 350 projects and initiatives and growing and so few of you in Infra, you must be constantly learning new things. How do you keep abreast of what's new? How do you close your skills gap? How do you stay ahead of everything?
I follow a few mailing lists, discussion boards, Reddit, and other similar sources. I typically learn new things when I need to implement new technology to solve a problem. "How do we provide 'X'?" I’ll go research it and learn that way. I also find out about new things from my hobby projects or other work.
...It's not like "I want to take Blah University to become certified in X" or anything like that, right? I mean, you’d do that from your own interest, but it's not something that's required of the job unless it comes up, right?
No, that's never been required of the job. Personally, I'm very much a self-directed learner. If I'm interested in something, I will absolutely seek out the resources to do so. I will say that there's not a lot of time for that stuff, at least not for me. I got a lot going on, right? So, having the time to sit down and take a class or go through that process, I find very difficult. I don't really learn that way very well either. So, class-based learning has never been for me.
...Not linear. Yeah.
Yeah. So, typically, if I want to learn something new —I've been trying to learn Python, because it's definitely a gap for me— I find it incredibly difficult, because it's very hard for me to sit down and watch a video on programming, right? I got to have a reason. I got to have a thing to do. I need to have a project that requires it. And then I go and I figure it out.
...Got it. So, it's purpose-driven education. You need an end result.
Yeah, exactly. That's how I've always operated.
[END OF PART I]
Posted at 02:09PM Jan 11, 2021 by Sally Khudairi in SuccessAtApache | |
Inside Infra: Andrew Wetmore --Part II
The "Inside Infra" series continues with members of the ASF Infrastructure team. Andrew Wetmore shares his experience in Part II of his interview with Sally Khudairi, ASF VP Marketing & Publicity.
Let's talk about your background and your road to the ASF. How did you become a technical writer and editor? What sorts of projects were you working on?
Well, let's see. I spent 20 years as an ordained minister and I was working for the Episcopal Church in the US and the Anglican Church in Canada. I got to the point where I preached my many thousands of sermons and it was time to stop. It was about then I moved over into QA and documentation with a company building healthcare software in DOS. That tells you how far back we are. One of my first great excitements was helping that team through the Y2K tensions. I got myself a bit smarter and took courses and had a lot of hands-on experience. I became proficient as a tester as well as a documenter. I worked over the next 15 years for a number of companies, large corporations, startups, and nonprofits, and leading teams or participating in teams, both for documentation and testing, but also at one point, I was the director of user experience. It was designing the front-end for a big complex project.
I've built applications from end to end, usually using Flex which compiled to Flash in the days when we could trust it, when we hadFlash to play with, ColdFusion for munging things around and communicating with the database and a MySQL database.
… That's a blast from the past with ColdFusion.
It's still around. There's a new ColdFusion. It's even brighter and shinier, I'm sure. I was just looking at it and thinking, "Gosh, I really should take a look at the tutorial and see if I still recognize anything."
I'm curious, when you tell people what you do, how would you describe the ASF to the uninitiated?
I would say it is a benevolent community home for a whole bunch of highly focused teams who are trying to do good stuff. The benevolent community home provides the support features that let those teams do their things without crashing into each other.
How do you explain what you do?
Do you know the movie Fifth Element?
… Yes. That's a cult film in the tech community. I've only seen parts of it superficially, I don't know it years after so I might have to watch it again.
You were very young. Your parents probably had to give you permission to go to it.
No. I'm older than you think.
In that movie, there's a sequence when a bad guy is explaining economics by knocking a drinking glass off his table and it breaks and there's a mess. Out from the baseboards of the wall come all these little robots, one robot with a broom, one robot with a dustpan and one robot with a vacuum cleaner and a duster and they go. They run around and they clean up the mess and they disappear back in the baseboard. That's me on the Infra team.
I'm the little guy with the dustpan.
… I love it. I understand that you are also very active with ApacheCon --were you involved in this past ApacheCon that we had in September?
I was. I thought Royale had some things to say that could be said and I looked around and nobody seemed to have the time or have paid attention to the fact that there should be a Royale track. I said, "Oh, there's going to be a Royale track and I guess I'll coordinate it."
… You volunteered to do that, you decided to do it, you just rolled up sleeves and dove it?
Well, I said to the team, "If nobody else will do it, I'm going to do it."
I was sure, I was so hoping, that someone else would say, "Oh, no, I'll do that," and then I'd be in a support role, but that didn't happen. I also engaged with the team that Rich had to put ApacheCon together, but in a very minor way. I didn't help as much as I felt I should have helped just from a lack of time.
… You were on the Planners list; you were involved with that as well?
Yeah, in the regular meetings and so on and testing out things like the --
… Hopin platform.
I have my own Hopin account now because I found it quite useful.
Was that your first ApacheCon or have you gone to a face-to-face event before?
I've never been to an ApacheCon until this one. Obviously I've never been in a face-to-face one because there hasn't been one since. In fact, the Infra team was going to have one of its annual face-to-face meetings. We were all geared up to do that just when the lockdown happened this year. I haven't even met my colleagues.
… That was actually one of my questions, so we'll get to that later; that's interesting too. Just before we leave ApacheCon, Apache Royale, you mentioned, for them, "it's code once and run anywhere".
That's the theory.
The Project is popular with folks who are programming for mobile devices and other applications. Is this something you're considering with your work that you're doing on apache.org? Is there any cross-pollination or is it completely on a content basis only?
I've not got to the point of suggesting that Apache as a group or that the Infrastructure team use Royale. One reason is that Royale is at the 0.98 stage release. It's really darn good, but it has some weaknesses still. I'm thinking that the suggestion, "Hey, why don't we use the thing which is after all an Apache project to power our front-ends?" should better happen once we're at the 1.0 version.
… I'm all about eating your own dog food, but when you're ready, right? Early on I'd always wondered why don't we require our projects to use Apache projects for everything and was constantly told, "That's not how we work. We don’t dictate what projects use --they’re free to use what they want." Very interesting position, compared to other groups. I was just curious, are you coming across things on the site and saying, "Oh, Royale will be a good fit for this."
One of the ways that Royale could be useful would be as the front-end for the Apache project pages. However, the Apache project landing pages are static. That's their primary thing. They're static HTML pages. Royale really shines when you're doing data-driven pages, when--
… You’re developing across platforms and devices
That's right. Sally logs in and Sally sees this, that and the other thing because of her role on the site. The Royale-built site dynamically knows what to show her. Andrew logs in and doesn't necessarily see what Sally sees.
… That's amazing.
It's cool. I've built apps like that using Flex that were serving people in many different roles, doing many different things on a common project without this huge massive duplicative pile of code to do it with. I could do my elevator pitch for Royale anytime, but that's not what this talk is about.
… No, but it's interesting because it's about site development. I was curious if it impacted what you're working on.
The main difference is that Royale has more power than a project site needs. Pushing them to use Royale because it's an Apache tool might be requiring people to use a shotgun to kill a fly.
With the expansion of the Web and how everyone's becoming a "publishing expert", many people want to break into technical writing and editing. They want to get their hands on site content and content development and don't know where to start. What do you think are some good entry points? Are there special considerations for Open Source, specifically Apache? Is there anything that people need to consider doing? Are you doing something that's very unique to you because you're doing that for the Foundation or is this a more common type of job that you could take anywhere?
I'm not a super expert on anything. I think that would be probably the theme of my life, but what I bring is curiosity and a willingness to try to see things, not just from my stance, but how would this look like to someone who doesn't have the same preexisting knowledge that I have. Any person who would like to become a documentation person for something really just needs to find something and say, "Hey, can I write about it?" Almost any project in the Apache galaxy would jump on a person who is willing to help with the documentation.
There's not a project here … well, there are a few projects that are very, very thorough with their documentation, but there are a lot of them where this is the thing that you're asking a technical developer to turn over and use another part of their brain and become a writer about the result of the technical development. That's not the easiest thing for most developers. If somebody with writing skills shows up and says, "I really like this cool thing you're doing. Could I write about it, so I could understand it better and maybe others could understand it better?" I think any team in Apache would embrace that person.
Since Day One for us, it's always been "documentation, documentation, documentation", but that’s often lacking. It's a challenge because you want people to do what they're best at and most comfortable with and happiest doing … which … the majority of folks want to develop code and yet we have an uptick with contributors that are non-code contributors. It's an interesting thing to see, "Where can they find the talents?"
In the "real world", the world of a corporation trying to make a buck off their code, they'd have X number of testers anda build and release person and X number of documenters and a middle manager that would help to make all this stuff happen. We don't have that structure. Indeed, the poor developers are called upon to try to put into words what they're doing. It's just tremendous if we have more --I want to say code sympathetic writers, not people who don't have any clue what's going on, but people who have some idea. At least they can ask the right question and say, "How come I don't understand what you're saying here? I think I tried to do it and I can't do it." Then the developer can say, "Oh, that's because you need to be logged in here. We forgot to write that down."
… Right. The obvious missing element to it.
Well, it's so clear in front of you that you can't see it. I flew in a plane once, years ago. The guy sitting beside me had been on the quality assurance team for one of the Gemini missions, the space missions, and this was the early days of manned spaceflight. He told me about how they were testing the escape procedure if something went wrong on the launch pad and they had this 14-step procedure that was attached in front of the astronaut. They thought, "This looks pretty good. We'll go through it." Sat someone down inside the spacesuit in the chair and said, "There's your procedure. Do it. This is step one. Do this. Two, three, four, five." Step six is blow the explosive bolts to release the door. Boom, bang, bang, bang, bang, the bolts go. The door flies away. With it goes the list.
… Right. Forgot about that. Vacuum. There goes the astronaut too with it.
… Wow: in the midst of it, you don't think about it?
Well, exactly. What I feel sad about is people who become excited in a software project and try to figure out how they can use it to solve their own problems or answer their own needs. They can't make it run or they can't make it do what they want and they can't figure out from the help docs how to even ask the question they want to ask and they give up and go away. That's a silent vote that we don't really hear.
… That is unfortunate. Unlike other Infra team members who have said to me; and I'm sure you read it --"everyone does everything".
I do nothing. I do nothing.
You're uniquely responsible for optimizing site content. Do you collaborate with any specific Infra team members? You said you talk to Greg. Is there someone you have to go to every time? Do you work with anyone else out of Infra?
I don't go to any one person because I really don't want to make that person roll his eyes when I contact them. It's a small team. First off, I usually lob out a question when I have an issue. I say, "I'm over here on page so-and-so. I haven't a clue what this thing relates to because it looks like it hasn't been touched in a few years. Who knows?" Sometimes, someone right away will say, "Oh, you do this and do that," or they'll say, "Oh, no. Drew knows about that or Gavin knows about that. Go ask him." Then I do that. And sometimes there's silence. Then I won't ask again on the list. I'll wait until the team meeting.
When we've got everyone on the call and it's my turn and say, "I'm stuck on this thing. Who can help me?" someone always steps up and says, "Yeah, put me down for that. I'll help you in a couple of days." All of these people can do everything. The codicil to that is that all day long they are doing everything. I don't want to be hauling on the same person's elbow all day long saying, "Help me with my little thing." I want to spread out my requests, so I don't pull away any one person too often from the essential tasks, the core tasks of keeping the Infrastructure running.
… Do you work with anyone outside of Infra or you only work within the Infra team to get your work done?
There are a number of people I consult, especially specifically with this migration off the CMS. I'm dealing with people on various projects. I probably have 25 conversations going on with people specific to their projects about what are the various pluses and minuses of the alternative technologies that are available or how do we even do step whatever in the list that's on the wiki page. Actually, I'm so proud of myself when I can actually answer one of those questions.
… How do you collaborate with everyone? Do you use certain tools or is it just email or Slack? How do you work with those folks?
Primarily Slack. Well, there are two things. Slack conversation is going on all the time. I also keep my eye out for whenever a page is updated on the confluence wiki Infra area. As soon as it is, like that little robot with the dustpan and the broom, I go and look at the page. I just scroll down it and maybe fix a little punctuation, change this word and that word to make it a little bit more cleare. Usually, I save it in such a way that they know I've been there. Sometimes if I'm just being really, really finicky, I turn off the thing that says, "Tell the team that you've done it." I'll just stealth in and change that run-on sentence into something more legible, but I don't want to draw attention to it.
… Is that more common than not, or …?
I'm not going to say.
… The "non-intrusive" thing. Describe your typical workday.
I get up around 5:00 or 5:30 and get on the Slack channel and see what's happening. The nice thing about my part of this work is that almost never do I check in and they say, "Andrew, you got to fix this." There's usually not a hanging message about an urgent issue. Then if there is no such hanging message, if I know that I have an ongoing project, like the top-level apache.org pages, I go and tackle the next one. That will keep me busy for many days to come. I keep the Slack channel going partly because, as I said before, I like to monitor what's on people's minds because I may say before they think of it, "Oh, do we need a page about that?"
I'm only working part time, so I only have basically a half of every day available for Apache. I tend to do a couple hours in the morning and another in the afternoon and another at the end of the day. I have another job with a small publishing house. As I'm working on that, I have one eye on the Apache Slack channel to make sure there isn't anything that requires me to jump over there really quick.
How do you keep your workload organized? It sounds like you're super immersed in everything.
Well, I've got, as I mentioned, a job jar page. Basically, it's a long, long list, a checklist of things that need to be done. It's divided into sections. There's a section of things that need to be done for which I need input from other team members. Then there's basically another section where it's just stuff I need to get around to doing. Then if the team comes up with something that they think I should be doing, they're not above adding an item to my list.
We are familiar with that. It's like, "Oh, I left many, many, many more items." How do you stay motivated? What are your challenges?
This is fun. It's partly fun because the ... Let me turn this around: I worked for a corporation at one point. This is fairly early in my career in software and I realized I didn't like what they were selling. I didn't like the way they were selling it. I didn't like what they hoped would happen with the stuff they were selling. It made it harder and harder to go into work. I was very happy when that relationship ended. Here, I am playing around at the edges of a very exciting machine shop or toy workshop with complicated gears and sliders and rheostats and bubbling things that I barely understand, but that I can be helpful with.
Really, I can't see any trouble with motivation. I don't have to say, "Oh, really, you have to go put in your hours." It's more like, "I know, I also have to do this thing outside, my editing job, but I really wanted to do this thing for Apache."
… That's awesome.
The nice thing is that the Infrastructure team does so much so well and almost making it look easy that any project in Apache that's really got itself organized to do its work is going to find success, because there's going to be no roadblock or brick wall or power failure that will keep them from it. That makes me feel like I'm engaged in a very small way in a very large good thing.
With regard to the Infra team, you're looking at it as an "inside outsider", right? You're someone who's working with Infrastructure, but you're not a sysadmin or not a DevOps person. Is this the first time ...
My camouflage is that we all have beards. I fit in that way.
[laughing] … is this the first time you've been part of an Infra team, because usually folks with your profile are usually part of a content team or marketing or PR or sales engineering or some other division that's more again outwardly facing from an acting standpoint. Is this the first time you're dealing with the underbelly?
Not purely. The very first software company I worked for, within a few weeks, I was in charge of build and release. That was as far down into the bowels of the code as I was. I wanted to go with that point, but it was certainly not outward facing or documentation related. At another company, I was in charge of localization across 17 languages. Although of course, there's words involved, it was very much words in terms of, "Will the German for this thing fit on the button we have for it?" I've been the inside-outside guy in other situations before.
Cool. Website administration, as we've been talking about, running Websites in general changed so much over the years. What has been the biggest change or hurdle that you've experienced?
Purely in Website administration or in dev?
... Anything for documentation, what you're doing now, and what it relates to administrating site content.
Gosh, I think the disappearance of paper for me as a writer is one of the big changes. Almost everything I do now I can do digitally. One of the companies I worked with, I helped supervise the transition from printed user guides. There was a great big room full of boxes of spiral-bound user manuals that we stopped doing. We moved over to a product which we could dynamically create, so we could create a manual for Sally and her role and a manual for Andrew and his role that would come off the same text source but would have different chapters with the stuff relevant to what they were doing. Getting away from the physical artifacts to me has been the biggest change in the writing world.
Remember, I work in a publishing company, when I'm not at Apache and what we turn out as books are very much in the physical world. I was just working with an author this past week where I'd send her a PDF of the very final, final, final, "This is the final version of the book before we go to press." She said, "I need a physical copy. Send me one."
We printed off this endless book and drove it to her house. Then she found two tiny little things that way and we fixed those and everybody was happy.
… That's good. When I teach media training, I require everyone to put their laptops away and write, even if it's on the tiny little hotel notepad, but write it down because your brain processes things differently when writing.
… Some people say, "I feel like I don't know how to write anymore." That's such a sad situation, but it’s our reality.
When I'm doing my own writing, first drafts are always physical. Pretty much always. Then the nice thing then is when you move it over to digital form, even if you try not to, you see ways you can improve it. You're already into version two with a better document because you wrote the thing first by hand and then you write it again on the keyboard.
You've seen a lot of transition in this space. How do you close your skills gaps in order to stay ahead of everything? How do you do that?
Boy, my skills gaps are larger than my skills. I'm wonderfully good at Apache Flex but it's not a skill much in demand now ... If I had known three years ago that I would be on the Infra team now, I would have learned Python, become very comfortable with it, because it's a Python shop. I'm learning Python on the side, not in order to become a coder, but just so I know what the others are talking about.
Got it. But that's not required for you to actually do your editing work?
No. Nowadays, the writing tools are good enough and versatile enough that they're almost transparent. If you learn how to write in Word or on Apple pages and it turns out you have to write using Markdown language in Git pages, that'll take you 90 seconds to figure it out and then you can do it.
Great, so that's not a problem. Earlier we're talking about you not having actually met the team: the offsite was cancelled, and you haven’t been to a previous ApacheCon. A huge advantage for the ASF is, as you know, especially during the pandemic, is that we've been virtual since Day One. We literally didn't have to change anything in order to maintain our day-to-day operations: it's just business as usual. Just keep going. For you, have there been any advantages or disadvantages of working remotely from the team? Do you think your work could be improved in any way or is it no big deal?
Because that's how the team works and because the expectation is asynchronous, that is you ask a question and you may not get your answer until the person four time zones away wakes up, it's not been a big impediment to me. If I need to have a private conversation with someone, I know how to ping them on Slack and we can open a private conversation. I will say I was working remotely from 2013, I guess. I moved from Boston up here to Nova Scotia holding on to the same job I had with the company I was working for. I was into working remotely from then and have continued pretty much without too much disruption.
When that job came to an end, I went primarily freelance. I was working with clients in Japan and Laos, Germany, all over the United States, South America and so on. Of course, I never met any of them face-to-face. This Apache experience ac,tually, I feel closer to this team because we're on Slack all the time than I had felt with many of the other teams I've worked with over the past decade.
… I love that. That's great.
We share cooking recipes. That's really important.
… I hear a lot about that: Infra’s cooking, drinking, and eating. I ask everyone this question and it tends to make folks pause, but what do you think people would be surprised to learn about ASF Infra?
I think ASF Infra is like the person who's not the main act in Cirque du Soleil. This is a big thing happening on Cirque du Soleil. Over in the corner is a person who's throwing two chainsaws and an apple, an egg and a baby, juggling those all at the same time and just doing that. It just happens and somehow it's essential to make it possible for everything else to happen, but that juggler is good enough to just make it happen and make it look easy. I watch what my team colleagues are doing. I'm just in awe of all the things they managed to do. The Apache teams are doing their things and something goes crazy on a server somewhere. We get an alert about it. Whoever is awake at that hour from the team jumps on it and maybe they call in another person. Then they realize it's because of this third thing over on this other server that's gone wrong and they fix that. All the individual project team might notice is that their email is delayed for a few minutes.
… Right. Everything else is being juggled in the background. They're not aware.
Juggling without dropping an egg or a baby.
… Incredible. If you had a magic wand, what would you see happen with what you're handling within Infra, with your role specifically because you're the only one doing that? What would you like to see change?
I don't know that I have a wish at this point. It's pretty straightforward. I guess I wish I could time travel back and learn Python and come back here and be at this point knowing Python, rather than trying to pick it up on the side.
What's your favorite part of the job?
My favorite part is when I hear back from people, "Oh, now I get it. I read that page again now that you've edited it and now I get the thing."
When something we've changed, something we've provided makes it possible for people to do what they need to do.
What was your biggest challenge when you came into the role, when you started? Was it a wall of, "Oh, my God"? What was it like?
I guess it was grandma's attic, trying to figure out what box to pick up first and feeling in the first weeks that I was working, I was afraid. I was very, very preoccupied with not offending anybody, with not implying that they were ... Not correcting their writing in such a way that I insulted them. It took me a little while to realize of course, they are perfectly happy to have their typo corrected, but it was a matter of ... In the first few weeks, I was still feeling out my colleagues and understanding how much they were going to appreciate some documentation help, how much they would find it as an intrusion or a waste of time.
What is your greatest piece of advice for aspiring technical writers and editors coming into this type of role? What advice would you give them?
I would say ask more questions. When you're stuck, don't presume it's your fault. Ask a question. Someone may say, "Oh, that's because X and we never said X."
What are you most proud of in your Infra career to date?
I haven't broken a single damn thing, except one thing and I'm not telling you what it is.
All right. How would your coworkers describe you?
I think the robot with the broom and dustpan. I think they bought that one. I suggested it.
What else do we need to know that I haven't asked?
I'm a playwright. I play the banjo. For $10, I won't play the banjo. At this point in my career, f you hand me a banjo and a cup of coffee, I'll be happy.
= = =
Andrew is based in Nova Scotia on UTC -4. His favorite thing to drink during the workday is countless cups of black tea, accompanied by homemade pumpkin-seed-flour bread served hot with butter.
Posted at 03:13PM Dec 14, 2020 by Sally Khudairi in SuccessAtApache | |
Inside Infra: Andrew Wetmore --Part I
The "Inside Infra" series with members of the ASF Infrastructure team continues with Part I of the interview with Andrew Wetmore, who shares his experience with Sally Khudairi, ASF VP Marketing & Publicity.
What is your name and how is it pronounced?
I'm An-drew Wet-more. The "Wetmore" is like a rainy day, very easy to pronounce.
When and how did you get involved with the ASF?
I was a Flex and ColdFusion developer. When Flex came to end-of-support with Adobe and they passed it over to Apache, I followed along. I wasn't an active committer: I was a participant in the Apache Flex project and contributing in my little ways here and there. Then when the Apache Royale project split off Apache Flex, I went there, but I was not an active, not a heavily significant contributor. That is, I was helping with documentation, a bit of testing, a bit of organizing and helping. I was truly surprised when I was invited to become a Committer. Then at some point, somebody noted on the Apache Royale list that the Apache Software Foundation Infrastructure Team was looking for a documentation person. I thought, "Well, that's interesting. Maybe I would be able to contribute to that."
I followed that up. I wrote to (ASF Infrastructure Administrator) Greg Stein and introduced myself and said, "Oh, I'd be interested if this is something that's happening." Then for one reason and another, nothing happened for quite a long time. That was fine. He told me nothing was going to happen for a while. He was migrating some monstrous mountain of something. Then when that long time was up, I pinged him again and said, "I'm still around if that's an interesting possibility," and we got talking. He did that wonderful interviewer thing of saying, "Well, if you were going to hire someone for this sort of a job, that has this heading, what sort of job description would you write?"
He made me write the job description. I thought, "this is cute: I'm happy to help. I don't know what person not me is going to get this job, but I'm happy to write what I think is a good job description for this thing." Truly, I really expected this to go out and a whole bunch of people to apply for it and that I would get a participation trophy. I was very pleased when I was invited to join the team.
… You got the real trophy.
Yes, I did.
You got involved when Flex came to Apache, so that goes back to 2011, you've been with the Foundation for nine years or so?
I was aware and downloading builds as soon as there were builds to download and participating. I was still building my own Flex stuff, but I don't think I was really contributing significantly until around maybe 2015. Then I didn't become a Committer until 2018.
The other things you were doing prior to Infra were limited to Apache Flex and then onto Royale?
Yeah, I had a glancing awareness of Apache. Without even thinking about it, of course, I was using Apache tools like Apache Tomcat packages, but I really had a distant but benevolent appreciation of Apache until I started to get more and more involved with Royale and began to understand from that angle all the things that the Foundation does to support these little projects that could not survive without it. Of course, now that I've become part of the Infrastructure team, I'm awestruck by the amount of work that the team does to support all these little projects, so they can do their thing.
It's interesting with Apache projects because they're mostly ingredient brands versus a customer-facing final product. Of course we do have those too, but the majority of them power something else. A lot of times people aren't aware until they're in it: then they’re like, "Oh, wow, Apache is everywhere."
Well, I keep trying to improve myself and I go and choose a product project at random and read its homepage and its "about our product" thing and see how far I can get before I've hit five things that I don't understand at all. I don't even understand what I would do with the thing that I do understand which is not a knock on those projects. It's what you just said, they're not end-user-facing. I, as a Flex developer, was a Flex developer. I was using Flex and now Royale to build other things, not getting in with the toolkit and adjusting and tweaking Flex or Royale.
Right, like a commercial product. Explain your role within the Infra team. What is it exactly? What's your title? When did you get there?
Let's see. It was at the very end of last year. I'm coming up on 11 months on the team. I started two weeks before the end of 2019. My title is either editor/writer or writer/editor. My job is ... well, the situation of the Infrastructure team from a point of view of a written material, of documentation, is that for the past 20 years people have been doing their very best they can to document what they do and how people should do things. They've been adding to that material and adding to that material. We have a built-up pile of stuff, some of which is no longer relevant, some of which contradicts other bits, some of which is written very provisionally for something that we now take for granted that we're always going to use.
My job is to go around with my little broom and dustpan and clean up things. I go and find a page in the Infrastructure documentation and do what I would do when I visit project pages. I start to read it. When I get lost, I start to edit it or I start to ask questions to the team and say, "Is this even a thing anymore?" The team sometimes says, "Oh, we were doing that in 2011, was it? I forget." We know that that page can go into history. There are other times when I run into things, there are many pages, far too many pages where there's a sentence like, "At this point, such and such is true." It doesn't say anywhere what "this point" is.
Then when I dig in a little bit, I find out it was again back several years, where either the conversation came to a stop or the conversation continued and whatever that uncertainty was on the page has been resolved. Now I'm able to update that page and say something that's more useful to the visitor like me who's coming now and doesn't want to read a historical document, so much as get help about doing the thing they want to do.
Right: "what does that mean for me?" This is the legacy dilemma.
Sure. Well, it's also a factor of people trying very hard to do too many things all at the same time. Let's just write enough explanation so people can get going, because surely they'll understand what we mean. That's often true for probably the vast majority of people who are active in an Apache project. The pages that stump me don't stump them.
We have this issue of tribal knowledge not scaling. Part of it is that people are moving on. Part of it is that processes are changing, technology is changing, people forgetting about pages. You're in a very interesting position. If I'm understanding it correctly it’s trying to rectify whatever the intention of what's historically been in place versus what people are doing now, even establishing relevance and trying to find out who's the de-facto source to go for this, considering the amount of contributors, is incredible.
There's another piece in this. I'm trying to come at my review of pages from the point of view of a person of good intention for whom English is not the first language. I'm trying to think, "What on Earth would one of my colleagues from Laos understand from this sentence that has a jokey little acronym or aphorism in it?" Then I tried to figure out how I can say that more clearly and not in a boring way, but in a more accessible way. The three flaws that all our documentation tends to have, one is to write in an academic way. One is to write with a heavy use of acronyms. One is to use a heavy serving of passive voice in the material.
Let me explain what I mean by those three. The academic style: so I have a page here that's going to tell you about X. If I'm an academic, I might write, "The intention of this page is to explain the modalities by which a user might accomplish X." As the reader, especially the reader for whom English is not the first language, I'd like to see, "Here's how to do X."
The second one is the acronyms. Even acronyms that should be, you would think would be obvious to everyone. "The PMC of a TLP." It's like reading Hebrew where there are no vowels, but a simple rule is to always spell out what the thing is and put the acronym in parentheses the first time you use it in a document. It's easy to forget if we know the acronym so very well ourselves.
The third thing is passive voice. Passive voice correlates strongly to the academic approach. It has a way of hiding who does what. "After the file is uploaded, it is passed to the server and verified." I don't know who does any of those things, if I'm the owner of that file, I don't know if I have to do any of those things or if I just sit back and watch them happen.
Sometimes when I ask—I actually enjoy doing this because it is my naïve or newbie character—I go to the team and I say, "Here's this sentence. Who is doing this?" In the conversation about that sentence, sometimes the team discovers that there's a difference of opinion about what part of the Infrastructure does it or whether someone has to do it. We might end up not only fixing the text but improving the process.
What I'm hearing, it's beyond just writer/editor. It's writer/interpreter.
I came into software through the QA door. Over 15 years, I ran teams that were generally documentation and quality assurance, documentation and testing. I think the two are very tightly connected. When I go to edit a document, to the extent that I'm competent to do so, I test it. If it says, "You go here and do that," I go over there and look, is the thing that I can do available there?
It does what it says on the tin. It doesn't correct that.
Well, we have innocent blind alleys that are built into our documentation out of good intentions, pages written about our software repositories, back just a few years, presumed that everything happens in a Subversion repository for projects.
For projects, almost everything happens in Git repositories now and the instructions of how to do something in the two repositories may be different. We need to go back and find those pages and make the path comfortable for someone who couldn't care less about Subversion, but they have to do something in Git.
Let's talk about the scale of what you're working on. How many pages are you handling?
Oh, good Lord, I don't know. I started out within Infrastructure itself, looking at several packages of pages. There was a set of pages on the Apache Website under the subhead dev, a set of pages on the Apache Website under infra.apache.org. There had been a set of pages under something called reference. There's a set of documents in a Subversion directory. Then there's a very large Infrastructure section of the Confluence wiki. One of the first things I had to work through is how did these relate to each other? How does anyone find their way? I proposed and the team seems to have accepted that the pages at the apache.org/dev area are the introduction. Here's what someone trying to figure out what goes on with dev and possibly the Infra team would need to look at.
The next step in is the infra.apache.org area. If you can't find what you need there, you follow the links through into the wiki. God help you if you have to go to the Subversion repository. Really, what should be in the Subversion repository is only, it seems to me, the instructions for restarting or rebuilding the Confluence wiki. I am gradually moving other stuff out of the Subversion repository.
This feels like a byproduct of individuals or projects or committees or communities actually having this "scratch your own itch" issue, right? "Hey, we need to document that somewhere on our thing and here's our particular experience." Apache is open enough. You can do that. The whole directive of integration or coordination or making sure that one hand is speaking to the other has never really been a mantra of ours. It's interesting to see as we're scaling, it reflects directly on what you’re experiencing.
There's the thing. If it were a team of six people in a room or at a project level, it wouldn't be a problem figuring out where to put what or how to fix a documentation collision. When you're talking about hundreds and hundreds and hundreds of people, some with decades of experience, literally two decades of experience in the Foundation, someone bopping in brand new can bring things to a standstill. People can get lost or disheartened.
The disorientation is very common. I hear people going, "How do I find …?" "Where do you start? Where do you go?" It's great that there's assistance.
In early January, I started with some thoughts. "I'll be a new person." "I'm going to read the apache.org site and go where it tells me." I got so depressed. I could not understand how the information on one page related to the information on this other page that I was looking at. There seemed to be two sets of instructions for doing many things. Part of my fun is trying to make it easier for people who come in: you have to help them find their way without giving up the whole thing and throwing the computer out the window.
Are you still at the audit and discovery stage or are you actually at the rewriting stage also? Are you in course-correction or new content?
I've moved almost everything out of the apache.org/dev area that shouldn't be there. What is left is introductory material. I've edited every page on the infra.apache.org area. I've edited almost every page, I think, on the wiki. I'm still digging around in the Subversion thing. I am doing a long march through the top level apache.org pages. I've been in most places, but I'm not done. Beyond this, I guess there are some areas I haven't touched or I need to ask permission. For instance, the Incubator has a set of interesting pages in which the material is clear, in large part. There are some suggestions I'd like to make for those pages, but I don't have any mandate to go in there and start changing things.
What I would do in that situation is write up a little sample report, "Here's how I would suggest changing this page," share it with the Incubator team and say, "This is yours to do as you'd like. Would you like some more?"
… Chances are they'd say yes.
Well, indeed, but I don't want to assume that. I know how I would feel if I turned around and found out someone was changing all my sentences.
I might very well feel that not having passed through the Incubator wouldn't have a good grip on what needs to be said.
It's really important for us to have these fresh eyes with respect to what the outside world is seeing: people who are new to it, how is their interpretation or misinterpretation? Again historically, there's been this issue of, "well, it's obvious". Not just has Apache evolved and the communities have evolved, but Open Source has evolved. The expectation is very different. Similar to what you were saying before with Subversion and Git, it's a completely different space now. We have to grow with that. I think again because we're not a corporation, we don't have these marching orders of "go, bring it up to speed or bring it in alignment." It's great that you were there to audit, align, and course-correct.
There are pluses for this that I had suspected but hadn't been sure I would find ... I'll give you an example. One of my projects is about Apache's content management system that's nearing end-of-life. We have to migrate all the projects that use that content management system to generate their Project websites to some other technology. We've been working on this for over a year, I think, but there were 40 or 50 projects that hadn't gotten started on that migration. I started conversations with them, saying, "Hey, are you going to move? What do you need help moving? What do you need to know?"
As I was getting feedback, I was able to improve the documentation we provide on the wiki on how to migrate your project off the CMS. Along the way, I've met some really interesting people and am having fascinating conversations with people deeply engaged in Projects; the output of which I know nothing. It's a lot of fun because these are very, very smart people. They're doing really significant stuff. I want to make it as easy as possible for them to turn from that highly significant stuff to this rather mundane thing of moving the way they built the Website from the current way, which is creaky but they know it, to a new way.
We long have had this "do your own thing" culture --no one's telling anyone that there's one, official, way to develop your Website. Hundreds of Apache Projects are developing their own Websites their own way. No doubt some that are using the current CMS, there is an opportunity to offer them a different development direction versus a giant arm sweep stating, "We're going to pull down 300 project sites and rebuild them all at once," as would be done in other organizations when they choose to rebrand or upgrade their CMS or backend. It's like little mushrooms popping up where everyone is producing their own site at their own pace, using their preferred tools. It's very, very interesting.
It's educational to me also because the Infra team has a series of recommendations, "We really recommend you go to this technology to build your Website." The subtext is because that gives you the most options, the most flexibility, and means Infra has to do less to hold things together, but then we have other Projects that say, "Oh, no, we don't like that. We really would like to use this." The Infrastructure response is, "Show us how you can possibly use that in a way that matches these requirements we have." For instance, that the landing page for the project Website has to be a thing that can be branded as [projectname].apache.org and hosted on our servers.
People, as a project, demonstrate, "Oh, they can use this technology" that we had not thought of, then we have the documentation for that and that might encourage some other project that doesn't like the vanilla package we're providing to migrate using this new thing. We're down to about 20 projects I think that haven't really gotten very far on their migration.
… Is there a deadline for that?
At some point, the content management system is just going to fall over. We'd like to get everyone out before that happens. We set the end of the year as, "Let's do this before the end of this year, but there's not a switch." There's not the end of a license or something like that that's going to happen. We have a little bit of wiggle room.
… We're not pulling the plug, so to speak.
No. We're not pulling the plug, especially since treasurer.a.o. hasn't moved and we wouldn't want to annoy them.
Right. Are there additional responsibilities that you take care of?
Well, that what I just described about helping or encouraging teams to migrate was not part of my job description. I just saw something that I could do that involved being engaged with projects to get them on the path, leaving the other team members available to do things that I can't do. I'm the least technologically savvy person on the team. I might as well do the stuff that involves words and interaction.
What's the process of sorting through 21 years of ASF history on apache.org? How far along are you? Is this a never-ending project or is there a specific milestone that you want to hit to say, "Hey, okay, we've done"? Is there an end in sight to this or …?
I think we'll get to a point where we'd say, "We're pretty well caught up. Now what?" That could happen within the next couple of months, but then remember, at that point, the process of doing doesn't end. Where new material is being created, technology changes. We're migrating server things from one kind of server to another kind of server. We have to document what that new server does. Git for instance, or GitHub, I guess, has provided a couple of new options for things projects can do. The Infra team has to learn how to support those, then we have to document them and help teams understand how they can use them for their benefit.
As long as the Foundation keeps doing stuff, the same problem of uncurated information silting up will recur. Hmm. That's my lifetime employment plan.
[laughs] Going to apache.org/dev, how did you decide where to start? Were there any active fires that you were told you had to put out or it was more a bunch of low simmer, "We'll get to that someday," types of sections of the site? Again, it just seems so like a Medusa situation. How did you decide to divide, conquer, and get started?
For apache.org/dev, I just started at the top file or the top link that said anything about dev and went into it. "Why is this out here? This looks very much like the same thing we say over here in infra.a.o. Why are we saying it twice in two different places in two different ways?" I started pretty much by grabbing anything. It's like the way you might go into your grandparents' attic when they're downsizing to a smaller house and you're going to help them move. All you can do is pick up the first box and see what's in it and give the best guess about where that should go.
… Then there's those people who just grab it and just donate everything.
… Not even looking through it, they're just purging and starting afresh.
Fairly early. I tried to elaborate that tiered idea of Infra information, so that if you land on dev, you're getting high-level stuff. If you go to info.a.o, you're getting more thorough stuff. If you need to, you can go over to the wiki to get code snippets or very detailed instructions. If you're an Infra team member, you go over there and get stuff. Only in the direst need you go down into the Subversion repository.
Before, everything was mixed around. Where the most essential and the least essential stuff was, was not consistent or logical.
Have you had to learn about the Apache Way of community-led development or other processes in order to get the job done? Even if they're talking about a technical thing, you're testing it out. Are you kicking the tires along the way saying, "Okay, this doesn't make sense," or are you not at that stage yet in terms of content?
I'm doing a fair bit of tire kicking. Of course, as a participant in the Flex and in the Royale projects, I've engaged myself to understand the Apache Way from them. The PMCs I work with modeled the management and development style of Apache. I learned it organically. I'm not seeing a conflict between what I learned on the Flex and Royale teams and the larger Apache Way of doing things. I think that's really good. You stumble into a small project with a very minor, very focused goal to do this thing, this bit of technology. You take in through your skin how to make decisions and how to share information and how to support each other.
… Continuity for the win: that's good to hear. What kind of influence do you have on content development? You said you're adjusting a page if it's not saying what it's supposed to do, but beyond that, are you saying, "Look, this really needs to take a different approach"? Are you deciding on your own? Is there a review committee that has to oversee every edit or is the process completely autonomous? How do you know what you're writing is factually correct? Who signs off on that?
The Infra team is in constant contact, 24-hour chatter every day on the Slack channel. There's an asynchronous conversation going on. When I run into something I don't understand ... Well, there are two things that happen. I can suggest things there that might be useful, but also when I notice people discussing something that's new or something went wrong and what they have to do to make it right, I often say, "Is that something we should write down, do you think? Where should we write it down?" That begins the conversation about documenting whatever the thing is.
One of the first things I created on the wiki page, the Infra wiki site, is a page for me called the job jar. Each time I come up with something that has to be written, I start a new item on a checklist and write in what that thing is to the best of my knowledge. Then, if I can't see any way to write because I don't have a clue what that thing is about, I go to the Slack channel or I go to the team meeting, which we have every Thursday, and say, "Who can help me write this? I just want you to blurt out the facts and then I'll turn it into pretty language." I can't direct that we have to write anything, but we work interactively.
If I write a new thing, I post it on the Slack channel. Someone will come back and say, "Well, you totally missed this thing. Here, let me fix it for you." We go back and forth like that until it's ready to make available to the larger public. Greg, of course, keeps a close eye on me, so I don't accidentally delete everything.
We review regularly what needs to be added or what can be sliced away because often, if you say less, you can communicate more.
Going back to "delete everything", when I first joined W3C 25 years ago, I remember making copies of everything because I was terrified that I was going to delete the Web's original history, there were thousands of legacy pages. Do you do the same thing? Do you make copies of things and edit that then just do merges? How do you actually do that?
I have a strong reliance in the team's guarantee that everything is version controlled. Actually, I'm more shy about changing things than they are to encourage me to do it. They just said, "Go ahead and do that. We'll fix it later." In that sense, I'm truly not afraid of deleting everything. I am afraid of inadvertently causing annoyance. I have an example: when I first started to move pages from a.o/dev area to the infra.a.o area, some of the pages I wanted to move that had titles that didn't really match what was in the document. God, I've got to improve this. I changed the name of the file at the new location. Then that was a pain because how do you redirect from the old location to the new direction?
I learned very quickly that I was causing trouble for my colleagues, but beyond that, I was causing trouble for people on projects who might have a link on their page to an Infra page. I really don't want to cause an information barrier because, in my mind, I'm making things more efficient. On the a.o/dev area, there are all sorts of pages sitting there now that are just stubs or just shells of their former selves. If you click on the link to go to that page, there's a little gearcranking and all of a sudden you're over at the same page at infra.a.o … most of the time. Sometimes it just does not work and then I have some sad people.
It's interesting you were saying about not wanting to upset people, but I think this is actually a parallel with good documentation and good data management. It becomes un-intrusive and a natural byproduct of your experience online. The whole point is you don't want to say, "Hey, there's some underhanded entity there that's controlling it." It's natural in terms of what you're seeing, what you're reading. In terms of comprehension, it's great UX. It's a very interesting comment that you made about you not wanting to ... this, "do no harm" approach --the outcome is very positive.
If you want to make a really highfalutin image, we're surgeons working on something together. There's a thing going on the table there that's going to ... Things are going to go bad if we don't do our job well. If I go moving around where the implements are that we're going to reach for on the tray from where they normally are just because I think they should be alphabetical or something, things are not going to go well for the patient.
… Someone might even die, right?
Fortunately, there's a limited amount of trouble I can cause because I'm not turning the nuts and bolts on the servers. Not yet.
But I am ... You asked earlier what sort of, I don't know, influence I have to bear, I'm in there asking questions whenever I can understand a question to ask about, "Shouldn't we update this list here of the servers? This doesn't look like it's been updated since 2014. Shouldn't we make this list more accessible to the people who have to look at it?" That makes it sound like my colleagues are bumbling along and inattentive. They're very attentive, they need to document what they're doing and they're very patient with me when I get fixated about a semicolon while they've done everything else right on that page, except that damn semicolon.
It's important. Both parties: that's a good dovetail of talent, right? You're talking about a page that hasn't been touched since 2014. We have pages that have been untouched since 2001. I'm sure you're coming across them.
Here's a situation that probably is of low impact, except when it has high impact. I've been reading the memorial pages for past committers. I got to a page that said many kind things about the person, "who died in a car accident this last week". This "last week".
… *When* was that, right?
Yeah. I'm making little reports on those pages and the people who have ownership of the pages have to decide what to do with those reports. I'm not going in and changing those pages, but I suggested, "Let's figure out the year at least, maybe the month and make that more accurate, so someone like me now visiting this memorial page about a person who died before I joined Apache can understand what happened." In some ways, that's important for remembering and honoring the people who have been with us and are gone. There is more painful stuff when we haven't updated something or we've left a sentence that says, "As of this writing, so and so is the case, but I don't know if it's going to be that way for long." Again, there's no date. I think it's scary.
… There's no frame of reference at all.
Exactly. It makes the whole thing provisional. We have, under this COVID-19 crisis right now in the province of Ontario in Canada, a very complex Website that purports to tell you if you're in the city of Toronto what you can do in different parts of Toronto, what the lockdown level is. At the very top of the pages, it says, "Latest information."
If you go in there, it's three months old. The latest information is elsewhere in the page. To me, it throws the whole thing. If I'm someone who's trying to find something out from that site, I tend not to believe any of it.
[END OF PART ONE]
Posted at 12:26PM Nov 29, 2020 by Sally Khudairi in SuccessAtApache | |
Inside Infra: Gavin McDonald --Part II
The "Inside Infra" series with members of the ASF Infrastructure team continues with Part II of the interview with Gavin McDonald, who shares his experience with Sally Khudairi, ASF VP Marketing & Publicity.
Yeah, as far as I'm concerned you're spot on with CI.
… That's you.
I mean it's not me totally. I have been concentrating on it more as others have been concentrating on other things, yes. Jenkins for example, we had this one --I call it a mega monolith of a service that had all the project services that was on one server, one Jenkins instance. And it was the same instance being upgraded for the last 10 years. So it was time to migrate it, so it's been migrated to five smaller Jenkins services.
BuildBot is also being upgraded and moved to a bigger server. You've got actually Travis is being used to its full capacity.
What does that mean, "to its full capacity"?
When Travis came out, projects started using it, and we're now at the stage where it's at the full capacity ASF is provided.
Oh okay. So if they're giving us 20 whatever, we're at 20.
Right. It's not unlimited. This is why a lot of projects have decided to start moving to GitHub actions, which is also not unlimited. So the more that's provided, the more is needed. I don't think we'll ever keep up with the pace.
Greg says that often. When I talk to him about donations and things like that coming from different companies, he just says "more" and "more" and "more", so he's not exaggerating, right?
No, no, he's not exaggerating.
... If we offer, they'll take it?
Yeah. There's ways in which projects can use CI in the same way they can use other things. And they will use everything that's given to them.
… Insatiable need.
Yeah. Not understanding that there's 300 other Projects that could be using those same services. But there's a few beginning to realize, and there's talks on certain mailing lists that how can we make this more efficient, how can we projects help each other in managing the best usage of these services that we've got. Because they don't want to have it all. They've just been creating whatever they feel is needed for their project. Then sometime later they realize, "Oh, I'm using 80% of what everyone's been given."
… So it's not malicious, just a lack of awareness. When you guys get a new service do you go, "Hi PMCs, we have this thing, there's 20 units available, use with your discretion," or does someone say, "Hey, Infra has this now. We're going to run and take it all," without realizing they're taking it all. How do you introduce new services to the projects? I'm curious because I never see that side of the activity.
Sometimes it's via the mailing list, users@infra. Projects can come to us and ask questions on that mailing list if it's not appropriate for a Jira ticket. People join that mailing list because they're interested in what Infra is up to. So we use that mailing list as a heads up for whatever it could be: "Jira is getting upgraded this weekend, or there's going to be some downtime on this", or it could be things like "okay we've now enabled GitHub actions across the board" or whatever. There were some new features added to one of our self serve things is asf.yaml, which I know you've spoken to Daniel Gruno about.
… I was really surprised to see that. It was exciting. Is that new for you to be announcing publicly like that, sharing outside of the ASF's mailing lists?
It is and it isn't. We used to do it all the time years ago. Then as we've become bigger, it's paid staff not volunteers, we're busy all the time. Blog posts got put out of the picture I guess for a while. So this really new cool feature that was provided, the code was provided by a volunteer via a GitHub pull request. And we looked at it, Daniel made some comments, the changes came backwards and forwards until it was ready to be committed the other day. And it's fantastic new features that projects keep asking us for all the time via Jira tickets. "Can you enable this? Can you check this checkbox?" It's work that we shouldn't have to be doing. So now we don't have to do it. They can just edit their own yaml file in their own GitHub repository and those GitHub features are enabled. It was worthy of a blog post.
… I was pleasantly surprised to see that. That is good. It's interesting because we often don't have enough time to do the work, put out the fires, deal with normal life. You have a family by the way, oh yeah let’s not forget that, but you also have to find time to share this sort of information with the outside world. For me, working on the Inside Infra project is amazing because not only do I get to learn about what's been going on, all these years with the ACF for me it's been kind of a black box in many instances with Infra, but I'm also hearing from outside folks who are saying, "This is great to learn," because how would we know otherwise? It's cool that you are able to share more and more however you can. I was really excited to see that blog post.
Hopefully there'll be more coming. Infra does have a blog and it's been not used as much as it should be I think.
When we publish, we'll be sure to get that out (https://blogs.apache.org/infra/). Let's talk about project requirements. Of course Apache projects have been setting the standard across countless "usual suspect" technologies: servers and Big Data and build tools and libraries and so much more. We now have incoming projects in Edge computing and IOT and blockchain and even hardware acceleration, all of which are coming in through the Incubator. Does Infra need to know anything technical about these topics in order to get the job done? Or do you just handle the back-end support and it doesn't matter what's coming, who's the project or the category they’re in? Are there instances that cause you to say, "We have to learn/do something completely different for this project"? Is there anything that's coming in that changes the way you're getting your work done?
… Great; so it doesn't matter.
No, it doesn't matter. Obviously the Foundation welcomes all types of technologies coming in, and Infra provides what it needs to provide. We don't need to know the ins and outs of 300 projects' code to get our own work done. That would be impossible anyway.
But when you came in, you came in through Apache Forrest: you came in with a project. Do you look at stuff that's coming in through the Incubator, out of curiosity, are you keeping an eye on that?
I do, yes, just on a personal level I do just because it's something I like.
… Okay, so it's a curiosity thing, rather than your job depends on it.
With so much evolving in this space, if you need to learn something new, is it top-down -- someone saying, "Hey Gavin, go to Jenkins University"? How do you figure out what needs to get done, and how do you learn how to do the job? How do you close your skills gaps?
I think each team member has their own way of learning more. Obviously more needs to be learned all the time, and that could be going to the particular pieces of software's Website, having a look through their documentation. I mean when we're implementing things, we're at a vendor's Website every day of the week, you know? We use Puppet a lot: it's core to what we do. We farm out to 30 Jenkins nodes or whatever, it would all be done through Puppet. So we need to keep up with what's happening with that project. So I'm on the Puppet Website looking through the documentation all the time. New versions are coming out, new features are being enabled, and that could be something that makes things easier for us, so we can implement that from our side. I do a lot of that reading. You talk about Jenkins University. I actually have--
… That exists? I was just making that up.
Well, there is a Cloudbees university. So in my own time I've actually done half a dozen courses myself on Jenkins through Cloudbees University. I'm keeping myself up-to-date and ahead of the curve on that. It's a deep interest of mine, so in my own time I take those courses.
Do you bring it back to the team and share it with them? Or is every man for himself? Do you guys have a lunch-and-learn --"Hey on Thursdays during our weekly call we're going to discuss topic X"? Is there anything like that that happens on a bigger scale, or is it on an individual basis?
Mostly individual basis, but obviously anything that people learn is going to be shared amongst the team at some point, it's going to be "oh I learned this today". Someone could be learning from someone through a PR (pull request), you know? A new piece of code comes into our infrastructure that someone committed, then the others ... everybody looks at everybody's code. So they would look at it and go, "Oh that's neat. Didn't know that." And that's because someone has gone out and learned it from some Website or some course they've done.
Right. Plus your team is super, super close-knit. I'm sure you are like, "Hey, I found this cool thing, you got to check it out" --you're sharing things together, right?
Yes, yes, we have our own Slack channel where we talk all sorts of things. That could be new software coming out, yeah, or new ways of doing things. Or new cooking methods.
That's the thing I've been hearing: the most common thing that everyone tells me is they talk about food and drinking --not code. That's good too, because that's the fabric that connects everyone together.
You can't talk about work all the time. It doesn't matter what industry you're in. If you're in an office somewhere or you're working in a restaurant. It doesn't matter what you're doing, you're not standing there talking work all day long. You're talking about your kids or what place is a good place to go to or have you seen this, have you seen this movie. You know? We do the same talking, we just do it online.
To that end, ASF since day one has been a virtual organization: anyone can work from anywhere, there's always these great stories about people meeting each other for the first time at ApacheCon after collaborating online for 10 years. It's a really cool thing to see. And for years you've been our man in Australia. Has anything changed with that as we're growing? Obviously you're no longer in Australia, but do we need to have an Infra presence on every continent? Where are things going with this? Has anything changed, or we're still business as usual since day one, because it doesn't really matter?
There's no specific place you need to live to do this job. Obviously global coverage is a good thing to have. When I was first employed, I'm pretty sure one of the bonus points was that I was in Australia. There was that one was US-based, then there's Australia-based, so you've got a fair bit of coverage there. So somebody can sleep while the other ones can then fight those fires. With there being five or six people now, there is still a bit of a gap, but not too much. People do all sorts of hours. And I plan on going back there sometime.
… I was going to say that we're going to dispatch someone to Siberia or someplace in order to cover the timezone. But you are going to go back you think?
Yeah, at some point.
… You miss it?
Yeah. The weather is nice, and we have family there.
This is a question I've been asking everyone. What are the biggest threats that infrastructure managers need to watch out for? Not necessarily a doom-and-gloom threat kind of lightning bolt coming out of the sky, unless there is something like that to be aware of. With the pandemic there's been all these security crises and all sorts of weird stuff happening. Is there anything in general you think people in this role need to know about or watch out for? Is there any advice that you think folks should be aware of?
Oh I have no idea. The pandemic hasn't really affected me in terms of the role. I mean being an all virtual organization, I'm not sure it's affected the Infrastructure team at all on a work level.
… Remarkably with that, we remained business as usual while everybody else was scrambling trying to figure out how to cope. So I agree with you on that. Another question I ask is what would you think people would be surprised to know about ASF Infra?
I'm sure the same answer has come out of everybody's mouth. That there's only five, six of us looking after this many projects. I don't really know, yeah. There's just so few of us I guess doing this.
As there are so few of you doing it, and you think that would be surprising for people to know, how many people do you think normally should be doing this work? I'm curious about that, because I don't understand how you guys are able to do this. For me there's always this awe of how you guys are making this work as I can't figure out how.
From my perspective I don't know any different. I've not worked at Google or Microsoft or any other tech company down the road. I've not worked in offices, so I've got no comparison. I wouldn't know whether it takes 20, 30, 40 people. All I know is that we could do with a few more people to take the strain off, because we're all working hard. There's not a day we can say, "Oh there's not much today."
… Are you on a schedule? Do you have days off, or are you on some level seven days a week?
It's different for everybody. Working from home which I've done for these last 10, 11 years is I guess you get used to it. You get up in the morning and you have your coffee, then you're straight into the work. That could be 7:00, 8:00 in the morning. Then you're doing bits and pieces. You're getting some work done. Then "oh hang on a minute, I need to go to the shops," so you can take an hour or two off and go to the shops. Then you come back, then you find yourself closing your laptop at 10:00, 11:00 at night.
… From bed.
Yeah. I like to work the weekends. If I wanted to take a Tuesday, Wednesday off or something like that, I'll just take Tuesday and Wednesday off. During these pandemic times, I haven't had to do much of that.
But the flexibility is crucial and it's super helpful, because you need to keep your mindset. I mean it is very easy to go mad. You could very easily get overwhelmed with this type of workload.
You need to try your best to balance it out. I know other people that work remotely say you must have your own office at home, your own space. And the rest of the family needs to know when you're in that office you're at work.
… Daddy is not accessible. That's not realistic either, right?
No, I mean 10 minutes ago I don't know if you heard my three year old coming up to me.
… I did, and I was going to ask you if you needed to stop because I heard the little one.
My office at the moment for this interview is the dining room table. The kids are upstairs.
You partially answered the next question that I ask everyone. Which is, if you had a magic wand, what would you see happen with ASF Infra?
… That's what you mentioned earlier without me asking, so more staff, more people. What kind of roles? More of a generalist thing, or do you need a database guy or a specific type of person?
Either of those would be nice. I mean we've got a dozen services that use databases, so we all know databases. I don't know how many of us are database admins. None of us are, I don't think. But we know enough that we need to know for our services that we run. So do we need a DB admin? I don't know. It could be that if a DB admin came along he would say or she would say, "Look at this, I can improve all this. You've been doing it wrong all these years." And maybe we have, I don't know. So you don't know you need somebody until somebody like that arrives.
Do I think we need somebody specific? Maybe another Python individual. Because we are focused on any new code that we write internally for our own benefit would be in Python. But Daniel is pretty good at that, and so is Greg.
Do we desperately need somebody like that, or do we need a generalist? Probably a generalist at this point.
… To handle the volume.
What's your favorite part of the job?
Interesting ... Let me think about that for a second. I think when you do something and somebody from one of the projects comes back and says, "Thanks very much. You've been a great help." I enjoy helping the projects.
I mean obviously I enjoy working on CI stuff. I enjoy maybe providing a new self-serve tool to help projects and help the team out. Various things. I enjoy the job. I've been here that long, obviously I enjoy the job. But yeah. It's nice when somebody from a community says good job.
… Is that infrequent? I know that we tend to be very high standards and very "that's expected" mentality. Do folks scrimp with the appreciation?
Yeah. Yeah pretty much. But it does happen. I've had a few "thanks very much, you've been great". Sometimes you need to work with a project not for an hour or a day, but sometimes you might need to work with them for a few weeks on something. It might be a migration or something like that. By the end of it you say, "Okay that's done, I'm going to leave your mailing list now or leave your Slack channel," then you get a message saying, "Thanks very much, we really appreciate your work."
That's great. When you first came into the role, when you first came Infra, what was your biggest challenge?
Hard question. I don't know to be honest. I mean the people that I was working with, whether it was paid staff or volunteers, I knew them anyway because I'd been volunteering previously. I don't know of any challenges to be honest. It was moved from volunteer role to paid role. Gradually over time I got to know the people, got to know the Infrastructure layout, where things were at a technical level.
... Do you think there was any sort of attitude shift towards you when you moved towards paid? It was new then: you were one of the earlier team members. Being paid anything kind of raised an eyebrow at the time. I remember that really clearly. Did you face anything like that? Were you challenged by that?
It wasn't an issue for me. If there were any attitude changes, I didn't notice. I don't think the volunteers leaving over the following couple of years was related.
… Well people burn out. And it's a lot of work. That's the other issue: "Hey I'm here to help out, I know Apache needs help, I'll lift or I'll help raise the building or whatever," --that's one thing, but the fact that the demand is constant, you can't expect people to be spending their free time to do all this work. We do need people to be dedicated and paid for. I'm all for that.
As you've been with us for such a long time, I'm sure there've been highlights for you. What's your proudest contribution or role or moment in this position?
That's a particularly hard question for me. I think I don't get proud over anything much to be honest. I enjoy doing what I do, but I don't have anything specific to call out. Getting my 10 year t-shirt was a proud moment, I guess.
… There you go. Are you saying that because you're shy or you're super humble?
I don't know.
… Or you're super hard on yourself? I mean there's that too: "That's it, that's all I've done."
No, I'm not one for calling out the self. That's probably the English in me, I guess.
I appreciate you as one of the gold star participants in my Media & Analyst Training, because you've done it so many times. But I noticed that in you, and that was part of the thing I was always trying to tweeze out of you. "Oh we need to talk about ourselves," and you're like, "No." A lot of people have that tendency to downplay or just not go around "rah rahing". I have to mention that's a very American thing to do, but it happens across the industry. Whether you're interviewing for a job or talking to someone at a cocktail party or whatever. Now even more so with social media, so it's the challenge of balancing that. I want you guys to get the recognition. You are doing massive work. You guys are heroes. I know it might sound weird just to hear that, but it's true.
Yeah. Yeah. I guess. Yeah, I don't have anything to call out. I mean if somebody said to me, if I was in ... not that I've had a job interview in over a decade, but I guess that's one of those questions, isn't it? "What are you most proud of in your last role? What did you accomplish? What was your best thing that you did?" And that's why I've not been good at interviews, because I wouldn't know what to say.
… That's where that t-shirt comes into play. I survived and I thrived over 10 years.
That's great. What about your co-workers, how would they describe you?
You see, we just talked about how quiet I am. But on the Thursday team meetings, I probably talk the most.
… I do not believe that. How do you mean?
It's a standing joke, "Oh this week's team meeting was only 20 minutes because Gavin didn't attend," you know? But I don't waffle on, I do talk about work. It's just I've got a lot to say.
That's good. You stick to the point, you're not waffling on. All right. What else do I need to know that I haven't asked? Is there anything specific that you want to highlight, a project you're working on, an achievement, something? Anything that I'm not touching upon that's like "I need to make sure that Sally knows X"? Is there anything you haven't touched upon that we haven't discussed?
No, I don't think so. I was thinking before the call, "What are we going to talk about?"
… Media training. There you go, you're preparing your speaking points.
… Ta-da, so that's my proud achievement. I got Gavin to think about speaking points.
Yeah. I mean yeah, as far as how the Infrastructure team are doing at the moment, are there any big things coming up? Not from me, I don't think. No.
I started thinking about this series when we were in Vegas (ApacheCon) last year. We had so many people show up at media training, I was thinking, "This is such an interesting group that no one ..." --you're not faceless, you're there, but it's so hard to discern the individual behind the group, right? And the group, like you said, is so small, so I think there is such an amazing story with you guys. And you're so integral to the Foundations actual operation. Because you're so ubiquitous, people don't think about it. And I think that's ... Maybe because I'm non-technical, for me it's this massive lift. It's like shooting a rocket up. It's incredible. So I certainly appreciate you guys. I think it's just amazing what you do, and you deserve a pat on the back and an attaboy and all sorts. I'm here to cheerlead.
You bet. Thank you so much. I really appreciate your time today. And thanks for doing this, and thanks for passing on the word with the team that it's not a painful experience. I'm also trying to encourage other members of the community to consider doing something like this. We do the "Success at Apache" posts, with individuals writing about themselves and their experiences. Apache wouldn't work if it weren't for the people. Right? It's all about the people. This is not from a PR perspective, but after 21 years I am always ... I keep wanting to know the same things about everyone. And I know it's not limited to me. If I'm curious, I'm sure other people are curious.
= = =
Gavin is based in the UK on UTC +0 (currently on CET). His favorite thing to drink during the workday is coffee with milk, no sugar, and consumes about 10 cups per day.
Posted at 03:23AM Nov 17, 2020 by Sally Khudairi in SuccessAtApache | |
Inside Infra: Gavin McDonald --Part I
The "Inside Infra" series with members of the ASF Infrastructure team continues with Gavin McDonald, who shares his experience with Sally Khudairi, ASF VP Marketing & Publicity.
Nice and easy one. Gavin McDonald. Just McDonald as in Big Mac and fries McDonald's. It's M and C, no Mac.
When and how did you get involved with the ASF?
That was back around about 2005. I was looking for something different to do than what I was doing. And I came across the Apache Forrest Project. I knew a little bit about XML and websites and stuff like that. So I started contributing to the Apache Forrest Project. And some months later they made me a Committer.
So you first got involved with the Forrest project, then at some point you became part of infra. How did that evolution happen?
That's me looking around for more things to do. I've always been involved in and interested in system administration work. My first real communications with the Infra team was whilst working on a Forrest Solaris Zone and needed some help with it. Shortly after that I started volunteering there.
First of all, I saw a huge number of tickets regarding mirrors, you know for our software downloads. I'd say it was probably around 150 tickets outstanding for mirrors wanting to join.
... One Hundred and Fifty...
Something like that; some of them had been outstanding for quite a while. At the time there was only one person being paid. There were volunteers obviously looking after the machines and stuff like that. Mirrors were sort of lagging behind as they were less important. So that was my in. I started off with getting karma to add all the mirrors.
There was a certain standard that mirrors have to have, certain configurations. So I was going backwards and forwards with the mirror providers and making sure they were up to scratch, then adding them into our configuration.
From then, I introduced BuildBot to Infrastructure. And I think maybe a year after that, this is now talking 2009, a position opened. I think more or less the rest of the Infrastructure volunteers said, "Gavin is doing the job anyway. Let's give it to him."
That was my interview.
Around October, November 2009 I became paid staff.
Are you the longest serving member of the current Infra team?
Yes. Last year at ApacheCon I got presented with a 10 year t-shirt. Next time there's a physical conference I'll be bringing it along.
10 years thumbs up: that's good! Explain the structure of the Infra team and your role in it.
There are six of us, plus Greg (Stein), our Infra Admin, and David (Nalley), VP Infra. One of them is a documentation guy, that's Andrew (Wetmore). The rest of us all various system administration devops work. We look through tickets, what's needed to be done, and obviously we're looking to improve our infrastructure uptime and software and updates. So we all do what's needed, basically. Everyone has various roles.
What's your role?
Well it's a bit of everything, I think. I have been concentrating quite a lot on the CI/CD side of things. That was written into my original contract, which is now not part of the contract. Basically that means the whole entire time I've been here, I've been involved in BuildBot and Jenkins and other CI/CD stuff, and I've been doing a lot of that lately as well. Migrating Jenkins over to new Cloudbees software, and on a whole load of VMs, mainly in AWS.
You mention that CI/CD is a key part of your role. Is that what you're specifically responsible for within Infra? Are you "the CI guy"? Are there other things you do? Everyone says to me, "Hey we do everything." That sounds amazing, but how is that possible? Do you do everything else in addition to the CI work?
Yeah pretty much. Yeah. Everyone can do pretty much everything that we touch on. Some just choose to do certain things that they're more capable of or more used to working with or they like it better. Nobody is told, "You're working on this."
That's interesting. Fill that part in: if there's six things that need to get done, but five of you are actually hands-on sysadmins, so you guys do what you like to do or what you prefer to do? No one says, "Okay you go handle that mail server"? How does it work?
Obviously there's 24 hours in a day and there's people all around the world. If there's an emergency going on or a mail server breaks down or something needs doing, then whoever's around at the time would step up and say, "Okay I'll take a look at that."
So everyone's pitching in --it's not, "Hey I'm not going to do it. Wait for ‘the mail server guy’."
No, no. Obviously I'm sort of known for the Jenkins and BuildBot stuff, but if I'm not around, everyone else can just jump in and get on with it.
So how did you become the Jenkins guy? How did you get to be the BuildBot guy? You were saying earlier that you kind of evolved into it because it was needed, but is this something that you've personally had interest in to start? Or is it just, "Hey there's a fire here, I need to put it out," and it became this cellular memory, a habit, it's now your thing.
I think a little bit of both. I started off introducing BuildBot not long after I started. Jenkins had already been going a little bit at that time, but I've been involved in that also since the start. And it's over the years become more important to projects. Back when it started, it was a nice-to-have sort of thing. There was none of this pipelines, and CI wasn't an integral part of releases, whereas these days it's more and more a requirement. Jenkins and BuildBot have gone from second-class citizens, if you like, to one of our core services that needs to be kept on top of all the time. It's one of the most important aspects of our infrastructure for projects. There's a great demand for it. And it's increasing all the time.
That's interesting to see it go from a supporting role to the lead demand. That's been what, over 11 years now?
Yeah, 11 years.
In earlier interviews, I spoke to Chris and Drew and Greg and Daniel (Infra team members Chris Thistlethwaite, Drew Foulks, Infra Administrator Greg Stein, and Daniel Gruno) and they've all given me their perspectives on the many areas that infra is involved with. Tell me about the scope of the work that you guys do, and how is it different from other Open Source foundations?
Not sure how I can answer that. I'm not involved in any other Open Source foundations.
Okay, well tell me how does Infra operate at the ASF? You support the Foundation, you support projects. How do you help?
We have, as you know, over 300 projects. Each of those requires a Website, each of those requires an area for their code, whether that be Subversion or Git. We obviously over the last couple of years have been more involved in supporting GitHub for Projects. And we have the Confluence wiki and Jira for the issue trackers. So all of the services that they need to operate as an Apache Project is what we offer them.
So every project needs a Website, as you said. Each Apache Project is responsible for their code, their communications, and their community. So they run their own Website, but Infra handles the backend? What is it that you do for them?
Yes, we handle the backend. We've got Web servers that all the Websites get published on, but they write their Website content, and that could be written in many different languages. So we support them being able to provide their Website content in whatever manner they want. This could be just plain HTML, it could be compiled in Maven or in Pelican, there's a million different things. GitHub pages. So we provide the publishing methods for them to be able to go from there ... most projects these days just want to be able to commit a change and leave it at that. Then that change automatically gets published to the Web via automated mechanisms at the backend, you know? We watch for a commit. That could be via a gitpubsub, could be via Jenkins, via BuildBot, GitHub Actions, all of these methods. We'll see a commit, and it'll publish it and build it if necessary before publishing. So they just commit a change and leave it at that.
So the magic that's associated with that automation, is that something you guys are building to support them? Or is it something that pre-exists? How are you integrating all these different languages or platforms? How is this happening?
Well, software like Jenkins and BuildBot ... those mechanisms we can provide pre-built code to watch their repositories for commits to their Website repository. It'll automatically build it, and then it'll push it to the websites. There's also recently GitHub actions will also, instead of being on Jenkins or BuildBot or Travis or any of those, GitHub actions will take a commit straight out of the GitHub repository. It'll do the building of the Website, then it'll push it to usually what's called an asf-site branch. And then we pick it up from there and publish it. The actual GitHub actions code themselves is written by the projects. So that's self-serve.
If there is a fail for that commit, who fixes it? Is it the Project’s responsibility or is that your responsibility? Who's under the hood dealing with that?
It depends. If it's a coding error, then it's theirs, the Projects. If there's some kind of hardware failure, or if there's a piece of software gone down, communications error, yeah, it's up to us to track that down and find out what happened.
I'm understanding a trend here. If you go to other foundation sites, they seem more “corporate” in the sense that everyone's site looks, feels, and performs the same way, they operate the same way and they tend to be under the same infrastructure altogether, right? They're not using 50 different CMS's.
... That in itself is highly unusual.
Oh okay, yeah. We don't mandate how projects make their Website look, or we don't mandate how they must build it.
That in itself, the autonomy to do what works best for the Project, I think is highly unusual.
Okay. That's good to know.
In terms of ASF Infra and other foundations, you guys don't sit together and compare notes or talk to each other or anything. A lot of groups copy us, so I presume there's little interaction other than socially, right? I didn't know if there was, "Hey, Linux Foundation does that. We should do the same thing," kind of thing. The ASF does its thing and so be it.
As far as I know, we have no interest in what other people are doing in terms of how they do things. We do things how we think it's best to do them for us and our Projects, how it works best for us. Whether other team members go off and have a look at how other foundations are doing things, I don't know. But I don't.
... Uniquely Apache.
In terms of services, what's the difference between what you offer for individual Apache Projects and their communities versus Foundation-level initiatives? I presume there's a difference --is the majority of your work serving the Projects? What's the percentage of work that you do that's for the Foundation versus Projects? Is it all for Projects? Or is it all considered one thing.
I don't really see a difference. All the work that needs doing is for the Project or Foundations as a whole. It's all the same to me.
What about incubating projects? Do they have special needs or requirements? How do you support them?
Not really unless they're coming into the Incubator with something they've always used that we don't do. Then we would look at that and decide whether it's something we can do for them or not. There's been a few projects that come in like maybe OpenOffice in 2011.
That was exactly in my head in terms of pre-existing groups that have pre-existing infrastructure. OpenOffice was a whole community altogether in a completely different way. How did Infra support them? What did you do? I knew that there were some issues with the codebase and licensing. What else did you do to support that project?
Oh that was a while ago.
... That's fine, I was just curious as to what you guys did. I just remember it was a huge lift from everybody, from all sides. Licensing and code and every aspect of that project coming in seemed to me to be very, very, very challenging, but we got through it. So that's great.
I know there was a lot of work bringing the code in, and not just from the licensing perspective, but also it was an enormous amount of code that needed to come in. I don't know whether they were in Subversion beforehand, but we provided them their space in Subversion and their Website space. I think a lot of the work was done by the project themselves.
Wow, wow. That was a lot of work. How do you handle Projects or communities that make unreasonable demands from the team? How do you guys deal with that?
There are some projects ask more of Infra than others. Some we never hear from at all. There's kind of a fine balance. Projects that are fairly new, we probably spend a bit more time with them helping them out, making sure they get all set up. They may ask new things, there may be some initial push backs, then all of a sudden there's another two or three projects interested in the same thing. So then we have to take a serious look and decide whether that's something we need to support ongoing.
We do get each of the team members I'm sure gets private PMs on Slack and emails and stuff like, "Hey, can you help me out with this?" Or whatever. Sometimes you just do it. But we're sort of encouraged to ask them to go through the proper channels via a Jira ticket or email to the appropriate list.
Not to name names, but have any Project's expectations been so unusual or so out of scope that it shocked you guys? Have you had situations where it's just been absolute, where you guys have been floored by it?
There's been maybe one or two projects that have just been incessant in their demands on Infra, as if we were their personal team. But we deal with it as in, "okay, slow down, what do you need? File a ticket." If they keep going on and on and on, then obviously we've got escalation levels. We can say, "Hang on," and we can pass that onto our boss and say these are being a bit unreasonable.
For those "colicky baby" types of projects, I've been hearing more and more about additional services being offered through Self-Serve. Are these guys able to take on Self-Serve and go, "Yeah that works for us and we'll do it." Have they been able to kind of self-satiate their needs, or has it always been "Infra do it for us"? How successful has Self-Serve been in terms of wicking away demand?
It's been hugely successful. You're referring to selfserve.apache.org: we introduced that three-four years ago maybe. It was a way to ... help the projects help themselves so they don't have to wait for Infra, because they know Infra is busy. Sometimes waiting two or three days for something is ... from their side of things they're like, "It's been two or three days. Still hasn't been done." But from our side of things, "it's only been open two or three days, what are they worried about?"
... "You're in the queue, wait."
Yeah: self-serve was introduced as a way for them to help them, and also it helps us, there's an awful lot of tickets now that don't get filed because of that. They can create their own Jira Project. They can create their own Confluence wiki. They can create their own Git repositories.
... On their own completely? Without intervention, without "mother may I?", anything? They just go do it?
Yep. There's an awful lot that they can do on their own. And we introduce more self-serve things all the time that otherwise we'd have hundreds more tickets if they weren't able to do that on their own. They can create their own mailing list now: they don't need us.
Do you have to be a PMC member to do that? Can any Committer can do that? Who gets to administer these types of services for projects?
I believe some of the self serve options are PMC chair, and others are PMC members.
… So not just some person who's like, "Hey I'm committing code, I'm going to go and futz around with the site and break something."
That's good. Controls obviously are necessary. This is terrific: what a huge difference.
We've got hundreds of projects that have successfully incubated and graduated under the Apache banner. How do you guys develop new products and services to help support that innovation? We get all sorts of projects coming into the Foundation. Going back to OpenOffice as an example, we've never had a project like this of that scale, and consumer-facing. There were so many different things about that that was so unique, and yet we said, "Yeah you're part of the Foundation, you're coming in, you're part of the family."
We’ve had to adapt as we grow. Is there a way for you guys in anticipation ... feel like you need to have a different type of runway in order to accommodate new projects coming to the ASF? Or do you deal with it as it comes along?
Infra is not in control of what projects come to the Foundation. We don't have a say in that. When a project comes to the foundation and they have different requirements, then that's when we get to know about it. And we would deal with it appropriately then.
Obviously there's growth and we know that there's going to be more and more projects coming to the ASF all the time. So we anticipate growth as such.
… So you are setting yourselves up to accommodate more growth, not specifically a matter of "we need more Jenkins" or whatever.
Right. I mean whatever it is that we are looking after, we need to know that that particular service is going to be able to connect with growth.
Got it. How many requests do you receive a day? In general in terms of what constitutes "hey we're slammed" versus a regular day of "we've got 40 things in the hopper", that's normal? What's the volume that you are dealing with?
I want to give you a figure as far as Jira is concerned, which is only one aspect of the things that we handle. Not everything is done by Jira tickets. But I'd say on an average month, we probably get between 150 to 200 tickets a month.
I've been on the Infra channel on Slack, and it's constant. It's nonstop.
Explain to me a typical workday. How do you manage between "hey I'm focused on a long-term project, this new request is coming in, Sally's hair is on fire because she needs help with a mailing list" and whatever else is going on? There's just constant demand on you guys. How do you not go crazy? How do you manage this?
We just get used to it, I guess. Obviously each individual handles their own time in their own way. At any one time there could be one, two, or all of us could be on Slack. So as requests come in on Slack, if it's a two minute, five minute job, we might just say, "Okay, all right, I'll sort that out for you now." Or if we feel it's going to be a little bit more in depth then we say, "Okay file a Jira ticket." Then one of us can pick that ticket up and take a look at it.
We do get people pinging individuals on Slack saying, "Can you help me with this?" Or whatever. Which is often negative to them in a way that they're narrowing their scope of help they can receive by targeting a specific individual. That person might be extremely busy for the next four or five hours, day and a half, whatever it is. And there's another four or five people that could help them with that question.
Typical day, obviously you get up, you check your emails, you see what's urgent, are there any fires to fight straightaway. You go on Slack, that stays open all day. As requests come in, you check Slack all day long. That's just one of those things. You check your tickets, your Jira tickets, what needs doing today, what can wait, or if you've got plenty of time then even the ones that can wait get done.
Whatever order you feel is most important. Then yes, everyone's got longer-term projects on. So myself personally, if I can spend a day or two on a long-term project, then get back to doing tickets, it's the way it is. If there's a lot going on in ticket land, then your project gets put on hold. If something breaks down ... The other week we had to move our Jira server because the hardware broke, so on a Sunday things broke down. Quickly fire up a new server and move everything across. Not sure anybody noticed, which is a good thing.
That's always a good thing. Business as usual, no one knows. With all this stuff coming at you and servers breaking down on the weekends, et cetera, how do you keep everything organized?
It depends on the day, I guess. Some days are good, some days are ... some days you can't see your hand in front of your face for things going on. Each day as it comes. There's no plan. I don't plan what I'm doing tomorrow. If there's a long-term project and I think things have slowed down, projects aren't asking for things, tickets are coming in slowly, I think I'll get on with my project tomorrow. Then you wake up tomorrow and something different happens. There's no real plan.
You don't use any special tools to keep your work checklist in order or anything like that other than the Jira?
I tried to use various products over the years. You've got Trello and these other kanban board type things. You actually got to open it up and fill it out, haven't you?
It's so interesting you say that because I think some people find that structured way of working extremely efficient, then it's exactly that solution for them. Spending the time to actually do it is taking away from doing other things ... so I don't know if that works for everyone.
It doesn't work for me. I did start one of these boards, but it doesn't fit in with the job. You've got ... "okay, this has got to be finished in three weeks, this has got to be finished in two days." And it sends you reminders and emails and this and that. I mean there is no time limits on things. We're not a software project. We don't have to release something next week.
… True, you don't have hard delivery dates.
Like you say, time is taken away by filling out these things that are supposed to help you organize. So I just don't do it anymore.
Do you have other challenges with that? Balancing everything and getting everything done?
No, feeling okay. I mean I'm still here.
That t-shirt is evidence, that's true. Since Day One, the ASF has been known for creating their own rules for success. They're like, "We're going to do it our way," right? And Infra --even before there was an official infra-- played an important role. You can't exist without that kind of support. How has --and you've been with the Foundation long enough to see patterns and changes --how has infra changed over the years?
Good question. When I officially came onboard as a contractor, I was the second contractor at the time. And everybody else was a volunteer. There were quite a few volunteers. And they were there a lot. At least a dozen people that were active as infrastructure volunteers, even though they knew that there were two people getting paid to do the same thing, they were still there. Still volunteering.
Over the years, things have gotten a bit more professional, I guess. The service requirements have become more of a professional level. Down time is ... years ago if something was down for a couple of hours, it was like "there were just volunteers that are handling it. They'll get to it when they can". But as more and more paid staff had come onboard, to a grand total of six, a reverse happened with volunteers. They've mostly gone. You've got now maybe two or three volunteers that have stuck around and been around for a while. Because there's paid staff doing it. It's changed as in "who wants to volunteer for something when there's people being paid to do it?"
Was this shift proactive or reactive? Was it a matter of the demand coming from a Project and for us to go, "Well we better change this," or was it a matter of we're feeling like we're having volunteer burnout or whatever and we need to make this a more professionally oriented organization? Do you recall how this shift happened?
It happened gradually over the years. As the Foundation grew, more projects came in, more hardware was required, more services are required, more hands-on time is required. So you increase the staff one by one to handle this. Then I think over time as volunteers start dwindling away, due to the fact that there's people getting paid to do it.
That's one aspect. The Foundation itself has a responsibility to the Projects to ensure that there is solid infrastructure there. So there's got to be a requirement that there's people there all the time to maintain this infrastructure. The Infrastructure team has become more professional over the years. The Projects have become customers, I guess. Volunteers are always welcome; at Infra we still have plenty of areas in which volunteers can help out. And, we don't bite!
Obviously the SLA is related to that shift too. They're becoming customers versus "we're all in it together and everybody figure out how to make it work". I'm sure the expectations also were higher, right? Because now you have a team, what's your excuse for not getting it done?
[END OF PART ONE]
Posted at 09:51PM Nov 02, 2020 by Sally Khudairi in SuccessAtApache | |
Inside Infra: Daniel Gruno --Part II
The "Inside Infra" series with members of the ASF Infrastructure team continues with Part II of the interview with Daniel Gruno, who shares his experience with Sally Khudairi, ASF VP Marketing & Publicity.
Posted at 06:26PM Oct 12, 2020 by Sally Khudairi in SuccessAtApache | |
Inside Infra: Daniel Gruno --Part I
The fourth interview in the "Inside Infra" series with members of the ASF Infrastructure team. Meet Daniel Gruno, who shares his experience with Sally Khudairi, ASF VP Marketing & Publicity.
Posted at 02:47PM Sep 07, 2020 by Sally Khudairi in SuccessAtApache | |
Success at Apache: I Became an Apache Solr Committer in 4,662 Days. Here’s how you can do it faster!
by Eric Pugh
On April 6th, 2020 I was invited to become a committer on the Apache Solr project. My journey to becoming a committer started in earnest 4662 days before that! On July 2nd, 2007, I opened SOLR-284, a ticket for adding content extraction to Solr.
A committer on an open source project under the Apache Foundation umbrella is someone who is trusted to contribute code to the project and to help manage and drive its ongoing development. It’s an honour to have been asked and I was very proud to accept the invitation!
So, you did the math, and you realized that it took me 153 months, or 13 years (rounding up), to become a committer, and you’re wondering “What if I don’t want to wait that long?” So here’s my quick cheat sheet on ways to become a committer on an open source project, illustrated by my own journey:
- Start by learning the culture of the project. How are decisions made? What tools do people use? What do the various acronyms mean? Join the mailing lists and read every commit.
- Start small and work your way in. Some great ways to do this are to:
- Take existing patches and test them. Update them to the latest code base. Document what you’ve learned.
- Take advantage of being new to a project to bring fresh eyes to the documentation. Every time you find yourself scratching your head on how something works, contribute a fix to the docs. It’s a powerful way to immediately contribute. This is the fastest way to get involved and involves the least cognitive load! See SOLR-2232 or this email thread.
- Answer questions on the mailing list! Being able to articulate reasonable responses to questions demonstrates how much you have learned.
- Bug fix, bug fix, bug fix! Pick bugs that have an obvious answer so that the “correct” solution is easy to figure out. If the right approach to solving it is very ambiguous, you probably won’t get much traction. Remember to remind committers to apply your fixes when they have the time! See SOLR-13965 and SOLR-11480 and SOLR-2611 and SOLR-2263.
- Ready to start slinging some code? Don’t go and refactor the core foundations of the project (at least not yet). Instead, be like a pilot fish and latch onto one of the core committers who is being very active in the project.
Embrace their vision, and start picking up tasks related to whatever major chunk of work they are doing. Write some unit tests. See about opportunities for refactoring. Do some manual testing over multiple platforms. Once they see that you’re contributing (and accelerating what they are pushing), then work to get some of your own tickets assigned to you under that vision. I’ve seen this lead directly to committership many times, and if I had followed this route, I might have joined sooner!
Here’s to the next 4,662 days of being active in the Apache Solr project!
Eric Pugh is a member of the ASF and a committer Apache Solr. He co-authored the book Apache Solr Enterprise Search Server. Eric is co-founder and CEO of OpenSource Connections, where he helps OSC clients, especially those in the ecommerce space, build their own search teams by leading projects and by acting as a trusted advisor. He also stewards Quepid, a platform for assessing and improving your search relevance.
[this post first appeared at https://opensourceconnections.com/blog/2020/07/10/i-became-a-solr-committer-in-4662-days-heres-how-you-can-do-it-faster/ ]
= = =
"Success at Apache" is a monthly blog series that focuses on the processes behind why the ASF "just works" https://blogs.apache.org/foundation/category/SuccessAtApache
Posted at 12:21PM Aug 03, 2020 by Sally Khudairi in SuccessAtApache | |
Inside Infra: Greg Stein --Part III
We were talking about ensuring that the team is up to speed with everything required of them...
So there certainly are skill gaps; this is one of the things I want to help motivate the team with, where if somebody says, "Hey, I want to go and investigate Ansible as a potential Puppet replacement," I say, "Go forward."
This would be similar to Google having their 20% projects. I'm sure you've heard of that.
It's almost the same where it's not 20%, maybe 5%, but it's the same as Google, no matter what they want to tell you, because everybody's got their job and you have to be really rigorous to carve out 20% of your time. And strictly speaking, it does actually make your Google manager a little upset if you carve out the entire 20%. But anyways, the concept is similar.
So for us it’s like, "Well, go in and investigate Ansible, see if it'll work for us and put your notes into the Wiki." That's how we make forward progress, up our game, and learn new skills. If someone says, "I want to go and figure this out," the response is almost always, "Okay. You go do it." There's certainly an allowance for people to learn new skills. But most of the time we simply rely on, say, Gavin (ASF Infrastructure team member Gavin McDonald), knowing more about JIRA configuration than the other guys.
That added component of sharing what you know, and adding it to the JIRA or to the Wiki actually is great because then everyone's learning. This is like the rising tide: everybody's learning about this, whether they're doing it perfectly or not. I think this is a very interesting process.
Yes, and that's also where Andrew (technical writer Andrew Wetmore) is helping us out. He’s organizing that information that we have learned, that we have documented, that we memorialized into the Wiki.
Because our (ASF’s) legacy is quite Medusa-like over all these years, it's interesting to see how everyone can get caught up and also contribute...you have to go back and deal with the legacy, but you also have to be able to move forward. To be able to bring others with you is brilliant. That's really cool.
The infrastructure has grown organically over 25 years from when Brian Behlendorf first said, "Hey, I have this server called hyperreal.org: you can run a CVS repository on it for the Web server."
That computer was under his desk at the Wired offices way back when, wasn’t it...
Yes it was. And it's just grown organically over those 25 years. Then we had Minotaur and it did six different things ... now it only does half of one and we've moved the stuff out onto newer machines and newer processes and this and that. But the organic growth means that we've got some really hairy stuff. Our move to Puppet --first Puppet 3, and now to Puppet 6-- at each step we're improving it and making it less hairy and more manageable and something that somebody can come along, look at, pick up and run with it from there. That makes it a lot easier, so that we don't have to spend 100% of our time cross training.
What are your thoughts on products, the hype cycle, where everyone's demanding Kubernetes, to use that as an example. Do you decide which products to provide support for, or is that up to Apache projects in the communities? You mentioned Ansible, just not too long ago, that was your internal decision to move. But I remember not long ago, GitHub entered into the landscape. How did that happen? How did you decide to make a move like that? That's a significant thing. Can you tell me a little bit about that?
It's a lot based on community input. So if we see a lot of people asking for a particular tool, we'll like, "Oh, hey, David, can you go and take a look at that and see if that's something…” Not David (ASF VP Infrastructure David Nalley), but Chris (Infrastructure team member Chris Lambertus) or somebody else. "Can you go take a look. Is that something that we can support? Because we're getting some queries about it."
And there's a little chicken and egg problem there that if the communities don't know to ask for the egg, we don't know whether to prep the chicken. It's like, “okay, wait, they don't even know to ask for a tool because we haven't said we will make this tool available, because we're not going to make the tool available until somebody asks”. But sometimes people file tickets like, "Can I get this set up?" and we'll go, "No."
Then six months later, somebody else will file a ticket: "Can I get this set up?" and we'll go," No." But after enough of those, we're like, "Maybe that's something that we really want to do." For GitHub, specifically that’s what happened there. Well, even before that Git, where we ran our own Git server, that was a volunteer that made that happen. That was, six years ago or so.
Well...the volunteer came along and said, "Well, I'll do this. I'm not going to take any time from Infra." There's been a couple things for the past few years where I've told people, "No, Infra will not work on that. But if you want to volunteer or find a volunteer, then we'll stand it up for testing." You know what I mean? Why not? So there's a couple things where people have stood up for test examples and there hasn't really been a lot of usage.
So, we're not going to support that. But something like Ansible is our own internal workflow and the tool we’ll experiment with, then to see if it'll improve our stuff. But from the community, they pretty much have to ask and it has to be a sustained ask. That's how we ended up with Travis CI: we actually pay for capacity in Travis CI, and that's based on community input.
So many people wanted to do their continuous integration through Travis that eventually we decided to pay for it. But it's tricky because some of these systems like Travis CI and others require certain permissions that we don't want to provide to the community. So we will want to hold those only within Infra. And so it gets hard to integrate certain tools. We've had to say no, but then again, we've found other ways to improve that so that we can lock down the permissions or use a proxy or other ways that we can route around some of these issues and then integrate the requested tool.
So further to that, have you been in a situation where a project or a community has made unreasonable demands of Infra or have expectations, where it's like, so over the top or so out of scope, it totally surprised you? Have you had something like this?
Nothing surprises me.
Nothing surprises you? Okay. Have you been in this situation? Like “was never going to happen”...
Yes, yes. There's been several times where one of the guys on the team is like, "Oh man, I got this ticket. I don't know what we want to do with this. Greg, go take a look." And I go and look at it and that's where I make that call: "Okay, is the Infra team going to take this on, or do I just say ‘no’ right now?"
So, yeah, there's been a number of times where I've said no and probably two or three times where I've gotten a little bit of pushback on that no. I say, "My answer is no, but here's how you escalate." I've had escalation a few times and I'm actually, mid-process --I'm dealing with one right now. So, I've said, "no, if you don't like my no, you can go to VP Infra and VP Infra is, probably going to tell you the same thing. And then you can go to the President. Right now those are actually the same person."
The same person is a double "no".
That really is the true escalation path. I have to describe that to people and say, "I don't think you're going to get what you want." If I'm the one that says, no, you probably are not going to get it because VP Infra and President, and after that is the Board. They're probably not going to say, "Greg is wrong. Yes, we'll give that to you." But it's there. There's been a couple of times where I said "No, you have to ask the Board for the budget for those additional virtual machines." They went to the board and said, "Can we have budget for three machines?" and the Board said, "Yes."
So Infra went ahead and gave them the three VMs that they had initially requested. Strictly speaking, we would track those machines against their budget, but that detail is more than what the actual budget was. So we don't spend that time doing that, but I have had to say, no. I have had to... There was Apache Maven: they were keeping a copy of Maven Central, and Maven Central is run by Sonatype...
Which is a commercial product...
Yes. They're using the trademark “Maven”, essentially a licensing agreement from us, a MOU. So with Maven Central, you could imagine if someone decides to just turn it off one day ...we wanted a copy. Apache Maven was making a copy of it, and it just started consuming so much disk space. We were like, "We can't support that growth rate. We can't support that even for the next six months. If you want to keep doing it, go ask the Board for money to keep doing it." They never did. We turned it off.
I wouldn't call that a ridiculous request --it was something where we didn't have to just say, "No, not going to do it. Bye." A lot of the requests are mostly just, "We aren't going to run that extra software. If you want: ask for a VM and you can run it, but we're not going to take responsibility for it."
Over the years, obviously ASF Infra has changed. Was this all reactive or was it also proactive? Do you plan for those changes as you go or has it all been in response to Project X or in response to X emergency?
The growth of Infrastructure and its movement from volunteer-only to paid staff was part of just the growth of Apache. The volunteers could no longer keep up and things, like account creation, used to take sometimes four weeks to get an account. You’d put in a request for an account, four weeks later, it would finally get created.
My gosh, that queue was crazy, huh?
Well, it wasn't even a long queue, it was simply that we didn't have volunteers making sure the queue stayed empty. Today it's down to one, two, maybe three days, and the account is created, because every day a staff member goes and creates the accounts first thing in the morning.
It was how I said that my day starts with looking at messages on Slack and then reading emails to see if there's stuff to handle. Well, one of the guys on staff, first thing he does in the morning is go and look at account creation. So he's been off and on pondering on a tool to make that easier for himself; he hasn't finished the tool, so he still has to do it manually. That's his incentive.
This is Chris Thistlethwaite. I say, "Chris, we can do something about that." And he says, "No, no, this is still my project. And every day when I run the script, it just makes me remember, I need to finish this."
So when the volunteers could not keep up with the amount of work, that's when we hired Joe Schaefer, then we hired another person, and hired another person. And so it was just trying to keep up with the rate of requests.
That's how we ended up with hiring six people. And then I'm half a person, like I said, I'm part-time. So, it's just the growth of Apache. I think we're in much better shape than when I started. We're ahead of the curve. We can stay ahead of the curve because one of the things that I can do because I don't fight the fires every day ... that's for all the guys who know their stuff. They fight the fires and I can look at if I need to go and ask for another head count. And that's how we ended up with Andrew (technical writer Andrew Wetmore): “Well, you know, what we really need is somebody to manage all this documentation.” This was part of Sam's (former ASF President Sam Ruby), “If you had some money, what would you do with it?” That's how the technical writer/editor came around, because we've got 20 years of organic growth. We had...let's just call it “organic documentation”. That revamping project is going really well, I think.
So, in what areas are you guys experiencing your biggest growth? As I was asking Chris and Drew, is there like a geographic influence on the demand? We’ve had a huge influx of users in China. Does any of that change the way or what you guys are doing? Or is it just more of everything?
Our biggest pain point, I would say, is continuous integration/continuous development: CI/CD. Jenkins, Travis, CircleCI, and things like this, where people make a change and they want that change built and tested. The more projects we get and the larger the communities get, the more changes and the more testing and the more building and the more this, more, more, more. It's kind of one of those things where it's “expand-to-fit”. So if we gave people 100 machines, they'd use 100 machines. If we doubled it to 200, they'd use all 200. It's just this rapacious need for CI machines. It's very hard to figure out how to plan around that other than just telling the communities, “No: we just don't have that much capacity: if you want to build it, do it on your own machine. You just can't use Apache hardware to do it.”
That's an unsatisfactory answer. That's been one of our hard problems and it's also kind of a newer problem: the development workflow that uses CI probably is just maybe five years old. Before that, certainly, automated building and testing was a thing, but it's really kind of grown into community workflow much, much more over the past five years, and more and more people are wanting to do it. The communities are growing. Apache is growing: we're just seeing the demand explode and it's a hard problem for us to solve.
China is the one case where we see regional issues, and that's because of the great firewall of China. Not because we're getting more Chinese developers, but because they have problems accessing our servers because they're located outside of China, and so we're looking at CDNs, a content distribution network to essentially make our content available closer to China. We've found that even with one of those CDN drop points in Hong Kong, they still have problems just reaching it there in Hong Kong, and so ... and we don't want to buy or lease or rent a server in China because doing business in China is too high of a hurdle for the Foundation.
You know, Microsoft and Google have to do business in China and they've got a pack of lawyers and a giant vault of money to deal with all the barriers. The Foundation does not, so it's also a hard problem to solve. We think we might be able to do it through Microsoft Azure, that they have a CDN that resides in China that Microsoft has done all that paperwork, so we're looking at that, but as far as regional things, it's not so much that we run into issues. We see Open Source communities in Europe and Brazil and Australia and Sri Lanka: none of them really have any problems because they don't have that firewall. It's not really about the Chinese people, but about the China firewall.
That's bigger than us. And that’s not something we can fire hose.
We do see little engagement from Japan and Brazil, and that is partly for language reasons and partly because the Brazil community is more about Free Software than Open Source software.
Yeah. They're very pro-FOSS.
Not OSS. But pro-free. And so, they're going to deal with the Free Software Foundation rather than the Apache Software Foundation.
I see. That’s an important distinction.
And then you also have the Portuguese language barrier. People contributing from Europe and India, Sri Lanka, etc., they pretty much know English and that's fine. A lot of the Brazilian developers do not know English...this is the same with the Japanese Open Source developers. Japanese and Brazilian, they tend to not know English, and so that kind of isolates them from the larger Open Source world, or Free Software world, in the case of Brazil.
Would we consider localizing anything that we do, or are we going to continue as-is, as the ASF is all English?
The Infrastructure team will not translate our documents to serve those other languages. That's just too high of a bar.
There are a couple groups that have user mailing lists that are not English and that's totally fine, and Infrastructure will... well, you don't have to file a ticket anymore. It's, again, back to selfserve.apache.org: “self-serve” on Apache will create a mailing list for users communicating in Brazilian Portuguese, for example, or communicating in Japanese. But Infra doesn't do anything about that, that's just the self-serve tools. We certainly can't support non-English, and I don't think that the Foundation itself is going to make any moves towards that.
Fair enough. So a lot of companies are really struggling to accommodate their teams working from home in response to the Coronavirus and all that. These stay-at-home orders are kind of shaking companies, but from day one, the ASF has always been a virtual organization. Has anything changed with your operation on that front? Has anything impacted the ASF's day-to-day, from this pandemic?
(chuckling) Not at all. I shouldn't laugh, but no. It really hasn't changed. We've been on our team channel for all three years, three and a half years that I've been here, and the world is burning down around us, but we still sit on the team channel.
Now, that said, (Infra team member) Daniel Gruno got stranded in Canada.
Right! He’s still there?
He's still doing work from Canada. This is why when he travels to Canada for two months at a time, I don't care, you know? Because if his butt is in a chair in Denmark or in a chair in Canada, it's the same butt, so, you know...
As long as you have connectivity and a computer, you can do it.
Right. But if he has to be offline for two months, I'd say no. Or if you want unpaid time off, well, I'm not going to pay you, of course. Certainly the discussions have changed, you know? I mean, going shopping. You know, some members are immuno-compromised and that had an effect on our team meeting that we were planning in Nashville: they were the first to say, “No way. I'm not going,” so, there’s that, but our day to day hasn't changed.
That's more of a social thing versus an operational thing. Safety first.
So the notion of, “Oh, I got to run out to the grocery store. I need to strap on a mask,” changes, but not the operation.
Right. Right. So...what do you think people would be surprised to know about ASF Infra?
I don't know if it'd be surprising, but we are global. We've got four people in the United States, one in Canada, one in Denmark, one used to be in Australia, but is now in the UK, which actually kind of hurt a little bit, because in Australia, that meant that we always had somebody in that time zone, but now we have kind of this gap of Australia/Asia time zones when...
A “Gavin” gap.
Yeah, well, I might be awake at that time, but I can't go and fix a MySQL server, so it does mean that we don't have that straight-up 24-hour coverage.
The notion that we are worldwide is kind of a neat thing about our team, and is what makes us pretty unique relative to other IT departments. I don't like being called an IT department, but that is essentially what we are.
What's the name of that TV show? The one that's about IT...
“The IT Crowd”, is that what you’re referring to? The British show?
Yeah. So, you know, that's a funny show, but mostly when you think “IT department”, you think of some corporate people with button-up shirts, but ...most of us, we're in our pajamas.
Good one. What's your favorite part of the job?
I definitely like the team and that's why, nominally I'm part-time, but I'm pretty much constantly on the team channel and interacting, and so I think I just put that down as volunteer hours, where before I might work on Apache Subversion, but now I hang out with the team or I write some little tool or something like that. That's definitely been one of the more rewarding changes. Up until I started with this, I'd been a director for 15-and-a-half years, and that was kind of how I contributed to Apache. Now my work for Infrastructure is a new way to contribute to the Foundation. I'm also part of a new community, where before I would hang out with the httpd community, APR community, the Subversion people ...now it's the Infra people and my hobby time is kind of blended in with my work time, and vice versa. I mean, when your work time can also be seen as a hobby time, that's pretty cool.
I do think it's the team that makes it interesting. That's what I like the most, and that I'm working with a new, interesting community to contribute to the Foundation.
Not only did you switch roles, you switched communities. What was your biggest challenge going into this new role?
I would say probably trying to delineate what I was going to handle for the guys and that I wasn't going to tell them what to do or how to do it. It's like, “OK, I'm here to assist, to unblock things, to enable you guys, rather than to block you or micromanage you.”
To earn that trust, that I wasn't going to be some pointy-haired boss telling them how to do their work. Now, I don't know if that was ever a problem for them, but that was certainly one of my initial concerns: how to properly create my role. This was the first time Apache's even had somebody fill in this role, so I also had to find the role, which is, again, why I came up with “Infrastructure Administrator”, is because I wanted to define it as an enabler role, as an administrator, so they could get their work done but I would not be their manager. I would not be their boss: I was simply there to enable them.
So, what are you most proud of in your infra career to date?
Ooh. I don't know. I would say by being hands-on, being the “hands” of Infra, it means that VP Infra didn't run away screaming.
David said in January 2016, maybe earlier, he was like, “No way. I'm out.” And after I was on the job for about two months, he said, “Huh. All right.”
And so I get that feedback from him, “You know, you make the VP Infra hat quite easy for me.” I think that's probably what I really like about taking on the role, is that one of our volunteers got to stay rather than drop it because it was just causing so much anxiety and pain and time and frustration. Otherwise, most of the stuff I do is really boring. Not to me, but I don't have “accomplishments”. I push paperwork, basically, so the other guys can do accomplishments.
Speaking of the other guys, how would your co-workers describe you?
I have no idea. I don't know. I really don't know. (laughing)
Where I just got done talking about what I saw as an issue, trying to frame what my role would be, it might have been fine with them and I was overly worried about it, but it’s hard for me to know. We don't do 360 reviews in Infra, so I don't get any feedback, really, from the team on what they think about myself or how I'm doing my job, so you'd have to ask them.
I have. Just kidding. So...what are the biggest “threats” that infrastructure managers or infrastructure administrators need to watch out for? What do you think is a “big thing” that people should be aware of, or is ASF so unique that you don’t feel like anyone really experiences what you experience?
There's our capacity issue with things like Travis, but I think you're asking a different question.
I am, but that's fine. What's your greatest piece of advice? What would you tell aspiring infra administrators?
Actually, one of my greatest fears is really, as a small charitable foundation, it's hard for us to compete with well-funded corporations and some well-funded start-ups.
Related to that, I touched on it earlier, is career development ...you go into Google or Microsoft and there's a career ladder; we simply don't have a career ladder. There's salary growth. There's bonuses. If you want to have a resume or a LinkedIn profile that shows changes in growth and titles and career ladder, we can't offer that, and that's going to cut out some people. It's a very hard problem for me to solve. You know, there's things I can maybe do, but I also want to keep the team egalitarian and sort of level, rather than, “Oh, well, this guy is now the team lead.”
Given what I talked about, our social aspects, because we are all equal peers, keeping everybody with the same title, same position on the ladder means that we are peers and it's a little easier to interact that way. It's a real, real difficult problem. You ask what's scary: that's scary.
But there's a counterpoint to that. You may not have a traditional career ladder path, but to say that you've worked in Infra for Apache carries weight. That's significant.
I believe it does, especially when you can demonstrate the hundred different types of tasks...
Well, that's exactly it. The breadth of work and the scale of what you guys do and the skill sets that you have to have and the fact that you have to play nice in the sandbox, all of it. The demand is immense, so to be able to be there and thrive and develop something from yourself in terms of a career is tremendous. Our team is exceptional. I mean, they're not expecting a linear ladder or something that others have.
You know, in other jobs, somebody might say, “I was a MySQL administrator.” Here, you're a MySQL administrator, PostgreSQL administrator… They had one role; here you've got dozens.
If you had a magic wand, what would you see happen with ASF infra?
I'd like to solve that CI problem. The other magic wand would be upgrading our mail server from 10-year-old technology to modern technology.
Is that happening or is that literally a wish list issue?
It's happening, but it's been happening for three years. The thing is that email is so central to the Foundation that we can't really experiment with that. There are certain things we can do, but most of it, not so much, and so it means that we're being super-careful. There's about 10-12 different moving parts to it, and we're upgrading each of those a little bit by a little bit, until we can finally pull that big, scary, Young Frankenstein lever to hit the lightning bolt, you know?
Yeah: I see the visual of that.
The magic wand would be to just make that all happen and make it work. Without the wand, it's going to take another 6-12 months.
Right. What else do we need to know that I haven't asked? What should I be aware of or what should I be sharing?
Oh, I don't know. This is where my creativity ends. Ask me a coding question.
Oh no coding questions. All right. Our time has also ended. Before we go, who should I be interviewing next?
I would say Daniel (Gruno), because his role ... he's 20-30% system administration. The rest is tool development, so that makes his role rather unique in the team.
Perfect. Thanks so much, Greg. I really appreciate it.
= = =
Greg is based in Austin on UTC -5. His favorite thing to drink during the workday is a big 32oz cup of Diet Mountain Dew.
Posted at 02:42PM Jul 17, 2020 by Sally Khudairi in SuccessAtApache | |
Inside Infra: Greg Stein --Part II
The "Inside Infra" interview continues with ASF Infrastructure Administrator Greg Stein, who shares his experience with Sally Khudairi, ASF VP Marketing & Publicity.
How or what would you describe the Infra "brand" to be?
I don't really know. I've never really thought about branding or marketing ourselves, so ...
Well, you guys have a certain persona, you have those funky t-shirts you wear at ApacheCon ...there's definitely some kind of street cred that's different from everybody else. I was curious to see if that's part of your natural sense of hip, or is that something that you guys deliberately planned for.
The t-shirts and other things go back to the team bonding kind of thing. We'll give ourselves an identity, but haven't tried to create or market ourselves. I think it is something that we do need to take some control over. We hired a part-time writer in December and he's been organizing our content to provide a better and more useful front to Infrastructure.
There were a lot of pages on www.apache.org that have now moved over to infra.apache.org. That creates a more coherent Web space, if you will. We can really talk about those different channels. "How do you reach Infrastructure? Do I go to the Slack channel or do I file a JIRA ticket: how do I decide?" So he's helping to, while I wouldn't say "market a new face", he's certainly helping people figure out who we are, what we do, what we can help with and getting that information organized.
Which is good. That's new. Even to have you guys featured in a project like this, it's unusual and it's refreshing. I'm personally curious, and I'm sure other people are also curious about what's behind Infra.
Right, right. Who are these crazy guys spread around the world that are keeping 200 machines up and running for all these different projects and committers and contributors?
So Andrew (technical writer Andrew Wetmore) is primarily going to work on the infrastructure docs until those are whipped into shape because a lot of the material that we have, a lot of the Webpages, is really infrastructure related. He has been working with the team on those pages. What's going to be harder though is when he's kind of at a stopping point for that, what to turn his focus to, and that would be www.apache. But then it gets a lot more difficult because when he wants to update the How It Works page, who does he talk to? Who's authoritative? He can do some edits for flow and word consistency, punctuation, clarity, right, but he can't really update the process.
Right. Right. That's the Foundation thing.
Yeah. But the problem is we don't really even have a concept of who's in charge of that How It Works page, who is, you know, it's just there's nobody that the foundation is willing to say, "That person controls that process." You know what I mean?
I totally do --I come across the same pages and people go, "Are they yours?" It's hard to determine not only evolving processes, but who signs off on this or who gets it. I hear you.
I've recommended for the past year, or three, that Marketing is the owner of DubDubDub (www.), but you know, that's the "face" of Apache. You know? But the raw content, as you point out, who approves the raw content.
I wake up, I look for email first, generally, sometimes I'll hop onto Slack because sometimes people ask me directly for something. Then I go look at email and sort through a number of different categories between direct team stuff, operations, the Apache Board, and then Apache in general. And then of course, if there's any vendor email to deal with. So there's a bunch of different categories in priority order. After I get through that initial work, then it's go and read all the back scroll in the team channel, which is anywhere from 200 to 400 lines of back scroll ...
Can you get any work done? Beyond just catching up on the communications?
Yes. But it does take like 30 minutes to read that back scroll. For me there's a lot in there about what the guys are doing and what they're working on, how to solve a particular problem when they're asking somebody else, "Hey, can you look at this? Can you help me with this?" But I don't, for the most part, "serve", you know ...they are the technical staff... I can do it: I have technical chops, but I let them do their jobs as they know best. I do like reading the back scroll because I'm also looking at it from the angle of "how is the team working together? Is that going well? Is there something that I need to poke and prod to improve how they're working? Are they getting jammed up on something that I can unblock for them so that they can get their work done?"
Stuff like that. That's what I look for when I go through that back scrolling, so it's important to me to read that back scroll. Most of the guys do tend to, when they first sign in in the morning, go back and scan for stuff where they might be needed. I've never really asked them how detailed they get, but I think pretty much everybody reads all of it to catch up, but they're going to be looking at it with a different lens than how I look at it. Mostly I'm looking at unblocking --are they running into problems that I can ease for them?
How do you keep your workload organized?
Fair enough. Again, there's a lot, so it's curious to me, like everything at Apache, with the exception of a handful of things, everything could be a priority, if you're always on fire and always running around, putting out fires, you know? It's funny when I've talked to the Infra guys and you also, you all have the same reaction to that question, which is the laugh. I think that's the nature of the beast with the ASF.
Yes. That really is the nature of system administration work. My career has been product development, and you can reasonably plot that out. You can say, "We're going to develop these five new features, which is going to take us between two and four months." We'll see...we might cut a feature to try and limit our time development. The feature is going to change, unless we'll plan in time for change. But system administration is very reactive, so it's a very different beast. This is where, like I said, we were kind of treading water with four people, but we could see as Apache was growing we were not going to be able to keep up. And we certainly weren't going to be able to move ahead of the curve and do things like selfserve.apache.org where, you know, before we would get a dozen tickets to create repositories and that took time. Now we don't have to do anything.
It's all selfserve.apache.org, but we had to write the tool first and have enough air time to get that tool written. So I think we're ahead of the curve. We're getting some of our longer-term initiatives done, but it is still a very reactive thing. For myself, my back office work is pretty straightforward and it's a lot of email and Website work, you know, going in, paying an invoice, putting in the infrastructure credit card, sending out a purchase order, stuff like verifying and improving payroll, that doesn't require me sitting down and writing Python scripts.
The other half of my job is being present on that channel because I also help to set priorities. When something comes up, I ask, "Is this a thing that we want to do? Do we want to take on this new task? Do we want to provide this new tool to the projects?" You know, like a project is going to say, "Well, we want to integrate this thing into our GitHub repository," and we go and review it. It may require permissions that we simply don't want to allow. So there's some of those kinds of policy kind of things that I also help with. And there's always being present to help set policies and priorities.
OK... so how do you work with (VP Infrastructure) David Nalley? Are you making the decisions? Infra is an unusual type of group as opposed to other areas of activity operationally at the ASF. How do you work together?
Correct: I'm the day-to-day, so I look at it like he's the brains and I'm the hands. That said, he's like the strategic brain and I do all the tactical decisions.
I make all the tactical decisions. I am an officer of the corporation. I can make any decision that I need to, related to Infrastructure. If I feel it's a little bit weird, then I'll bounce that off David, but for most of the stuff, he doesn't feel a need to inject himself in. He feels comfortable letting me go ahead and run with the things, and rely upon me asking when it seems a little sketchy.
That's good: that process suits both of your personalities, both your sensibilities. It sounds like a good fit.
I report to the VP of Infrastructure, and that is still David, even though he became Executive VP and is now (ASF) President. He still holds that title. He's asked me, "Well, Greg, maybe you should just be VP Infra," and I said, "No way." Because we're paid people, but the Foundation is all volunteers. I told him I do not want to be a VP, because I want to report to a volunteer. I think that I (and the team) should report to a volunteer that always has a volunteer eye on the Foundation's long-term goals.
Because I manage all the day-to-day, it's a very lightweight hat for him. That VP hat is a tiny aspect compared to his President hat. One day, he'll find somebody to take over that VP Infra hat, but I've essentially mandated to him that it has to be a volunteer position.
It's not that I see we're going to go all out of control and we need a check from a volunteer; I just want a volunteer to always be able to say, "Okay, you guys are a little bit crazy, let's redirect our long-term thinking more in line with what the Foundation wants," and have a volunteer interpret what the Foundation wants.
That perfectly dovetails into what folks referred to in our ("Trillions and Trillions Served") documentary, where they were talking about Greg Stein's famous "plan for the ASF for 50 years..." This super long-term vision, which again, everyone goes back and says, "Greg Stein said..." What does that mean exactly, and how does that translate to Infra, considering that you can't really plan that far out? How does that work?
Well, actually we can plan that far out. I wrote that "50 years" in one of my Director's statements, I think it was 2014 or 2012 ...maybe earlier. Where I was going in that Director statement was the Board doesn't deal with the communities. The Board is there to support the communities. So we want the Foundation to exist for 50 years so that these communities can continue to run and see through evolution.
Some communities are going to move to the Attic, new ones are going to come along, but we want the Foundation to be viable. To say "forever" is okay. Nobody can really put that in their brain. So I just said, "OK, we can think what 50 years means." That is long enough out, but still within people's brain capacity to think, "Okay, what _does_ 50 years mean?"
And so that's where I came up with that. What does the Board need to think about to ensure that we are here 50 years from now and our projects are successful and can run through their lifetime, lifecycles. Apache HTTP and Tomcat, I don't think they are ever going to go away, but you could see maybe in 30 years they might. There might be some other mechanism in computing that would obsolete them, but the model of Apache does need to exist for at least that long.
Now, within Infra, I think we actually can plan that far out because we have growth curves. We see what kinds of computing resources people need. So we can plan for project growth, for machine growth. We can do long-term planning on how we allocate machines among our various cloud resources that we have, and start to schedule those further out. None of that really affects our day to day, but it is something that we can project out a ways and think about what kinds of resources we are going to need two, three, five years from now.
There isn't anything really that we can do for 50 years, but we can keep it in mind. Okay, that is going to be a larger team. That is going to need a larger staff, a full time manager, a full time HR person, a full time... There's different things that will change over that time, but we can actually do some of that projection, although we haven't bothered.
I do the five year plans for the Board, but mostly that is a simple cost growth as opposed to actually changing the structure of the team or the role assignments, because like I said, I think probably within 10 years, we'll probably need to add one or two more staff on top of the head count of six that we have right now. And I think supporting that would still be fine for a part-time person like myself. But once it grows to 10 or 12, then I think it's going to need a real change. Where we need to have a full-time person managing and so, we'll need to adjust the budget considerably to make that happen.
But if we ever get there, the Foundation is going to be likely in a very different position. We're talking 10 years from now. And so, who knows.
So with more than 350 projects and initiatives as we've discussed before, how do you guys stay ahead of the demand? And again, if you're trying to plan for five, ten years out, you mentioned earlier cloud computing. Not so long ago, cloud computing was a novelty. How do you plan for this?
And that is where we try and move more things to selfserve.apache.org, where we look at the kinds of requests that we're getting. The kinds of tasks that we’re performing and find a way to automate that workflow and create more self-serve options for the kinds of tasks that we regularly get tickets on.
Where we used to get tickets on creating Git repositories, we get zero now and, and we can see over the past six months, we've had 20 tickets to do X, is there a way that we can automate that, so we don't have to get our hands on that ourselves and save our hands for doing things like machine upgrades, for rebalancing some of our computer resources, where things are running on an old operating system and we need to get that onto a newer version. Right now, all of our machines are managed by a system called Puppet, which does the basic configuration work for us. But today, we're on two different versions of Puppet, a really old one and a reasonably new one.
And we're trying to get everything migrated off the old stuff onto the new but once we finished that migration, we're going to have to start all over again, or maybe switch to a different tool. We're looking at a tool called Ansible to use instead of Puppet.
And so there's this never-ending ongoing set of tasks, but each time we do it, it reduces our workload by that much more. So when we upgrade from Puppet 3 to Puppet 6, we get an improvement in the maintainability of that server. And that means that we spend less time with that server going forward and have more time to do other things or to deal with project growth.
Regarding a scale of efficiency, how do you close your skills gaps? When I spoke to Chris and Drew, they both said, "We do everything." How do you do that? How do you know all of this? Do you look at this big picture and say, "Okay, we need a person to specialize in X and Y and Z," and then you send them out to learn about it? How do you cope with that?
The team definitely specializes. And the guys have specializations around different areas, but we do a little bit of cross training, but not a lot because as I mentioned, we've got like 200 machines, each individually doing their own thing. If we cross trained everybody in everything, we'd get nothing done. So, there's a little bit of cross training, but mostly some specialties. It does create a little bit of bus factor...
Which is very scary. I was just going to say, your bus factor is very scary. Talk about that.
The thing is that Puppet allows us to create configurations and that's in version control. If all of a sudden somebody leaves, another person can backfill them because if somebody leaves, it's not like they take their work with them: all the work is in version control. And so that work doesn't go with them, but we may need to backfill some education on that particular specialized area. For example, Chris (ASF Infra team member Chris Thistlethwaite) does a lot of our monitoring work. If he left, now we need somebody to get a little more familiar with NodePing and a little more familiar with Datadog, but that'll be like a week for somebody to pick that up.
It wouldn't be, "Oh my God, this is three years of expertise that we need to go backfill" ...we don't have anything that is that highly specialized.
Is that because the team is more well rounded or because you guys are more efficient or what about it? Because of technology evolution, or...
We don't deal with systems of that level of complexity. We've got 200 machines, like I said, each doing their thing, but it's not like we've got a cluster of 200 machines all trying to coordinate to create one particular outcome. It's, here's my SQL server, here's a JIRA server, here's a Puppet server. Things like that, where the amount of technology is pretty small in each little pocket ... but we just have a hundred pockets on our pants.
[END OF PART II]
Posted at 02:53PM Jun 29, 2020 by Sally Khudairi in SuccessAtApache | |