The Apache Software Foundation Blog

Monday November 02, 2020

Inside Infra: Gavin McDonald --Part I

The "Inside Infra" series with members of the ASF Infrastructure team continues with Gavin McDonald, who shares his experience with Sally Khudairi, ASF VP Marketing & Publicity.



"...The Foundation itself has a responsibility to the Projects to ensure that there is solid infrastructure there. So there's got to be a requirement that there's people there all the time to maintain this infrastructure. The Infrastructure team has become more professional over the years. The Projects have become customers, I guess. Volunteers are always welcome; at Infra we still have plenty of areas in which volunteers can help out."

All right, let's get started. What is your name and how is it pronounced?

Nice and easy one. Gavin McDonald. Just McDonald as in Big Mac and fries McDonald's. It's M and C, no Mac.

When and how did you get involved with the ASF?

That was back around about 2005. I was looking for something different to do than what I was doing. And I came across the Apache Forrest Project. I knew a little bit about XML and websites and stuff like that. So I started contributing to the Apache Forrest Project. And some months later they made me a Committer.

So you first got involved with the Forrest project, then at some point you became part of infra. How did that evolution happen?

That's me looking around for more things to do. I've always been involved in and interested in system administration work. My first real communications with the Infra team was whilst working on a Forrest Solaris Zone and needed some help with it. Shortly after that I started volunteering there. 

First of all, I saw a huge number of tickets regarding mirrors, you know for our software downloads. I'd say it was probably around 150 tickets outstanding for mirrors wanting to join.

... What?!

Yeah.

... One Hundred and Fifty...

Something like that; some of them had been outstanding for quite a while. At the time there was only one person being paid. There were volunteers obviously looking after the machines and stuff like that. Mirrors were sort of lagging behind as they were less important. So that was my in. I started off with getting karma to add all the mirrors.

There was a certain standard that mirrors have to have, certain configurations. So I was going backwards and forwards with the mirror providers and making sure they were up to scratch, then adding them into our configuration.

From then, I introduced BuildBot to Infrastructure. And I think maybe a year after that, this is now talking 2009, a position opened. I think more or less the rest of the Infrastructure volunteers said, "Gavin is doing the job anyway. Let's give it to him."

That was my interview.

Around October, November 2009 I became paid staff.

Are you the longest serving member of the current Infra team?

Yes. Last year at ApacheCon I got presented with a 10 year t-shirt. Next time there's a physical conference I'll be bringing it along.

10 years thumbs up: that's good! Explain the structure of the Infra team and your role in it.

There are six of us, plus Greg (Stein), our Infra Admin, and David (Nalley), VP Infra. One of them is a documentation guy, that's Andrew (Wetmore). The rest of us all various system administration devops work. We look through tickets, what's needed to be done, and obviously we're looking to improve our infrastructure uptime and software and updates. So we all do what's needed, basically. Everyone has various roles.

What's your role?

Well it's a bit of everything, I think. I have been concentrating quite a lot on the CI/CD side of things. That was written into my original contract, which is now not part of the contract. Basically that means the whole entire time I've been here, I've been involved in BuildBot and Jenkins and other CI/CD stuff, and I've been doing a lot of that lately as well. Migrating Jenkins over to new Cloudbees software, and on a whole load of VMs, mainly in AWS.

You mention that CI/CD is a key part of your role. Is that what you're specifically responsible for within Infra? Are you "the CI guy"? Are there other things you do? Everyone says to me, "Hey we do everything." That sounds amazing, but how is that possible? Do you do everything else in addition to the CI work?

Yeah pretty much. Yeah. Everyone can do pretty much everything that we touch on. Some just choose to do certain things that they're more capable of or more used to working with or they like it better. Nobody is told, "You're working on this."

That's interesting. Fill that part in: if there's six things that need to get done, but five of you are actually hands-on sysadmins, so you guys do what you like to do or what you prefer to do? No one says, "Okay you go handle that mail server"? How does it work?

Obviously there's 24 hours in a day and there's people all around the world. If there's an emergency going on or a mail server breaks down or something needs doing, then whoever's around at the time would step up and say, "Okay I'll take a look at that."

So everyone's pitching in --it's not, "Hey I'm not going to do it. Wait for ‘the mail server guy’."

No, no. Obviously I'm sort of known for the Jenkins and BuildBot stuff, but if I'm not around, everyone else can just jump in and get on with it.

So how did you become the Jenkins guy? How did you get to be the BuildBot guy? You were saying earlier that you kind of evolved into it because it was needed, but is this something that you've personally had interest in to start? Or is it just, "Hey there's a fire here, I need to put it out," and it became this cellular memory, a habit, it's now your thing.

I think a little bit of both. I started off introducing BuildBot not long after I started. Jenkins had already been going a little bit at that time, but I've been involved in that also since the start. And it's over the years become more important to projects. Back when it started, it was a nice-to-have sort of thing. There was none of this pipelines, and CI wasn't an integral part of releases, whereas these days it's more and more a requirement. Jenkins and BuildBot have gone from second-class citizens, if you like, to one of our core services that needs to be kept on top of all the time. It's one of the most important aspects of our infrastructure for projects. There's a great demand for it. And it's increasing all the time.

That's interesting to see it go from a supporting role to the lead demand. That's been what, over 11 years now?

Yeah, 11 years.

In earlier interviews, I spoke to Chris and Drew and Greg and Daniel (Infra team members Chris Thistlethwaite, Drew Foulks, Infra Administrator Greg Stein, and Daniel Gruno) and they've all given me their perspectives on the many areas that infra is involved with. Tell me about the scope of the work that you guys do, and how is it different from other Open Source foundations?

Not sure how I can answer that. I'm not involved in any other Open Source foundations.

Okay, well tell me how does Infra operate at the ASF? You support the Foundation, you support projects. How do you help?

We have, as you know, over 300 projects. Each of those requires a Website, each of those requires an area for their code, whether that be Subversion or Git. We obviously over the last couple of years have been more involved in supporting GitHub for Projects. And we have the Confluence wiki and Jira for the issue trackers. So all of the services that they need to operate as an Apache Project is what we offer them.

So every project needs a Website, as you said. Each Apache Project is responsible for their code, their communications, and their community. So they run their own Website, but Infra handles the backend? What is it that you do for them?

Yes, we handle the backend. We've got Web servers that all the Websites get published on, but they write their Website content, and that could be written in many different languages. So we support them being able to provide their Website content in whatever manner they want. This could be just plain HTML, it could be compiled in Maven or in Pelican, there's a million different things. GitHub pages. So we provide the publishing methods for them to be able to go from there ... most projects these days just want to be able to commit a change and leave it at that. Then that change automatically gets published to the Web via automated mechanisms at the backend, you know? We watch for a commit. That could be via a gitpubsub, could be via Jenkins, via BuildBot, GitHub Actions, all of these methods. We'll see a commit, and it'll publish it and build it if necessary before publishing. So they just commit a change and leave it at that.

So the magic that's associated with that automation, is that something you guys are building to support them? Or is it something that pre-exists? How are you integrating all these different languages or platforms? How is this happening?

Well, software like Jenkins and BuildBot ... those mechanisms we can provide pre-built code to watch their repositories for commits to their Website repository. It'll automatically build it, and then it'll push it to the websites. There's also recently GitHub actions will also, instead of being on Jenkins or BuildBot or Travis or any of those, GitHub actions will take a commit straight out of the GitHub repository. It'll do the building of the Website, then it'll push it to usually what's called an asf-site branch. And then we pick it up from there and publish it. The actual GitHub actions code themselves is written by the projects. So that's self-serve.

If there is a fail for that commit, who fixes it? Is it the Project’s responsibility or is that your responsibility? Who's under the hood dealing with that?

It depends. If it's a coding error, then it's theirs, the Projects. If there's some kind of hardware failure, or if there's a piece of software gone down, communications error, yeah, it's up to us to track that down and find out what happened.

I'm understanding a trend here. If you go to other foundation sites, they seem more “corporate” in the sense that everyone's site looks, feels, and performs the same way, they operate the same way and they tend to be under the same infrastructure altogether, right? They're not using 50 different CMS's.

Right.

... That in itself is highly unusual.

Oh okay, yeah. We don't mandate how projects make their Website look, or we don't mandate how they must build it.

That in itself, the autonomy to do what works best for the Project, I think is highly unusual.

Okay. That's good to know.

In terms of ASF Infra and other foundations, you guys don't sit together and compare notes or talk to each other or anything. A lot of groups copy us, so I presume there's little interaction other than socially, right? I didn't know if there was, "Hey, Linux Foundation does that. We should do the same thing," kind of thing. The ASF does its thing and so be it. 

As far as I know, we have no interest in what other people are doing in terms of how they do things. We do things how we think it's best to do them for us and our Projects, how it works best for us. Whether other team members go off and have a look at how other foundations are doing things, I don't know. But I don't.

... Uniquely Apache.

Yeah.

In terms of services, what's the difference between what you offer for individual Apache Projects and their communities versus Foundation-level initiatives? I presume there's a difference --is the majority of your work serving the Projects? What's the percentage of work that you do that's for the Foundation versus Projects? Is it all for Projects? Or is it all considered one thing.

I don't really see a difference. All the work that needs doing is for the Project or Foundations as a whole. It's all the same to me.

What about incubating projects? Do they have special needs or requirements? How do you support them?

Not really unless they're coming into the Incubator with something they've always used that we don't do. Then we would look at that and decide whether it's something we can do for them or not. There's been a few projects that come in like maybe OpenOffice in 2011.

That was exactly in my head in terms of pre-existing groups that have pre-existing infrastructure. OpenOffice was a whole community altogether in a completely different way. How did Infra support them? What did you do? I knew that there were some issues with the codebase and licensing. What else did you do to support that project?

Oh that was a while ago.

... That's fine, I was just curious as to what you guys did. I just remember it was a huge lift from everybody, from all sides. Licensing and code and every aspect of that project coming in seemed to me to be very, very, very challenging, but we got through it. So that's great.

I know there was a lot of work bringing the code in, and not just from the licensing perspective, but also it was an enormous amount of code that needed to come in. I don't know whether they were in Subversion beforehand, but we provided them their space in Subversion and their Website space. I think a lot of the work was done by the project themselves.

Wow, wow. That was a lot of work. How do you handle Projects or communities that make unreasonable demands from the team? How do you guys deal with that?

There are some projects ask more of Infra than others. Some we never hear from at all. There's kind of a fine balance. Projects that are fairly new, we probably spend a bit more time with them helping them out, making sure they get all set up. They may ask new things, there may be some initial push backs, then all of a sudden there's another two or three projects interested in the same thing. So then we have to take a serious look and decide whether that's something we need to support ongoing.

We do get each of the team members I'm sure gets private PMs on Slack and emails and stuff like, "Hey, can you help me out with this?" Or whatever. Sometimes you just do it. But we're sort of encouraged to ask them to go through the proper channels via a Jira ticket or email to the appropriate list.

Not to name names, but have any Project's expectations been so unusual or so out of scope that it shocked you guys? Have you had situations where it's just been absolute, where you guys have been floored by it?

There's been maybe one or two projects that have just been incessant in their demands on Infra, as if we were their personal team. But we deal with it as in, "okay, slow down, what do you need? File a ticket." If they keep going on and on and on, then obviously we've got escalation levels. We can say, "Hang on," and we can pass that onto our boss and say these are being a bit unreasonable.

For those "colicky baby" types of projects, I've been hearing more and more about additional services being offered through Self-Serve. Are these guys able to take on Self-Serve and go, "Yeah that works for us and we'll do it." Have they been able to kind of self-satiate their needs, or has it always been "Infra do it for us"? How successful has Self-Serve been in terms of wicking away demand?

It's been hugely successful. You're referring to selfserve.apache.org: we introduced that three-four years ago maybe. It was a way to ... help the projects help themselves so they don't have to wait for Infra, because they know Infra is busy. Sometimes waiting two or three days for something is ... from their side of things they're like, "It's been two or three days. Still hasn't been done." But from our side of things, "it's only been open two or three days, what are they worried about?"

... "You're in the queue, wait."

Yeah: self-serve was introduced as a way for them to help them, and also it helps us, there's an awful lot of tickets now that don't get filed because of that. They can create their own Jira Project. They can create their own Confluence wiki. They can create their own Git repositories. 

... On their own completely? Without intervention, without "mother may I?", anything? They just go do it?

Yep. There's an awful lot that they can do on their own. And we introduce more self-serve things all the time that otherwise we'd have hundreds more tickets if they weren't able to do that on their own. They can create their own mailing list now: they don't need us.

Do you have to be a PMC member to do that? Can any Committer can do that? Who gets to administer these types of services for projects?

I believe some of the self serve options are PMC chair, and others are PMC members.

… So not just some person who's like, "Hey I'm committing code, I'm going to go and futz around with the site and break something."

Yeah, no.

That's good. Controls obviously are necessary. This is terrific: what a huge difference.

Yeah definitely.

We've got hundreds of projects that have successfully incubated and graduated under the Apache banner. How do you guys develop new products and services to help support that innovation? We get all sorts of projects coming into the Foundation. Going back to OpenOffice as an example, we've never had a project like this of that scale, and consumer-facing. There were so many different things about that that was so unique, and yet we said, "Yeah you're part of the Foundation, you're coming in, you're part of the family."

Yeah.

We’ve had to adapt as we grow. Is there a way for you guys in anticipation ... feel like you need to have a different type of runway in order to accommodate new projects coming to the ASF? Or do you deal with it as it comes along?

Infra is not in control of what projects come to the Foundation. We don't have a say in that. When a project comes to the foundation and they have different requirements, then that's when we get to know about it. And we would deal with it appropriately then.

Obviously there's growth and we know that there's going to be more and more projects coming to the ASF all the time. So we anticipate growth as such.

… So you are setting yourselves up to accommodate more growth, not specifically a matter of "we need more Jenkins" or whatever.

Right. I mean whatever it is that we are looking after, we need to know that that particular service is going to be able to connect with growth.

Got it. How many requests do you receive a day? In general in terms of what constitutes "hey we're slammed" versus a regular day of "we've got 40 things in the hopper", that's normal? What's the volume that you are dealing with?

I want to give you a figure as far as Jira is concerned, which is only one aspect of the things that we handle. Not everything is done by Jira tickets. But I'd say on an average month, we probably get between 150 to 200 tickets a month.

I've been on the Infra channel on Slack, and it's constant. It's nonstop.

Yeah.

Explain to me a typical workday. How do you manage between "hey I'm focused on a long-term project, this new request is coming in, Sally's hair is on fire because she needs help with a mailing list" and whatever else is going on? There's just constant demand on you guys. How do you not go crazy? How do you manage this?

We just get used to it, I guess. Obviously each individual handles their own time in their own way. At any one time there could be one, two, or all of us could be on Slack. So as requests come in on Slack, if it's a two minute, five minute job, we might just say, "Okay, all right, I'll sort that out for you now." Or if we feel it's going to be a little bit more in depth then we say, "Okay file a Jira ticket." Then one of us can pick that ticket up and take a look at it.

We do get people pinging individuals on Slack saying, "Can you help me with this?" Or whatever. Which is often negative to them in a way that they're narrowing their scope of help they can receive by targeting a specific individual. That person might be extremely busy for the next four or five hours, day and a half, whatever it is. And there's another four or five people that could help them with that question.

Typical day, obviously you get up, you check your emails, you see what's urgent, are there any fires to fight straightaway. You go on Slack, that stays open all day. As requests come in, you check Slack all day long. That's just one of those things. You check your tickets, your Jira tickets, what needs doing today, what can wait, or if you've got plenty of time then even the ones that can wait get done.

Whatever order you feel is most important. Then yes, everyone's got longer-term projects on. So myself personally, if I can spend a day or two on a long-term project, then get back to doing tickets, it's the way it is. If there's a lot going on in ticket land, then your project gets put on hold. If something breaks down ... The other week we had to move our Jira server because the hardware broke, so on a Sunday things broke down. Quickly fire up a new server and move everything across. Not sure anybody noticed, which is a good thing.

That's always a good thing. Business as usual, no one knows. With all this stuff coming at you and servers breaking down on the weekends, et cetera, how do you keep everything organized?

It depends on the day, I guess. Some days are good, some days are ... some days you can't see your hand in front of your face for things going on. Each day as it comes. There's no plan. I don't plan what I'm doing tomorrow. If there's a long-term project and I think things have slowed down, projects aren't asking for things, tickets are coming in slowly, I think I'll get on with my project tomorrow. Then you wake up tomorrow and something different happens. There's no real plan.

You don't use any special tools to keep your work checklist in order or anything like that other than the Jira? 

I tried to use various products over the years. You've got Trello and these other kanban board type things. You actually got to open it up and fill it out, haven't you?

It's so interesting you say that because I think some people find that structured way of working extremely efficient, then it's exactly that solution for them. Spending the time to actually do it is taking away from doing other things ... so I don't know if that works for everyone.

It doesn't work for me. I did start one of these boards, but it doesn't fit in with the job. You've got ... "okay, this has got to be finished in three weeks, this has got to be finished in two days." And it sends you reminders and emails and this and that. I mean there is no time limits on things. We're not a software project. We don't have to release something next week.

… True, you don't have hard delivery dates.

Like you say, time is taken away by filling out these things that are supposed to help you organize. So I just don't do it anymore.

Do you have other challenges with that? Balancing everything and getting everything done?

No, feeling okay. I mean I'm still here.

That t-shirt is evidence, that's true. Since Day One, the ASF has been known for creating their own rules for success. They're like, "We're going to do it our way," right? And Infra --even before there was an official infra-- played an important role. You can't exist without that kind of support. How has --and you've been with the Foundation long enough to see patterns and changes --how has infra changed over the years?

Good question. When I officially came onboard as a contractor, I was the second contractor at the time. And everybody else was a volunteer. There were quite a few volunteers. And they were there a lot. At least a dozen people that were active as infrastructure volunteers, even though they knew that there were two people getting paid to do the same thing, they were still there. Still volunteering.

Over the years, things have gotten a bit more professional, I guess. The service requirements have become more of a professional level. Down time is ... years ago if something was down for a couple of hours, it was like "there were just volunteers that are handling it. They'll get to it when they can". But as more and more paid staff had come onboard, to a grand total of six, a reverse happened with volunteers. They've mostly gone. You've got now maybe two or three volunteers that have stuck around and been around for a while. Because there's paid staff doing it. It's changed as in "who wants to volunteer for something when there's people being paid to do it?"

Was this shift proactive or reactive? Was it a matter of the demand coming from a Project and for us to go, "Well we better change this," or was it a matter of we're feeling like we're having volunteer burnout or whatever and we need to make this a more professionally oriented organization? Do you recall how this shift happened?

It happened gradually over the years. As the Foundation grew, more projects came in, more hardware was required, more services are required, more hands-on time is required. So you increase the staff one by one to handle this. Then I think over time as volunteers start dwindling away, due to the fact that there's people getting paid to do it.

That's one aspect. The Foundation itself has a responsibility to the Projects to ensure that there is solid infrastructure there. So there's got to be a requirement that there's people there all the time to maintain this infrastructure. The Infrastructure team has become more professional over the years. The Projects have become customers, I guess. Volunteers are always welcome; at Infra we still have plenty of areas in which volunteers can help out. And, we don't bite!

Obviously the SLA is related to that shift too. They're becoming customers versus "we're all in it together and everybody figure out how to make it work". I'm sure the expectations also were higher, right? Because now you have a team, what's your excuse for not getting it done?

Right.

[END OF PART ONE]

Comments:

Post a Comment:
Comments are closed for this entry.

Calendar

Search

Hot Blogs (today's hits)

Tag Cloud

Categories

Feeds

Links

Navigation