The Apache Software Foundation Blog
Inside Infra: Chris Lambertus --Part II
So, in the scope of the team, I understand that you're a more "senior" developer. Not that you know better; it's not an issue of better or worse, but you're more seasoned. How does ASF compare to other groups that you've worked with? Are there special technical requirements or special security issues you have to be concerned with? Especially as we mentioned before, it seems like there's an unlimited number of project development environments. Are there certain things that you have to consider or accommodate or do that's so different with ASF that you've never experienced before? Can you give a little bit of a frame of reference for folks unfamiliar with how it is within the ASF?
First of all, I'm not a developer. I am terrible at programming. Absolutely, I'm awful at it. I don't consider myself a developer in any way, shape, or form. I am a system administrator, 100%.
...Administrator. Okay, so, you're a more "senior" sysadmin then.
I hesitate to use the word senior, because it has some implications in the industry that I don't necessarily feel are appropriate for the ASF. I believe that I have been doing it longer than most other people on the team just as a career. I'm guessing that's probably what you mean by that.
Right. That's why I used the word "seasoned" also. It's hard because some people go, "Are you saying I'm old, or are you saying hierarchical, that I'm above others?" It's a hard way of describing it, because some folks have been programming or dealing with computers since there were kids, others later in life, but you guys are all moving in the same direction. So, how does one describe it?
Yeah, I think seasoned is a good word. Just like I said, I've been working in the industry as a system administrator since 1992, pretty much continuously with some brief changes in the 2000s. It's not here nor there. So, it's not hierarchical. Everybody is equivalent in terms of the Infra team. Nobody's above anybody else or below anybody else, right?
...I was wondering how is the ASF different from other groups you've worked with.
All right. It's actually not all that different. There are a couple of things that make it unique. Well, a number of things that make it unique. One is that it's completely remote and completely geographically dispersed. Two is that the participants on the team are all from very different backgrounds and cultures and countries, which is fairly unusual for a system admin team, a small system admin team, I would say. But beyond that, it actually shares quite a lot of things that I typically see in system administration teams. There's a central job board, if you will, like the Jira stuff. There's a communications channel. We have Slack.
There's a nominal leader in Greg, that directs the general movement of the barge. Yeah, by and large, it's pretty similar with most environments that I've worked in. I mean, some are much different. Some are very corporate, some are very open. Yeah, now I remember one of your previous questions --one of the biggest challenges that I found is the openness.
The ASF for quite some time has been incredibly public with its configurations, with its systems, with its documentation. These types of things are very unusual in the corporate world or in commercial IT. Typically, you would never make that stuff public. The fact that it is and has been at the ASF, that's been a challenge for me. It's an unusual way to maintain systems. It's got some downsides. Having that stuff available can be concerning at times.
...How so? Help me understand this, because I've been with the ASF forever. What you're mentioning right now reminds me of about 10 years ago, something failed in Infrastructure. I can't remember what it was, but it was a big thing. People were talking about it. It was even in the press at the time. It wasn't catastrophic, but it was big. We actually wrote a blog post about it and we presented about it at ApacheCon. From a marketing perspective and a media perspective, I was uncomfortable, because from a corporate perspective, you don't do that. The fact that we not only encouraged it but published it and educated everyone about it, admitted it, ate it all, we took responsibility, 100%: "Here's what failed. Here's what happened. Here's what we did." People found this to be extremely refreshing, extremely helpful, and it was totally eye opening for me. I had no concept of anything like that before, and I'd been with the ASF for like 10 years already. I've never seen us opening the kimono at that capacity. So, I'm coming at it from a slightly different perspective as you. I understand you don't want to have your config files public. Obviously, that can put you at a different level of exposure and risk.
...Is that required, or is that just part of our culture saying, "This is what we do"?
It's definitely part of the culture. My background is heavily in computer security. Coming on board to the ASF and seeing all this stuff out in the open to me was... I couldn't believe my eyes. "You're doing what?" So, I've actually worked quite a lot to reel that into some extent, because even 10 years ago was nothing like what's happening today in the world of computer security, in the terms of the threats, in terms of what people are looking for, what people are doing, and what people are capable of doing, right? Even to benevolent organizations like ASF, it's distressing.
So, one of the things that I've really tried to encourage is it's okay to be open to some extent, but you have to have some common sense about your security exposure. That's what I've been trying to do just for the entire time that I've been here is just to try and reel some of that in without losing the culture, because I think the culture is valuable. Like you said, the incident that happened whenever that was, I think it was a right decision for the time. Would you do that today? Probably not.
It's not because you wanted to cover something up, but it's because you want to limit your exposure. Yeah, so it's a different culture now, not the ASF, but the world in general. You have to keep that in mind as you move through the day to make sure that you are minimizing your risk and minimizing your security threat vectors.
All right. Have you had instances where a project has basically treated you as their dedicated resource? Has anyone made unusual demands of the team? I’m not asking you to name names, but I can imagine it can get out of hand with all these different projects, especially the corporate ones.
Absolutely. Yeah, the corporate ones are typically the biggest problems, because they come in with a much different mindset than somebody who's come in from developing an Open Source package and has brought it to the ASF. The corporate projects that we've seen really are the ones who are the purveyors of that mentality. They feel Infra is their personal resource, because they don't really have an understanding of the scope of the Foundation. They don't have an understanding of the amount of projects that Infra supports. So, I don't really fault them for that, because it's just a matter of education. They just need to understand where they are placed in terms of the Foundation, in terms of Infra's availability and scalability.
Once we've explained that to people, they get it. We typically don't have any problems after that. But there are a few projects that have come in and just persisted in wanting weird stuff. Some of the things you can provide. Some of the things, you just got to kick back and say, "Hey, this is not something..." Like I mentioned earlier, if it doesn't have a broad benefit to the Foundation, if it's something really specific to your project. Infra is probably not going to support that for you, because we can't support all these one-offs.
So, we'll say, "We'll give you a VM. You can do it yourself." That's worked out pretty well, but there’ve been a few cases even where people like Greg and David have had to go and talk to these projects and say, "Look, how you're approaching this is not appropriate. You need to pull it back. You need to rein it in." But that's really pretty uncommon. I would say just a basic education as people come through the Incubator is sufficient to dispel most of that.
Those kinds of projects... Do they stand down or they wind up hiring their own committers to do their Infra work? Do you have any idea as to how that works? I'm seeing more projects coming in with more diversity in their committership to take care of marketing stuff, for example. That's expected especially as they scale, but from the site administration side of things, Website stuff, it's a very interesting thing to observe. Some project sites’ information is stagnant ... they're focused on specifically developing code. Others are super productive in terms of getting stuff done. I'm always wondering how are they able to handle all this? Curious to see if you had ideas as to what's going on there ...
I will say this, documentation is hard, right? Writing code is comparatively easy, and it's a lot more fun. So, when you're developing a product, your natural instinct is to develop the product, not develop the documentation. So, you get a project that's only got a couple of active members. They're probably not going to spend most of their time writing documentation. They're going to spend most of their time trying to advance the code base. Even within Infra, that's been a huge challenge for us.
Now that we've hired Andrew (ASF Infra team member and technical writer Andrew Wetmore) to help us work on some of this documentation, it's becoming extremely clear as we work through it how much of that documentation has been untouched. It's been stale, for all the same reasons as these projects. Yeah. Some projects will say, "Hey, we need a documentation guy. That's what Infra said, we need a documentation guy." They'll find one. Maybe somebody will volunteer or maybe it's a corporate thing, whatever. So, yeah, I think it really depends on the project. Some people have the resources. Some projects have the resources, and some don't.
Yeah, it's interesting. Again, since day one, since the '90s, documentation has always been an issue for all projects, even when we started with just HTTPd. It's a constant issue.
If I was going to have money to do anything in a project, I would use it on documentation.
Documentation is often the thing we need the most. I mean, how is it going to work otherwise?
Yeah, I agree. Even from just a cognitive aspect, writing code and writing documentation are about polar opposites. The type of mind that goes and writes code isn't usually the type of mind that can write documentation or can write meaningful documentation. I'm guilty of it myself. I can't write documentation, I find it quite difficult. Where building packages and tying things together, and Puppet configuration management, is not difficult for me. So, it's a huge mind split between those two types of things. I absolutely agree that hiring somebody to do documentation is a great use of resources.
We've grown a lot during the time you've been with us, now six plus years. Other than scale, how has Infra changed over the years? What's unusual is that the team is getting smaller. I would presume as the Foundation is scaling upwards, you would have more team members. It's some crazy number: five people, six people, it's so small. It’s hard to understand how you guys handle everything.
Yes, six people, including Andrew and then Greg, right?
Including Andrew, that’s six, but Andrew doesn't handle the day-to-day Jira stuff anyway. He doesn't handle tickets. So, you really are a tiny group. From your perspective and your experience, would you say that that's a small group, considering the workload and the demand?
Yeah, I would say so. Probably based on my experience in other organizations, about half the size that it would be in a commercial environment. Well, to go to your original question there, in terms of what's changed, I think prior to David Nalley, I would say that Infra was extremely reactive. I think that's changed quite a lot. I think David has really brought an element of customer service and customer focus to the team that really had been somewhat lacking in the past.
So that was a proactive decision to go in and say, "We have to better serve our projects," right?
Yeah. I really do credit David with that. I think he brought a huge amount of that to the team and that mindset. It's really improved our relationships, Infra's relationships, with the projects. It's helped us develop tooling like Self-Service.The more that we can move off into those projects, do-it-yourself tooling, the better off we are, because it's less tickets that we have to handle. It's a constant juggle for us between dealing with legacy code, dealing with technical debt from years and years and years and years ago to doing modern things to bring out new tools, and all the while supporting projects.
In what areas are you guys experiencing bursts of growth or demand? Everyone has a slightly different perspective. I know CI comes up a lot in this arena. Greg's always saying (since I deal with ASF’s Sponsors), "We need more." Where do you feel Infra's growing at the highest rate or the most interesting rate? Where do you feel like that's happening?
Yeah, continuous integration was the first thing that came to mind when you said that. The more projects we have, the more need there is for CI. That's fairly linear. Other growth places are things like Infra VMs, machines that we run to support Infra services internally. Prior to the resources that we have now, we used to have a lot of monolithic systems, systems that would run a lot of things. Think of a machine like Minotaur, which used to run two dozen services on one machine. That's not a best practice at all.
Moving to aggressive use of configuration management Puppet, and making sure that systems are easily replicable with the configuration management, has allowed us to really build -- not quite micro services, but single purpose systems, which are a lot easier to maintain, a lot easier to scale than some of those monolithic systems. So, that's been a big growth area for us. Just the number of VMs, number of systems that we're maintaining, it's got to be in the hundreds at this point. I haven't counted. Yeah.
...These microservices that you're mentioning also reduce the single point of failure, which is critical. That keeps you guys scalable and keeps you up and running. That's important.
Yeah, that's right.
I'm curious when was the last time you guys had a fire drill type of thing, where everyone's hands on. You had something recently, right? A couple months ago, all hands on deck, there was something broken. You guys were able to resolve it pretty quickly, but that's uncommon, where something breaks in its entirety.
I don't want to say anything about this, because it's going to cause a problem.
...We can go off the record.
What I mean is I'm going to say it's fine, right? And then something's going to break.
...[laughing] You don't want to jinx it. Okay.
We have failures from time to time. We've had some situations where there's been a problem at a colo. One of our VM providers had an issue and we lost machines. We had to rebuild them with Puppet, our configuration management, and restore stuff from backup. It sucked, but it wasn't a disaster, right? Because we have the backups. We have the capacity. We have the configuration management. So, nobody had to wrack their brains: “How did this work? How did this go together?” We’ve made very, very big strides in avoiding that old mindset of ‘one guy set this up 10 years ago and nobody else knows how it works.’ We're very much trying to avoid that these days.
...Right, bus factor.
Yeah, yeah, yeah. The configuration management systems have been absolutely critical with that. So, that continues to grow. We continue to add to configuration management wherever possible and just make sure that those systems are able to be reconstituted wherever, whenever it's needed.
Cool, cool. Okay. What do you think people would be surprised to know about ASF Infra?
The other guys probably said the same thing, but probably the amount of stuff that we support from the number of people we have. I think that would probably surprise most people in the industry.
That's one answer. I think it was (Infra team member) Chris Thistlethwaite who said "that we exist", that you guys exist. People don't know how it happens. It's like magic. I've always talked about how Infra is this crazy-magic-impossible story. It's like The Little Engine That Could, because you guys are such a tiny group. You have such a good working relationship, and everyone is connected. From the outside, it seems like a completely seamless operation. There's this magic thing behind the scenes, and then you find it's only five, six people running it. That's mind blowing. It's incredible.
I hope that people have that perception. We do try to provide a unified front. In reality, there's not really any infighting in the team. We all generally know what needs to be done. We all generally agree on how to do it. So, the disagreements are fairly minor and not all that common.
Well, that in itself is unusual, right? Think about it. I mean, there's a lot of factions and politics and weirdness, but that tends to happen with larger groups. So, you guys make it work in a way that's awesome.
I think one of the things that makes that the way it is, is because we're all supporting the ASF, right? We're all here, because we support the Foundation, and we want the Foundation to succeed. So, that drives, I think, a lot of the direction and the way that we approach how we support the Foundation.
You guys have a very different common goal, right? You're there for the benefit of the Foundation with a capital F; Projects are there to work on their own thing. Of course, if they can help everybody else, that's good, too. But the focus is different.
...What is your favorite part of the job?
I have to say, the flexibility and the remote aspect of it, along with the constantly changing technology. There are a lot of opportunities to learn new things, and work on new technologies.
...You are all on call for certain periods throughout the week, right? So, because of your 7:00 to 11:00, are you ever on call overnight, or does that just not work out with schedules, or it doesn't matter?
Well, we rotate on call. So, you're on call for a week at a time, starting at, I think, 10:30 or 11:00 Pacific AM and then going through the following week. So, typically, what happens is you'll get the pages when you're on call, regardless of the time of day or night. But the way that it works out, typically, because we have folks in Europe, we have folks in the US, we have folks on the West Coast and the East Coast, that almost always there'll be somebody awake and available to answer.
Sometimes in the middle of the night, if my pager goes off at 2:00 in the morning, I'll look at my phone and I'll see that Humbedooh or Gavin is already working on it. Thanks, guys. Obviously, the same is reciprocated, right? If the phone goes off in the middle of their night and I see that they're on call but it's 3:00 in the morning, I'll grab a ticket if I can, I'll grab the call if I can. We just try to help each other out that way.
You guys are a true team: you have each other's backs in a way that again, is unusual to see. It's almost like family but even better, because even family has infighting and issues. You are there for each other, which is really, really cool to see.
Yeah, let's say we've had our disagreements, but it is a very familial atmosphere.
When you first came into the role, what was your biggest challenge? Was it what you thought it was? How was your experience?
It was an incredibly steep learning curve. When I first started here, we were in the middle of the transition from the "one guy who set up everything, a volunteer five years ago, nobody knows how it works" environment to a configuration management. We were just starting to get into that, and shore up some of our documentation at the time. For me, just coming in and learning all the different systems and all the different processes and all the different edge cases and one-offs and locations for things and who's who and all these, that was incredibly difficult. It took me probably at least a couple of years before I felt comfortable with most of the systems.
Even today, there's stuff out there where I'll be like: "I'm not sure what this means. Do you have any idea what's going on?" Because there's so many little pockets and holes and places and things and historical legacy stuff. It's very complicated. It's been organically grown over a long, long time.
...With a lot of different personalities and a lot of different processes, that is what's unusual. The "quilt" that makes Apache is so diverse.
What are you most proud of with your career with Infra so far?
I'm not really sure, to be honest. I don't tend to think of things like that. I can't really single out one thing and say, "Hey, I'm really particularly proud of that," or whatever. I try and take pride with all my work. Building better backup systems, I think, is definitely a big one. Just getting through some of this mail project has been good as well. When I finally got everything working, that was a pretty proud moment there. I felt pretty good about that. That was a complicated system. It's still a complicated system. I'm still not sure it all works right. That's why we have to test it. By and large, I'm feeling pretty good about it.
That's great. How would your coworkers describe you?
...[laughs] The response is the same with everyone. Everyone laughs, but grumpy is the first one I've ever heard.
I don't really talk too much. I'm not a super verbal person. So, I always seem to come across as grumpy on the chat systems there. It's a schtick, I guess, but it is fun. I'm not really grumpy. Well, most of the time.
What are the biggest threats or concerns that sysadmins need to watch out for? I don’t mean doom-and-gloom unless there’s actually doom-and-gloom ...A lot of non-Apache folks are curious what the Apache guys think. So, is there anything that you could share in terms of advice or trends that are coming up or something that people should be aware of moving forward?
Security, backups, disaster recovery, those are the keystones of any organization that you absolutely must have in place to sleep at night. If you don't have any one of those three, you're in grave danger of doom-and-gloom.
That makes sense. What is your greatest piece of advice for someone looking to have a job like yours?
Oh, boy. Run for the hills [laughing]. Work with as many different things as you can, learn as many different things as you can, and try not to get stuck doing one specific thing. I think in my career I've been such a jack of all trades that it's really helped me to be able to see and build systems that work with a lot of different technologies. You get some people coming in, they're IBM guys, like a specific subset of IBM AIX expertise or something, right? That's all they do. And then when the situation comes around, well, nobody's really using that anymore, you run into a problem, because you're not really marketable anymore. So, the advice that I would give anybody who's trying to get into the system administration field, be broad and learn as much as you can about as many different things as you can.
If you had a magic wand, what would you see happen with ASF Infra?
I think I'd probably just give us more resources. I mean, I don't really have any complaints, to be honest. I think if we had more, then we would do more.
...More ...machines or more cash or more team or more what?
All of those ...I think more cash. Being able to buy more physical compute resources would go a long way for us. We do rely so much on donations and donated resources that it can be a little bit daunting when that donation goes away and you have to scramble to fill the void. Staffing is a complicated one, because it is familial.
Having somebody new come on board, it's challenging. It's nice to have an additional person be able to work on stuff, but going through the process of integrating them into the team and teaching everything else, it's daunting, it's challenging. So, I think having more resources would be more important at least to me than having more staff, because I think we're doing all right with the staff that we have now. So, that's just my perspective.
= = =
Chris is based in California on UTC -8. His favorite thing to drink during the workday is ice water and the occasional Diet Pepsi.
Posted at 06:20PM Jan 24, 2021 by Sally in SuccessAtApache | |