Entries tagged [software]

Friday July 24, 2020

The Apache News Round-up: week ending 24 July 2020

Happy Friday! We've had a great week within the Apache community. Here's what happened:

Inside Infra – the interview series with members of the ASF Infrastructure team.
 - Meet Greg Stein --Part III https://s.apache.org/InsideInfra-Greg3

ASF Board – management and oversight of the business affairs of the corporation in accordance with the Foundation's bylaws.
 - Next Board Meeting: 19 August 2020. Board calendar and minutes https://apache.org/foundation/board/calendar.html

ApacheCon™ – the ASF's official global conference series, bringing Tomorrow's Technology Today since 1998.
 - Registration is OPEN (and free) for ApacheCon@Home taking place online 29 September - 1 October. Join us! https://www.apachecon.com/acna2020/ 
 - Sponsorships available for ApacheCon@Home https://www.apachecon.com/acna2020/sponsors.html 

ASF Infrastructure – our distributed team on three continents keeps the ASF's infrastructure running around the clock.
 - 7M+ weekly checks yield uptime at 99.96%. Performance checks across 50 different service components spread over more than 250 machines in data centers around the world. http://www.apache.org/uptime/

Apache Code Snapshot – Over the past week, 407 Apache Committers changed 4,357,953 lines of code over 3,609 commits. Top 5 contributors, in order, are: Gary Gregory, Andrea Cosentino, Mark Miller, Zhang Yonglun, and Sebastian Bazley.      

Apache Project Announcements – the latest updates by category.

Big Data --
 - Apache Atlas 2.1.0 released https://atlas.apache.org/
 - Apache Calcite Avatica Go 5.0.0 released https://calcite.apache.org/avatica
 - Apache Druid 0.19.0 released https://druid.apache.org/
 - Apache NiFi Registry 0.7.0 released http://nifi.apache.org/registry
 - Apache Flink 1.11.1 released https://flink.apache.org/

Content --
 - Apache Jackrabbit Oak 1.22.4 released https://jackrabbit.apache.org/oak

Libraries --
 - Apache Commons Lang 3.11 released https://commons.apache.org/proper/commons-lang/
 - Apache Geometry 1.0-beta1 released https://commons.apache.org/proper/commons-geometry/

Programming Languages --
 - Apache Groovy 2.4.20, 2.5.13, and 3.0.5 released http://groovy.apache.org/

Search --
 - Apache Lucene 8.6.0 and Solr 8.6.0 released http://lucene.apache.org/

Web Frameworks --
 - Apache Wicket 7.17.0 released https://wicket.apache.org/


Did You Know?

 - Did you know that Apache projects' ongoing sustainability is ensured through the generosity of our Sponsors and individual donors, whose support helps ensure that the ASF continues to provide more than $20B worth of software to the public-at-large at 100% no cost? http://donate.apache.org

 - Did you know that projects undergoing development in the Apache Incubator are in a variety of categories that include AI, Annotation, Big Data, Blockchain, Cryptography, Data Visualization, Distributed Computing, Email, Embedded Systems, Geospatial Data, Graphing, Hardware Acceleration, IoT, Messaging, Monitoring, Natural Language Processing, Scheduling, Streaming, Training, Usability Testing, and more? http://incubator.apache.org/projects/

 - Did you know that you can learn about Apache Beam, Calcite, Camel, CarbonData, Groovy, Hadoop, Karaf, Labs, NetBeans, OFBiz, OpenOffice, PLC4X, Rya, Spark, Tomcat, Unomi, and more in the "Apache Innovation" short? https://s.apache.org/ApacheInnovation

Apache Community Notices

- Apache Month In Review: June 2020 – overview of events that have taken place within the Apache community https://s.apache.org/June2020 

- "Trillions and Trillions Served" – the documentary on the ASF filmed onsite at ApacheCon Las Vegas and Berlin in 2019 have been released: 1) full feature https://s.apache.org/Trillions-Feature 2) "Apache Everywhere" short https://s.apache.org/ApacheEverywhere 3) "Why Apache" teaser https://s.apache.org/ASF-Trillions 4) “Apache Innovation” shorts https://s.apache.org/ApacheInnovation 

 - The Apache Software Foundation Statement on the COVID-19 Coronavirus Outbreak https://s.apache.org/COVID-19  

 - The Apache Software Foundation Celebrates 21 Years of Open Source Leadership https://s.apache.org/21stAnniversary

 - The Apache Software Foundation Operations Summary: Q3 FY2020 (November 2019 - January 2020) https://s.apache.org/r6s5u

 - Apache in 2019 - By The Digits https://s.apache.org/Apache2019Digits

 - The Apache Way to Sustainable Open Source Success https://s.apache.org/GhnI

 - Foundation Reports and Statements http://www.apache.org/foundation/reports.html

 - "Success at Apache" focuses on the people and processes behind why the ASF "just works". https://blogs.apache.org/foundation/category/SuccessAtApache

 - Inside Infra: the new interview series with members of the ASF infrastructure team --meet Christ Thistlethwaite https://s.apache.org/InsideInfra-Chris | Drew Foulks https://s.apache.org/InsideInfra-Drew | Greg Stein Part I https://s.apache.org/InsideInfra-Greg and Part II https://s.apache.org/InsideInfra-Greg2 

 - Did you know that Beam Summit 2020 will be held 24-28 August online and free of charge? https://beamsummit.org/

 - Please follow/like/re-tweet the ASF on social media: @TheASF on Twitter (https://twitter.com/TheASF) and on LinkedIn at https://www.linkedin.com/company/the-apache-software-foundation

 - Do friend and follow us on the Apache Community Facebook page https://www.facebook.com/ApacheSoftwareFoundation/ and Twitter account https://twitter.com/ApacheCommunity

 - Are your software solutions Powered by Apache? Download & use our "Powered By" logos http://www.apache.org/foundation/press/kit/#poweredby

= = =

For real-time updates, sign up for Apache-related news by sending mail to announce-subscribe@apache.org and follow @TheASF on Twitter. For a broader spectrum from the Apache community, https://twitter.com/PlanetApache provides an aggregate of Project activities as well as the personal blogs and tweets of select ASF Committers.


Friday July 17, 2020

Inside Infra: Greg Stein --Part III

The close of the "Inside Infra" interview with ASF Infrastructure Administrator Greg Stein, who shares his experience with Sally Khudairi, ASF VP Marketing & Publicity. 




"Apache is growing: we're just seeing the demand explode and it's a hard problem for us to solve."



PART THREE.


We were talking about ensuring that the team is up to speed with everything required of them...


So there certainly are skill gaps; this is one of the things I want to help motivate the team with, where if somebody says, "Hey, I want to go and investigate Ansible as a potential Puppet replacement," I say, "Go forward." 


This would be similar to Google having their 20% projects. I'm sure you've heard of that.


Oh, yeah.


It's almost the same where it's not 20%, maybe 5%, but it's the same as Google, no matter what they want to tell you, because everybody's got their job and you have to be really rigorous to carve out 20% of your time. And strictly speaking, it does actually make your Google manager a little upset if you carve out the entire 20%. But anyways, the concept is similar.


So for us it’s like, "Well, go in and investigate Ansible, see if it'll work for us and put your notes into the Wiki." That's how we make forward progress, up our game, and learn new skills. If someone says, "I want to go and figure this out," the response is almost always, "Okay. You go do it." There's certainly an allowance for people to learn new skills. But most of the time we simply rely on, say, Gavin (ASF Infrastructure team member Gavin McDonald), knowing more about JIRA configuration than the other guys.


That added component of sharing what you know, and adding it to the JIRA or to the Wiki actually is great because then everyone's learning. This is like the rising tide: everybody's learning about this, whether they're doing it perfectly or not. I think this is a very interesting process.


Yes, and that's also where Andrew (technical writer Andrew Wetmore) is helping us out. He’s organizing that information that we have learned, that we have documented, that we memorialized into the Wiki.


Because our (ASF’s) legacy is quite Medusa-like over all these years, it's interesting to see how everyone can get caught up and also contribute...you have to go back and deal with the legacy, but you also have to be able to move forward. To be able to bring others with you is brilliant. That's really cool.


The infrastructure has grown organically over 25 years from when Brian Behlendorf first said, "Hey, I have this server called hyperreal.org: you can run a CVS repository on it for the Web server."


That computer was under his desk at the Wired offices way back when, wasn’t it...


Yes it was. And it's just grown organically over those 25 years. Then we had Minotaur and it did six different things ... now it only does half of one and we've moved the stuff out onto newer machines and newer processes and this and that. But the organic growth means that we've got some really hairy stuff. Our move to Puppet --first Puppet 3, and now to Puppet 6-- at each step we're improving it and making it less hairy and more manageable and something that somebody can come along, look at, pick up and run with it from there. That makes it a lot easier, so that we don't have to spend 100% of our time cross training.


What are your thoughts on products, the hype cycle, where everyone's demanding Kubernetes, to use that as an example. Do you decide which products to provide support for, or is that up to Apache projects in the communities? You mentioned Ansible, just not too long ago, that was your internal decision to move. But I remember not long ago, GitHub entered into the landscape. How did that happen? How did you decide to make a move like that? That's a significant thing. Can you tell me a little bit about that?


It's a lot based on community input. So if we see a lot of people asking for a particular tool, we'll like, "Oh, hey, David, can you go and take a look at that and see if that's something…” Not David (ASF VP Infrastructure David Nalley), but Chris (Infrastructure team member Chris Lambertus) or somebody else. "Can you go take a look. Is that something that we can support? Because we're getting some queries about it."


And there's a little chicken and egg problem there that if the communities don't know to ask for the egg, we don't know whether to prep the chicken. It's like, “okay, wait, they don't even know to ask for a tool because we haven't said we will make this tool available, because we're not going to make the tool available until somebody asks”. But sometimes people file tickets like, "Can I get this set up?" and we'll go, "No."


Then six months later, somebody else will file a ticket: "Can I get this set up?" and we'll go," No." But after enough of those, we're like, "Maybe that's something that we really want to do." For GitHub, specifically that’s what happened there. Well, even before that Git, where we ran our own Git server, that was a volunteer that made that happen. That was, six years ago or so.


Well...the volunteer came along and said, "Well, I'll do this. I'm not going to take any time from Infra." There's been a couple things for the past few years where I've told people, "No, Infra will not work on that. But if you want to volunteer or find a volunteer, then we'll stand it up for testing." You know what I mean? Why not? So there's a couple things where people have stood up for test examples and there hasn't really been a lot of usage.


So, we're not going to support that. But something like Ansible is our own internal workflow and the tool we’ll experiment with, then to see if it'll improve our stuff. But from the community, they pretty much have to ask and it has to be a sustained ask. That's how we ended up with Travis CI: we actually pay for capacity in Travis CI, and that's based on community input.


So many people wanted to do their continuous integration through Travis that eventually we decided to pay for it. But it's tricky because some of these systems like Travis CI and others require certain permissions that we don't want to provide to the community. So we will want to hold those only within Infra. And so it gets hard to integrate certain tools. We've had to say no, but then again, we've found other ways to improve that so that we can lock down the permissions or use a proxy or other ways that we can route around some of these issues and then integrate the requested tool.


So further to that, have you been in a situation where a project or a community has made unreasonable demands of Infra or have expectations, where it's like, so over the top or so out of scope, it totally surprised you? Have you had something like this?


Nothing surprises me.


Nothing surprises you? Okay. Have you been in this situation? Like “was never going to happen”...


Yes, yes. There's been several times where one of the guys on the team is like, "Oh man, I got this ticket. I don't know what we want to do with this. Greg, go take a look." And I go and look at it and that's where I make that call: "Okay, is the Infra team going to take this on, or do I just say ‘no’ right now?"


So, yeah, there's been a number of times where I've said no and probably two or three times where I've gotten a little bit of pushback on that no. I say, "My answer is no, but here's how you escalate." I've had escalation a few times and I'm actually, mid-process --I'm dealing with one right now. So, I've said, "no, if you don't like my no, you can go to VP Infra and VP Infra is, probably going to tell you the same thing. And then you can go to the President. Right now those are actually the same person."


The same person is a double "no".


That really is the true escalation path. I have to describe that to people and say, "I don't think you're going to get what you want." If I'm the one that says, no, you probably are not going to get it because VP Infra and President, and after that is the Board. They're probably not going to say, "Greg is wrong. Yes, we'll give that to you." But it's there. There's been a couple of times where I said "No, you have to ask the Board for the budget for those additional virtual machines." They went to the board and said, "Can we have budget for three machines?" and the Board said, "Yes."


So Infra went ahead and gave them the three VMs that they had initially requested. Strictly speaking, we would track those machines against their budget, but that detail is more than what the actual budget was. So we don't spend that time doing that, but I have had to say, no. I have had to... There was Apache Maven: they were keeping a copy of Maven Central, and Maven Central is run by Sonatype...


Which is a commercial product...


Yes. They're using the trademark “Maven”, essentially a licensing agreement from us, a MOU. So with Maven Central, you could imagine if someone decides to just turn it off one day ...we wanted a copy. Apache Maven was making a copy of it, and it just started consuming so much disk space. We were like, "We can't support that growth rate. We can't support that even for the next six months. If you want to keep doing it, go ask the Board for money to keep doing it." They never did. We turned it off.


I wouldn't call that a ridiculous request --it was something where we didn't have to just say, "No, not going to do it. Bye." A lot of the requests are mostly just, "We aren't going to run that extra software. If you want: ask for a VM and you can run it, but we're not going to take responsibility for it."


Over the years, obviously ASF Infra has changed. Was this all reactive or was it also proactive? Do you plan for those changes as you go or has it all been in response to Project X or in response to X emergency?


The growth of Infrastructure and its movement from volunteer-only to paid staff was part of just the growth of Apache. The volunteers could no longer keep up and things, like account creation, used to take sometimes four weeks to get an account. You’d put in a request for an account, four weeks later, it would finally get created.


My gosh, that queue was crazy, huh?


Well, it wasn't even a long queue, it was simply that we didn't have volunteers making sure the queue stayed empty. Today it's down to one, two, maybe three days, and the account is created, because every day a staff member goes and creates the accounts first thing in the morning.


It was how I said that my day starts with looking at messages on Slack and then reading emails to see if there's stuff to handle. Well, one of the guys on staff, first thing he does in the morning is go and look at account creation. So he's been off and on pondering on a tool to make that easier for himself; he hasn't finished the tool, so he still has to do it manually. That's his incentive.


“Work quickly”...


This is Chris Thistlethwaite. I say, "Chris, we can do something about that." And he says, "No, no, this is still my project. And every day when I run the script, it just makes me remember, I need to finish this."


So when the volunteers could not keep up with the amount of work, that's when we hired Joe Schaefer, then we hired another person, and hired another person. And so it was just trying to keep up with the rate of requests. 


That's how we ended up with hiring six people. And then I'm half a person, like I said, I'm part-time. So, it's just the growth of Apache. I think we're in much better shape than when I started. We're ahead of the curve. We can stay ahead of the curve because one of the things that I can do because I don't fight the fires every day ... that's for all the guys who know their stuff. They fight the fires and I can look at if I need to go and ask for another head count. And that's how we ended up with Andrew (technical writer Andrew Wetmore): “Well, you know, what we really need is somebody to manage all this documentation.” This was part of Sam's (former ASF President Sam Ruby), “If you had some money, what would you do with it?” That's how the technical writer/editor came around, because we've got 20 years of organic growth. We had...let's just call it “organic documentation”. That revamping project is going really well, I think.


So, in what areas are you guys experiencing your biggest growth? As I was asking Chris and Drew, is there like a geographic influence on the demand? We’ve had a huge influx of users in China. Does any of that change the way or what you guys are doing? Or is it just more of everything?


Our biggest pain point, I would say, is continuous integration/continuous development: CI/CD. Jenkins, Travis, CircleCI, and things like this, where people make a change and they want that change built and tested. The more projects we get and the larger the communities get, the more changes and the more testing and the more building and the more this, more, more, more. It's kind of one of those things where it's “expand-to-fit”. So if we gave people 100 machines, they'd use 100 machines. If we doubled it to 200, they'd use all 200. It's just this rapacious need for CI machines. It's very hard to figure out how to plan around that other than just telling the communities, “No: we just don't have that much capacity: if you want to build it, do it on your own machine. You just can't use Apache hardware to do it.”


That's an unsatisfactory answer. That's been one of our hard problems and it's also kind of a newer problem: the development workflow that uses CI probably is just maybe five years old. Before that, certainly, automated building and testing was a thing, but it's really kind of grown into community workflow much, much more over the past five years, and more and more people are wanting to do it. The communities are growing. Apache is growing: we're just seeing the demand explode and it's a hard problem for us to solve.


China is the one case where we see regional issues, and that's because of the great firewall of China. Not because we're getting more Chinese developers, but because they have problems accessing our servers because they're located outside of China, and so we're looking at CDNs, a content distribution network to essentially make our content available closer to China. We've found that even with one of those CDN drop points in Hong Kong, they still have problems just reaching it there in Hong Kong, and so ... and we don't want to buy or lease or rent a server in China because doing business in China is too high of a hurdle for the Foundation. 


Oh? 


You know, Microsoft and Google have to do business in China and they've got a pack of lawyers and a giant vault of money to deal with all the barriers. The Foundation does not, so it's also a hard problem to solve. We think we might be able to do it through Microsoft Azure, that they have a CDN that resides in China that Microsoft has done all that paperwork, so we're looking at that, but as far as regional things, it's not so much that we run into issues. We see Open Source communities in Europe and Brazil and Australia and Sri Lanka: none of them really have any problems because they don't have that firewall. It's not really about the Chinese people, but about the China firewall. 


That's bigger than us. And that’s not something we can fire hose.

 

We do see little engagement from Japan and Brazil, and that is partly for language reasons and partly because the Brazil community is more about Free Software than Open Source software. 


Yeah. They're very pro-FOSS.


Not OSS. But pro-free. And so, they're going to deal with the Free Software Foundation rather than the Apache Software Foundation.


I see. That’s an important distinction. 


And then you also have the Portuguese language barrier. People contributing from Europe and India, Sri Lanka, etc., they pretty much know English and that's fine. A lot of the Brazilian developers do not know English...this is the same with the Japanese Open Source developers. Japanese and Brazilian, they tend to not know English, and so that kind of isolates them from the larger Open Source world, or Free Software world, in the case of Brazil.


Would we consider localizing anything that we do, or are we going to continue as-is, as the ASF is all English?


The Infrastructure team will not translate our documents to serve those other languages. That's just too high of a bar.


There are a couple groups that have user mailing lists that are not English and that's totally fine, and Infrastructure will... well, you don't have to file a ticket anymore. It's, again, back to selfserve.apache.org: “self-serve” on Apache will create a mailing list for users communicating in Brazilian Portuguese, for example, or communicating in Japanese. But Infra doesn't do anything about that, that's just the self-serve tools. We certainly can't support non-English, and I don't think that the Foundation itself is going to make any moves towards that.


Fair enough. So a lot of companies are really struggling to accommodate their teams working from home in response to the Coronavirus and all that. These stay-at-home orders are kind of shaking companies, but from day one, the ASF has always been a virtual organization. Has anything changed with your operation on that front? Has anything impacted the ASF's day-to-day, from this pandemic?


(chuckling) Not at all. I shouldn't laugh, but no. It really hasn't changed. We've been on our team channel for all three years, three and a half years that I've been here, and the world is burning down around us, but we still sit on the team channel.


Now, that said, (Infra team member) Daniel Gruno got stranded in Canada.


Right! He’s still there?


He's still doing work from Canada. This is why when he travels to Canada for two months at a time, I don't care, you know? Because if his butt is in a chair in Denmark or in a chair in Canada, it's the same butt, so, you know...


As long as you have connectivity and a computer, you can do it. 


Right. But if he has to be offline for two months, I'd say no. Or if you want unpaid time off, well, I'm not going to pay you, of course. Certainly the discussions have changed, you know? I mean, going shopping. You know, some members are immuno-compromised and that had an effect on our team meeting that we were planning in Nashville: they were the first to say, “No way. I'm not going,” so, there’s that, but our day to day hasn't changed.


That's more of a social thing versus an operational thing. Safety first.


So the notion of, “Oh, I got to run out to the grocery store. I need to strap on a mask,” changes, but not the operation.


Right. Right. So...what do you think people would be surprised to know about ASF Infra?


I don't know if it'd be surprising, but we are global. We've got four people in the United States, one in Canada, one in Denmark, one used to be in Australia, but is now in the UK, which actually kind of hurt a little bit, because in Australia, that meant that we always had somebody in that time zone, but now we have kind of this gap of Australia/Asia time zones when...


A “Gavin” gap.


Yeah, well, I might be awake at that time, but I can't go and fix a MySQL server, so it does mean that we don't have that straight-up 24-hour coverage.


The notion that we are worldwide is kind of a neat thing about our team, and is what makes us pretty unique relative to other IT departments. I don't like being called an IT department, but that is essentially what we are. 


Surprise.


What's the name of that TV show? The one that's about IT...


“The IT Crowd”, is that what you’re referring to? The British show?


Yeah. So, you know, that's a funny show, but mostly when you think “IT department”, you think of some corporate people with button-up shirts, but ...most of us, we're in our pajamas.


Good one. What's your favorite part of the job?


I definitely like the team and that's why, nominally I'm part-time, but I'm pretty much constantly on the team channel and interacting, and so I think I just put that down as volunteer hours, where before I might work on Apache Subversion, but now I hang out with the team or I write some little tool or something like that. That's definitely been one of the more rewarding changes. Up until I started with this, I'd been a director for 15-and-a-half years, and that was kind of how I contributed to Apache. Now my work for Infrastructure is a new way to contribute to the Foundation. I'm also part of a new community, where before I would hang out with the httpd community, APR community, the Subversion people ...now it's the Infra people and my hobby time is kind of blended in with my work time, and vice versa. I mean, when your work time can also be seen as a hobby time, that's pretty cool.


I do think it's the team that makes it interesting. That's what I like the most, and that I'm working with a new, interesting community to contribute to the Foundation. 


Not only did you switch roles, you switched communities. What was your biggest challenge going into this new role?


I would say probably trying to delineate what I was going to handle for the guys and that I wasn't going to tell them what to do or how to do it. It's like, “OK, I'm here to assist, to unblock things, to enable you guys, rather than to block you or micromanage you.”


To earn that trust, that I wasn't going to be some pointy-haired boss telling them how to do their work. Now, I don't know if that was ever a problem for them, but that was certainly one of my initial concerns: how to properly create my role. This was the first time Apache's even had somebody fill in this role, so I also had to find the role, which is, again, why I came up with “Infrastructure Administrator”, is because I wanted to define it as an enabler role, as an administrator, so they could get their work done but I would not be their manager. I would not be their boss: I was simply there to enable them.


So, what are you most proud of in your infra career to date?


Ooh. I don't know. I would say by being hands-on, being the “hands” of Infra, it means that VP Infra didn't run away screaming.


David said in January 2016, maybe earlier, he was like, “No way. I'm out.” And after I was on the job for about two months, he said, “Huh. All right.”


“I'm in!”


And so I get that feedback from him, “You know, you make the VP Infra hat quite easy for me.” I think that's probably what I really like about taking on the role, is that one of our volunteers got to stay rather than drop it because it was just causing so much anxiety and pain and time and frustration. Otherwise, most of the stuff I do is really boring. Not to me, but I don't have “accomplishments”. I push paperwork, basically, so the other guys can do accomplishments.


Speaking of the other guys, how would your co-workers describe you?


I have no idea. I don't know. I really don't know. (laughing)


Where I just got done talking about what I saw as an issue, trying to frame what my role would be, it might have been fine with them and I was overly worried about it, but it’s hard for me to know. We don't do 360 reviews in Infra, so I don't get any feedback, really, from the team on what they think about myself or how I'm doing my job, so you'd have to ask them. 


I have. Just kidding. So...what are the biggest “threats” that infrastructure managers or infrastructure administrators need to watch out for? What do you think is a “big thing” that people should be aware of, or is ASF so unique that you don’t feel like anyone really experiences what you experience?


There's our capacity issue with things like Travis, but I think you're asking a different question.


I am, but that's fine. What's your greatest piece of advice? What would you tell aspiring infra administrators?


Actually, one of my greatest fears is really, as a small charitable foundation, it's hard for us to compete with well-funded corporations and some well-funded start-ups.


Related to that, I touched on it earlier, is career development ...you go into Google or Microsoft and there's a career ladder; we simply don't have a career ladder. There's salary growth. There's bonuses. If you want to have a resume or a LinkedIn profile that shows changes in growth and titles and career ladder, we can't offer that, and that's going to cut out some people. It's a very hard problem for me to solve. You know, there's things I can maybe do, but I also want to keep the team egalitarian and sort of level, rather than, “Oh, well, this guy is now the team lead.”


Given what I talked about, our social aspects, because we are all equal peers, keeping everybody with the same title, same position on the ladder means that we are peers and it's a little easier to interact that way. It's a real, real difficult problem. You ask what's scary: that's scary.


But there's a counterpoint to that. You may not have a traditional career ladder path, but to say that you've worked in Infra for Apache carries weight. That's significant. 


I believe it does, especially when you can demonstrate the hundred different types of tasks...


Well, that's exactly it. The breadth of work and the scale of what you guys do and the skill sets that you have to have and the fact that you have to play nice in the sandbox, all of it. The demand is immense, so to be able to be there and thrive and develop something from yourself in terms of a career is tremendous. Our team is exceptional. I mean, they're not expecting a linear ladder or something that others have.


You know, in other jobs, somebody might say, “I was a MySQL administrator.” Here, you're a MySQL administrator, PostgreSQL administrator… They had one role; here you've got dozens. 


If you had a magic wand, what would you see happen with ASF infra?


I'd like to solve that CI problem. The other magic wand would be upgrading our mail server from 10-year-old technology to modern technology.


Is that happening or is that literally a wish list issue?


It's happening, but it's been happening for three years. The thing is that email is so central to the Foundation that we can't really experiment with that. There are certain things we can do, but most of it, not so much, and so it means that we're being super-careful. There's about 10-12 different moving parts to it, and we're upgrading each of those a little bit by a little bit, until we can finally pull that big, scary, Young Frankenstein lever to hit the lightning bolt, you know?


Yeah: I see the visual of that.


The magic wand would be to just make that all happen and make it work. Without the wand, it's going to take another 6-12 months.


Right. What else do we need to know that I haven't asked? What should I be aware of or what should I be sharing?


Oh, I don't know. This is where my creativity ends. Ask me a coding question.


Oh no coding questions. All right. Our time has also ended. Before we go, who should I be interviewing next? 


I would say Daniel (Gruno), because his role ... he's 20-30% system administration. The rest is tool development, so that makes his role rather unique in the team.


Perfect. Thanks so much, Greg. I really appreciate it. 


= = =

Greg is based in Austin on UTC -5. His favorite thing to drink during the workday is a big 32oz cup of Diet Mountain Dew.


The Apache News Round-up: week ending 17 July 2020

Happy Friday! Let's take a look at what the Apache community has been up to over the past week:

"Trillions and Trillions Served" – the feature documentary on the ASF filmed onsite at ApacheCon Las Vegas and Berlin in 2019
 - Watch “Apache Innovation”, the fourth and final segment of the series https://s.apache.org/ApacheInnovation

ASF Board – management and oversight of the business affairs of the corporation in accordance with the Foundation's bylaws.
 - Next Board Meeting: 19 August 2020. Board calendar and minutes https://apache.org/foundation/board/calendar.html

ApacheCon™ – the ASF's official global conference series, bringing Tomorrow's Technology Today since 1998.
 - Registration is OPEN (and free) for ApacheCon@Home taking place online 29 September - 1 October. Join us! https://www.apachecon.com/acna2020/ 
 - Sponsorships available for ApacheCon@Home https://www.apachecon.com/acna2020/sponsors.html 

ASF Infrastructure – our distributed team on three continents keeps the ASF's infrastructure running around the clock.
 - 7M+ weekly checks yield uptime at 99.88%. Performance checks across 50 different service components spread over more than 250 machines in data centers around the world. http://www.apache.org/uptime/

Apache Code Snapshot – Over the past week, 405 Apache Committers changed 3,321,417 lines of code over 3,483 commits. Top 5 contributors, in order, are: Mark Miller, Shad Storhaug, Andi Huber, Andrea Cosentino, and Gary Gregory.  

Apache Project Announcements – the latest updates by category.

API --
 - The Apache Software Foundation Announces Apache® APISIX™ as a Top-Level Project https://s.apache.org/29wd9

Big Data --
 - Apache HBase 2.3.0 released https://hbase.apache.org/

Blockchain --
 - Apache Tuweni (Incubating) 1.1.0 released https://tuweni.apache.org/

Content --
 - Apache Jackrabbit Oak 1.32.0 released https://jackrabbit.apache.org/oak

Enterprise Processes Automation / ERP --
 - Apache OFBiz 17.12.04 released https://ofbiz.apache.org/

Libraries --
 - Apache Annotator (Incubating) 0.1.0 released https://annotator.apache.org/
 - Apache BVal 2.0.4 released https://bval.apache.org/
 - Apache Daffodil (Incubating) 2.7.0 https://daffodil.apache.org/

Machine Learning --
 - Apache TVM (Incubating) 0.6.1 released https://tvm.apache.org/

Messaging --
 - Apache Curator 5.1.0 released https://curator.apache.org/

Web Frameworks --
 - The Apache Software Foundation Announces Apache® Wicket™ v9 https://s.apache.org/lpsmm
 - Apache Wicket 8.9.0 released https://wicket.apache.org/
 - Apache MyFaces Core 2.2.13 released http://myfaces.apache.org/


Did You Know?

 - Did you know that, as the world's largest Open Source foundation, the ASF is 7,800 Committers strong? https://projects.apache.org/timelines.html 

 - Did you know that the "Trillions and Trillions Served" feature documentary is now available with Chinese subtitles? https://www.bilibili.com/video/BV1Uz411i7MH 

 - Did you know that your employer can join companies such as American Express, Bloomberg, IBM, and Microsoft in matching contributions and volunteer hours made by their employees? http://apache.org/foundation/contributing.html ?

Apache Community Notices

 - "Trillions and Trillions Served" – the documentary on the ASF filmed onsite at ApacheCon Las Vegas and Berlin in 2019 have been released: 1) full feature https://s.apache.org/Trillions-Feature 2) "Apache Everywhere" short https://s.apache.org/ApacheEverywhere 3) "Why Apache" teaser https://s.apache.org/ASF-Trillions 4) “Apache Innovation” shorts https://s.apache.org/ApacheInnovation 

 - Apache Month In Review: June 2020 – overview of events that have taken place within the Apache community https://s.apache.org/June2020

 - The Apache Software Foundation Statement on the COVID-19 Coronavirus Outbreak https://s.apache.org/COVID-19 

 - The Apache Software Foundation Celebrates 21 Years of Open Source Leadership https://s.apache.org/21stAnniversary

 - The Apache Software Foundation Operations Summary: Q3 FY2020 (November 2019 - January 2020) https://s.apache.org/r6s5u

 - Apache in 2019 - By The Digits https://s.apache.org/Apache2019Digits

 - The Apache Way to Sustainable Open Source Success https://s.apache.org/GhnI

 - Foundation Reports and Statements http://www.apache.org/foundation/reports.html

 - "Success at Apache" focuses on the people and processes behind why the ASF "just works". https://blogs.apache.org/foundation/category/SuccessAtApache

 - Inside Infra: the new interview series with members of the ASF infrastructure team --meet Christ Thistlethwaite https://s.apache.org/InsideInfra-Chris | Drew Foulks https://s.apache.org/InsideInfra-Drew | Greg Stein Part I https://s.apache.org/InsideInfra-Greg and Part II https://s.apache.org/InsideInfra-Greg2 

 - Did you know that Beam Summit 2020 will be held 24-28 August online and free of charge? https://beamsummit.org/

 - Please follow/like/re-tweet the ASF on social media: @TheASF on Twitter (https://twitter.com/TheASF) and on LinkedIn at https://www.linkedin.com/company/the-apache-software-foundation

 - Do friend and follow us on the Apache Community Facebook page https://www.facebook.com/ApacheSoftwareFoundation/ and Twitter account https://twitter.com/ApacheCommunity

 - Are your software solutions Powered by Apache? Download & use our "Powered By" logos http://www.apache.org/foundation/press/kit/#poweredby

= = =

For real-time updates, sign up for Apache-related news by sending mail to announce-subscribe@apache.org and follow @TheASF on Twitter. For a broader spectrum from the Apache community, https://twitter.com/PlanetApache provides an aggregate of Project activities as well as the personal blogs and tweets of select ASF Committers.

Friday July 10, 2020

The Apache News Round-up: week ending 10 July 2020

We're more than halfway through the year, and the Apache community is wrapping up another great week. Let's take a look:

ASF Board – management and oversight of the business affairs of the corporation in accordance with the Foundation's bylaws.
 - Next Board Meeting: 15 July 2020. Board calendar and minutes https://apache.org/foundation/board/calendar.html

ApacheCon™ – the ASF's official global conference series, bringing Tomorrow's Technology Today since 1998.
 - Registration is OPEN (and free!) for ApacheCon@Home: combining ApacheCon North America and ApacheCon Europe in a new, online format! Taking place 29 September - 1 October. Join us! https://www.apachecon.com/acna2020/
 - FINAL CALL: CFP for ApacheCon@Home closes on 13 July https://www.apachecon.com/acna2020/cfp.html

ASF Infrastructure – our distributed team on three continents keeps the ASF's infrastructure running around the clock.
 - 7M+ weekly checks yield uptime at 99.95%. Performance checks across 50 different service components spread over more than 250 machines in data centers around the world. http://www.apache.org/uptime/

Apache Code Snapshot – Over the past week, 303 Apache Committers changed 752,461 lines of code over 2,537 commits. Top 5 contributors, in order, are: Jean-Baptiste Onofré, Andrea Cosentino, Ioan Eugen Stan, Sebastian Bazley, and Paul King.

Apache Project Announcements – the latest updates by category.

Content --
 - Apache Jackrabbit 2.21.2 released https://jackrabbit.apache.org/

Servers --
 - Apache Tomcat 7.0.105, 8.5.57, 9.0.37, and 10.0.0-M7 released https://tomcat.apache.org/


Did You Know?

 - Did you know that more than 100 developers at University Hospitals Leuven, one of Belgium's largest hospitals, created software (powered by Apache Subversion, Ant, Ivy, and many Apache Java libraries) that is used across the nexuzhealth joint venture of 25 hospitals to meet their rapidly-evolving needs as impacted by the COVID-19 pandemic? http://subversion.apache.org/ | http://ant.apache.org/ | https://ant.apache.org/ivy/  

 - Did you know that those wishing to support Apache may do so with a donation towards their ApacheCon@Home registration or by sponsoring the event? https://www.apachecon.com/acna2020/ 

 - Did you know that Airflow Summit continues over the next week? https://airflowsummit.org/

Apache Community Notices

 - "Trillions and Trillions Served" – three parts of the documentary on the ASF filmed onsite at ApacheCon Las Vegas and Berlin in 2019 have been released: 1) full feature https://s.apache.org/Trillions-Feature 2) "Apache Everywhere" short https://s.apache.org/ApacheEverywhere 3) "Why Apache" teaser https://s.apache.org/ASF-Trillions 

 - Apache Month In Review: June 2020 – overview of events that have taken place within the Apache community https://s.apache.org/June2020

 - The Apache Software Foundation Statement on the COVID-19 Coronavirus Outbreak https://s.apache.org/COVID-19 

 - The Apache Software Foundation Celebrates 21 Years of Open Source Leadership https://s.apache.org/21stAnniversary

 - The Apache Software Foundation Operations Summary: Q3 FY2020 (November 2019 - January 2020) https://s.apache.org/r6s5u

 - Apache in 2019 - By The Digits https://s.apache.org/Apache2019Digits

 - The Apache Way to Sustainable Open Source Success https://s.apache.org/GhnI

 - Foundation Reports and Statements http://www.apache.org/foundation/reports.html

 - "Success at Apache" focuses on the people and processes behind why the ASF "just works". https://blogs.apache.org/foundation/category/SuccessAtApache

 - Inside Infra: the new interview series with members of the ASF infrastructure team --meet Christ Thistlethwaite https://s.apache.org/InsideInfra-Chris | Drew Foulks https://s.apache.org/InsideInfra-Drew | Greg Stein Part I https://s.apache.org/InsideInfra-Greg and Part II https://s.apache.org/InsideInfra-Greg2 

 - Airflow Summit 2020 will be held online 6-17 July online https://airflowsummit.org/

 - Did you know that Beam Summit 2020 will be held 24-28 August online and free of charge? https://beamsummit.org/

 - Please follow/like/re-tweet the ASF on social media: @TheASF on Twitter (https://twitter.com/TheASF) and on LinkedIn at https://www.linkedin.com/company/the-apache-software-foundation

 - Do friend and follow us on the Apache Community Facebook page https://www.facebook.com/ApacheSoftwareFoundation/ and Twitter account https://twitter.com/ApacheCommunity

 - Are your software solutions Powered by Apache? Download & use our "Powered By" logos http://www.apache.org/foundation/press/kit/#poweredby

= = =

For real-time updates, sign up for Apache-related news by sending mail to announce-subscribe@apache.org and follow @TheASF on Twitter. For a broader spectrum from the Apache community, https://twitter.com/PlanetApache provides an aggregate of Project activities as well as the personal blogs and tweets of select ASF Committers.

Friday July 03, 2020

The Apache News Round-up: week ending 3 July 2020

Welcome, July! We've had a great week within the Apache community. Here's what happened:

Inside Infra – the third interview in the series with members of the ASF Infrastructure team.
 
- Meet Greg Stein --Part II https://s.apache.org/InsideInfra-Greg2

ASF Board – management and oversight of the business affairs of the corporation in accordance with the Foundation's bylaws.
 - Next Board Meeting: 15 July 2020. Board calendar and minutes https://apache.org/foundation/board/calendar.html

ApacheCon™ – the ASF's official global conference series, bringing Tomorrow's Technology Today since 1998.
 - NEW! ApacheCon@Home: combining ApacheCon North America and ApacheCon Europe in a new, online format! Taking place 29 September - 1 October, registration is OPEN and FREE, with donation options for those wishing to support the ASF. Join us! https://www.apachecon.com/acna2020/ 

- CFP for ApacheCon@Home has been re-opened. Hurry! Presentation proposals due on 13 July.
https://www.apachecon.com/acna2020/cfp.html

ASF Infrastructure – our distributed team on three continents keeps the ASF's infrastructure running around the clock.
 - 7M+ weekly checks yield uptime at 100%. Performance checks across 50 different service components spread over more than 250 machines in data centers around the world. http://www.apache.org/uptime/

Apache Code Snapshot – this week, 907 Apache contributors changed 1,367,670 lines of code over 3,904 commits. Top 5 contributors, in order, are: Gary Gregory, Kaxil Naik, Andrea Cosentino, Eugen Stan, and Sebastian Bazley.     

Apache Project Announcements – the latest updates by category.

Application Performance Monitor --
 - Apache Chart 3.0.0 and Python 0.1.0 released https://skywalking.apache.org/

Big Data --
 - Apache Avro 1.10.0 released https://avro.apache.org/
 - Apache Storm 2.2.0 released https://storm.apache.org/
 - Apache Kylin 3.1.0 released https://kylin.apache.org/

IoT --
 - Apache IoTDB (Incubating) 0.10.0 released https://iotdb.apache.org/

Network Client --
 - Apache Guacamole 1.2.0 released https://guacamole.apache.org/

Web Crawler --
 - Apache Nutch 1.17 released https://nutch.apache.org/


Did You Know?

 - Did you know that the following Apache projects are celebrating anniversaries this month? Three cheers to Tcl (20 years); DB (18 years); STeVe (8 years); JSPWiki (7 years); Celix and Tez (6 years); NiFi (5 years); Kudu (4 years); Fluo, MADlib, and Streams (3 years); OpenWhisk (1 year)! https://projects.apache.org/committees.html?date

 - Did you know that, as with all Apache software, registration to ApacheCon@Home is free of charge? We do have donation options for those who wish to support the ASF; thank you in advance for your participation. Join us! https://apachecon.com/acna2020/

 - Did you know that John Deere's data platform is powered by Apache Flink and Apache Spark to scalably receive and processes millions of sensor measurements per second, and adapt to continually increasing volumes of data? https://flink.apache.org/ https://spark.apache.org/

Apache Community Notices

 - "Trillions and Trillions Served" – the feature documentary on the ASF filmed onsite at ApacheCon Las Vegas and Berlin in 2019 https://s.apache.org/Trillions-Feature  

 - The Apache Software Foundation Statement on the COVID-19 Coronavirus Outbreak https://s.apache.org/COVID-19  

 - The Apache Software Foundation Celebrates 21 Years of Open Source Leadership https://s.apache.org/21stAnniversary

 - Apache Month In Review: June 2020 – overview of events that have taken place within the Apache community https://s.apache.org/June2020

 - The Apache Software Foundation Operations Summary: Q3 FY2020 (November 2019 - January 2020) https://s.apache.org/r6s5u  

 - "Trillions and Trillions Served", the documentary on the ASF, is in post-production. Catch the teaser at https://s.apache.org/ASF-Trillions and "Apache Everywhere", the first "Trillions" "short" filmed onsite at ApacheCon Las Vegas and Berlin this past year https://youtu.be/nXtIti9jMFI

 - Apache in 2019 - By The Digits https://s.apache.org/Apache2019Digits

 - The Apache Way to Sustainable Open Source Success https://s.apache.org/GhnI

 - ASF Operations Summary: Q2 FY2020 (August - October 2019) https://s.apache.org/2kv2n

 - ASF Founders look back on 20 Years of the ASF https://blogs.apache.org/foundation/entry/our-founders-look-back-on

 - Foundation Reports and Statements http://www.apache.org/foundation/reports.html

 - ApacheCon: Tomorrow's Technology Today since 1998 http://s.apache.org/ApacheCon

 - "Success at Apache" focuses on the people and processes behind why the ASF "just works". https://blogs.apache.org/foundation/category/SuccessAtApache

 - Inside Infra: the new interview series with members of the ASF infrastructure team --meet Drew Foulks https://s.apache.org/InsideInfra-Drew

- Did you know that Airflow Summit 2020 will be held 6-17 July online? https://airflowsummit.org/

- Did you know that Beam Summit 2020 will be held 24-28 August online and free of charge? https://beamsummit.org/

 - Please follow/like/re-tweet the ASF on social media: @TheASF on Twitter (https://twitter.com/TheASF) and on LinkedIn at https://www.linkedin.com/company/the-apache-software-foundation

 - Do friend and follow us on the Apache Community Facebook page https://www.facebook.com/ApacheSoftwareFoundation/ and Twitter account https://twitter.com/ApacheCommunity

 - Find out how you can participate with Apache community/projects/activities --opportunities open with Apache Camel, Apache HTTP Server, and more! https://helpwanted.apache.org/

 - Are your software solutions Powered by Apache? Download & use our "Powered By" logos http://www.apache.org/foundation/press/kit/#poweredby

= = =

For real-time updates, sign up for Apache-related news by sending mail to announce-subscribe@apache.org and follow @TheASF on Twitter. For a broader spectrum from the Apache community, https://twitter.com/PlanetApache provides an aggregate of Project activities as well as the personal blogs and tweets of select ASF Committers.

Wednesday July 01, 2020

Apache Month in Review: June 2020

Welcome to the latest monthly overview of events from the Apache community. Here's a summary of what happened in June:

New this month --

 -"Trillions and Trillions Served" – the feature documentary on the ASF filmed onsite at ApacheCon Las Vegas and Berlin in 2019 https://s.apache.org/Trillions-Feature

 - ApacheCon™ – the ASF's official global conference series, bringing Tomorrow's Technology Today since 1998.
  -- Announcing ApacheCon @Home 2020: ApacheCon North America and Europe have been combined and will be held online 29 September - 1 October 2020. Join us! https://apachecon.com/acah2020

 - "Inside Infra" --a new interview series with members of the ASF Infrastructure team
  -- Meet Greg Stein --Part I https://s.apache.org/InsideInfra-Greg and Part II https://s.apache.org/InsideInfra-Greg2

 - ASF Statement on the COVID-19 Coronavirus Outbreak https://s.apache.org/COVID-19

 - Apache Month in Review: May 2020 https://s.apache.org/May2020


Important Dates --

 - Next Board Meeting: 15 July 2020. Board calendar and minutes http://apache.org/foundation/board/calendar.html

 - ApacheCon @Home 29 September - 1 October 2020


Infrastructure --

Our seven-member Infrastructure team on three continents oversees our highly-reliable, distributed network under the leadership of VP Infrastructure David Nalley and Infrastructure Administrator Greg Stein. ASF Infrastructure supports 300+ Apache projects and their communities across ~200 individual machines, 1,400+ repositories, 5-6PB in traffic annually, ~75M downloads per month, and 2-3M daily emails on 2,000+ lists. ASF Infra performs 7M+ weekly checks to ensure services are available around the clock. The average uptime in May was 99.92%. http://www.apache.org/uptime/

Committer Activity --

In May, 860 Apache Committers changed 19,454,137 lines of code over 16,319 commits. The Committers with the top 5 highest contributions, in order, were: Gary Gregory, Jean-Baptiste Onofré, Sebastian Bazley, Andrea Cosentino, and Claus Ibsen.

Project Releases and Updates --

New releases from Apache Archiva (Build Management); Beam (Big Data); Calcite (Big Data); Commons IO (Libraries); Commons BCEL (Libraries); Curator (Messaging); CXF Fediz (Libraries); Flink (Big Data); Fortress (Identity Management); HttpComponents Client (Servers); HttpComponents Core (Servers); Hudi (Big Data); Jackrabbit (Content); JSPWiki (Content); Libcloud (Cloud Computing); NetBeans (Integrated Development Environment); PDFBox (Content); Pulsar (Messaging); Qpid (Messaging); ShardingSphere (Big Data); Skywalking and Nginx (Application Performance Management); Tomcat (Servers); Traffic Control (Servers).

The Apache Incubator is the primary entry path for projects and codebases wishing to become part of the efforts at The Apache Software Foundation. Congratulations to Apache Hudi, which graduated as a Top-Level Project this month https://s.apache.org/odtwv. Welcome to Apache Pegasus (incubating) as the latest podling to enter the Incubator! We invite you to review the many projects currently in development in the Apache Incubator http://incubator.apache.org/   

# # #

To see our Weekly News Round-ups, visit https://blogs.apache.org/foundation/ and click on the calendar in the upper-right side (published every Friday) or hop directly to https://blogs.apache.org/foundation/category/Newsletter . For real-time updates, sign up for Apache-related news by sending mail to announce-subscribe@apache.org and follow @TheASF on Twitter. We appreciate your support!

Monday June 29, 2020

Inside Infra: Greg Stein --Part II

The "Inside Infra" interview continues with ASF Infrastructure Administrator Greg Stein, who shares his experience with Sally Khudairi, ASF VP Marketing & Publicity.




"Who are these crazy guys spread around the world that are keeping 200 machines up and running for all these different projects and committers and contributors?"



PART TWO.


How or what would you describe the Infra "brand" to be?


I don't really know. I've never really thought about branding or marketing ourselves, so ...


Well, you guys have a certain persona, you have those funky t-shirts you wear at ApacheCon ...there's definitely some kind of street cred that's different from everybody else. I was curious to see if that's part of your natural sense of hip, or is that something that you guys deliberately planned for.


The t-shirts and other things go back to the team bonding kind of thing. We'll give ourselves an identity, but haven't tried to create or market ourselves. I think it is something that we do need to take some control over. We hired a part-time writer in December and he's been organizing our content to provide a better and more useful front to Infrastructure.


There were a lot of pages on www.apache.org that have now moved over to infra.apache.org. That creates a more coherent Web space, if you will. We can really talk about those different channels. "How do you reach Infrastructure? Do I go to the Slack channel or do I file a JIRA ticket: how do I decide?" So he's helping to, while I wouldn't say "market a new face", he's certainly helping people figure out who we are, what we do, what we can help with and getting that information organized.


Which is good. That's new. Even to have you guys featured in a project like this, it's unusual and it's refreshing. I'm personally curious, and I'm sure other people are also curious about what's behind Infra.


Right, right. Who are these crazy guys spread around the world that are keeping 200 machines up and running for all these different projects and committers and contributors?


So Andrew (technical writer Andrew Wetmore) is primarily going to work on the infrastructure docs until those are whipped into shape because a lot of the material that we have, a lot of the Webpages, is really infrastructure related. He has been working with the team on those pages. What's going to be harder though is when he's kind of at a stopping point for that, what to turn his focus to, and that would be www.apache. But then it gets a lot more difficult because when he wants to update the How It Works page, who does he talk to? Who's authoritative? He can do some edits for flow and word consistency, punctuation, clarity, right, but he can't really update the process.


Right. Right. That's the Foundation thing.


Yeah. But the problem is we don't really even have a concept of who's in charge of that How It Works page, who is, you know, it's just there's nobody that the foundation is willing to say, "That person controls that process." You know what I mean?


I totally do --I come across the same pages and people go, "Are they yours?" It's hard to determine not only evolving processes, but who signs off on this or who gets it. I hear you.


I've recommended for the past year, or three, that Marketing is the owner of DubDubDub (www.), but you know, that's the "face" of Apache. You know? But the raw content, as you point out, who approves the raw content.


One thing that I asked Drew and Chris, and I'm always curious with people who are super busy and juggling 50 things, is to describe a typical workday for you.


I wake up, I look for email first, generally, sometimes I'll hop onto Slack because sometimes people ask me directly for something. Then I go look at email and sort through a number of different categories between direct team stuff, operations, the Apache Board, and then Apache in general. And then of course, if there's any vendor email to deal with. So there's a bunch of different categories in priority order. After I get through that initial work, then it's go and read all the back scroll in the team channel, which is anywhere from 200 to 400 lines of back scroll ...


Can you get any work done? Beyond just catching up on the communications?


Yes. But it does take like 30 minutes to read that back scroll. For me there's a lot in there about what the guys are doing and what they're working on, how to solve a particular problem when they're asking somebody else, "Hey, can you look at this? Can you help me with this?" But I don't, for the most part, "serve", you know ...they are the technical staff... I can do it: I have technical chops, but I let them do their jobs as they know best. I do like reading the back scroll because I'm also looking at it from the angle of "how is the team working together? Is that going well? Is there something that I need to poke and prod to improve how they're working? Are they getting jammed up on something that I can unblock for them so that they can get their work done?"


Stuff like that. That's what I look for when I go through that back scrolling, so it's important to me to read that back scroll. Most of the guys do tend to, when they first sign in in the morning, go back and scan for stuff where they might be needed. I've never really asked them how detailed they get, but I think pretty much everybody reads all of it to catch up, but they're going to be looking at it with a different lens than how I look at it. Mostly I'm looking at unblocking --are they running into problems that I can ease for them?


How do you keep your workload organized?


I don't.


Fair enough. Again, there's a lot, so it's curious to me, like everything at Apache, with the exception of a handful of things, everything could be a priority, if you're always on fire and always running around, putting out fires, you know? It's funny when I've talked to the Infra guys and you also, you all have the same reaction to that question, which is the laugh. I think that's the nature of the beast with the ASF.


Yes. That really is the nature of system administration work. My career has been product development, and you can reasonably plot that out. You can say, "We're going to develop these five new features, which is going to take us between two and four months." We'll see...we might cut a feature to try and limit our time development. The feature is going to change, unless we'll plan in time for change. But system administration is very reactive, so it's a very different beast. This is where, like I said, we were kind of treading water with four people, but we could see as Apache was growing we were not going to be able to keep up. And we certainly weren't going to be able to move ahead of the curve and do things like selfserve.apache.org where, you know, before we would get a dozen tickets to create repositories and that took time. Now we don't have to do anything.


It's all selfserve.apache.org, but we had to write the tool first and have enough air time to get that tool written. So I think we're ahead of the curve. We're getting some of our longer-term initiatives done, but it is still a very reactive thing. For myself, my back office work is pretty straightforward and it's a lot of email and Website work, you know, going in, paying an invoice, putting in the infrastructure credit card, sending out a purchase order, stuff like verifying and improving payroll, that doesn't require me sitting down and writing Python scripts.


The other half of my job is being present on that channel because I also help to set priorities. When something comes up, I ask, "Is this a thing that we want to do? Do we want to take on this new task? Do we want to provide this new tool to the projects?" You know, like a project is going to say, "Well, we want to integrate this thing into our GitHub repository," and we go and review it. It may require permissions that we simply don't want to allow. So there's some of those kinds of policy kind of things that I also help with. And there's always being present to help set policies and priorities.


OK... so how do you work with (VP Infrastructure) David Nalley? Are you making the decisions? Infra is an unusual type of group as opposed to other areas of activity operationally at the ASF. How do you work together?


Correct: I'm the day-to-day, so I look at it like he's the brains and I'm the hands. That said, he's like the strategic brain and I do all the tactical decisions.


I make all the tactical decisions. I am an officer of the corporation. I can make any decision that I need to, related to Infrastructure. If I feel it's a little bit weird, then I'll bounce that off David, but for most of the stuff, he doesn't feel a need to inject himself in. He feels comfortable letting me go ahead and run with the things, and rely upon me asking when it seems a little sketchy.


That's good: that process suits both of your personalities, both your sensibilities. It sounds like a good fit.


I report to the VP of Infrastructure, and that is still David, even though he became Executive VP and is now (ASF) President. He still holds that title. He's asked me, "Well, Greg, maybe you should just be VP Infra," and I said, "No way." Because we're paid people, but the Foundation is all volunteers. I told him I do not want to be a VP, because I want to report to a volunteer. I think that I (and the team) should report to a volunteer that always has a volunteer eye on the Foundation's long-term goals.


Because I manage all the day-to-day, it's a very lightweight hat for him. That VP hat is a tiny aspect compared to his President hat. One day, he'll find somebody to take over that VP Infra hat, but I've essentially mandated to him that it has to be a volunteer position.


It's not that I see we're going to go all out of control and we need a check from a volunteer; I just want a volunteer to always be able to say, "Okay, you guys are a little bit crazy, let's redirect our long-term thinking more in line with what the Foundation wants," and have a volunteer interpret what the Foundation wants.


That perfectly dovetails into what folks referred to in our ("Trillions and Trillions Served") documentary, where they were talking about Greg Stein's famous "plan for the ASF for 50 years..." This super long-term vision, which again, everyone goes back and says, "Greg Stein said..." What does that mean exactly, and how does that translate to Infra, considering that you can't really plan that far out? How does that work?


Well, actually we can plan that far out. I wrote that "50 years" in one of my Director's statements, I think it was 2014 or 2012 ...maybe earlier. Where I was going in that Director statement was the Board doesn't deal with the communities. The Board is there to support the communities. So we want the Foundation to exist for 50 years so that these communities can continue to run and see through evolution.


Some communities are going to move to the Attic, new ones are going to come along, but we want the Foundation to be viable. To say "forever" is okay. Nobody can really put that in their brain. So I just said, "OK, we can think what 50 years means." That is long enough out, but still within people's brain capacity to think, "Okay, what _does_ 50 years mean?"


And so that's where I came up with that. What does the Board need to think about to ensure that we are here 50 years from now and our projects are successful and can run through their lifetime, lifecycles. Apache HTTP and Tomcat, I don't think they are ever going to go away, but you could see maybe in 30 years they might. There might be some other mechanism in computing that would obsolete them, but the model of Apache does need to exist for at least that long.


Now, within Infra, I think we actually can plan that far out because we have growth curves. We see what kinds of computing resources people need. So we can plan for project growth, for machine growth. We can do long-term planning on how we allocate machines among our various cloud resources that we have, and start to schedule those further out. None of that really affects our day to day, but it is something that we can project out a ways and think about what kinds of resources we are going to need two, three, five years from now.


There isn't anything really that we can do for 50 years, but we can keep it in mind. Okay, that is going to be a larger team. That is going to need a larger staff, a full time manager, a full time HR person, a full time... There's different things that will change over that time, but we can actually do some of that projection, although we haven't bothered.


I do the five year plans for the Board, but mostly that is a simple cost growth as opposed to actually changing the structure of the team or the role assignments, because like I said, I think probably within 10 years, we'll probably need to add one or two more staff on top of the head count of six that we have right now. And I think supporting that would still be fine for a part-time person like myself. But once it grows to 10 or 12, then I think it's going to need a real change. Where we need to have a full-time person managing and so, we'll need to adjust the budget considerably to make that happen.


But if we ever get there, the Foundation is going to be likely in a very different position. We're talking 10 years from now. And so, who knows.


So with more than 350 projects and initiatives as we've discussed before, how do you guys stay ahead of the demand? And again, if you're trying to plan for five, ten years out, you mentioned earlier cloud computing. Not so long ago, cloud computing was a novelty. How do you plan for this?


And that is where we try and move more things to selfserve.apache.org, where we look at the kinds of requests that we're getting. The kinds of tasks that we’re performing and find a way to automate that workflow and create more self-serve options for the kinds of tasks that we regularly get tickets on.


Where we used to get tickets on creating Git repositories, we get zero now and, and we can see over the past six months, we've had 20 tickets to do X, is there a way that we can automate that, so we don't have to get our hands on that ourselves and save our hands for doing things like machine upgrades, for rebalancing some of our computer resources, where things are running on an old operating system and we need to get that onto a newer version. Right now, all of our machines are managed by a system called Puppet, which does the basic configuration work for us. But today, we're on two different versions of Puppet, a really old one and a reasonably new one.


And we're trying to get everything migrated off the old stuff onto the new but once we finished that migration, we're going to have to start all over again, or maybe switch to a different tool. We're looking at a tool called Ansible to use instead of Puppet.


And so there's this never-ending ongoing set of tasks, but each time we do it, it reduces our workload by that much more. So when we upgrade from Puppet 3 to Puppet 6, we get an improvement in the maintainability of that server. And that means that we spend less time with that server going forward and have more time to do other things or to deal with project growth.


Regarding a scale of efficiency, how do you close your skills gaps? When I spoke to Chris and Drew, they both said, "We do everything." How do you do that? How do you know all of this? Do you look at this big picture and say, "Okay, we need a person to specialize in X and Y and Z," and then you send them out to learn about it? How do you cope with that?


The team definitely specializes. And the guys have specializations around different areas, but we do a little bit of cross training, but not a lot because as I mentioned, we've got like 200 machines, each individually doing their own thing. If we cross trained everybody in everything, we'd get nothing done. So, there's a little bit of cross training, but mostly some specialties. It does create a little bit of bus factor...


Which is very scary. I was just going to say, your bus factor is very scary. Talk about that.


The thing is that Puppet allows us to create configurations and that's in version control. If all of a sudden somebody leaves, another person can backfill them because if somebody leaves, it's not like they take their work with them: all the work is in version control. And so that work doesn't go with them, but we may need to backfill some education on that particular specialized area. For example, Chris (ASF Infra team member Chris Thistlethwaite) does a lot of our monitoring work. If he left, now we need somebody to get a little more familiar with NodePing and a little more familiar with Datadog, but that'll be like a week for somebody to pick that up.


It wouldn't be, "Oh my God, this is three years of expertise that we need to go backfill" ...we don't have anything that is that highly specialized.


Is that because the team is more well rounded or because you guys are more efficient or what about it? Because of technology evolution, or...


We don't deal with systems of that level of complexity. We've got 200 machines, like I said, each doing their thing, but it's not like we've got a cluster of 200 machines all trying to coordinate to create one particular outcome. It's, here's my SQL server, here's a JIRA server, here's a Puppet server. Things like that, where the amount of technology is pretty small in each little pocket ... but we just have a hundred pockets on our pants.

[END OF PART II]

Friday June 26, 2020

The Apache News Round-up: week ending 26 June 2020

Farewell, June --we're wrapping up the month with another great week. Here are the latest updates on the Apache community's activities:

ASF Board – management and oversight of the business affairs of the corporation in accordance with the Foundation's bylaws.
 - Next Board Meeting: 15 July 2020. Board calendar and minutes https://apache.org/foundation/board/calendar.html

ApacheCon™ – the ASF's official global conference series, bringing Tomorrow's Technology Today since 1998.
 - Notice on Apache 2020 Conferences https://s.apache.org/zgm8m 

ASF Infrastructure – our distributed team on three continents keeps the ASF's infrastructure running around the clock.
 - 7M+ weekly checks yield uptime at 100%. Performance checks across 50 different service components spread over more than 250 machines in data centers around the world. http://www.apache.org/uptime/

Apache Code Snapshot – this week, 917 Apache contributors changed 3,498,501 lines of code over 3,692 commits. Top 5 contributors, in order, are: Manfred Moser, Sebastian Bazley, Andrea Cosentino, Gary Gregory, and Claus Ibsen.      

Apache Project Announcements – the latest updates by category.

Application Performance Monitor --
 - Apache SkyWalking 8.0.1 and Nginx LUA 0.2.0 released https://skywalking.apache.org/

Big Data --
 - Apache Calcite Avatica 1.17.0 released https://calcite.apache.org/avatica

Build Management --
 - Apache Archiva 2.2.5 released https://archiva.apache.org/

Libraries --
 - Apache CXF Fediz 1.5.0 released http://cxf.apache.org/fediz.html

Messaging --
 - Apache Pulsar 2.6.0 released https://pulsar.apache.org/

Servers --
 - Apache Traffic Server 8.0.8 and 7.1.11 released https://trafficserver.apache.org/


Did You Know?

 - Did you know that Apache Cordova released OSX 6.0.0? https://cordova.apache.org/ 

 - Did you know that Apache Royale released the new, nifty Tour De Jewel component to demonstrate progress on pages? https://royale.apache.org/ 

 - Did you know that the Python agent for Apache SkyWalking is in development? http://skywalking.apache.org/ 


Apache Community Notices

 - "Trillions and Trillions Served" – the feature documentary on the ASF filmed onsite at ApacheCon Las Vegas and Berlin in 2019 https://s.apache.org/Trillions-Feature  

 - The Apache Software Foundation Statement on the COVID-19 Coronavirus Outbreak https://s.apache.org/COVID-19  

 - The Apache Software Foundation Celebrates 21 Years of Open Source Leadership https://s.apache.org/21stAnniversary

 - Apache Month In Review: May 2020 – overview of events that have taken place within the Apache community https://s.apache.org/May2020

 - The Apache Software Foundation Operations Summary: Q3 FY2020 (November 2019 - January 2020) https://s.apache.org/r6s5u  

 - "Trillions and Trillions Served", the documentary on the ASF, is in post-production. Catch the teaser at https://s.apache.org/ASF-Trillions and "Apache Everywhere", the first "Trillions" "short" filmed onsite at ApacheCon Las Vegas and Berlin this past year https://youtu.be/nXtIti9jMFI

 - Apache in 2019 - By The Digits https://s.apache.org/Apache2019Digits

 - The Apache Way to Sustainable Open Source Success https://s.apache.org/GhnI

 - ASF Operations Summary: Q2 FY2020 (August - October 2019) https://s.apache.org/2kv2n

 - ASF Founders look back on 20 Years of the ASF https://blogs.apache.org/foundation/entry/our-founders-look-back-on

 - Foundation Reports and Statements http://www.apache.org/foundation/reports.html

 - ApacheCon: Tomorrow's Technology Today since 1998 http://s.apache.org/ApacheCon

 - "Success at Apache" focuses on the people and processes behind why the ASF "just works". https://blogs.apache.org/foundation/category/SuccessAtApache

 - Inside Infra: the new interview series with members of the ASF infrastructure team --meet Drew Foulks https://s.apache.org/InsideInfra-Drew

- Did you know that Airflow Summit 2020 will be held 6-17 July online? https://airflowsummit.org/

- Did you know that Beam Summit 2020 will be held 24-28 August online and free of charge? https://beamsummit.org/

 - Please follow/like/re-tweet the ASF on social media: @TheASF on Twitter (https://twitter.com/TheASF) and on LinkedIn at https://www.linkedin.com/company/the-apache-software-foundation

 - Do friend and follow us on the Apache Community Facebook page https://www.facebook.com/ApacheSoftwareFoundation/ and Twitter account https://twitter.com/ApacheCommunity

 - Find out how you can participate with Apache community/projects/activities --opportunities open with Apache Camel, Apache HTTP Server, and more! https://helpwanted.apache.org/

 - Are your software solutions Powered by Apache? Download & use our "Powered By" logos http://www.apache.org/foundation/press/kit/#poweredby

= = =

For real-time updates, sign up for Apache-related news by sending mail to announce-subscribe@apache.org and follow @TheASF on Twitter. For a broader spectrum from the Apache community, https://twitter.com/PlanetApache provides an aggregate of Project activities as well as the personal blogs and tweets of select ASF Committers.

Friday June 19, 2020

The Apache News Round-up: week ending 19 June 2020

Happy Friday! Let's take a look at what the Apache community has been up to over the past week:

ASF Board – management and oversight of the business affairs of the corporation in accordance with the Foundation's bylaws.
 - Next Board Meeting: 15 July 2020. Board calendar and minutes https://apache.org/foundation/board/calendar.html

ApacheCon™ – the ASF's official global conference series, bringing Tomorrow's Technology Today since 1998.
 - Notice on Apache 2020 Conferences https://s.apache.org/zgm8m

ASF Infrastructure – our distributed team on three continents keeps the ASF's infrastructure running around the clock.
 - 7M+ weekly checks yield uptime at 99.72%. Performance checks across 50 different service components spread over more than 250 machines in data centers around the world. http://www.apache.org/uptime/

Apache Code Snapshot – this week, 902 Apache contributors changed 5,499,342 lines of code over 3,942 commits. Top 5 contributors, in order, are: Chunen Ni, Sebastian Bazley, Rupeng Wang, Gary Gregory, and Andrea Cosentino.      

Apache Project Announcements – the latest updates by category.

Cloud Computing --
 - Apache Libcloud 3.1.0 released http://libcloud.apache.org/

Servers --
 - Apache HttpComponents Client 5.0.1 GA released https://hc.apache.org/
 - Apache Traffic Control 4.1.0 released https://trafficcontrol.apache.org/


Did You Know?

 - Did you know that you can meet Apache APISIX (Incubating), catch up with Apache CloudStack, see what’s next with Apache HBaseas the project celebrates its 10th Anniversary, and more? Only on Feathercast --the voice of the ASF https://feathercast.apache.org

 - Did you know that Tencent uses Apache Pulsar to process tens of billions of dollars in financial transactions each day? http://pulsar.apache.org/ 

 - Did you know that Apache Cordova has a major release for iOS? https://cordova.apache.org/ 
 

Apache Community Notices

 - "Trillions and Trillions Served" – the feature documentary on the ASF filmed onsite at ApacheCon Las Vegas and Berlin in 2019 https://s.apache.org/Trillions-Feature  

 - The Apache Software Foundation Statement on the COVID-19 Coronavirus Outbreak https://s.apache.org/COVID-19  

 - The Apache Software Foundation Celebrates 21 Years of Open Source Leadership https://s.apache.org/21stAnniversary

 - Apache Month In Review: May 2020 – overview of events that have taken place within the Apache community https://s.apache.org/May2020

 - The Apache Software Foundation Operations Summary: Q3 FY2020 (November 2019 - January 2020) https://s.apache.org/r6s5u  

 - "Trillions and Trillions Served", the documentary on the ASF, is in post-production. Catch the teaser at https://s.apache.org/ASF-Trillions and "Apache Everywhere", the first "Trillions" "short" filmed onsite at ApacheCon Las Vegas and Berlin this past year https://youtu.be/nXtIti9jMFI

 - Apache in 2019 - By The Digits https://s.apache.org/Apache2019Digits

 - The Apache Way to Sustainable Open Source Success https://s.apache.org/GhnI

 - ASF Operations Summary: Q2 FY2020 (August - October 2019) https://s.apache.org/2kv2n

 - ASF Founders look back on 20 Years of the ASF https://blogs.apache.org/foundation/entry/our-founders-look-back-on

 - Foundation Reports and Statements http://www.apache.org/foundation/reports.html

 - ApacheCon: Tomorrow's Technology Today since 1998 http://s.apache.org/ApacheCon

 - "Success at Apache" focuses on the people and processes behind why the ASF "just works". https://blogs.apache.org/foundation/category/SuccessAtApache

 - Inside Infra: the new interview series with members of the ASF infrastructure team --meet Drew Foulks https://s.apache.org/InsideInfra-Drew

- Did you know that Airflow Summit 2020 will be held 6-17 July online? https://airflowsummit.org/

- Did you know that Beam Summit 2020 will be held 24-28 August online and free of charge? https://beamsummit.org/

 - Please follow/like/re-tweet the ASF on social media: @TheASF on Twitter (https://twitter.com/TheASF) and on LinkedIn at https://www.linkedin.com/company/the-apache-software-foundation

 - Do friend and follow us on the Apache Community Facebook page https://www.facebook.com/ApacheSoftwareFoundation/ and Twitter account https://twitter.com/ApacheCommunity

 - Find out how you can participate with Apache community/projects/activities --opportunities open with Apache Camel, Apache HTTP Server, and more! https://helpwanted.apache.org/

 - Are your software solutions Powered by Apache? Download & use our "Powered By" logos http://www.apache.org/foundation/press/kit/#poweredby

= = =

For real-time updates, sign up for Apache-related news by sending mail to announce-subscribe@apache.org and follow @TheASF on Twitter. For a broader spectrum from the Apache community, https://twitter.com/PlanetApache provides an aggregate of Project activities as well as the personal blogs and tweets of select ASF Committers.

Friday June 12, 2020

The Apache News Round-up: week ending 12 June 2020

Hurrah for Friday! We've had a great week within the Apache community. Here's what happened:

"Trillions and Trillions Served" – the feature documentary on the ASF filmed onsite at ApacheCon Las Vegas and Berlin in 2019 https://s.apache.org/Trillions-Feature 

Inside Infra – the third interview in the series with members of the ASF Infrastructure team.
 - Meet Greg Stein --Part I https://s.apache.org/InsideInfra-Greg

ASF Board – management and oversight of the business affairs of the corporation in accordance with the Foundation's bylaws.
 - Next Board Meeting: 17 June 2020. Board calendar and minutes https://apache.org/foundation/board/calendar.html

ApacheCon™ – the ASF's official global conference series, bringing Tomorrow's Technology Today since 1998.
 - Notice on Apache 2020 Conferences https://s.apache.org/zgm8m

ASF Infrastructure – our distributed team on three continents keeps the ASF's infrastructure running around the clock.
 - 7M+ weekly checks yield uptime at 99.98%. Performance checks across 50 different service components spread over more than 250 machines in data centers around the world. http://www.apache.org/uptime/

Apache Code Snapshot – this week, 350 committers changed 922,742 lines of code over 2,850 commits. Top 5 committers, in order of commits, are: Andrea Cosentino, Guillaume Nodet, Jark Wu, Raphaël Ouazana, and Michael Vorburger.             

Apache Project Announcements – the latest updates by category.

Big Data --
 - Apache ShardingSphere 4.1.1 released https://shardingsphere.apache.org/
 - Apache Beam 2.22.0 released https://beam.apache.org/
 - Apache Flink Stateful Functions 2.1.0 released https://flink.apache.org/

Content --
 - Apache Jackrabbit 2.20.1 released https://jackrabbit.apache.org/
 - Apache PDFBox 2.0.20 released https://pdfbox.apache.org/

Integrated Development Environment --
 - Apache NetBeans 12.0 released https://netbeans.apache.org/
 - Newly Identified Inactive Malware Campaign: Impact on Apache NetBeans https://blogs.apache.org/netbeans/entry/newly-identified-inactive-malware-campaign 

Libraries --
 - Apache Commons IO 2.7 released https://commons.apache.org/proper/commons-io/
 - Apache Commons BCEL 6.5.0 released https://commons.apache.org/proper/commons-bcel/

Messaging --
 - Apache Qpid JMS 0.52.0 released https://qpid.apache.org/

Servers --
 - Apache Tomcat 8.5.56, 9.0.36, and 10.0.0-M6 released https://tomcat.apache.org/
 - Apache HttpComponents Core 5.0.1 GA released https://hc.apache.org/


Did You Know?

 - Did you know that you could help with the next version of Apache OpenOffice? https://blogs.apache.org/OOo/entry/apache-openoffice-needs-your-help 

 - Did you know that Airflow Summit 2020 will be held 6-17 July online? https://airflowsummit.org/  

 - Did you know that Beam Summit 2020 will be held 24-28 August online and free of charge? https://beamsummit.org/
 

Apache Community Notices

 - The Apache Software Foundation Statement on the COVID-19 Coronavirus Outbreak https://s.apache.org/COVID-19  

 - The Apache Software Foundation Celebrates 21 Years of Open Source Leadership https://s.apache.org/21stAnniversary

 - Apache Month In Review: May 2020 – overview of events that have taken place within the Apache community https://s.apache.org/May2020

 - The Apache Software Foundation Operations Summary: Q3 FY2020 (November 2019 - January 2020) https://s.apache.org/r6s5u  

 - "Trillions and Trillions Served", the documentary on the ASF, is in post-production. Catch the teaser at https://s.apache.org/ASF-Trillions and "Apache Everywhere", the first "Trillions" "short" filmed onsite at ApacheCon Las Vegas and Berlin this past year https://youtu.be/nXtIti9jMFI

 - Apache in 2019 - By The Digits https://s.apache.org/Apache2019Digits

 - The Apache Way to Sustainable Open Source Success https://s.apache.org/GhnI

 - ASF Operations Summary: Q2 FY2020 (August - October 2019) https://s.apache.org/2kv2n

 - ASF Founders look back on 20 Years of the ASF https://blogs.apache.org/foundation/entry/our-founders-look-back-on

 - Foundation Reports and Statements http://www.apache.org/foundation/reports.html

 - ApacheCon: Tomorrow's Technology Today since 1998 http://s.apache.org/ApacheCon

 - "Success at Apache" focuses on the people and processes behind why the ASF "just works". https://blogs.apache.org/foundation/category/SuccessAtApache

 - Inside Infra: the new interview series with members of the ASF infrastructure team --meet Drew Foulks https://s.apache.org/InsideInfra-Drew

 - Please follow/like/re-tweet the ASF on social media: @TheASF on Twitter (https://twitter.com/TheASF) and on LinkedIn at https://www.linkedin.com/company/the-apache-software-foundation

 - Do friend and follow us on the Apache Community Facebook page https://www.facebook.com/ApacheSoftwareFoundation/ and Twitter account https://twitter.com/ApacheCommunity

 - Find out how you can participate with Apache community/projects/activities --opportunities open with Apache Camel, Apache HTTP Server, and more! https://helpwanted.apache.org/

 - Are your software solutions Powered by Apache? Download & use our "Powered By" logos http://www.apache.org/foundation/press/kit/#poweredby

= = =

For real-time updates, sign up for Apache-related news by sending mail to announce-subscribe@apache.org and follow @TheASF on Twitter. For a broader spectrum from the Apache community, https://twitter.com/PlanetApache provides an aggregate of Project activities as well as the personal blogs and tweets of select ASF Committers.

Tuesday June 09, 2020

Inside Infra: Greg Stein --Part I

The third "Inside Infra" interview is with ASF Infrastructure Administrator Greg Stein, who shares his experience with Sally Khudairi, ASF VP Marketing & Publicity.




"We've got about 200 different machines and each one runs something different"



PART ONE.


What is your name --how is it pronounced?

Greg Stein. "Gregg St-eye-n"

When people need to find you, are you at gstein@? Has that always been your handle for everything?

Ever since high school, actually. I was gjs@ for a bit in college, but went back to gstein@. I started at Google early April 2004, and Gmail launched on April 1, so I was able to get my work email ID, gstein@gmail. So it’s great, but also rather annoying, because there are a lot of Gary Steins and Gertrude Steins and George Steins, and I get all of their email ... I get plane tickets, hotel reservations ... I got a proposal from the Gates Foundation once. I had some crazy bitter angry lady yelling at her husband as they were getting divorced, and she could rant. I mean, wow: that lady had a pirate's mouth.

But she didn't have his email address.

Apparently not.

When and how did you get involved with the ASF?

I left Microsoft in 1998, and the product group I was working in was building WebDAV into various Microsoft products. I thought the concept of WebDAV was very cool, and wanted the Open Source world to have it. That meant writing a module for the Apache Web Server. I think it was September 1998 when I started posting to the Apache mailing list and looking at how to plug in a WebDAV module. That was Apache 1.3 at the time. I developed a module called mod_dav for Apache 1.3, And when we started Apache 2.0 in 2000, I donated the module to Apache, and it became a standard module in Apache 2.0.

I remember that: I did the press release for that way back when. I knew you were connected with mod_dav, but didn't realize the path as to how you got there. It's very interesting.

That's what brought me to Apache, when they started putting together the foundation: it was in the Spring of '99. I remember asking Roy if I could be one of the first members of the foundation, and Roy's answer was basically like, "We already had the set of people locked in. You'll probably get nominated and voted in at our first member meeting," which occurred in September 1999. So yes, I was in that first batch of new members rather than the original membership.

You've been a member of the ASF much longer than you've been involved with ASF Infra. What were the previous hats you were wearing at the ASF? You've been here for a while, and have had a lot of different configurations.

This is true. So I'm a committer on HTTPd (Apache HTTP Server) and then a PMC Member, an ASF Member. I helped start the APR (Apache Portable Runtime) project with some of the other Web server committers, we pulled that out of HTTPd and created APR, and we used that for 2.0. We used APR, whereas Apache 1.3 was essentially the combination of the two, one big code base. Then Justin Erenkrantz and I started Apache Serf, and that was a high performance C-based client library for HTTP. But we didn't have three people in the community, so it couldn't really be an Apache project. So we took it out of Apache and started working on it on our own, and then eventually Subversion started to use Serf, and so we got more committers on Serf, and the community kind of built up around it because of Subversion. So we ran Serf externally, but just like it was an Apache community, it was Apache licensed and so on. Eventually we wanted to move it back into Apache, and I don't recall off hand, but we went straight to a TLP from our external project back to Apache Serf.

Early 2000, it was January or February, (ASF co-Founder) Brian Behlendorf approached me about helping with the network protocol for this new version control system they were starting at CollabNet, because he knew my background in HTTP and WebDAV. That “V” stands for versioning. I got involved with the Subversion project that Spring. That was also run as a very egalitarian Open Source project, very similar to how we run stuff at Apache. I was really the only Apache person, but Karl Fogel just knows how to run a great community, and so all those values that we cherish in communities at Apache were part of Subversion from day one, but was run by CollabNet. I was hired in 2001 to manage their development team. Eventually, CollabNet wanted to turn it into a vendor-neutral thing that wasn't only CollabNet, so they started a small LLC called the Subversion Corporation. Once the IP was transferred to the Subversion Corporation, people said, "Okay, let's move to Apache," because nobody wanted to deal with the overhead of the Subversion Corporation. We approached Apache at the end of 2009, and Subversion became Apache Subversion. I was the first VP for that. I think that's the only VP hat I've worn.

In 2001, I was elected to the Board at the Members meeting, and in 2002, Roy decided to step down as Chair and said, "Oh, Greg should be Chairman." He just kind of threw me under the bus that way, but I agreed, and that's when I became ASF Chairman. I was chairman until 2007, which is the longest-running chairman. I think Brett Porter did four years.

I think it was 2009 when you hosted us at the Harvard Club and Doug Cutting was appointed Chairman, but he said he didn't really want to travel, do much press stuff, or be a face of Apache. Roy came to the rescue, threw me under the bus again, and said, "Greg can be Vice Chairman, and we'll have the Vice Chairman do all that stuff”. So I held the Vice Chairman role until September 2016, when I gave up my director position, the Vice Chairman position, and VP of Subversion, because that's when I became Infrastructure Administrator.

Over the years, I did a bunch of volunteer work for ASF Infrastructure. I helped out with what we call AP mail: adjusting moderators, changing aliases, things like that. So I've had AP mail access for quite a while when I was doing that. Upayavira wrote id.apache.org for people to review their Member records, change their passwords, etc. I helped him with some of that stuff. That was all written in Python, so I was able to help out.

Python before Python was popular.

I've been using Python since 1995, and I've contributed to Python itself. We set up the Python Software Foundation in 2001. When I say “we”, I mean myself and Dick Hardt from ActiveState. We took the Apache bylaws, and added a different class of membership to it so that companies would become... I forget what we called them, like corporate members or something. The normal people were called nominated members, as they were nominated by somebody else and voted in. But this gave corporations a vote at the table on the board and anything else that members would get a vote on. So the core of the Python Software Foundation came from Apache.

Back to ASF Infrastructure. In 2016 we had four people on staff in Infrastructure, and our volunteer VP of Infrastructure didn't have enough volunteer time to be able to provide support and management for those four people, plus we wanted to hire two more people. With six people, he was right out. So we spent a lot of 2016 trying to figure out how to create a “manager” for the Infra team. At the time the idea of an “executive director” type position was also thrown around, but a full-time position to manage four or six staff is completely overkill, and we certainly didn't have the budget for a full-time position. Somewhere around late August, I realized that there was an email that Ross (former ASF President Ross Gardler) sent and I thought, "I can do that. That's a half-time job. I'm certainly happy to do it. I've managed engineering teams before.” Now, infra's not an engineering team, they don't really develop products, but it's pretty close to engineering management. At a minimum, it's personnel management, which I've been doing since the '90s.

So I threw my hat in the ring. Ross ran it by the Board and the team, and nobody raised a strong concern, so in his authority as President, he went ahead and hired me half-time. It was the day of our Board meeting --I resigned all my positions, and we appointed my replacement for Vice Chairman and my Director position that day, both of which I believe were Sam. He filled my role as Director, and I started as Infrastructure Administrator.

What does “Infrastructure Administrator” mean? What does it entail: are you hands-on coding solutions like the rest of the team? Are you solving problems? What do you do?

I chose the title because I didn't want to be called “manager”: I didn't want to feel like I'm the boss. I wanted to help with the administrative side, make sure the guys get paid, deal with the invoices, handle what you might call back office kind of stuff, and let the team focus on what they do best, which is the system administration. (ASF Infrastructure Team Member) Daniel Gruno does some development work in addition. I do a little bit of development work. For me, it's more like where in my hobby time I might work on Subversion, but now my hobby time is coding Infrastructure type stuff, so it's not really part of my work duties. I deal with salaries, raises, bonuses, getting the payroll done, and for our contractors, getting them paid. I also deal with third party contracts for things like Travis CI, for lists.apache,org ... that's with PonyMail. I make sure that our vendors get paid, and our contractors and employees get paid.

How was the Infra team structured, and how many are in the team?


We have five full-time people that work on Infrastructure: all five are system administrators. Daniel Gruno does maybe 30% system administration and 70% tool development. We don't develop any products, because we're not an Apache community. We write tools, but don't actually develop any products. This is why PonyMail is in the Incubator: it was originally written by one of the people on Infra, but we didn't want to run that as an Infra community. With only five people, we don't really want to be a community lead or anything like that.

The joke is if somebody wants to move into my position, they lose half their salary, because my position is part-time. It's not really a promotion: it would be a loss to do anything. So unlike a corporation with 10,000 people on staff, career development is a little more difficult. It's really a job for people that enjoy Apache and enjoy our mission, and also enjoy working with the other people on the team.

Who does ASF infra serve?

Our primary users are all the communities at Apache. We've got over 200 communities, and those are the primary users. I don't like calling them “customers”, but in a corporate world, they would probably be considered our customers, and we serve those users. There's 8,000 people with accounts that are working on different projects, but the user base is way, way larger than that, because people can file JIRA tickets and work on the wiki and do things like that without actually being an Apache Committer. So the user base is even larger. Then you start looking at all the people subscribed to all of our mailing lists, and that number goes even higher. There's probably 10% of our work which is also supporting the administrative side of the Foundation itself.

For the Board, your role in PR, and Trademarks, and Legal, and the office of the President and various other operational type stuff, we spend 5-10% of our time. A lot of what we do applies naturally across all of the user base, because the foundation uses the same tool set as our communities. Subversion, mailing lists, JIRA, Confluence, etc. We help with account creation, the LDAP management, what sort of permissions people have to access different things...

One of the neat things that we've done, and I've actually had a couple of communities ask us about it, is our GitBox setup where our projects can use GitHub. But then we also mirror all that source code back to Apache so that we have a copy of it for provenance tracking. And in case GitHub does something dumb, we have our own copy of the code. Any changes made on GitHub get sent to our mailing list or get mirrored into JIRA. Our projects can see all the activity on GitHub, and it gets mirrored into our mailing lists where we prefer that our community work is performed.

That's actually a pretty cool feature that we've done at Apache.


It's interesting to see communities outside of Apache that emulate structures and processes and solutions that the ASF has created. It's cool to see it even happening on an infrastructure level. How does ASF infra differ from other organizations or other open source foundations?

Most of them don't really have teams. Most projects out there do their work on GitHub, and don't have their own source control. They don't have account management, they don't run mailing lists. We do all this stuff that most Open Source projects just don't deal with.

They also don't have the scale that we have.


Yes. Because they're one project, and we have over 200 projects. Most projects have some repositories hanging out on GitHub or on GitLab, or wherever else that they might host: if somebody wants to run a demonstration of that project, they buy their own virtual machine and AWS, and pay that out of pocket. At Apache, all of our projects can have virtual machines hosted by Infra, where they install their software for demonstration purposes. They can point people at that VM, so they can check out the product in live motion. So that ability to run VMs is also pretty unique to the Foundation. When you look at the Linux Foundation or the Eclipse Foundation, those are a little bit different. They're not a charitable organization like us. They're a 501(c)(6), which is really like a trade association.

Like a consortium.

Yes, a consortium. I believe that they do have infra teams, but their business model is quite different from ours. If you look at Mozilla, they have the Mozilla Foundation, but that's kind of a shell; Mozilla Corporation essentially runs everything, and the foundation is like a legal shell wrapped around the corporation.

You mentioned earlier that we have 200 projects: you're referring to 200 Top-Level Projects (TLPs), but we also have sub projects and initiatives. At Apache, we have more than 350 different activities going on --you guys touch all of those. It's not like there's any aspect of ASF that you're not involved with or you're not supporting.


That's correct. And I say 200 because I'm thinking mostly from a TLP thing.

Irrespective of the existence of sub projects, you're still dealing with other communities and projects: there's more than just the 200. Hats off to you guys. It's quite a lot of work.

We've got about 200 different machines and each one runs something different. Some companies have 50 copies of a machine that they'll start up in the cloud, running some container --we never do that. Each individual machine is configured one by one and they're all different. And so 200 machines to support the 350 initiatives. It's a lot of heterogeneous work and that can be kind of distracting, but it's also very interesting because we do support such a wide variety of stuff for our projects.

There's what, five Infra team members, and we have 350 projects and initiatives going on. That's a lot of stuff happening: is it non-stop?

Yeah, it's nonstop. That's why we went from four to six people, we were sort of treading water, but we weren't really able to move forward on a number of our longer term initiatives. So when we went to six people in November 2016, that made us a lot more hands-on, if you will. That meant that we could actually make some progress on this longer term work that we wanted to accomplish. Some of that is like https://selfserve.apache.org/ , where people can get things done instead of filing a JIRA ticket and having us do the work for them.

Is that popular? Do people use it?

Oh, absolutely. When somebody opens a JIRA ticket to say, "Can I have this Git repository?” or “Can you create a JIRA Space for me?" we close the ticket and say, "Go to selfserve.apache.org". Before, where everybody would file a ticket for a Git repository or file a ticket for JIRA, file a ticket for Confluence, or whatever, we just close them all down now, and they use selfserve.apache.org instead. We simply won't do those things anymore. So selfserve.apache.org is actually quite handy. And then about four months ago we've added a feature called asf.yaml: it allows communities to control a lot of the finer grained aspects about how their repositories are used, like how do they publish Web pages from a repository, or if you make a change, where does the commit email go? Which mailing lists? Does it go to their development list? Or do they have a commit list? If somebody opens a PR on GitHub, where does notification of that go? Those used to all be tickets also, but people can deal with those just by editing a file in their repository now. So again, it reduces tickets and that's our goal where these routine tasks that all the different communities want to perform, we want to move those into a self-served mechanism so that we don't need hands-on all the time. And thus, we can support 350 different initiatives.

That's great to help empower the communities to take care of their own needs, whether they're minor or major, but that also encourages autonomy. So that's really helpful for you guys: you don't need to have a team of 40 people to support the day-to-day.

We do stay busy. You're talking about the influx and we get requests from people through email, through our Slack channel and through JIRA. Of course our monitoring system will tell us when something goes down, so our monitoring systems also give us more work to do, so it is kind of an endless string of queries. Depending on what the task is, each of those different channels is appropriate. For a quick task, hitting us up on Slack is totally fine, but if there's going to be several days of work, we like JIRA tickets so that we can track the work as it progresses.

How do you encourage the team? How do you keep them motivated? What were your challenges with such a huge load to carry: how do you keep everyone going?

One of the big benefits that we have for our team is actually that we're all remote, so we all sit on a Slack channel. We have a team-only channel that we use for communicating, "What's going on? What beer are you drinking today? What are you having for dinner?" I think about my days when I worked at Microsoft or at Google where I sat in the office by myself and it's a very individual experience that I used to have, but now, our team is there all the time on our channel. It's a very social experience: I think that makes for a much tighter team. And it provides a very different experience than what you get at a more “normal” company. That sort of team experience really helps keep people motivated.

People enjoy their jobs more. From a management standpoint, I can certainly say, "If people are sitting there talking about what they're going to make for lunch, there's a drag on the team and maybe we're not seeing the highest productivity possible," but I think that would actually run the counter. Our team is actually more productive as a result of this great team bonding. We have a conference call once a week for 30-60 minutes. And we don't really have to: the team knows what everybody's doing because we're all doing it right in front of everybody. We all get the commit messages. We have our Slack channel. We see the changes to JIRA. We know what each person is doing, but having the call actually gives us a chance to speak to another human so you're not working in your basement all day without any human contact.

We actually have that once a week, if you will, forced human voice contact.

Did that evolve organically? Or was that something planned?

The team was already doing weekly status calls. When I started, I said, "We're going to keep doing that. We're not going to switch out for just, you know, a status email or anything." Before I started, I think they were doing a group edit on a status Web page or something. I don't know if they had calls, but today I mandate the call because I want the team to get together. We've also been doing the group get-togethers at ApacheCon. We got together at ApacheCon Miami, and then the next year in Montreal. Last year we skipped the whole conference format and just got together as a team in New Orleans for four nights.

It was great because it was just us without the distractions of the conference. The conference is good because the INFRA team gets to meet the people that are their users, their customers, the people that we're actually trying to support, all those communities. And the people in the communities get to meet the team. You know, the people that asked, "Can you help me with X?" They get to put a face to those names.

There are times where one of the guys on the team will work with somebody in the community for a couple of weeks to track down some problem, get a virtual machine configured, whatever. All you see is a user ID and the kind of tone of their messages, but at the conference, you can actually put a face to that name, to that ID. That’s really good from a team standpoint. With the team bonding, we spent eight hours a day in this giant penthouse suite in New Orleans on the 30th floor looking out over the Mississippi River. It was very cool, it had space and a big dining table where we could all come in and work. And then I would go around the corner to Mothers and pick up—

Oh my gawd: the po boys ...the debris po boys.

Exactly, you know what I'm talking about.

I lived there. So, yes, I know.

It was literally a block away. So that was our lunch. Every day I was going down to the Mothers, getting a big brown shopping bag full of food and bringing it to the room. We did go there and eat once so the guys could get out of the room for lunch, and each evening we would go as a team out for dinner. After dinner, it's like, “OK, do whatever you want. It's New Orleans.” That was a really good team experience. We were set to go to Nashville this year and then, you know, pandemic ensued. So we called it off.

It's funny: I stumbled across your channel on Slack and, if I remember this correctly, someone was talking about grilling a whole steer or something along those lines. You guys deal with a lot of beef, there’s a lot of meat in this group. So ...

In the team channel, there's a lot of stories about food and beer and other forms of alcohol. We eventually created a cooking channel on Slack because there's other people like Ruth (ASF Executive Vice President Ruth Suehle) and Shane (ASF Vice Chair Shane Curcuru) and others who also like talking about making food. We still have a lot of that discussion on the team channel, but we’ve now got a dedicated channel with a larger set of people talking foodie type of stuff, so that’s very cool.

You were also talking about motivation: I work with each of the guys to find out what they're interested in exploring. Whether it's a new tool or a new product or to write a new tool to improve our workflow, it's like, "What are you interested in? Okay, take point on that, do the research, go do the experimenting." So each of the guys has gotten generally one or two long-term projects that interest them that they want to work on.


[END OF PART ONE]

Friday June 05, 2020

The Apache News Round-up: week ending 5 June 2020

Welcome, June! We've had a great week within the Apache community. Here's what happened:

ASF Board – management and oversight of the business affairs of the corporation in accordance with the Foundation's bylaws.
 - Next Board Meeting: 17 June 2020. Board calendar and minutes https://apache.org/foundation/board/calendar.html

ApacheCon™ – the ASF's official global conference series, bringing Tomorrow's Technology Today since 1998.
 - Notice on Apache 2020 Conferences https://s.apache.org/zgm8m

ASF Infrastructure – our distributed team on three continents keeps the ASF's infrastructure running around the clock.
 - 7M+ weekly checks yield uptime at 100%. Performance checks across 50 different service components spread over more than 250 machines in data centers around the world. http://www.apache.org/uptime/

Apache Code Snapshot – this week, 918 Apache contributors changed 11,483,033 lines of code over 3,726 commits. Top 5 contributors, in order, are: Jean-Baptiste Onofré, Gary Gregory, Claus Ibsen, Andrea Cosentino, and Mark Thomas.          

Apache Project Announcements – the latest updates by category.

Big Data --
 - The Apache Software Foundation Announces Apache® Hudi™ as a Top-Level Project https://s.apache.org/odtwv

Content --
 - Apache JSPWiki 2.11.0.M7 released https://jspwiki-wiki.apache.org/

Identity Management --
 - Apache Fortress 2.0.5 released http://directory.apache.org/fortress/

Integrated Development Environment --
 - Newly Identified Inactive Malware Campaign: Impact on Apache NetBeans https://blogs.apache.org/netbeans/entry/newly-identified-inactive-malware-campaign 

Messaging --
 - Apache Curator 5.0.0 released https://curator.apache.org/
 - Apache Qpid Proton-J 0.33.5 and JMS AMQP 0-x 6.4.0 released https://qpid.apache.org/


Did You Know?

 - Did you know that the following Apache projects are celebrating anniversaries this month? SpamAssassin (16 years); Santuario (14 years); Commons and Wicket (13 years); Sling (11 years); Karaf (10 years); Flume and VCL (8 years); Mesos (7 years); Atlas and Mynewt (3 years) --many happy returns! https://projects.apache.org/committees.html?date 

 - Did you know that the first Pulsar Summit will be held 17-18 June? SIgn up today at https://pulsar-summit.org 

 - Did you know that Feathercast is back? The voice of the ASF is up with new projects featured, such as HBase, Shiro, Kafka, SkyWalkng, Ignite, Mahout, and more! https://feathercast.apache.org/ 
 

Apache Community Notices

 - The Apache Software Foundation Statement on the COVID-19 Coronavirus Outbreak https://s.apache.org/COVID-19  

 - The Apache Software Foundation Celebrates 21 Years of Open Source Leadership https://s.apache.org/21stAnniversary

 - Apache Month In Review: May 2020 – overview of events that have taken place within the Apache community https://s.apache.org/May2020

 - The Apache Software Foundation Operations Summary: Q3 FY2020 (November 2019 - January 2020) https://s.apache.org/r6s5u  

 - "Trillions and Trillions Served", the documentary on the ASF, is in post-production. Catch the teaser at https://s.apache.org/ASF-Trillions and "Apache Everywhere", the first "Trillions" "short" filmed onsite at ApacheCon Las Vegas and Berlin this past year https://youtu.be/nXtIti9jMFI

 - Apache in 2019 - By The Digits https://s.apache.org/Apache2019Digits

 - The Apache Way to Sustainable Open Source Success https://s.apache.org/GhnI

 - ASF Operations Summary: Q2 FY2020 (August - October 2019) https://s.apache.org/2kv2n

 - ASF Founders look back on 20 Years of the ASF https://blogs.apache.org/foundation/entry/our-founders-look-back-on

 - Foundation Reports and Statements http://www.apache.org/foundation/reports.html

 - ApacheCon: Tomorrow's Technology Today since 1998 http://s.apache.org/ApacheCon

 - "Success at Apache" focuses on the people and processes behind why the ASF "just works". https://blogs.apache.org/foundation/category/SuccessAtApache

 - Inside Infra: the new interview series with members of the ASF infrastructure team --meet Drew Foulks https://s.apache.org/InsideInfra-Drew

 - Please follow/like/re-tweet the ASF on social media: @TheASF on Twitter (https://twitter.com/TheASF) and on LinkedIn at https://www.linkedin.com/company/the-apache-software-foundation

 - Do friend and follow us on the Apache Community Facebook page https://www.facebook.com/ApacheSoftwareFoundation/ and Twitter account https://twitter.com/ApacheCommunity

 - Find out how you can participate with Apache community/projects/activities --opportunities open with Apache Camel, Apache HTTP Server, and more! https://helpwanted.apache.org/

 - Are your software solutions Powered by Apache? Download & use our "Powered By" logos http://www.apache.org/foundation/press/kit/#poweredby

= = =

For real-time updates, sign up for Apache-related news by sending mail to announce-subscribe@apache.org and follow @TheASF on Twitter. For a broader spectrum from the Apache community, https://twitter.com/PlanetApache provides an aggregate of Project activities as well as the personal blogs and tweets of select ASF Committers.


Thursday June 04, 2020

The Apache Software Foundation Announces Apache® Hudi™ as a Top-Level Project

Open Source data lake technology for stream processing on top of Apache Hadoop in use at Alibaba, Tencent, Uber, and more.

Wakefield, MA —4 June 2020— The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today Apache® Hudi™ as a Top-Level Project (TLP).

Apache Hudi (Hadoop Upserts Deletes and Incrementals) data lake technology enables stream processing on top of Apache Hadoop compatible cloud stores & distributed file systems. The project was originally developed at Uber in 2016 (code-named and pronounced "Hoodie"), open-sourced in 2017, and submitted to the Apache Incubator in January 2019.

"Learning and growing the Apache way in the incubator was a rewarding experience," said Vinoth Chandar, Vice President of Apache Hudi. "As a community, we are humbled by how far we have advanced the project together, while at the same time, excited about the challenges ahead."

Apache Hudi is used to manage petabyte-scale data lakes using stream processing primitives like upserts and incremental change streams on Apache Hadoop Distributed File System (HDFS) or cloud stores. Hudi data lakes provide fresh data while being an order of magnitude efficient over traditional batch processing. Features include:

  • Upsert/Delete support with fast, pluggable indexing
  • Transactionally commit/rollback data
  • Change capture from Hudi tables for stream processing
  • Support for Apache Hive, Apache Spark, Apache Impala and Presto query engines
  • Built-in data ingestion tool supporting Apache Kafka, Apache Sqoop and other common data sources
  • Optimize query performance by managing file sizes, storage layout
  • Fast row based ingestion format with async compaction into columnar format
  • Timeline metadata for audit tracking

Apache Hudi is in use at organizations such as Alibaba Group, EMIS Health, Linknovate, Tathastu.AI, Tencent, and Uber, and is supported as part of Amazon EMR by Amazon Web Services. A partial list of those deploying Hudi is available at https://hudi.apache.org/docs/powered_by.html

"We are very pleased to see Apache Hudi graduate to an Apache Top-Level Project. Apache Hudi is supported in Amazon EMR release 5.28 and higher, and enables customers with data in Amazon S3 data lakes to perform record-level inserts, updates, and deletes for privacy regulations, change data capture (CDC), and simplified data pipeline development," said Rahul Pathak, General Manager, Analytics, AWS. “We look forward to working with our customers and the Apache Hudi community to help advance the project."

"At Uber, Hudi powers one of the largest transactional data lakes on the planet in near real time to provide meaningful experiences to users worldwide," said Nishith Agarwal, member of the Apache Hudi Project Management Committee. "With over 150 petabytes of data and more than 500 billion records ingested per day, Uber’s use cases range from business critical workflows to analytics and machine learning."

"Using Apache Hudi, end-users can handle either read-heavy or write-heavy use cases, and Hudi will manage the underlying data stored on HDFS/COS/CHDFS using Apache Parquet and Apache Avro," said Felix Zheng, Lead of Cloud Real-Time Computing Service Technology at Tencent.

"As cloud infrastructure becomes more sophisticated, data analysis and computing solutions gradually begin to build data lake platforms based on cloud object storage and computing resources," said Li Wei, Technical Lead on Data Lake Analytics, at Alibaba Cloud. "Apache Hudi is a very good incremental storage engine that helps users manage the data in the data lake in an open way and accelerate users' computing and analysis."

"Apache Hudi is a key building block for the Hopsworks Feature Store, providing versioned features, incremental and atomic updates to features, and indexed time-travel queries for features," said Jim Dowling, CEO/Co-Founder at Logical Clocks. "The graduation of Hudi to a top-level Apache project is also the graduation of the open-source data lake from its earlier data swamp incarnation to a modern ACID-enabled, enterprise-ready data platform."

"Hudi's graduation to a top-level Apache project is a result of the efforts of many dedicated contributors in the Hudi community," said Jennifer Anderson, Senior Director of Platform Engineering at Uber. "Hudi is critical to the performance and scalability of Uber's big data infrastructure. We're excited to see it gain traction and achieve this major milestone."

"Thus far, Hudi has started a meaningful discussion in the industry about the wide gaps between data warehouses and data lakes. We have also taken strides to bridge some of them, with the help of the Apache community," added Chandar. "But, we are only getting started with our deeply technical roadmap. We certainly look forward to a lot more contributions and collaborations from the community to get there. Everyone’s invited!"

Catch Apache Hudi in action at Virtual Berlin Buzzwords 7-12 June 2020, as well as at MeetUps, and other events.

Availability and Oversight
Apache Hudi software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache Hudi, visit http://hudi.apache.org/ and https://twitter.com/apachehudi 

About the Apache Incubator
The Apache Incubator is the primary entry path for projects and codebases wishing to become part of the efforts at The Apache Software Foundation. All code donations from external organizations and existing external projects enter the ASF through the Incubator to: 1) ensure all donations are in accordance with the ASF legal standards; and 2) develop new communities that adhere to our guiding principles. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF. For more information, visit http://incubator.apache.org/ 

About The Apache Software Foundation (ASF)
Established in 1999, The Apache Software Foundation (ASF) is the world’s largest Open Source foundation, stewarding 200M+ lines of code and providing more than $20B+ worth of software to the public at 100% no cost. The ASF’s all-volunteer community grew from 21 original founders overseeing the Apache HTTP Server to 765 individual Members and 206 Project Management Committees who successfully lead 350+ Apache projects and initiatives in collaboration with 7,600 Committers through the ASF’s meritocratic process known as "The Apache Way". Apache software is integral to nearly every end user computing device, from laptops to tablets to mobile devices across enterprises and mission-critical applications. Apache projects power most of the Internet, manage exabytes of data, execute teraflops of operations, and store billions of objects in virtually every industry. The commercially-friendly and permissive Apache License v2 is an Open Source industry standard, helping launch billion dollar corporations and benefiting countless users worldwide. The ASF is a US 501(c)(3) not-for-profit charitable organization funded by individual donations and corporate sponsors including Aetna, Alibaba Cloud Computing, Amazon Web Services, Anonymous, Baidu, Bloomberg, Budget Direct, Capital One, CarGurus, Cerner, Cloudera, Comcast, Facebook, Google, Handshake, Huawei, IBM, Indeed, Inspur, Leaseweb, Microsoft, Pineapple Fund, Red Hat, Target, Tencent, Union Investment, Verizon Media, and Workday. For more information, visit http://apache.org/ and https://twitter.com/TheASF 

© The Apache Software Foundation. "Apache", "Hudi", "Apache Hudi", "Hadoop", "Apache Hadoop", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.

# # #

Monday June 01, 2020

Apache Month in Review: May 2020

Welcome to the latest monthly overview of events from the Apache community. Here's a summary of what happened in May:

New this month --

 - The Apache Software Foundation Welcomes 34 New Members https://s.apache.org/q14mx

 - Support Apache: help the ASF continue to provide $20B+ worth of software –at 100% no cost– for the public good https://s.apache.org/GivingTuesdayNow2020

 - Apache Everywhere: the first "short" from the "Trillions" documentary filmed onsite at ApacheCon Las Vegas and Berlin this past year https://youtu.be/nXtIti9jMFI 

 - Announcing New ASF Board of Directors https://s.apache.org/Board2020  

 - ASF Statement on the COVID-19 Coronavirus Outbreak https://s.apache.org/COVID-19 

 - Notice on Apache 2020 Conferences https://s.apache.org/zgm8m  

 - "Inside Infra" --a new interview series with members of the ASF Infrastructure team. Meet Drew Foulks https://s.apache.org/InsideInfra-Drew

 - Success at Apache: the monthly blog series that focuses on the people and processes behind why the ASF "just works".
   - Bringing the Apache Beam firefly to life by Julián Bruno https://s.apache.org/bmq4l
   - Remote Collaboration in the Time of Coronavirus" by Marvin Humphrey https://s.apache.org/dkffj

 - Happy 10th Anniversary Apache HBase https://s.apache.org/m2pxf 

 - Apache Month in Review: April 2020 https://s.apache.org/Apr2020


Important Dates --

 - Next Board Meeting: 17 June 2020. Board calendar and minutes http://apache.org/foundation/board/calendar.html


Infrastructure --

Our seven-member Infrastructure team on three continents oversees our highly-reliable, distributed network under the leadership of VP Infrastructure David Nalley and Infrastructure Administrator Greg Stein. ASF Infrastructure supports 300+ Apache projects and their communities across ~200 individual machines, 1,400+ repositories, 5-6PB in traffic annually, ~75M downloads per month, and 2-3M daily emails on 2,000+ lists. ASF Infra performs 7M+ weekly checks to ensure services are available around the clock. The average uptime in May was 99.85%. http://www.apache.org/uptime/

Committer Activity --

In May, 905 Apache Committers changed 62,477,961 lines of code over 15,350 commits. The Committers with the top 5 highest contributions, in order, were: Andrea Cosentino, Manfred Moser, Claus Ibsen, Jean-Baptiste Onofré, and Liang Zhang. 


Project Releases and Updates --

New releases from Apache Ant (Build Management); Arrow (Big Data); Beam (Big Data); Calcite (Big Data); CloudStack (Cloud Computing); CouchDB (Big Data); Daffodil (Libraries); Druid (Big Data); Flink (Big Data); Groovy (Programming Languages); Ignite (Big Data); IoTDB (IoT); Impala (Databases); Jackrabbit (Content); jclouds (Cloud); JMeter (Testing); Kudu (Big Data); Kylin (Big Data); Log4j (Libraries); Lucene (Search); NLPCraft (Natural Language Processing); NuttX (Operating System); OFBiz (Enterprise Processes Automation / ERP); OpenMeetings (Web Conferencing); PLC4X (IoT); Qpid (Messaging); ShardingSphere (Big Data); Subversion (Version Control); Syncope (Identity Management); Tomcat (Servers); UIMA (Content); Wicket (Web Frameworks); ZooKeeper (Databases).

The Apache Incubator is the primary entry path for projects and codebases wishing to become part of the efforts at The Apache Software Foundation. Welcome new podlings AgensGraph (graphing database), which entered the Incubator at the very end of April, as well as Liminal (machine learning), which was submitted just a few days ago. Congratulations to Apache ShardingSphere, which graduated as a Top-Level Project this month https://s.apache.org/315iv .We invite you to review the many projects currently in development in the Apache Incubator http://incubator.apache.org/   

Community --

Apache Community Development ("ComDev") welcomes new participants to the Apache community and mentors them in "The Apache Way".

Budapest, Hungary, is named the latest Apache Local Community (ALC) Chapter https://s.apache.org/yot34 . The Apache Local Community program is a relatively new initiative, and launched early 2020. Four ALC chapters are up and running; for more information, including how to set up your a local chapter for your community, visit https://s.apache.org/alc .


# # #

To see our Weekly News Round-ups, visit https://blogs.apache.org/foundation/ and click on the calendar in the upper-right side (published every Friday) or hop directly to https://blogs.apache.org/foundation/category/Newsletter . For real-time updates, sign up for Apache-related news by sending mail to announce-subscribe@apache.org and follow @TheASF on Twitter. We appreciate your support!

Friday May 29, 2020

The Apache News Round-up: week ending 29 May 2020

Farewell, May --we're wrapping up the month with another great week. Here are the latest updates on the Apache community's activities:

ASF Board – management and oversight of the business affairs of the corporation in accordance with the Foundation's bylaws.
 - Next Board Meeting: 17 June 2020. Board calendar and minutes https://apache.org/foundation/board/calendar.html

ApacheCon™ – the ASF's official global conference series, bringing Tomorrow's Technology Today since 1998.
 - Notice on Apache 2020 Conferences https://s.apache.org/zgm8m 
 - CFP EXTENDED for ApacheCon North America: submissions due 1 June https://www.apachecon.com/

ASF Infrastructure – our distributed team on three continents keeps the ASF's infrastructure running around the clock.
 - 7M+ weekly checks yield uptime at 99.74%. Performance checks across 50 different service components spread over more than 250 machines in data centers around the world. http://www.apache.org/uptime/

Apache Code Snapshot – this week, 900 Apache contributors changed 2,695,384 lines of code over 3,697 commits. Top 5 contributors, in order, are: Manfred Moser, Andrea Cosentino, Gary Gregory, Congxian Qiu, and Claus Ibsen.        

Apache Project Announcements – the latest updates by category.

Big Data --
 - Apache Arrow 0.17.1 released https://arrow.apache.org/
 - Apache Calcite 1.23.0 released https://calcite.apache.org/
 - Apache Ignite 2.8.1 released https://ignite.apache.org/
 - Apache Beam 2.21.0 released https://beam.apache.org/

Cloud Computing --
 - The Apache Software Foundation Announces Apache CloudStack v 4.14 https://s.apache.org/l5ps8

Content --
 - Apache Jackrabbit Oak 1.8.22 released https://jackrabbit.apache.org/

IoT --
 - Apache PLC4X 0.7.0 released https://plc4x.apache.org/ 

Search --
 - Apache Lucene 8.5.2 and Solr 8.5.2 released https://lucene.apache.org/

Version Control --
 - The Apache Software Foundation Announces Apache Subversion 1.14.0-LTS https://s.apache.org/osr65


Did You Know?

 - Did you know that Apache OpenOffice now features new extensions for Danish spellcheck and hyphenation dictionaries? https://openoffice.apache.org/

 - Did you know that due to the coronavirus, Pulsar Summit is virtual this year? Catch live & interactive sessions by Splunk, Verizon Media, Iterable, Yahoo! JAPAN, TIBCO, OVHcloud, Clever Cloud and many more! https://pulsar-summit.org/

 - Did you know that Japanese work-life site Haken-EN is powered by Apache Wicket? https://wicket.apache.org/ 
 

Apache Community Notices

 - The Apache Software Foundation Statement on the COVID-19 Coronavirus Outbreak https://s.apache.org/COVID-19  

 - The Apache Software Foundation Celebrates 21 Years of Open Source Leadership https://s.apache.org/21stAnniversary

 - Apache Month In Review: April 2020 – overview of events that have taken place within the Apache community https://s.apache.org/Apr2020

 - The Apache Software Foundation Operations Summary: Q3 FY2020 (November 2019 - January 2020) https://s.apache.org/r6s5u  

 - "Trillions and Trillions Served", the documentary on the ASF, is in post-production. Catch the teaser at https://s.apache.org/ASF-Trillions and "Apache Everywhere", the first "Trillions" "short" filmed onsite at ApacheCon Las Vegas and Berlin this past year https://youtu.be/nXtIti9jMFI

 - Apache in 2019 - By The Digits https://s.apache.org/Apache2019Digits

 - The Apache Way to Sustainable Open Source Success https://s.apache.org/GhnI

 - ASF Operations Summary: Q2 FY2020 (August - October 2019) https://s.apache.org/2kv2n

 - ASF Founders look back on 20 Years of the ASF https://blogs.apache.org/foundation/entry/our-founders-look-back-on

 - Foundation Reports and Statements http://www.apache.org/foundation/reports.html

 - ApacheCon: Tomorrow's Technology Today since 1998 http://s.apache.org/ApacheCon

 - "Success at Apache" focuses on the people and processes behind why the ASF "just works". https://blogs.apache.org/foundation/category/SuccessAtApache

 - Inside Infra: the new interview series with members of the ASF infrastructure team --meet Drew Foulks https://s.apache.org/InsideInfra-Drew

 - Please follow/like/re-tweet the ASF on social media: @TheASF on Twitter (https://twitter.com/TheASF) and on LinkedIn at https://www.linkedin.com/company/the-apache-software-foundation

 - Do friend and follow us on the Apache Community Facebook page https://www.facebook.com/ApacheSoftwareFoundation/ and Twitter account https://twitter.com/ApacheCommunity

 - Find out how you can participate with Apache community/projects/activities --opportunities open with Apache Camel, Apache HTTP Server, and more! https://helpwanted.apache.org/

 - Are your software solutions Powered by Apache? Download & use our "Powered By" logos http://www.apache.org/foundation/press/kit/#poweredby

= = =

For real-time updates, sign up for Apache-related news by sending mail to announce-subscribe@apache.org and follow @TheASF on Twitter. For a broader spectrum from the Apache community, https://twitter.com/PlanetApache provides an aggregate of Project activities as well as the personal blogs and tweets of select ASF Committers.

Calendar

Search

Hot Blogs (today's hits)

Tag Cloud

Categories

Feeds

Links

Navigation