Apache Infrastructure Team

Sunday March 26, 2017

Bringing GitPubSub to the Apache Jenkins build server

When it comes to [Jenkins], it has long been known that [polling must die].

While we could go and create post commit hooks in all the ASF hosted Git repositories, that is something that realistically is just creating an added maintenance burden. In any case, we have [GitPubSub].

The question then becomes, how do we integrate [GitPubSub] with [Jenkins]? Thankfully, ASF committer stephenc is also an active committer to the [Jenkins] project and created a [plugin] that connects to [GitPubSub] parses the events and passes them through to the Jenkins [SCM API].

What does this mean?

* You can turn your Git polling down - way way down - to something like once per day. This should significantly reduce the load on both the ASF git servers and builds.apache.org
* Your builds will be triggered in seconds rather than having to wait for the next polling run.
* You can try out using Multi-branch projects much like the [Maven] project has been doing for [Maven core] and [Maven Surefire]

If the reaction to this change proves positive, the next step will be to integrate SvnPubSub with Jenkins and bring the benefits to the Subversion based projects too

See also this blog post by Stephen Connolly:


[polling must die]: http://kohsuke.org/2011/12/01/polling-must-die-triggering-jenkins-builds-from-a-git-hook/
[GitPubSub]: https://www.apache.org/dev/gitpubsub.html
[Jenkins]: https://jenkins.io/
[plugin]: https://github.com/stephenc/asf-gitpubsub-jenkins-plugin
[SCM API]: https://plugins.jenkins.io/scm-api
[Maven]: https://maven.apache.org
[Maven core]: https://builds.apache.org/job/maven-3.x-jenkinsfile/
[Maven Surefire]: https://builds.apache.org/job/maven-surefire-pipeline/

Posted on behalf of Committer Stephen Connolly (stephenc)

Sunday January 01, 2017

blogs.a.o moved, upgraded and improved

Hi All,

blogs.apache.org   - the site you are reading now! has had a bit of an update.

1. We moved it from an aged VM Host to the Cloud (thanks LeaseWeb!)

2. We puppetised the entire service, from install to deploy (see our Github Mirror )

3. We upgraded the Apache Roller software from 5.0.3 to the latest 5.1.2

4. We enabled LDAP for logins. That's right! Every single ASF Committer can now just login! No more creating an INFRA Jira ticket just to get a Roller account on blogs.apache.org

Other stuff remains the same - meaning if you are a Blog Administrator you still need to invite committers into your blog, you still need to choose to make them an Author or Admin etc - Roller doesnt support anything more than login auth for LDAP currently - but I bet the project would love to see the LDAP integration extended and improved if you feel the need!.

Anyhow, our first new year present to our ASF Committers, a shiny updated blog instance,

 Enjoy, and have a great 2017!!

Monday July 25, 2016

Position Available: Infrastructure Systems Administrator/Architect

UPDATE:  We have received enough applicants at this time. Thank you all for your interest. 

The Apache Software Foundation (ASF) seeks to fill an Infrastructure Systems Administrator/Architect position. You will be responsible for working with the existing technical infrastructure team, and VP of Infrastructure at the Apache Software Foundation. The ASF manages a world-wide network of open source software which includes more than 750 software code repositories, a worldwide distribution and mirroring system for software; change management, issue tracking, and software management for 300+ Open Source initiatives and more than 11,000 contributors around the world.

Applicants should have a strong background in Computer and Information Science, and should be familiar with modern Dev/Ops environments. Applicants must demonstrate the ability to work in a remote team environment alongside others working in diverse locations around the world and in different timezones. The successful applicant will work with the existing Infrastructure team and VP, Infrastructure to manage the ASF's critical infrastructure and resources. Infrastructure team members are expected to work an on-call rotation with the rest of the team.

Our infrastructure team also supports our broader community by enabling the creation of self-service tooling. The successful candidate will be able to balance the needs of our critical infrastructure and the needs of our community to self-serve. These two demands can often be in conflict and thus an ability to navigate such complex environments is a distinct advantage.

Familiarity with Puppet (or a similar configuration management tool) Linux (Debian-based), Virtual Machines, Subversion/Git and full development environment stacks are a requirement. Further, the candidate should possess great documentation skills and should be well versed in not only developing and assisting in technical solutions, but in documenting them.

Preferred qualifications include a Bachelor's Degree in Computer Science or similar background from an accredited university, though demonstrable and appropriate on-the-job experience is an acceptable substitute for formal qualifications. Familiarity with how open source communities work is a plus.

English as a spoken and written language is required in order to facilitate team collaboration.

This is a remote work position, the ASF does not require nor provide office locations. Travel required for conferences and general team meetups.

Contact vp-infra@apache.org with your CV.

Thursday June 30, 2016

ASF JIRA Outages and Troubleshooting

As people have noticed, our JIRA instance (arguably the largest public instance in the world) has been suffering from a yet unknown issue as of late. We are reasonably sure that this is related to specific queries being made against the instance (possibly automated queries from scrapers), but have yet to identify the exact cause of the problem.

The failure condition arises when the database connection pool is exhausted, despite being configured and sized appropriately. These connections all appear idle, but when the pool is full, no new connections can be established, and the instance falls over, requiring a restart. 

We are working closely with Atlassian, the creator of JIRA, to remedy the situation. Unfortunately, this requires running diagnostics on the production JIRA instance, which in and of itself causes performance degradation and downtime. Over the past several days, we've identified and implemented some changes to the pool parameters which we hope will help stabilize the instance while we continue our diagnostic work.

We expect that there may still be some moments of downtime and occasional restarts. Any longer duration outages will be announced via Twitter/infrabot and status.apache.org.

Friday February 12, 2016

AppVeyor CI now available for GitHub Mirrors

The ASF Infrastructure team is happy to announce that projects can how have AppVeyor CI setup on their GitHub mirrors.

 The only thing you need to do is create an INFRA ticket at issues.apache.org with the following information:

  • Repo Name
  • Mailing list to send build notifications to (optional)

There are already a few projects using AppVeyor on their GitHub mirror, and we now have an Organization role account for central management (and I have gone through an updated previous tickets with new links to badges).

If you have any questions, you can ask us in Hipchat or you can email infrastructure@apache.org

Monday October 19, 2015

Dear Apache

My name is Daniel Takamori and I'm so happy to be joining the Infra team here at Apache.  I'm from Oregon in the United States and really enjoy the rain.  While at Oregon State University I studied mathematics and physics with a lean towards error correcting codes and mathematical modelling.  Some of my hobbies are playing Go in which I'm ranked 6.9 kyu by the AGA, cooking with eggs and green things, and old school platforming video games.  In a former life I worked on underwater remotely operated vehicles and automated gardening systems.  Traveling is something I liked to do once; living in Hungary was awesome and I hope to visit again. Oregon is a great place to live, with all the trees, rain and burritos but maybe things will change in the future.  My handle Pono is my Hawaiian name, and I'm really proud to use it.

Previously I was at the Oregon State University Open Source Lab and really enjoyed my time there; getting to know the Open Source communities and even work with Apache!  It was a real eye opening experience to the world of what software and DevOps (lol who knows what that even means).  I'm very excited to continue working with the community and even more excited to start this next chapter with such an amazing group.

See you around internets!

Wednesday August 19, 2015

Planned downtime for ReviewBoard

The ReviewBoard vm ran out of space and despite our best efforts to fix the space issue without restarting the service, that is the only option left.

The plan is to restart the vm on Thursday August 20th at 21:00 UTC (14:00 PDT), but if it fills up again before then, the resize will take place earlier.

A tweet via @infrabot will be tweeted 1 hour before the scheduled downtime and a planned maintenance notice will be posted to status.apache.org.

The actual downtime should take no more than 30 minutes.

The next email about this will be after the service has resumed from the planned downtime.


Geoff Corey

Monday August 03, 2015

Planned downtime for Jira

There will be a planned reboot of Jira on Friday 7th August at 00:00 UTC.

This is 72 hours notice as recommended in our Core Services planned downtime SLA.

Currently, Jira requires a reboot when adding new projects to it. There is an outstanding
ticket with Atlassian about this. They require logs and so these will be gathered at the
time of the planned reboot.

Projects being added to Jira at this time will include:-

INFRA-9713 - Whimsy

and any more that get requested between now and downtime.

Any projects requiring issues to be imported from other issue trackers will NOT be done at
this time.

A tweet via @infrabot will be tweeted 24 hrs and 1 hr before.
A planned maintenance notice will be posted on status.apache.org.

Actual downtime should be no more than 10 minutes all being well.

The next email about this will be after the service has resumed from the planned downtime.


Geoff Corey

Tuesday July 14, 2015

Mirroring to GitHub issues

As some of you are aware, there have been some issues syncing changes from repositories on https://git-wip-us.apache.org to the mirrors on GitHub.

The issues we are seeing:

  • Pull requests not being closed when they should be
  • Changes not being synced to the GitHub mirrors
  • Bots other than asfgit closing PRs on Apache GitHub mirrors.

We are looking into why changes are not being synced, as well as why PRs are not getting closed and why some PRs are being closed by other bots such as hubot.

We will update this blog post as we get more information about the sync issues.

Monday June 29, 2015

Buildbot master currently off-line

Update (2015-06-30 ~12.00 UTC):

The replacement buildbot master is now live. The CMS service and the ci.apache.org  website have been restored. The project CI builds are mostly working but builds that upload docs, snapshots etc. to the buildmaster for publishing are likely to fail at the upload stage while we ensure all the necessary directory structures are in place to receive the uploads. Work to resolve these final few issues is ongoing.

We continue to try and contact the owner of the account where the IRC proxy was running. In case their account has been compromised, it remains locked. In addition, all their commits have been reviewed by other project committers and that review has comfirmed that no malicious commits have been made by the account in question.

The review of aegis.apache.org  is ongoing. No evidence of compromise beyond the possible compromise of the single, non-privileged user account has been found.

Original post (2015-06-29 ~21.00 UTC):

As per the e-mails to committers@ earlier today, aegis.apache.org is currently offline after a report was received that suspicious network traffic had been observed from that host. This blog post will be updated as more information becomes known.

What we know:

  • At ~16.00 UTC 28 June 2015 a report of suspicious network activity from a buildbot host was reported to the Apache security team.
  • Further information was requested and at ~18.00 UTC 28 June 2015 the Apache Infrastructure team received a copy of network logs that showed a number of suspicious IRC connections originating from aegis.apache.org
  • These IRC connections were traced to a non-privileged user account on aegis.apache.org  running an open IRC proxy
  • At ~20.00 UTC 28 June 2015 the user account concerned was locked for all ASF services and the proxy process terminated.
  • At ~10.00 UTC 29 June 2015, after further discussion within the infrastructure team, aegis.apache.org was taken off-line as a precaution.

It remains unclear whether the open IRC proxy was installed by the user that owned the account or whether their account was compromised and the IRC proxy was installed by an unauthorized user.

It is worth stressing that no further information came to light between 20.00 UTC 28 June 2015 and 10.00 UTC 29 June 2015 that triggered the decision to take the host off-line. The host was taken off-line purely as a precaution while we reviewed the available information. That process is ongoing. So far we have found no evidence to even suggest anything more than a user account being used to run an IRC proxy and plenty of evidence that suggests that this was the only activity this account was used for.


There is no risk to released source or binaries for any ASF project. There are multiple reasons for this:

  • buildbot is a CI system used to build snapshots, not releases
  • no builds are performed on aegis.apache.org

Buildbot is used to build some project web sites and / or project documentation. The risk of compromise here is viewed as very low for the following reasons:

  • the builds do not take place on aegis.apache.org
  • diffs of every change are sent to the relevant project team's mailing list for review and an unexpected / malicious change would be spotted

Project impact:

The following services are currently off-line and will remain so until the buildbot master is restored

  • All buildbot builds
  • Projects that use the CMS will be unable to update their web sites (the CMS uses buildbot to build web site updates)
  • the ci.apache.org  website

Work in progress:

Analyzing aegis.apache.org  is going to take time and, while we view the chances of a wider compromise of this host as very, very small, we are not willing to bring the host back on line at this point. This host was due for replacement so the decision has been taken to pull this work forward and rebuild the buildbot master on a new host now. We have taken this decision not because we believe aegis.apache.org  to be compromised, but because it is possible to complete this work far more quickly than it is possible to confirm our view that aegis.apche.org is not compromised.  We currently estimate that the rebuild of the new buildbot master host will be completed by 1 July 2015.

We continue to analyze the information we have obtained from aegis.apache.org  and from other sources and will update this blog post as more information becomes available.


Questions, concerns, comments etc. should be directed to infrastructure@apache.org

Wednesday June 10, 2015

Confluence Wiki service to be restarted

Hi All,

There will be a planned reboot of Confluence on Friday 12th June at 18:00 UTC+1

This is a blog post notice as recommended in our Core Services planned downtime SLA.

The Confluence wiki service configuration is stored in our Puppet configuration.

We have made some modifications to the Puppet Manifest affecting the Module that
Confluence uses (cwiki_asf). Some code is being moved out from the module and
into a host specific YAML file. This will make it easier for future hosts to re-use the
module (such as an upgrade host currently awaiting these changes.)
A twitter notification will be posted 1 hour before.
A planned maintenance notice will be posted on status.apache.org.

If necessary we will make use this outage window to apply any OS updates and reboot
the host VM.

Actual downtime should be no more than 1 hour all being well.

An email about this will be sent to infrastructure@ after the service has resumed from the planned downtime.

Monday May 18, 2015

Planned downtime for Jira

Hi All,

There will be a planned reboot of Jira on Thursday 21st May at 16:00 UTC+1

This is 72 hours notice as recommended in our Core Services planned downtime SLA.

Currently, Jira requires a reboot when adding new projects to it. There is an outstanding
ticket with Atlassian about this. They require logs and so these will be gathered at the
time of the planned reboot.

Projects being added to Jira at this time will include:-

INFRA-9516 - Myriad
INFRA-9609 - Atlas

and any more that get requested between now and downtime.

Any projects requiring issues to be imported from other issue trackers will NOT be done at
this time.

A tweet via @infrabot will be tweeted 24 hrs and 1 hr before.
A planned maintenance notice will be posted on status.apache.org.

Actual downtime should be no more than 10 minutes all being well.

The next email about this will be after the service has resumed from the planned downtime.



Friday May 08, 2015

Mail Service Architecture Changes

For the past few months the Infrastructure team have been working extremely hard to re-design, implement and manage changes to the email service architecture.  Today we are proud to announce that phase 1 of this has been completed, and has been running for several days now.

Phase 1 covers all components of the service except the listserv service, and mail archives.  These will be included in phase 2, which we will come onto later. When we started out on this project to review, update and manage our email infrastructure we had a several guiding principals that either the old system must be made to conform too; or any new service would need to meet before being accepted.  When we talk about these principals really we are talking about criteria, these are: 

  • The service must be entirely managed (operationally) from our puppet service. 
  • The software (packages) must all be packaged - i.e. .deb's, either upstream or packaged locally and in our own repo. Deploying from source is no longer acceptable.
  • All the work carried out by puppet et al must be idempotent
  • We will not allow the service design to restrict our ability to either adapt it, or grow it at will and on demand. 

Very early on in the design and testing work it became clear that we needed clear separation of each of the roles in the email service infrastructure. This would allow us, in the future too add more capability of any given type if for some reason it were needed. Lets say for example we needed for SpamAssassin capability this can we scaled sideways and allow us to swallow the load without needing to also make it an MX host or listserv host etc. 

The design we have settled upon, with phase 1 complete can be seen in this diagram. http://www.apache.org/dev/mailflow.jpg - This diagram shows that we have deployed several MX hosts (each of which are more than capable of handling our entire inbound mail load comfortably); in differing AWS regions globally. This decision means that while we dont need 3 to cope with capacity we wanted 3 to cope with networking resilience should any of these instances suffer network degradation or outage.  

These MX hosts are simple Postfix instances that run Postfix Postscreen, RBL checks, and Amavisd-new.  This simple protection of only performing RBL checks at the edge frees up the internal scanning hosts from having to scan emails needlessly. Amavis is simply used to pass the emails internally for scanning. 

Once the mails have been passed on by the MX (and there is an interesting detail about how exactly the mails are handled by Amavis that might be a blog post in the near future) they are handled by our scanning cluster. This group of hosts utilise SpamAssassin, ClamAV and again Postfix. While these may not be new technologies, again having a dedicated host or hosts in our case allows us to tune the services specifically for the resources dedicated to scanning and not worry about choking other local services. Of course it also means that should we see a marked increase in mail volume we can easily deploy a new node in a matter of minutes and have it join the rotation and start scanning email.

All of the scanning nodes are being fronted by a HAProxy instance, this allows us to load balance our nodes and not have to reconfigure the MX hosts should we change the number of scanning hosts.  It also means we can take a node out of rotation for maintenance and none of the MX hosts need to be reconfigured or modified in anyway.

As we said earlier this is only phase 1.  You will see in the diagram that we are still running our old ezmlm/qmail stack. This will now become the focus of phase 2, to determine what changes, if any best suit our projects and the foundation as a whole. One of the failings of the current system is that if the listserv host goes down, mail basically stops flowing, as this is the authoritative host for all apache addresses. We will also be looking very hard as to how we can run multiple listserv hosts to remove that single point of failure concern. 

The foundation relies on email as it's official internal communication mechanism, this is evident no more than when we say "If it didn't happen on the list, it didn't happen". Moving this service forward will be a significant challenge, one which we hope to deliver as soon as we can. 

As always, if you have any questions please email infrastructure@apache.org  and we will do what we can to help.

On behalf of the Infrastructure Team

Wednesday April 29, 2015

Apache Services and SHA-1 SSL Cert deperecation

As some of you may have already encountered, certain services within Apache appear to have broken SSL support. While the cert is still valid, there is a part of the cert that both Microsoft and Google have stopped accepting as valid. We are working on fixing this and will use this blogpost to track what services will be updated and when (as well as emails).


  • git-wip-us
  • TLP sites
  • SSL terminator (erebus-ssl)
  • svn-master
  • mail-relay


  • git-wip-us: Friday May 1, 16:00 UTC
  • TLP sites: Friday May 1, 16:00 UTC
  • SSL terminator (erebus-ssl): Friday May 1, 16:00 UTC
  • svn-master: Friday May 1, 16:00 UTC
  • mail-relay: Friday May 1, 16:00 UTC

Git based websites available

If you have worked on a web site for an Apache project, you've probably come across the fact that everything has to be in Subversion for web sites. The reason for this has been the desire to have a unified standard for publishing web site contents across all projects. The current workflow is handled by two components, svnpubsub - a pubsub service for subversion - and svnwcsub, the client for svnpubsub. In 2013 we added a similar method for Git, called gitpubsub. Nowadays, gitpubsub is used for a ton of different service messages in the ASF; Git commits, JIRA notifications, GitHub communication and so on, and as of today, we have added gitwcsub, a gitpubsub client similar to svnwcsub, enabling projects to use git as their repository for web site content.

 In order to use git as your web site repository, you must have your web site in a git repo. This can either be an existing repository or a new one created just for your web site. gitwcsub will, by default, pull content from the asf-site branch of any repo set up for it, so all that needs to be done is to have this branch in a repo on git-wip-us.apache.org and you can have your projects site published via git.

To have your site transferred to a git based workflow, please file a JIRA ticket with infrastructure.

Lastly, we want to thank the CouchDB project for being guinea pigs in this process!



Hot Blogs (today's hits)

Tag Cloud