Apache Infrastructure Team

Monday July 25, 2016

Position Available: Infrastructure Systems Administrator/Architect

UPDATE:  We have received enough applicants at this time. Thank you all for your interest. 

The Apache Software Foundation (ASF) seeks to fill an Infrastructure Systems Administrator/Architect position. You will be responsible for working with the existing technical infrastructure team, and VP of Infrastructure at the Apache Software Foundation. The ASF manages a world-wide network of open source software which includes more than 750 software code repositories, a worldwide distribution and mirroring system for software; change management, issue tracking, and software management for 300+ Open Source initiatives and more than 11,000 contributors around the world.

Applicants should have a strong background in Computer and Information Science, and should be familiar with modern Dev/Ops environments. Applicants must demonstrate the ability to work in a remote team environment alongside others working in diverse locations around the world and in different timezones. The successful applicant will work with the existing Infrastructure team and VP, Infrastructure to manage the ASF's critical infrastructure and resources. Infrastructure team members are expected to work an on-call rotation with the rest of the team.

Our infrastructure team also supports our broader community by enabling the creation of self-service tooling. The successful candidate will be able to balance the needs of our critical infrastructure and the needs of our community to self-serve. These two demands can often be in conflict and thus an ability to navigate such complex environments is a distinct advantage.

Familiarity with Puppet (or a similar configuration management tool) Linux (Debian-based), Virtual Machines, Subversion/Git and full development environment stacks are a requirement. Further, the candidate should possess great documentation skills and should be well versed in not only developing and assisting in technical solutions, but in documenting them.

Preferred qualifications include a Bachelor's Degree in Computer Science or similar background from an accredited university, though demonstrable and appropriate on-the-job experience is an acceptable substitute for formal qualifications. Familiarity with how open source communities work is a plus.

English as a spoken and written language is required in order to facilitate team collaboration.

This is a remote work position, the ASF does not require nor provide office locations. Travel required for conferences and general team meetups.

Contact vp-infra@apache.org with your CV.

Thursday June 30, 2016

ASF JIRA Outages and Troubleshooting

As people have noticed, our JIRA instance (arguably the largest public instance in the world) has been suffering from a yet unknown issue as of late. We are reasonably sure that this is related to specific queries being made against the instance (possibly automated queries from scrapers), but have yet to identify the exact cause of the problem.

The failure condition arises when the database connection pool is exhausted, despite being configured and sized appropriately. These connections all appear idle, but when the pool is full, no new connections can be established, and the instance falls over, requiring a restart. 

We are working closely with Atlassian, the creator of JIRA, to remedy the situation. Unfortunately, this requires running diagnostics on the production JIRA instance, which in and of itself causes performance degradation and downtime. Over the past several days, we've identified and implemented some changes to the pool parameters which we hope will help stabilize the instance while we continue our diagnostic work.

We expect that there may still be some moments of downtime and occasional restarts. Any longer duration outages will be announced via Twitter/infrabot and status.apache.org.

Friday February 12, 2016

AppVeyor CI now available for GitHub Mirrors

The ASF Infrastructure team is happy to announce that projects can how have AppVeyor CI setup on their GitHub mirrors.

 The only thing you need to do is create an INFRA ticket at issues.apache.org with the following information:

  • Repo Name
  • Mailing list to send build notifications to (optional)

There are already a few projects using AppVeyor on their GitHub mirror, and we now have an Organization role account for central management (and I have gone through an updated previous tickets with new links to badges).

If you have any questions, you can ask us in Hipchat or you can email infrastructure@apache.org

Monday October 19, 2015

Dear Apache

My name is Daniel Takamori and I'm so happy to be joining the Infra team here at Apache.  I'm from Oregon in the United States and really enjoy the rain.  While at Oregon State University I studied mathematics and physics with a lean towards error correcting codes and mathematical modelling.  Some of my hobbies are playing Go in which I'm ranked 6.9 kyu by the AGA, cooking with eggs and green things, and old school platforming video games.  In a former life I worked on underwater remotely operated vehicles and automated gardening systems.  Traveling is something I liked to do once; living in Hungary was awesome and I hope to visit again. Oregon is a great place to live, with all the trees, rain and burritos but maybe things will change in the future.  My handle Pono is my Hawaiian name, and I'm really proud to use it.

Previously I was at the Oregon State University Open Source Lab and really enjoyed my time there; getting to know the Open Source communities and even work with Apache!  It was a real eye opening experience to the world of what software and DevOps (lol who knows what that even means).  I'm very excited to continue working with the community and even more excited to start this next chapter with such an amazing group.

See you around internets!

Monday August 03, 2015

Planned downtime for Jira

There will be a planned reboot of Jira on Friday 7th August at 00:00 UTC.

This is 72 hours notice as recommended in our Core Services planned downtime SLA.

Currently, Jira requires a reboot when adding new projects to it. There is an outstanding
ticket with Atlassian about this. They require logs and so these will be gathered at the
time of the planned reboot.

Projects being added to Jira at this time will include:-

INFRA-9713 - Whimsy

and any more that get requested between now and downtime.

Any projects requiring issues to be imported from other issue trackers will NOT be done at
this time.

A tweet via @infrabot will be tweeted 24 hrs and 1 hr before.
A planned maintenance notice will be posted on status.apache.org.

Actual downtime should be no more than 10 minutes all being well.

The next email about this will be after the service has resumed from the planned downtime.


Geoff Corey

Tuesday July 14, 2015

Mirroring to GitHub issues

As some of you are aware, there have been some issues syncing changes from repositories on https://git-wip-us.apache.org to the mirrors on GitHub.

The issues we are seeing:

  • Pull requests not being closed when they should be
  • Changes not being synced to the GitHub mirrors
  • Bots other than asfgit closing PRs on Apache GitHub mirrors.

We are looking into why changes are not being synced, as well as why PRs are not getting closed and why some PRs are being closed by other bots such as hubot.

We will update this blog post as we get more information about the sync issues.

Monday May 18, 2015

Planned downtime for Jira

Hi All,

There will be a planned reboot of Jira on Thursday 21st May at 16:00 UTC+1

This is 72 hours notice as recommended in our Core Services planned downtime SLA.

Currently, Jira requires a reboot when adding new projects to it. There is an outstanding
ticket with Atlassian about this. They require logs and so these will be gathered at the
time of the planned reboot.

Projects being added to Jira at this time will include:-

INFRA-9516 - Myriad
INFRA-9609 - Atlas

and any more that get requested between now and downtime.

Any projects requiring issues to be imported from other issue trackers will NOT be done at
this time.

A tweet via @infrabot will be tweeted 24 hrs and 1 hr before.
A planned maintenance notice will be posted on status.apache.org.

Actual downtime should be no more than 10 minutes all being well.

The next email about this will be after the service has resumed from the planned downtime.



Friday May 08, 2015

Mail Service Architecture Changes

For the past few months the Infrastructure team have been working extremely hard to re-design, implement and manage changes to the email service architecture.  Today we are proud to announce that phase 1 of this has been completed, and has been running for several days now.

Phase 1 covers all components of the service except the listserv service, and mail archives.  These will be included in phase 2, which we will come onto later. When we started out on this project to review, update and manage our email infrastructure we had a several guiding principals that either the old system must be made to conform too; or any new service would need to meet before being accepted.  When we talk about these principals really we are talking about criteria, these are: 

  • The service must be entirely managed (operationally) from our puppet service. 
  • The software (packages) must all be packaged - i.e. .deb's, either upstream or packaged locally and in our own repo. Deploying from source is no longer acceptable.
  • All the work carried out by puppet et al must be idempotent
  • We will not allow the service design to restrict our ability to either adapt it, or grow it at will and on demand. 

Very early on in the design and testing work it became clear that we needed clear separation of each of the roles in the email service infrastructure. This would allow us, in the future too add more capability of any given type if for some reason it were needed. Lets say for example we needed for SpamAssassin capability this can we scaled sideways and allow us to swallow the load without needing to also make it an MX host or listserv host etc. 

The design we have settled upon, with phase 1 complete can be seen in this diagram. http://www.apache.org/dev/mailflow.jpg - This diagram shows that we have deployed several MX hosts (each of which are more than capable of handling our entire inbound mail load comfortably); in differing AWS regions globally. This decision means that while we dont need 3 to cope with capacity we wanted 3 to cope with networking resilience should any of these instances suffer network degradation or outage.  

These MX hosts are simple Postfix instances that run Postfix Postscreen, RBL checks, and Amavisd-new.  This simple protection of only performing RBL checks at the edge frees up the internal scanning hosts from having to scan emails needlessly. Amavis is simply used to pass the emails internally for scanning. 

Once the mails have been passed on by the MX (and there is an interesting detail about how exactly the mails are handled by Amavis that might be a blog post in the near future) they are handled by our scanning cluster. This group of hosts utilise SpamAssassin, ClamAV and again Postfix. While these may not be new technologies, again having a dedicated host or hosts in our case allows us to tune the services specifically for the resources dedicated to scanning and not worry about choking other local services. Of course it also means that should we see a marked increase in mail volume we can easily deploy a new node in a matter of minutes and have it join the rotation and start scanning email.

All of the scanning nodes are being fronted by a HAProxy instance, this allows us to load balance our nodes and not have to reconfigure the MX hosts should we change the number of scanning hosts.  It also means we can take a node out of rotation for maintenance and none of the MX hosts need to be reconfigured or modified in anyway.

As we said earlier this is only phase 1.  You will see in the diagram that we are still running our old ezmlm/qmail stack. This will now become the focus of phase 2, to determine what changes, if any best suit our projects and the foundation as a whole. One of the failings of the current system is that if the listserv host goes down, mail basically stops flowing, as this is the authoritative host for all apache addresses. We will also be looking very hard as to how we can run multiple listserv hosts to remove that single point of failure concern. 

The foundation relies on email as it's official internal communication mechanism, this is evident no more than when we say "If it didn't happen on the list, it didn't happen". Moving this service forward will be a significant challenge, one which we hope to deliver as soon as we can. 

As always, if you have any questions please email infrastructure@apache.org  and we will do what we can to help.

On behalf of the Infrastructure Team

Wednesday April 29, 2015

Git based websites available

If you have worked on a web site for an Apache project, you've probably come across the fact that everything has to be in Subversion for web sites. The reason for this has been the desire to have a unified standard for publishing web site contents across all projects. The current workflow is handled by two components, svnpubsub - a pubsub service for subversion - and svnwcsub, the client for svnpubsub. In 2013 we added a similar method for Git, called gitpubsub. Nowadays, gitpubsub is used for a ton of different service messages in the ASF; Git commits, JIRA notifications, GitHub communication and so on, and as of today, we have added gitwcsub, a gitpubsub client similar to svnwcsub, enabling projects to use git as their repository for web site content.

 In order to use git as your web site repository, you must have your web site in a git repo. This can either be an existing repository or a new one created just for your web site. gitwcsub will, by default, pull content from the asf-site branch of any repo set up for it, so all that needs to be done is to have this branch in a repo on git-wip-us.apache.org and you can have your projects site published via git.

To have your site transferred to a git based workflow, please file a JIRA ticket with infrastructure.

Lastly, we want to thank the CouchDB project for being guinea pigs in this process!

Wednesday April 15, 2015

Apache gains additional Travis-CI capacity

Travis-CI is a distributed continuous integration platform that integrates well with projects on Github. As many of our projects are taking advantage of our Github integration, they're also making use of Travis-CI for testing of inbound patches.

Travis CI offers a free account for open source projects, with a built in assumption that projects are generally a single project per Github organization. The level of resources and jobs able to run is 'fair use', which is fair indeed considering that is gratis.

Of course, most Github organizations aren't as large as the Apache organization on Github, and we recently discovered that the Foundation was one of the largest gratis open source user of Travis CI. On average, our build queue length was in excess of 300 jobs. While we appreciate the generosity of the Travis-CI folks, our demand for their services was clearly outstripping the available supply. This also meant that a lot of Apache projects were frustrated, or even abandoning their efforts to use Travis-CI because the length of time for a build to start was high enough to not really quality as 'continuous'.

To that end, we've now purchased a subscription to Travis services, and have moved from 'fair use' to having 30 concurrent builds. This should be a dramatic increase in throughput for Apache projects who make use of Travis.

Monday April 13, 2015

Introducing JIRA Service Desk

As part of our ongoing efforts to streamline our service offerings, and to make it easier to interact with the Infrastructure team we are launching an instance of JIRA Service Desk. 

This should make it much simpler to submit common JIRA issues, such as SVN->GIT migration, New wiki, New JIRA project, etc. The forms ask for the minimum amount of data we would need to complete the request. 

One common theme we found that delayed resolution was needing additional information to action tickets. Service Desk allows us to request the exact information needed for a specific task. 

We would like to ask everyone to start using this to submit new issues. You can access this new service here:  https://helpinfrahelpyou.apache.org   or  https://infrahelp.apache.org

Friday February 27, 2015

Towards a redeployable future, or how I stopped worrying and learned to love setting the execute bit on CGI files

Things change, even within the ASF.

One of these changes is to our infrastructure, and is a move from manually managed and maintained web servers towards re-deployable, configuration managed servers that tend to themselves and rarely, if ever, require manual intervention. As such, we have started moving towards no longer manually fixing bugs that creep up on various project web sites, in particular setting the correct permission on files. This means that all projects are now required to check their download scripts and verify that the executable flag is set on these CGI files. If not, your download page will likely not work.

Whenever we receive an email from a user of an Apache project about an error on a project web site, we will forward this to the respective project, but we ask that projects take proactive measures and check their download scripts (and any other scripts they may have) to ensure that they have the right permissions set and work.

 Projects using the CMS system will, for the time being, have to commit the execute bit changes directly to the staging repo for their site.

With regards,
Daniel on behalf of the Infrastructure Team.

Monday January 12, 2015

Downtime notice for the R/W git repositoies


Please note than on Thursday 15th at 20:00 UTC the Infrastructure team
will be taking the read/write git repositories offline.  We expect
that this migration to last about 4 hours.

During the outage the service will be migrated from an old host to a
new one.   We intend to keep the URL the same for access to the repos
after the migration, but an alternate name is already in place in case
DNS updates take too long.   Please be aware it might take some hours
after the completion of the downtime for github to update and reflect
any changes.

The Infrastructure team have been trialling the new host for about a
week now, and [touch wood] have not had any problems with it.

The service is current;y available by accessing repos via:

If you have any questions please address them to infrastructure@apache.org

Friday November 21, 2014

MoinMoin Service - User Account Tidy Up

In recent months we have become increasingly aware of a slowing down of our MoinMoin wiki service.  We have attributed this, at least in part, due to the way MoinMoin stores some data about user accounts.  

Across all of our wiki instances (in the farm) we had a little over 1.08 million distinct user accounts.  Many of which have never been used (spam etc).  So we have decided to archive all users who have not accessed any of the wiki sites they were registered for in more than 128 days.  

This has resulted in us being able to archive a little over 800k users.  This leaves us with around 200k users across 77 wikis. This still feels very high, and in the coming weeks we will investigate further still in how we can better understand if those remaining accounts are making valid changes, or are they just link farm home pages.

If you think your account was affected by this, and you would like to have your account restored, then please contact the Infra team using this page http://www.apache.org/dev/infra-contact

ASF Infra Team

Monday October 06, 2014

Code signing service now available

The ASF Infrastructure team is pleased to announce the availability of a new code signing service for Java, Windows and Android applications. This service is available to any Apache project to use to sign their releases. Traditionally, Apache projects have shipped source code. The code tarballs are signed with a GPG signature to allow users and providers to verify the code's authenticity, but users have either compiled their own applications or some projects have provided convenience binaries. With projects like Apache OpenOffice, users expect to receive binaries that are ready to run. Today's desktop and mobile operating systems expect that binaries will be signed by the vendor -- which had left a gap to be filled for Apache projects.  

After a great deal of research, we have chosen Symantec's Secure App Service offering to provide code signing service. This allows us to granularly permit access; and each PMC will have their own certificate(s) for signing. The per-project nature of certificate issuance allows us to revoke a signature without disrupting other projects. 

This service will permit projects to sign artifacts either via a web GUI or a SOAP API. In addition a Java client and an ant task for signing have been written and a maven plugin is under development.

This service results in a 'pay for what you use' scenario, so PMCs are asked to use the service responsibly. To that end, projects will have access to a test environment to ensure that they have their process working correctly before consuming actual credits.

Thus far, we've had two projects who have helped testing this and working out process for which we are very grateful. Those projects, Commons and Tomcat, have successfully released signed artifacts recently. (Commons Daemon 1.0.15 and Tomcat 8.0.14)

Projects that wish to use this service should open an Infra JIRA ticket under the Codesigning component. Further information for projects using the service is also maintained by the infra team



Hot Blogs (today's hits)

Tag Cloud