Apache Infrastructure Team

Thursday Jul 26, 2012

New Infra Team Members

Since out last update over a year ago, the Infra Team has expanded by another NINE (9) members!

Congrats and our warmest thanks go to:


Niklas Gustavsson - (ngn)
Jeremy Thomerson - (jrthomerson)
Mark Struberg - (struberg)
Eric Evans - (eevans)
Brandon Williams - (brandonwilliams)
Mohammad Nour El-Din - (mnour)
David Nalley - (ke4qqq)
Yang Shih-Ching - (imacat)
Daniel Gruno - (humbedooh)

The rest of the Infra team look forward to continuing to work with you all.

There are now a total of 80 infrastructure members with another 36 in the infrastructure-interest group.

Thursday Sep 23, 2010

1 million commits and still going strong.

Yesterday, the main ASF SVN code repository passed the 1 million commit mark. Shortly thereafter one of the ASF members enquired as to how he could best grab the svn log entries for all of these commits. As always there were a bunch of useful replies, but they were all set to take quite some time; mainly because if anyone just simply runs

svn log http://svn.apache.org/repos/asf -r1:1000000 

It will not only take several hours, it will also cause high levels of load on one of the two geo-balanced SVN servers. Also, requesting that many log entries will likely result in your IP address being banned.

So I decided to create the log set locally on one of the SVN servers. This is now available for download [http://s.apache.org/1m-svnlog] [md5]
This is a 50Mb tar/gz file. It will uncompress to about 240Mb. The log 'only' contains the log entries from 1 -> 1,000,000 - if you want the rest you can run:

svn log http://svn.apache.org/repos/asf -r1000001:HEAD

This will give you all the log entries from 1M+1 to current

Thursday Mar 04, 2010

New slaves for ASF Buildbot

The ASF Buildbot CI instance has just launched two more slaves, expanding the range of platforms it can build and test on[Read More]

Monday Feb 22, 2010

LDAP, groups and SVN - Coupled together

The infrastructure team have now completed the next stage of the planned LDAP migration.
We have migrated our old SVN authorisation file, and POSIX groups into LDAP data.  SVN access control is now managed using these groups.

This means to change access the Subversion repositories is now as simple as changing group membership. We use some custom perl scripts that build the equivalent authorisation file meaning that we dont need to use the <location> blocks nasty hack to do this.  It also means that all changes, including adding new groups and extending access control is made simple.

ASF PMC chairs, are now able to make changes to their POSIX, and SVN groups whilst logged into people.apache.org - using a selection of scripts:

  • /usr/local/bin/list_unix_groups.pl
  • /usr/local/bin/list_committees.pl
  • /usr/local/bin/modify_unix_groups.pl
  • /usr/local/bin/modify_committees.pl

All of these scripts have a '--help' option to show you how to use them.

What's next?  We are now working on adding a custom ASF LDAP schema, that will allow us to record ASF specific data such as ICLA files and date of membership etc.
We will also be looking at adding support for 3rd party applications such as Hudson, and building an identity management portal where people can manage their own account.

Wednesday Sep 02, 2009

apache.org incident report for 8/28/2009

Last week we posted about the security breach that caused us to temporarily suspend some services.  All services have now been restored. We have analyzed the events that led to the breach, and continued to work on improving the security of our systems.

NOTE: At no time were any Apache Software Foundation code repositories, downloads, or users put at risk by this intrusion. However, we believe that providing a detailed account of what happened will make the internet a better place, by allowing others to learn from our mistakes.

What Happened?

Our initial running theory was correct--the server that hosted the apachecon.com (dv35.apachecon.com) website had been compromised. The machine was running CentOS, and we suspect they may have used the recent local root exploits patched in RHSA-2009-1222 to escalate their privileges on this machine. The attackers fully compromised this machine, including gaining root privileges, and destroyed most of the logs, making it difficult for us to confirm the details of everything that happened on the machine. 

This machine is owned by the ApacheCon conference production company, not by the Apache Software Foundation. However, members of the ASF infrastructure team had accounts on this machine, including one used to create backups.

The attackers attempted unsuccessfully to use passwords from the compromised ApacheCon host to log on to our production webservers.  Later, using the SSH Key of the backup account, they were able to access people.apache.org (minotaur.apache.org). This account was an unprivileged user, used to create backups from the ApacheCon host.

minotaur.apache.org runs FreeBSD 7-STABLE, and acts as the staging machine for our mirror network. It is our primary shell account server, and provides many other services for Apache developers. None of our Subversion (version control) data is kept on this machine, and there was never any risk to any Apache source code.

Once the attackers had gained shell access, they added CGI scripts to the document root folders of several of our websites. A regular, scheduled rsync process copied these scripts to our production web server, eos.apache.org, where they became externally visible. The CGI scripts were used to obtain remote shells, with information sent using HTTP POST commands.

Our download pages are dynamically generated, to enable us to present users with a local mirror of our software. This means that all of our domains have ExecCGI enabled, making it harder for us to protect against an attack of this nature.

After discovering the CGI scripts, the infrastructure team decided to shutdown any servers that could potentially have been affected. This included people.apache.org, and both the EU and US website servers. All website traffic was redirected to a known-good server, and a temporary security message was put in place to let people know we were aware of an issue.

One by one, we brought the potentially-affected servers up, in single user mode, using our out of band access. It quickly became clear that aurora.apache.org, the EU website server, had not been affected. Although the CGI scripts had been rsync'd to that machine, they had never been run. This machine was not included in the DNS rotation at the time of the attack.

aurora.apache.org runs Solaris 10, and we were able to restore the box to a known-good configuration by cloning and promoting a ZFS snapshot from a day before the CGI scripts were synced over. Doing so enabled us to bring the EU server back online, and to rapidly restore our main websites. Thereafter, we continued to analyze the cause of the breach, the method of access, and which, if any, other machines had been compromised.

Shortly after bringing up aurora.apache.org we determined that the most likely route of the breach was the backup routine from dv35.apachecon.com. We grabbed all the available logs from dv35.apachecon.com, and promptly shut it down.

Analysis continued on minotaur.apache.org and eos.apache.org (our US server), until we were confident that all remants of the attackers had been removed. As each server was declared clean, it was brought back online.

What worked?

  • The use of ZFS snapshots enabled us to restore the EU production web server to a known-good state.
  • Redundant services in two locations allowed us to run services from an alternate location while continuing to work on the affected servers and services.
  • A non-uniform set of compromised machines (Linux/CentOS i386, FreeBSD-7 amd_64, and Solaris 10 on sparc) made it difficult for the attackers to escalate privileges on multiple machines.

What didn't work?

  • The use of SSH keys facilitated this attack. In hindsight, our implementation left a lot to be desired--we did not restrict SSH keys appropriately, and we were unaware of their misuse.
  • The rsync setup, which uses people.apache.org to manage the deployment of our websites, enabled the attackers to get their files onto the US mirror, undetected.
  • The ability to run CGI scripts in any virtual host, when most of our websites do not need this functionality, made us unneccesarily vulnerable to an attack of this nature.
  • The lack of logs from the ApacheCon host prevents us from conclusively determining the full course of action taken by the attacker. All but one log file were deleted by the attacker, and logs were not kept off the machine.

What changes we are making now?

As a result of this intrusion we are making several changes, to help further secure our infrastructure from such issues in the future. These changes include the following:
  • Requiring all users with elevated privileges to use OPIE for sudo on certain machines.  We already require this in some places, but will expand its use as necessary.
  • Recreating and using new SSH keys, one per host, for backups.  Also enforcing use of the from="" and command="" strings in the authorized keys file on the destination backup server. In tandem with access restrictions which only allow connections from machines that are actually backing up data, this will prevent 3rd party machines from being able to establish an SSH connection. 
    • The command="" string in the authorized_keys file is now explicit, and only allows one way rsync traffic, due to the paths and flags used.
    • New keys have been generated for all hosts, with a minimum key length of at least 4096 bits .
  • The VM that hosted the old apachecon.com site remains powered down, awaiting further detailed analysis.  The apachecon.com website has been re-deployed on a new VM, with a new provider and different operating system.
  • We are looking at disabling CGI support on most of our website systems.  This has led to the creation of a new httpd module that will handle things like mirror locations for downloads.
  • The method by which most of our public facing websites are deployed to our production servers will also change, becoming a much more automated process. We hope to have switched over to a SvnSubPub / SvnWcSub based system within the next few weeks.
  • We will re-implement measures such as IP banning after several failed logins, on all machines. 
  • A proposal has been made to introduce centralized logging. This would include all system logs, and possibly also services such as smtpd and httpd.



Tuesday Jul 07, 2009

Confluence 2.10 migration for cwiki.a.o 11 July

The ASF Infrastructure Team will be upgrading the Confluence instance powering http://cwiki.apache.org from Confluence 2.2.9 to Confluence 2.10.3 on July 11 at 0400 UTC, or July 10 at 2100 PST.  The migration is expected to take several hours.  

If you haven't already, this would be a good time to check the test migration instance at:

http://confluence-test.zones.apache.org:8080

Exported pages can be found at http://confluence-test.zones.apache.org:8080/export/SPACE_KEY/PAGE_TITLE.html   If in doubt, find your existing exported pages at http://cwiki.apache.org/, so:

http://cwiki.apache.org/WW/home.html

will become

http://confluence-test.zones.apache.org:8080/export/WW/home.html

As much as possible, the space export templates will be preserved in the migration, although changes to the Confluence UI will mean the exports will look different.

Further updates with regards to the Confluence 2.10.3 migration will posted to this blog.

Update 11-07-2009

The Confluence 2.10.3 upgrade has been completed and all spaces have been exported.  There are a few things to note:

  1. The Gliffy license is out of date.  I'll try to track down a new one.
  2. The visibility plugin doesn't support Confluence 2.10.3.  Not sure if anyone uses it, however.
  3. The exported html, as warned, generally looks a bit different.  Let me know if you have any issues tweaking your template.

Update 11-07-2009 part 2

If, for some reason, your templates didn't get copied over or the exported site is so messed up you need the old version, the old files are available:

Update 14-07-2009

The Gliffy folks were kind enough to give us a new license.  Please re-export any applicable spaces.

Wednesday Mar 25, 2009

Slow SVN Service This Week

In preparation for upgrading Subversion to the latest version (1.6.0), we are running an svn dump on svn.apache.org.  This will chew up enough disk IO to be noticeable to svn users.  We expect the dump to finish sometime during this weekend.
 

Monday Mar 23, 2009

Roller installed for use by Apache Projects

blogs.apache.org open for ASF Projects[Read More]

Calendar

Search

Hot Blogs (today's hits)

Tag Cloud

Categories

Feeds

Links

Navigation