Apache Infrastructure Team

Wednesday Sep 02, 2009

apache.org incident report for 8/28/2009

Last week we posted about the security breach that caused us to temporarily suspend some services.  All services have now been restored. We have analyzed the events that led to the breach, and continued to work on improving the security of our systems.

NOTE: At no time were any Apache Software Foundation code repositories, downloads, or users put at risk by this intrusion. However, we believe that providing a detailed account of what happened will make the internet a better place, by allowing others to learn from our mistakes.

What Happened?

Our initial running theory was correct--the server that hosted the apachecon.com (dv35.apachecon.com) website had been compromised. The machine was running CentOS, and we suspect they may have used the recent local root exploits patched in RHSA-2009-1222 to escalate their privileges on this machine. The attackers fully compromised this machine, including gaining root privileges, and destroyed most of the logs, making it difficult for us to confirm the details of everything that happened on the machine. 

This machine is owned by the ApacheCon conference production company, not by the Apache Software Foundation. However, members of the ASF infrastructure team had accounts on this machine, including one used to create backups.

The attackers attempted unsuccessfully to use passwords from the compromised ApacheCon host to log on to our production webservers.  Later, using the SSH Key of the backup account, they were able to access people.apache.org (minotaur.apache.org). This account was an unprivileged user, used to create backups from the ApacheCon host.

minotaur.apache.org runs FreeBSD 7-STABLE, and acts as the staging machine for our mirror network. It is our primary shell account server, and provides many other services for Apache developers. None of our Subversion (version control) data is kept on this machine, and there was never any risk to any Apache source code.

Once the attackers had gained shell access, they added CGI scripts to the document root folders of several of our websites. A regular, scheduled rsync process copied these scripts to our production web server, eos.apache.org, where they became externally visible. The CGI scripts were used to obtain remote shells, with information sent using HTTP POST commands.

Our download pages are dynamically generated, to enable us to present users with a local mirror of our software. This means that all of our domains have ExecCGI enabled, making it harder for us to protect against an attack of this nature.

After discovering the CGI scripts, the infrastructure team decided to shutdown any servers that could potentially have been affected. This included people.apache.org, and both the EU and US website servers. All website traffic was redirected to a known-good server, and a temporary security message was put in place to let people know we were aware of an issue.

One by one, we brought the potentially-affected servers up, in single user mode, using our out of band access. It quickly became clear that aurora.apache.org, the EU website server, had not been affected. Although the CGI scripts had been rsync'd to that machine, they had never been run. This machine was not included in the DNS rotation at the time of the attack.

aurora.apache.org runs Solaris 10, and we were able to restore the box to a known-good configuration by cloning and promoting a ZFS snapshot from a day before the CGI scripts were synced over. Doing so enabled us to bring the EU server back online, and to rapidly restore our main websites. Thereafter, we continued to analyze the cause of the breach, the method of access, and which, if any, other machines had been compromised.

Shortly after bringing up aurora.apache.org we determined that the most likely route of the breach was the backup routine from dv35.apachecon.com. We grabbed all the available logs from dv35.apachecon.com, and promptly shut it down.

Analysis continued on minotaur.apache.org and eos.apache.org (our US server), until we were confident that all remants of the attackers had been removed. As each server was declared clean, it was brought back online.

What worked?

  • The use of ZFS snapshots enabled us to restore the EU production web server to a known-good state.
  • Redundant services in two locations allowed us to run services from an alternate location while continuing to work on the affected servers and services.
  • A non-uniform set of compromised machines (Linux/CentOS i386, FreeBSD-7 amd_64, and Solaris 10 on sparc) made it difficult for the attackers to escalate privileges on multiple machines.

What didn't work?

  • The use of SSH keys facilitated this attack. In hindsight, our implementation left a lot to be desired--we did not restrict SSH keys appropriately, and we were unaware of their misuse.
  • The rsync setup, which uses people.apache.org to manage the deployment of our websites, enabled the attackers to get their files onto the US mirror, undetected.
  • The ability to run CGI scripts in any virtual host, when most of our websites do not need this functionality, made us unneccesarily vulnerable to an attack of this nature.
  • The lack of logs from the ApacheCon host prevents us from conclusively determining the full course of action taken by the attacker. All but one log file were deleted by the attacker, and logs were not kept off the machine.

What changes we are making now?

As a result of this intrusion we are making several changes, to help further secure our infrastructure from such issues in the future. These changes include the following:
  • Requiring all users with elevated privileges to use OPIE for sudo on certain machines.  We already require this in some places, but will expand its use as necessary.
  • Recreating and using new SSH keys, one per host, for backups.  Also enforcing use of the from="" and command="" strings in the authorized keys file on the destination backup server. In tandem with access restrictions which only allow connections from machines that are actually backing up data, this will prevent 3rd party machines from being able to establish an SSH connection. 
    • The command="" string in the authorized_keys file is now explicit, and only allows one way rsync traffic, due to the paths and flags used.
    • New keys have been generated for all hosts, with a minimum key length of at least 4096 bits .
  • The VM that hosted the old apachecon.com site remains powered down, awaiting further detailed analysis.  The apachecon.com website has been re-deployed on a new VM, with a new provider and different operating system.
  • We are looking at disabling CGI support on most of our website systems.  This has led to the creation of a new httpd module that will handle things like mirror locations for downloads.
  • The method by which most of our public facing websites are deployed to our production servers will also change, becoming a much more automated process. We hope to have switched over to a SvnSubPub / SvnWcSub based system within the next few weeks.
  • We will re-implement measures such as IP banning after several failed logins, on all machines. 
  • A proposal has been made to introduce centralized logging. This would include all system logs, and possibly also services such as smtpd and httpd.



Comments:

It's worth noting that from= and command= in authorized_keys by itself cannot stand alone. It would be trivial for an attacker with access to the private key to generate and upload a revised authorized_keys. I recommend using the Match keyword in sshd_config to enforce command or origin restrictions - or use AuthorizedKeysFile to move user key files to somewhere they don't have write permission.

Posted by Alex Holst on September 02, 2009 at 10:21 AM UTC #

"We will re-implement measures such as IP banning after several failed logins, on all machines." Biggest bang for the buck.

Posted by Fred on September 02, 2009 at 05:15 PM UTC #

I swear it wasn't me

Posted by sam on September 03, 2009 at 12:13 PM UTC #

Have you thought about httpd checking GPG-signed CGI scripts?

Posted by Whatsif on September 03, 2009 at 04:25 PM UTC #

I don't know if this would work with all or any your logging, but with MySQL, you get a storage engine called "Archive" which is great for logging. Everything it compressed and it only supports inserts and selects. You can also replicate it to a different machine for extra redundancy.

Posted by Casey on September 03, 2009 at 04:31 PM UTC #

There is no info about they got into the CentOS machine. How did they got in? Greetlings from phrack.

Posted by mikeh on September 03, 2009 at 09:48 PM UTC #

@mikeh: Because they obtained root on the CentOS machine, we are not entirely sure, almost all logs on the machine were destroyed. The machine ran many stock web applications and may of had less than secure password practices -- but once they got root whatever evidence of the initial hack was destroyed.

Posted by pquerna on September 04, 2009 at 12:33 AM UTC #

[Trackback] The Apache foundation experienced some downtime on August 28 when unauthorized access to their servers was detected. A few days ago, the Apache infrastructure team posted a very well-written post-incident report in which more details with respect to th...

Posted by Kees Leune on September 04, 2009 at 02:41 PM UTC #

Remote logging would have allowed logs retrieval, if the logging server was not affected, of course.

Posted by Ugo Bellavance on September 04, 2009 at 05:34 PM UTC #

@pquerna Are you guys not shipping logs to a repository? You say the logs were destroyed, are they only stored locally?

Posted by nick on September 04, 2009 at 08:27 PM UTC #

The lesson here is not to use faux-"stable" RedHat, Debian, or any distro that fiddles around with stock kernel.org, pretending to know better than them, and to keep your kernel updated to the latest release at all times.

Posted by DistroWatcher on September 05, 2009 at 08:07 AM UTC #

You guys should definitely look at OSSEC to monitor these systems. With the centralized log analysis and integrity checking, most of the attacker's actions would have been detected way earlier.

Posted by Daniel Cid on September 05, 2009 at 06:08 PM UTC #

@nick: logs were only destroyed on the 3rd party host -- they did not escalate privileges on 'our' machines, hence no logs destroyed. Regardless, we are currently looking into ways of doing both centralized logging and local logs for syslog, so we have an archive of everything that happens.

Posted by pquerna on September 06, 2009 at 05:01 AM UTC #

The report indicates the machine was compromised due to a kernel exploit. But from what I read, this exploit only works with LOCAL access. How was this local access obtained? Was that the ssh part?

Posted by Joseph on September 08, 2009 at 07:03 PM UTC #

Post a Comment:
Comments are closed for this entry.

Calendar

Search

Hot Blogs (today's hits)

Tag Cloud

Categories

Feeds

Links

Navigation