Apache Infrastructure Team
What can the ASF Buildbot do for your project?
The below information has just been published to the main ASF Buildbot URI ci.apache.org/buildbot.html.
A summary of just some of the things the ASF Buildbot can do for your project:
- Perform per commit build & test runs for your project
- Not just svn! - Buildbot can pull in from your Git/Mercurial branches too!
- Build and Deploy your website to a staging area for review
- Build and Deploy your website to mino (people) for syncing live
- Automatically Build and Deploy Snapshots to Nexus staging area.
- Create Nightly and historical zipped/tarred snapshot builds for download
- Builds can be triggered manually from within your own freenode #IRC Channel
- An IRCBot can report on success/failures of a build instantly
- Build Success/Failures can go to your dev/notification mailing list
- Perform multiple builds of an svn/git commit on multiple platforms asyncronously
- ASF Buildbot uses the latest RAT build to check for license header issues for all your files.
- RAT Reports are published live instantly to ci.apache.org/$project/rat-report.[txt|html]
- As indicated above, plain text or html versions of RAT reports are published.
- [Coming Soon] - RAT Reports sent to your dev list, only new failures will be listed.
- [Coming Soon] - Email a patch with inserted ASL 2.0 Headers into your failed files!!
- Currently Buildbot has Ubuntu 8.04, 9.04 and Windows Server 2008 Slaves
- [Coming Soon] - ASF Buildbot will soon have Solaris, FreeBSD 8 and Windows 7 Slaves
Dont see a feature that you need? Join the builds.at.apache.org mailing list and request it now, or file a Jira Ticket.
Help is always on hand on the builds.at.apache.org mailing list for any problems or build configuration issues/requests. Or try the #asftest channel on irc.freenode.net for live support.
So now you want your project to use Buildbot? No problem, best way is to file a Jira Ticket. and count to 10 (well maybe a bit longer but it wont be long before you are up and running).
Posted at 01:01PM Nov 09, 2009 by administrator in General |
DDOS mystery involving Linux and mod_ssl
In the first week of October we started getting reports of performance issues, mainly connection timeouts, on all of our services hosted at https://issues.apache.org/. On further inspection we noticed a huge amount of "Browser disconnect" errors in the error log right at the beginning of the ssl transaction, on the order of 50 connections / second. This was grinding the machine to a standstill, so we wrote a quick and dirty perl script to investigate the matter. Initial reports indicated a ddos attack from nearly 100K machines targetting Apache + mod_ssl's accept loop, and the script was tweaked to filter out that traffic before proxying the connections to httpd.
As we started getting a picture of the IP space conducting the attack, the prognosis looked rather bleak: more and more IP's were getting involved and the ddos traffic continued to increase, getting to the point where Linux was shutting down the ethernet interface. So we then rerouted the traffic to an available FreeBSD machine, which did a stellar job of filtering out the traffic at the kernel level. We unfortunately didn't quite realize how good a job FreeBSD was doing, and for a time we were operating under the impression that the ddos was ending. So we eventually moved the traffic back to brutus, the original Linux host, and patched httpd using code developed by Ruediger Pluem.
And back came the ddos traffic. In a few days the rate of closed connections had nearly doubled, so we had little choice but to start dumping the most frequent IP addresses into iptables DROP rules. 5000 rules cut the traffic by 2/3 in an instant. But the problem was growing- our logs indicated there were now over 300K addresses participating in the attack.
We started looking closer at the IP's in an attempt to correlate them with regular http requests. The only pattern that seemed to emerge was that many of the IP's in question we're also generating spartan "GET / HTTP/1.1" requests with a single Host: 140.211.11.140 header to port 443. Backtracking through a year of logs revealed that these spartan requests had been going on since August 6, 2008. The IP's originating these requests were as varied as, and more often that not matched up with, the rapid closed connection traffic we started seeing in October.
So what exactly is going on here? The closed connection traffic continues to rise, and the origin of the associated spartan requests is currently unknown.
Posted at 01:53AM Oct 12, 2009 by joes in General | Comments[1]
apache.org incident report for 8/28/2009
Last week we posted about the security breach that caused us to temporarily suspend some services. All services
have now been restored. We have analyzed the events that led to the breach, and continued to work on improving the security of our systems.
NOTE: At no time were any Apache Software Foundation code repositories, downloads, or users put at risk by this intrusion. However, we believe that providing a detailed account of what happened will make the internet a better place, by allowing others to learn from our mistakes.
What Happened?
Our initial running theory was correct--the server that hosted the apachecon.com (dv35.apachecon.com) website had been compromised. The machine was running CentOS, and we suspect they may have used the recent local root exploits patched in RHSA-2009-1222 to escalate their privileges on this machine. The attackers fully compromised this machine, including gaining root privileges, and destroyed most of the logs, making it difficult for us to confirm the details of everything that happened on the machine.
This machine is owned by the ApacheCon conference production company, not by the Apache Software Foundation. However, members of the ASF infrastructure team had accounts on this machine, including one used to create backups.
The
attackers attempted unsuccessfully to use passwords from the compromised ApacheCon
host to log on to our production webservers. Later, using the SSH Key of the backup account, they were able to access
people.apache.org (minotaur.apache.org). This account was an unprivileged user, used
to create backups from the ApacheCon host.
minotaur.apache.org runs FreeBSD 7-STABLE, and acts as the staging machine for our mirror
network. It is
our primary shell account server, and provides many other services for Apache developers. None of our Subversion (version control) data is kept on this machine, and there was never any risk to any Apache source code.
Once the attackers had gained shell access, they added CGI scripts to the document root folders of several of our websites. A regular, scheduled rsync process copied these scripts to our production web server, eos.apache.org, where they became externally visible. The CGI scripts were used to obtain remote shells, with information sent using HTTP POST commands.
Our download pages are
dynamically generated, to enable us to present users with a local mirror of our software. This means that all of our domains have ExecCGI enabled, making it harder for us to protect against an attack of this nature.
After discovering the CGI scripts, the infrastructure team decided to shutdown any servers that could potentially have been affected. This included people.apache.org, and both the EU and US website servers. All website traffic was redirected to a known-good server, and a temporary security message was put in place to let people know we were aware of an issue.
One by one, we brought the potentially-affected servers up, in single user mode, using our out of band access. It quickly became clear that aurora.apache.org, the EU website server, had not been affected. Although the CGI scripts had been rsync'd to that machine, they had never been run. This machine was not included in the DNS rotation at the time of the attack.
aurora.apache.org runs Solaris 10, and we were
able to restore the box to a known-good configuration by cloning
and promoting a ZFS snapshot from a day before the CGI scripts were synced
over. Doing so enabled us to bring the EU server back online, and to rapidly restore our main websites. Thereafter, we continued to analyze the cause of the breach, the method of access, and which, if any, other machines had been compromised.
Shortly after bringing up
aurora.apache.org we determined that the most likely route of the breach was
the backup routine from dv35.apachecon.com. We grabbed all the
available logs from dv35.apachecon.com, and promptly shut it down.
Analysis continued on minotaur.apache.org and eos.apache.org (our US
server), until we were confident that all remants of the attackers had been removed. As each server was declared clean, it was brought back online.
What worked?
- The use of ZFS snapshots enabled us to restore the EU production web server to a known-good state.
- Redundant services in two locations allowed us to run services from an alternate location while continuing to work on the affected servers and services.
- A non-uniform set of compromised machines (Linux/CentOS i386, FreeBSD-7 amd_64, and Solaris 10 on sparc) made it difficult for the attackers to escalate privileges on multiple machines.
What didn't work?
- The
use of SSH keys facilitated this attack. In hindsight, our implementation left a lot to be
desired--we did not restrict SSH keys appropriately, and we were
unaware of their misuse.
- The rsync setup, which uses people.apache.org to manage the deployment of our websites, enabled the attackers to get their files onto the US mirror, undetected.
- The ability to run CGI scripts in any virtual host, when most of our websites do not need this functionality, made us unneccesarily vulnerable to an attack of this nature.
- The lack of logs from the ApacheCon host prevents us from conclusively determining the full course of action taken by the attacker. All but one log file were deleted by the attacker, and logs were not kept off the machine.
What changes we are making now?
As a result of this intrusion we are making several changes, to help further secure our infrastructure from such issues in the future. These changes include the following:- Requiring all users with elevated privileges to use OPIE for sudo on certain machines. We already require this in some places, but will expand its use as necessary.
- Recreating
and using new SSH keys, one per host, for backups. Also enforcing use of the
from="" and command="" strings in the authorized keys file on the
destination backup server. In tandem with access restrictions which only allow connections
from machines that are actually backing up data, this will prevent 3rd party
machines from being able to establish an SSH connection.
- The command="" string in the authorized_keys file is now explicit, and only allows one way rsync traffic, due to the paths and flags used.
- New keys have been generated for all hosts, with a minimum key length of at least 4096 bits .
- The
VM that hosted the old apachecon.com site remains powered down, awaiting
further detailed analysis. The apachecon.com website has been re-deployed on a
new VM, with a new provider and different operating system.
- We are looking at disabling CGI support on most of our website systems. This has led to the creation of a new httpd module that will handle things like mirror locations for downloads.
- The
method by which most of our public facing websites are deployed to our production servers will also change, becoming a much more automated process. We hope to have switched over to a SvnSubPub / SvnWcSub based system within the next few weeks.
- We will re-implement measures such as IP banning after several failed logins, on all machines.
- A
proposal has been made to introduce centralized logging. This would include all system logs, and possibly also services such as smtpd and httpd.
Posted at 08:56AM Sep 02, 2009 by pctony in Status | Comments[14]
apache.org downtime - initial report
This is a short overview of what happened on Friday August 28 2009 to the apache.org services. A more detailed post will come at a later time after we complete the audit of all machines involved.
On August 27th, starting at
about 18:00 UTC an account used for automated backups for the ApacheCon
website hosted on a 3rd party hosting provider was used to upload files
to minotaur.apache.org. The account was accessed using SSH key
authentication from this host.
To the best of our knowledge at this time, no end users were affected by this incident, and the attackers were not able to escalate their privileges on any machines.
While we have no evidence that downloads were affected, users are always advised to check digital signatures where provided.minotaur.apache.org runs FreeBSD 7-STABLE and is more widely known as people.apache.org. Minotaur serves as the seed host for most apache.org websites, in addition to providing shell accounts for all Apache committers.
The attackers created several files in the directory containing files for www.apache.org, including several CGI scripts. These files were then rsynced to our production webservers by automated processes. At about 07:00 on August 28 2009 the attackers accessed these CGI scripts over HTTP, which spawned processes on our production web services.
At about 07:45 UTC we noticed these rogue processes on eos.apache.org, the Solaris 10 machine that normally serves our websites.
Within the next 10 minutes we decided to shutdown all machines involved as a precaution.
After an initial investigation we changed DNS for most apache.org services to eris.apache.org, a machine not affected and provided a basic downtime message.
After investigation, we determined that our European fallover and backup machine, aurora.apache.org, was not affected. While the some files had been copied to the machine by automated rsync processes, none of them were executed on the host, and we restored from a ZFS snapshot to a version of all our websites before any accounts were compromised.
At this time several machines remain offline, but most user facing websites and services are now available.
We will provide more information as we can.
Posted at 12:33PM Aug 28, 2009 by pquerna in General | Comments[24]
Relaying mail from apache.org.
One of the more common issues committers face at Apache is in trying to send mail from their apache.org account. We've just made that process a whole lot easier by setting up an SSL-enabled, smtp-auth based mail submission service on people.apache.org port 465; which is compatible with gmail's recently announced feature to allow outbound mail from your apache.org address to be directed to people.apache.org, instead of to a gmail server, for delivery. Say goodbye to all the ezmlm moderation battles: your SMTP envelope sender will now match your From header!
In the future we may wish to tighten up the SPF records for apache.org, so please take advantage of this new service for all outbound delivery of your personal apache.org email.
Posted at 12:24PM Aug 01, 2009 by joes in General |
Public Preview of Drafts feature added to ASF Roller instance
Previously, to be able to preview a draft post by any Roller Blog, one had to be a member user of that blog.
For those that would like an easy way to post previews of drafts for lazy consensus or voting, a script has been setup to allow the preview url that Roller generates to be shared publicly. For example:
(roller preview url)
https://blogs.apache.org/roller-ui/authoring/preview/test/?previewEntry=testing
(public preview url)
https://blogs.apache.org/preview/test/?previewEntry=testing
A typical process is to create the blog post, set it up to publish in 3-4 days via the "Advanced Settings", then post the modified preview URL to your dev@ list with the anticipated publish date for lazy consensus.
Projects must opt-in by adding the "preview" user with "Limited" access.
Details here:
http://www.apache.org/dev/blogs.html
Posted at 06:59AM Jul 15, 2009 by administrator in General |
Confluence 2.10 migration for cwiki.a.o 11 July
The ASF Infrastructure Team will be upgrading the Confluence instance powering http://cwiki.apache.org from Confluence 2.2.9 to Confluence 2.10.3 on July 11 at 0400 UTC, or July 10 at 2100 PST. The migration is expected to take several hours.
If you haven't already, this would be a good time to check the test migration instance at:
http://confluence-test.zones.apache.org:8080
Exported pages can be found at http://confluence-test.zones.apache.org:8080/export/SPACE_KEY/PAGE_TITLE.html If in doubt, find your existing exported pages at http://cwiki.apache.org/, so:
http://cwiki.apache.org/WW/home.html
will become
http://confluence-test.zones.apache.org:8080/export/WW/home.html
As much as possible, the space export templates will be preserved in the migration, although changes to the Confluence UI will mean the exports will look different.
Further updates with regards to the Confluence 2.10.3 migration will posted to this blog.
Update 11-07-2009
The Confluence 2.10.3 upgrade has been completed and all spaces have been exported. There are a few things to note:
- The Gliffy license is out of date. I'll try to track down a new one.
- The visibility plugin doesn't support Confluence 2.10.3. Not sure if anyone uses it, however.
- The exported html, as warned, generally looks a bit different. Let me know if you have any issues tweaking your template.
Update 11-07-2009 part 2
If, for some reason, your templates didn't get copied over or the exported site is so messed up you need the old version, the old files are available:
- Autoexport templates - http://cwiki.apache.org/autoexport-2.2.9-templates
- Autoexport-generated html - http://cwiki.apache.org/autoexport-2.2.9
Update 14-07-2009
The Gliffy folks were kind enough to give us a new license. Please re-export any applicable spaces.Posted at 07:04AM Jul 07, 2009 by mrdon in Status |
It's official, we now have LDAP running!
Earlier this week the Infrastructure team rolled out phase one of the planned LDAP services.
We are using LDAP for authentication of shell accounts. For now this is the extent of the implementation, however the next phase should follow this quite quickly.
The next phase will involve moving to LDAP to manage access to our subversion repositories. This is a slightly more complicated migration as we currently use an SVNAuthz file, that contains the appropriate groups and their memberships. We are currently working on a new template system where by changes to LDAP will trigger a build of the SVNAuthz file based on groups in LDAP. This means we must watch LDAP changes, work on a template system, and if a new version of the template is checked into Subversion we need to trigger a build again. This is a work in progress at the moment.
If you find yourself in the position of needing to change your shell account password you can do it by doing this on the command line "ldappasswd -W -S -A -D uid=availid,ou=people,dc=apache,dc=org" -- Where availid is your ASF username. For example "ldappasswd -W -S -A -D uid=pctony,ou=people,dc=apache,dc=org". This is far from an elegant solution, but for now it works. You will be required to enter and confirm your current password, and then enter and confirm your new password choice, followed by your LDAP password (this is your old password) .
We are working on a web portal that will allow users to edit attributes, such as forwarding address, password, etc. This will be made available as soon as it is ready. If you don't know your current password, then you will need to email root@ as per usual.
You can follow the trials and tribulations of the rollout on my personal blog
Posted at 04:01PM May 21, 2009 by pctony in General | Comments[3]
Git support at Apache
Git is a new version control system that has been getting increasingly popular during the past few years. Many Apache contributors have also expressed interested in using Git for working with Apache codebases. While the canonical location of all Apache source code is our Subversion repository, we also want to support developers who prefer to use Git as their version control tool.
Based on work by volunteers on the infrastructure-dev@ mailing list, we have recently set up read-only Git mirrors of many Apache codebases at http://git.apache.org/. These mirrors contain the full version histories (including all branches and tags) of the mirrored codebases and are updated in near real time based on the latest svn commits.
See the documentation and wiki pages for more details about this service and how to best use it. We are also open to good ideas on how to extend or improve this service. Please join the infrastructure-dev@ mailing list for the ongoing discussion!
Posted at 10:22PM May 03, 2009 by Jukka in General |
New mailing list for CI Build Services
Established today, we now have a dedicated mailing list to talk about and work out all things to do with our build services. Currently infrastructure provides projects with use of Hudson, Continuum, Gump and now we have another option in Buildbot. Buildbot is a new service here at Apache Infrastructure, currently in its last stages of testing , more info coming soon.
All these services and all the projects that use them, are welcome to meet together on the new mailing list. Maybe your project is looking to use one or more of these CI's to build & test their code, build their site, publish to Nexus. Maybe you are already using a CI and want some configuration additions/changes or extra jobs run.
Also look out for us poor souls looking after these instances and the machines they run on - we might need more information from you projects, clarification or updating of build requirements, builds taking too long that needs investigation.
Failing builds are of course for each project to solve code-wise, but be sure that whichever CI(s) you choose, they are there to inform and will give you constant reminders of build failures 
Sign up to the new mailing list - builds-subscribe-AT-apache-DOT-org.
See you there!
Posted at 09:14AM Apr 06, 2009 by administrator in General |
Improving our Subversion Services
This week the ASF Infrastructure Team deployed one of the first major changes to how svn.apache.org works since it was launched, 6 years ago.
We now distribute Subversion traffic to our servers based on the geographic region of a client.
We are using pgeodns, the same software that powers CPAN Search and the NTP Pool. With pgeodns we can give out different DNS entries to clients, depending on where they are connecting from. It isn't an exact science, but for most clients it is good enough to find the closer Subversion Server.
If you are connecting from Europe, your client will connect to Harmonia. Harmonia is a Sun x4150 running FreeBSD 7.0, using ZFS raid2z over 6 disks, hosted in Amsterdam at SURFnet.
Users in North America are directed to Eris, our traditional Subversion Master Server. Eris is a Dell 2950 also running FreeBSD 7.0, using ZFS raid2z over 4 disks, hosted in Corvallis, Oregon at OSUOSL.
Using svnsync as described in Norman's ApacheCon EU 2009 Talk, we replicate all commits to the master to the slave in real time. If a commit is made to the slave, we proxy the commit to the master.
Read operations are handled on the nearest mirror, and are much faster for everything from the initial checking out to running an update due to the decreased latency.
While this change should improve the experience significantly, we have some other changes coming up soon for svn.apache.org:
- Upgrade to Subversion 1.6: Representation Sharing, inode packing, and memcached support should help make our SVN servers even faster.
- Upgrade both Eris and Harmonia to FreeBSD 7.2-STABLE: The ZFS filesystem is experimental in FreeBSD 7, and there are many stability and performance enhancements available in newer versions.
- Adding more Geographic Mirrors: Once we are comfortable with the current setup, we would like to expand to another mirror location, hopefully in Australia or Asia.
Posted at 08:47PM Apr 02, 2009 by pquerna in General |
Subversion on-the-fly Replication Talk
Last week (at ApacheCon 2009 EU) I gave a session talk about "Subversion on-the-fly Replication" and how we (ASF) deployed such an setup last year with in the Apache Software Foundation. So check out the slides if you are interested in how it works, why you should do it, what are the known problems and the solutions etc..
Thanks to all the people who attended the session I (we) are still open for good suggestions and feedback ;-)
There is no need for being sad if you missed the session talk, it was recorded as part of the "HTTP Server Administration" track. You can register for it HERE
Ps: Thanks to Tony Stevenson to act as my session chair and to Paul Querna to help out on answering questions ;)
Posted at 05:47PM Apr 02, 2009 by norman in General |
LDAP - It's getting closer
As of this afternoon whilst at ApacheCon Europe 2009, we have gotten our initial LDAP platform in place ready for testing. This will allow us to move to a centralized AAA system.
Posted at 04:56PM Mar 26, 2009 by pctony in Development |
Slow SVN Service This Week
In preparation for upgrading Subversion to the latest version (1.6.0), we are running an svn dump on svn.apache.org. This will chew up enough disk IO to be noticeable to svn users. We expect the dump to finish sometime during this weekend.
Posted at 06:39PM Mar 25, 2009 by joes in Status |
New faces in Infrastructure
Over the past year the Infrastructure Team has grown to meet new challenges. Here is a list of the new folks on the team:
- Gavin McDonald (gmcdonald)
- Norman Maurer (norman)
- Tony Stevenson (pctony)
- Wendy Smoak (wsmoak)
- Mark Thomas (markt)
- Chris J. Davis (chrisjdavis)
- Jukka Zitting (jukka)
Congratulate these people on the hard work they have done for the ASF the next time you bump into one or two of them- or better yet, buy 'em a beer!
Posted at 02:28AM Mar 25, 2009 by joes in General | Comments[1]
