Apache Infrastructure Team

Sunday February 26, 2012

Apache CMS: latest new feature is SPEED!

Over the past few months the Apache CMS has seen lots of new improvements, all under the general theme of making the system more performant.  Supporting very large sites like the Apache OpenOffice User Site with almost 10 GB of content has presented new challenges, met largely with the introduction of zfs clones for generating per-user server-side working copies, changing what was an O(N) rsync job to an O(1) operation.  We've also moved the update processing out-of-band to further cut down on the time it takes for the bookmarklet to produce a page, eliminating all O(N) algorithms from the process.

 More recent work focuses on the merge-based publication process, which for large changesets took a considerable amount of time to process.  That too has been recoded based on svnmucc and is now another O(1) operation- essentially a perfect copy of staging with a few adjustments for "external" paths.

Combine that with the activity around parallelizing the build system and you have a completely different performance profile compared to the way the system worked in 2011.  In short, if you haven't tried the CMS lately, and were a bit offput by the page rendering times or build speeds, have another look!

Next up: describing the work done around external build support, focusing first on maven based sites.


Sunday December 11, 2011

translate service now open!

A few projects have requested it, now it is here! Check out https://translate.apache.org and get your project added.

See also https://cwiki.apache.org/confluence/display/INFRA/translate+pootle+service+auth+levels for more information - you will see that general public non-logged in users can submit translate requests whilst any logged in user (i.e. - committers) can process those submissions.

Enjoy! - Any queries to the infra team please or file a INFRA Jira ticket.

Friday April 15, 2011

PEAR package hosting available

Any projects in the position of being able to release via PEAR packages can now do so hosted officially on ASF servers.

http://pear.apache.org is now up and running and ready to serve!

Tuesday March 22, 2011

Welcome new members of the infra team

Well, some are not exactly new faces, but since our last blog update of new infra members in 2009 , we have conned with promises of fame, fortune and beer the following new additions to the infra team:

  • Chris Rhodes: (arreyder)
  • Brian Fox: (brianf)
  • Matt Benson: (mbenson)
  • David Blevins: (dblevins)
  • Rudiger Pluem: (rpluem)
  • Noirin Plunkett: (noirin)
  • Ulrich Stärk: (uli)
  • Daniel Shahaf: (danielsh)
  • Paul Davis: (davisp)

Infra work is not your normal volunteer work, and it is greatly appreciated when any of these folks get to help.

Thursday February 24, 2011

Changes to email service for all committers

In the near future the Infrastructure team will be implementing a change to the way we handle emails for all committers.

Historically we have allowed users to choose how to handle their apache.org email. However we will be making the following changes:

  1. Making LDAP authoritative for all mail forwarding addresses.
  2. Users will no longer be allowed to store their apache.org email locally on people.apache.org (minotaur)
  3. The Infra team will take the mail address currently held in either your .qmail or .forward file (.qmail is authoritative if they both exist) and inject this into LDAP
  4. We will no longer allow users to configure mail filtering, but you can configure your SpamAssassin threshold as per our recent blog post.
  5. We will make committers ~/.forward and ~/.qmail files read-only, there will still be at least one of these files, but it will be owned by the mail daemon user.

This means that all committers will be required to forward their apache.org email to an email address outside of the foundation.

We are doing this to simplify the email infrastructure, and to help reduce the current level of complexity of maintaining people.apache.org. Also, making LDAP authoritative means we can move some of the work straight out to the MXs, and thus avoid sending it through several mail servers. In the new architecture if someone emails you directly at your apache.org mail address it will only be handled by one apache.org MX.

Of course, we wont delete any email you currently have on people.apache.org. Should you want to edit your LDAP record you should use https://id.apache.org to do this.

Thursday January 27, 2011

Controlling your SpamAssassin threshold

Committers,

The Infrastructure Team has just enabled a new feature to control your SpamAssassin Threshold for your apache.org account. The default score for user delivery has always remained at 10, but with this new feature you can lower that score to anything you want. Many people with older accounts will probably prefer a lower score, like 5, which is the default for all apache mailing lists.

To lower your score login to id.apache.org and change your 'SpamAssassin Threshold (asf-sascore)' attribute to your desired level. Don't forget to supply the form with your LDAP password.

Enjoy.

Thursday December 02, 2010

The ASF CMS

Over the past 3 months, the Infrastructure Team has developed and deployed a custom CMS for Apache projects to use to manage their websites. There is a document available which explains the rationale, role, and future plans for the CMS. We have opened up the ACLs for the www.apache.org site for all committers to now be able to edit content on the site using the cms (while still restricting live publication to the Apache membership and the Infrastructure Team).

The basic workflow for committers is easy to describe: first install the javascript bookmarklet on your browser toolbar. Next visit a webpage on the www.apache.org website. When you've located a page you'd like to edit, click on the installed bookmarklet: you'll be taken to a working copy of the markdown source for the page in question. To edit the content click on the [Edit] link. A markdown editor will show you a preview of your changes while you work. When you have finished, submit your changes and [Commit] them.

Your commit will trigger buildbot to build a staging version of your changes. You can follow the build while it is ongoing, and once it has completed you can click on the [Staged] link to see the results. Members and Infrastructure Team members can continue on and publish those changes once they are satisfied with them. Other committers may need to send a note to the site-dev@ mailing list to request publication of their changes.

The publication links in the CMS are essentially merge + commit operations in subversion which are tied into the live site via svnpubsub. That means publishing in the CMS is virtually instantaneous.

The CMS is now open to all top-level and incubating projects. Interested projects should contact the infrastructure@ mailing list or simply file an INFRA ticket against the CMS component. Early adopters are encouraged to collaborate on the wiki page for working out usage and adoption issues.

Tuesday October 26, 2010

ReviewBoard instance running at the ASF

We know we have projects that use reviewboard externally to the ASF, we also have some projects using codereview.appspot.com and we also have some projects using Fisheye/Clover externally. Well, due to popular request, we now have an internal ReviewBoard running on https://reviews.apache.org !! So, sign up for an account, request that your projects repository be added (file an INFRA issue) and get collaborating! Questions or comments please raise them on the infrastructure-dev list as reviews.apache.org is in early stages it may need tweaking.

Monday July 19, 2010

new hardware for apache.org

This weekend we rolled out a new server, a Dell Power Edge R410, named Eos, to host the Apache.org websites and MoinMoin wiki:

  • OS: FreeBSD 8.1-RC2
  • CPU: 2x Intel(R) Xeon(R) CPU X5550 @ 2.67GHz (2 package(s) x 4 core(s) x 2 SMT threads = 16 CPUs)
  • RAM: 48gb DDR3
  • Storage: 12x 15k RPM 300gb SAS, 2x 80gb SSD, configured in a ZFS raidz2 with the SSDs used for the L2ARC

This new hardware replaces an older Sun T2000, also called eos, as the primary webserver for apache.org. We hope everyone enjoys the increased performance, especially from the Wiki!

On the less visible infrastructure side, we are also upgrading Athena, one of our frontend mail servers. The new Athena is a DPE r210 with a 4 core 2.67GHz processor, replacing a Sun X2200.

Friday June 11, 2010

s.apache.org - uri shortening service

Today we've brought s.apache.org online. It's a url shortening service that's limited to Apache committers- the people who write all that Apache software! One of the main reasons we're providing this service is to allow committers to use shortened links whose provenance is known to be a trusted source, which is a big improvement over the generic shorteners out there in the wild. It is also meant to provide permanent links suitable for inclusion in board reports, or more generally email sent to our mailing lists - which will be archived, either publically or privately, for as long as Apache is around.

The service is easy to use, and being from Apache the source code for the service is readily available. The primary author of the code is Ulrich Stärk (uli). Some of the more interesting features you can pick up from the source is the ability to "display" a link before doing a redirect by tacking on "?action=display" to any shortened url. For the truly paranoid there is the "?action=display;cookie=1" query string to force all shortened urls to display by default before redirecting. That feature may be turned off again with the "?action=display;cookie=" query string. Again, look over the source code for other interesting features you may wish to take advantage of.

Committers: here's some javascript you might consider placing in a bookmark, courtesy of Doug Cutting. To use create a new bookmark and set the link url to

javascript:void(location.href='https://s.apache.org/?action=create&search=ON&uri='+escape(location.href))

Tuesday April 13, 2010

apache.org incident report for 04/09/2010

Apache.org services recently suffered a direct, targeted attack against our infrastructure, specifically the server hosting our issue-tracking software.

The Apache Software Foundation uses a donated instance of Atlassian JIRA as an issue tracker for our projects. Among other projects, the ASF Infrastructure Team uses it to track issues and requests. Our JIRA instance was hosted on brutus.apache.org, a machine running Ubuntu Linux 8.04 LTS.

Password Security

If you are a user of the Apache hosted JIRA, Bugzilla, or Confluence, a hashed copy of your password has been compromised.

JIRA and Confluence both use a SHA-512 hash, but without a random salt. We believe the risk to simple passwords based on dictionary words is quite high, and most users should rotate their passwords.

Bugzilla uses a SHA-256, including a random salt. The risk for most users is low to moderate, since pre-built password dictionaries are not effective, but we recommend users should still remove these passwords from use.

In addition, if you logged into the Apache JIRA instance between April 6th and April 9th, you should consider the password as compromised, because the attackers changed the login form to log them.

What Happened?

On April 5th, the attackers via a compromised Slicehost server opened a new issue, INFRA-2591. This issue contained the following text:

ive got this error while browsing some projects in jira http://tinyurl.com/XXXXXXXXX [obscured]

Tinyurl is a URL redirection and shortening tool. This specific URL redirected back to the Apache instance of JIRA, at a special URL containing a cross site scripting (XSS) attack. The attack was crafted to steal the session cookie from the user logged-in to JIRA. When this issue was opened against the Infrastructure team, several of our administators clicked on the link. This compromised their sessions, including their JIRA administrator rights.

At the same time as the XSS attack, the attackers started a brute force attack against the JIRA login.jsp, attempting hundreds of thousands of password combinations.

On April 6th, one of these methods was successful. Having gained administrator privileges on a JIRA account, the attackers used this account to disable notifications for a project, and to change the path used to upload attachments. The path they chose was configured to run JSP files, and was writable by the JIRA user. They then created several new issues and uploaded attachments to them. One of these attachments was a JSP file that was used to browse and copy the filesystem. The attackers used this access to create copies of many users' home directories and various files. They also uploaded other JSP files that gave them backdoor access to the system using the account that JIRA runs under.

By the morning of April 9th, the attackers had installed a JAR file that would collect all passwords on login and save them. They then sent password reset mails from JIRA to members of the Apache Infrastructure team. These team members, thinking that JIRA had encountered an innocent bug, logged in using the temporary password sent in the mail, then changed the passwords on their accounts back to their usual passwords.

One of these passwords happened to be the same as the password to a local user account on brutus.apache.org, and this local user account had full sudo access. The attackers were thereby able to login to brutus.apache.org, and gain full root access to the machine. This machine hosted the Apache installs of JIRA, Confluence, and Bugzilla.

Once they had root on brutus.apache.org, the attackers found that several users had cached Subversion authentication credentials, and used these passwords to log in to minotaur.apache.org (aka people.apache.org), our main shell server. On minotaur, they were unable to escalate privileges with the compromised accounts.

About 6 hours after they started resetting passwords, we noticed the attackers and began shutting down services. We notified Atlassian of the previously unreported XSS attack in JIRA and contacted SliceHost. Atlassian was responsive. Unfortunately, SliceHost did nothing and 2 days later, the very same virtual host (slice) attacked Atlassian directly.

We started moving services to a different machine, thor.apache.org. The attackers had root access on brutus.apache.org for several hours, and we could no longer trust the operating system on the original machine.

By April 10th, JIRA and Bugzilla were back online.

On April 13th, Atlassian provided a patch for JIRA to prevent the XSS attack. See JRA-20994 and JRA-20995 for details.

Our Confluence wiki remains offline at this time. We are working to restore it.

What worked?

  • Limited use passwords, especially one-time passwords, were a real lifesaver. If JIRA passwords had been shared with other services/hosts, the attackers could have caused widespread damage to the ASF's infrastructure. Fortunately, in this case, the damage was limited to rooting a single host.
  • Service isolation worked with mixed results. The attackers must be presumed to have copies of our Confluence and Bugzilla databases, as well as our JIRA database, at this point. These databases include hashes of all passwords used on those systems. However, other services and hosts, including LDAP, were largely unaffected.

What didn't work?

  • The primary problem with our JIRA install is that the JIRA daemon runs as the user who installed JIRA. In this case, it runs as a jira role-account. There are historical reasons for this decision, but with 20/20 hindsight, and in light of the security issues at stake, we expect to revisit the decision!
  • The same password should not have been used for a JIRA account as was used for sudo access on the host machine.
  • Inconsistent application of one time passwords; We required them on other machines, but not on brutus. PAM was configured to allow optional use of OPIE, but not all of our sudoers had switched to it.
  • SSH passwords should not have been enabled for login over the Internet. Although the Infrastructure Team had attempted to configure the sshd daemon to disable password-based logins, having UsePAM yes set meant that password-based logins were still possible.
  • We use Fail2Ban for many services, but we did not have it configured to track JIRA login failures.

What are we changing?

  • We have remedied the JIRA installation issues with our reinstall. JIRA is now installed by root and runs as a separate daemon with limited privileges.
  • For the time being we are running JIRA in a httpd-tomcat proxy config with the following rules:
    
       ProxyPass /jira/secure/popups/colorpicker.jsp !
       ProxyPass /jira/secure/popups/grouppicker.jsp !
       ProxyPass /jira/secure/popups/userpicker.jsp !
       ProxyPass /jira        http://127.0.0.1:18080/jira
    
    
    Sysadmins may find this useful to secure their JIRA installation until an upgrade is feasible.
  • We will be making one-time-passwords mandatory for all super-users, on all of our Linux and FreeBSD hosts.
  • We have disabled caching of svn passwords, and removed all currently cached svn passwords across all hosts ast the ASF via the global config /etc/subversion/config file:
    
    [auth]
    store-passwords = no
    
    
  • Use Fail2Ban to protect web application login failures from brute force attacks

We hope our disclosure has been as open as possible and true to the ASF spirit. Hopefully others can learn from our mistakes.

Monday February 22, 2010

The ASF LDAP system

When we decided some time ago to start using LDAP for auth{n,z} we had to come up with a sane structure, this is what we have thus far. 

 dc=apache,dc=org
      | ---  ou=people,dc=apache,dc=org
      | ---  ou=groups,dc=apache,dc=org
           | ---  ou=people,ou=groups,dc=apache,dc=org
           | ---  ou=committees,ou=groups,dc=apache,dc=org

 As well as other OUs that contain infrastructure related objects.

So with "dc=apache,dc=org" being our basedn, we decided we needed to keep the structure as simple as possible and placed the following objects in the respective OUs:

  • User accounts -  "ou=groups,dc=apache,dc=org"
  • POSIX groups - "ou=groups,dc=apache,dc=org"
  • User Groups  - "ou=people,ou=groups,dc=apache,dc=org"
  • PMC/Committee groups - "ou=committees,ou=groups,dc=apache,dc=org"
Access to the LDAP infrastructure is connection limited to hosts within our co-location sites.  This is essentially to help prevent unauthorised data leaving our network. 

Wednesday February 17, 2010

SVN performance enhancements

Tonight we enabled a pair of Intel X25-M's to serve as l2arc cache for the zfs array which contains all of our svn repositories.  Over the next few hours as these SSD's start serving files from cache, the responsiveness and overall performance of svn on eris (our master US-based server) should be noticably better than it has been lately.

In addition we are planning to install 16GB of extra RAM into eris to improve zfs performance even further, but for now we are hopeful that committers will appreciate the performance we've added tonight.


 

Monday November 09, 2009

What can the ASF Buildbot do for your project?

The below information has just been published to the main  ASF Buildbot URI ci.apache.org/buildbot.html.

A summary of just some of the things the ASF Buildbot can do for your project:

  • Perform per commit build & test runs for your project
  • Not just svn! - Buildbot can pull in from your Git/Mercurial branches too!
  • Build and Deploy your website to a staging area for review
  • Build and Deploy your website to mino (people) for syncing live
  • Automatically Build and Deploy Snapshots to Nexus staging area.
  • Create Nightly and historical zipped/tarred snapshot builds for download
  • Builds can be triggered manually from within your own freenode #IRC Channel
  • An IRCBot can report on success/failures of a build instantly
  • Build Success/Failures can go to your dev/notification mailing list
  • Perform multiple builds of an svn/git commit on multiple platforms asyncronously
  • ASF Buildbot uses the latest RAT build to check for license header issues for all your files.
  • RAT Reports are published live instantly to ci.apache.org/$project/rat-report.[txt|html]
  • As indicated above, plain text or html versions of RAT reports are published.
  • [Coming Soon] - RAT Reports sent to your dev list, only new failures will be listed.
  • [Coming Soon] - Email a patch with inserted ASL 2.0 Headers into your failed files!!
  • Currently Buildbot has Ubuntu 8.04, 9.04 and Windows Server 2008 Slaves
  • [Coming Soon] - ASF Buildbot will soon have Solaris, FreeBSD 8 and Windows 7 Slaves

Dont see a feature that you need? Join the builds.at.apache.org mailing list and request it now, or file a Jira Ticket.

Help is always on hand on the builds.at.apache.org mailing list for any problems or build configuration issues/requests. Or try the #asftest channel on irc.freenode.net for live support.

So now you want your project to use Buildbot? No problem, best way is to file a Jira Ticket. and count to 10 (well maybe a bit longer but it wont be long before you are up and running).

Monday October 12, 2009

DDOS mystery involving Linux and mod_ssl

In the first week of October we started getting reports of performance issues, mainly connection timeouts, on all of our services hosted at https://issues.apache.org/.  On further inspection we noticed a huge amount of "Browser disconnect" errors in the error log right at the beginning of the ssl transaction, on the order of 50 connections / second.  This was grinding the machine to a standstill, so we wrote a quick and dirty perl script to investigate the matter.  Initial reports indicated a ddos attack from nearly 100K machines targetting Apache + mod_ssl's accept loop, and the script was tweaked to filter out that traffic before proxying the connections to httpd.

As we started getting a picture of the IP space conducting the attack, the prognosis looked rather bleak: more and more IP's were getting involved and the ddos traffic continued to increase, getting to the point where Linux was shutting down the ethernet interface.  So we then rerouted the traffic to an available FreeBSD machine, which did a stellar job of filtering out the traffic at the kernel level.  We unfortunately didn't quite realize how good a job FreeBSD was doing, and for a time we were operating under the impression that the ddos was ending.  So we eventually moved the traffic back to brutus, the original Linux host, and patched httpd using code developed by Ruediger Pluem.

And back came the ddos traffic.  In a few days the rate of closed connections had nearly doubled, so we had little choice but to start dumping the most frequent IP addresses into iptables DROP rules.  5000 rules cut the traffic by 2/3 in an instant.  But the problem was growing- our logs indicated there were now over 300K addresses participating in the attack.

We started looking closer at the IP's in an attempt to correlate them with regular http requests.   The only pattern that seemed to emerge was that many of the IP's in question we're also generating spartan  "GET / HTTP/1.1" requests with a single Host: 140.211.11.140 header to port 443.   Backtracking through a year of logs revealed that these spartan requests had been going on since August 6, 2008.  The IP's originating these requests were as varied as, and more often that not matched up with, the rapid closed connection traffic we started seeing in October.

So what exactly is going on here?  The closed connection traffic continues to rise, and the origin of the associated spartan requests is currently unknown.

Calendar

Search

Hot Blogs (today's hits)

Tag Cloud

Categories

Feeds

Links

Navigation