Entries tagged [downtime]

Monday May 18, 2015

Planned downtime for Jira

Hi All,

There will be a planned reboot of Jira on Thursday 21st May at 16:00 UTC+1

This is 72 hours notice as recommended in our Core Services planned downtime SLA.

Currently, Jira requires a reboot when adding new projects to it. There is an outstanding
ticket with Atlassian about this. They require logs and so these will be gathered at the
time of the planned reboot.

Projects being added to Jira at this time will include:-

INFRA-9516 - Myriad
INFRA-9609 - Atlas
INFRA-9635 - CMDA

and any more that get requested between now and downtime.

Any projects requiring issues to be imported from other issue trackers will NOT be done at
this time.

A tweet via @infrabot will be tweeted 24 hrs and 1 hr before.
A planned maintenance notice will be posted on status.apache.org.

Actual downtime should be no more than 10 minutes all being well.

The next email about this will be after the service has resumed from the planned downtime.

Thanks!

Gav…

Tuesday April 13, 2010

apache.org incident report for 04/09/2010

Apache.org services recently suffered a direct, targeted attack against our infrastructure, specifically the server hosting our issue-tracking software.

The Apache Software Foundation uses a donated instance of Atlassian JIRA as an issue tracker for our projects. Among other projects, the ASF Infrastructure Team uses it to track issues and requests. Our JIRA instance was hosted on brutus.apache.org, a machine running Ubuntu Linux 8.04 LTS.

Password Security

If you are a user of the Apache hosted JIRA, Bugzilla, or Confluence, a hashed copy of your password has been compromised.

JIRA and Confluence both use a SHA-512 hash, but without a random salt. We believe the risk to simple passwords based on dictionary words is quite high, and most users should rotate their passwords.

Bugzilla uses a SHA-256, including a random salt. The risk for most users is low to moderate, since pre-built password dictionaries are not effective, but we recommend users should still remove these passwords from use.

In addition, if you logged into the Apache JIRA instance between April 6th and April 9th, you should consider the password as compromised, because the attackers changed the login form to log them.

What Happened?

On April 5th, the attackers via a compromised Slicehost server opened a new issue, INFRA-2591. This issue contained the following text:

ive got this error while browsing some projects in jira http://tinyurl.com/XXXXXXXXX [obscured]

Tinyurl is a URL redirection and shortening tool. This specific URL redirected back to the Apache instance of JIRA, at a special URL containing a cross site scripting (XSS) attack. The attack was crafted to steal the session cookie from the user logged-in to JIRA. When this issue was opened against the Infrastructure team, several of our administators clicked on the link. This compromised their sessions, including their JIRA administrator rights.

At the same time as the XSS attack, the attackers started a brute force attack against the JIRA login.jsp, attempting hundreds of thousands of password combinations.

On April 6th, one of these methods was successful. Having gained administrator privileges on a JIRA account, the attackers used this account to disable notifications for a project, and to change the path used to upload attachments. The path they chose was configured to run JSP files, and was writable by the JIRA user. They then created several new issues and uploaded attachments to them. One of these attachments was a JSP file that was used to browse and copy the filesystem. The attackers used this access to create copies of many users' home directories and various files. They also uploaded other JSP files that gave them backdoor access to the system using the account that JIRA runs under.

By the morning of April 9th, the attackers had installed a JAR file that would collect all passwords on login and save them. They then sent password reset mails from JIRA to members of the Apache Infrastructure team. These team members, thinking that JIRA had encountered an innocent bug, logged in using the temporary password sent in the mail, then changed the passwords on their accounts back to their usual passwords.

One of these passwords happened to be the same as the password to a local user account on brutus.apache.org, and this local user account had full sudo access. The attackers were thereby able to login to brutus.apache.org, and gain full root access to the machine. This machine hosted the Apache installs of JIRA, Confluence, and Bugzilla.

Once they had root on brutus.apache.org, the attackers found that several users had cached Subversion authentication credentials, and used these passwords to log in to minotaur.apache.org (aka people.apache.org), our main shell server. On minotaur, they were unable to escalate privileges with the compromised accounts.

About 6 hours after they started resetting passwords, we noticed the attackers and began shutting down services. We notified Atlassian of the previously unreported XSS attack in JIRA and contacted SliceHost. Atlassian was responsive. Unfortunately, SliceHost did nothing and 2 days later, the very same virtual host (slice) attacked Atlassian directly.

We started moving services to a different machine, thor.apache.org. The attackers had root access on brutus.apache.org for several hours, and we could no longer trust the operating system on the original machine.

By April 10th, JIRA and Bugzilla were back online.

On April 13th, Atlassian provided a patch for JIRA to prevent the XSS attack. See JRA-20994 and JRA-20995 for details.

Our Confluence wiki remains offline at this time. We are working to restore it.

What worked?

  • Limited use passwords, especially one-time passwords, were a real lifesaver. If JIRA passwords had been shared with other services/hosts, the attackers could have caused widespread damage to the ASF's infrastructure. Fortunately, in this case, the damage was limited to rooting a single host.
  • Service isolation worked with mixed results. The attackers must be presumed to have copies of our Confluence and Bugzilla databases, as well as our JIRA database, at this point. These databases include hashes of all passwords used on those systems. However, other services and hosts, including LDAP, were largely unaffected.

What didn't work?

  • The primary problem with our JIRA install is that the JIRA daemon runs as the user who installed JIRA. In this case, it runs as a jira role-account. There are historical reasons for this decision, but with 20/20 hindsight, and in light of the security issues at stake, we expect to revisit the decision!
  • The same password should not have been used for a JIRA account as was used for sudo access on the host machine.
  • Inconsistent application of one time passwords; We required them on other machines, but not on brutus. PAM was configured to allow optional use of OPIE, but not all of our sudoers had switched to it.
  • SSH passwords should not have been enabled for login over the Internet. Although the Infrastructure Team had attempted to configure the sshd daemon to disable password-based logins, having UsePAM yes set meant that password-based logins were still possible.
  • We use Fail2Ban for many services, but we did not have it configured to track JIRA login failures.

What are we changing?

  • We have remedied the JIRA installation issues with our reinstall. JIRA is now installed by root and runs as a separate daemon with limited privileges.
  • For the time being we are running JIRA in a httpd-tomcat proxy config with the following rules:
    
       ProxyPass /jira/secure/popups/colorpicker.jsp !
       ProxyPass /jira/secure/popups/grouppicker.jsp !
       ProxyPass /jira/secure/popups/userpicker.jsp !
       ProxyPass /jira        http://127.0.0.1:18080/jira
    
    
    Sysadmins may find this useful to secure their JIRA installation until an upgrade is feasible.
  • We will be making one-time-passwords mandatory for all super-users, on all of our Linux and FreeBSD hosts.
  • We have disabled caching of svn passwords, and removed all currently cached svn passwords across all hosts ast the ASF via the global config /etc/subversion/config file:
    
    [auth]
    store-passwords = no
    
    
  • Use Fail2Ban to protect web application login failures from brute force attacks

We hope our disclosure has been as open as possible and true to the ASF spirit. Hopefully others can learn from our mistakes.

Wednesday September 02, 2009

apache.org incident report for 8/28/2009

Last week we posted about the security breach that caused us to temporarily suspend some services.  All services have now been restored. We have analyzed the events that led to the breach, and continued to work on improving the security of our systems.

NOTE: At no time were any Apache Software Foundation code repositories, downloads, or users put at risk by this intrusion. However, we believe that providing a detailed account of what happened will make the internet a better place, by allowing others to learn from our mistakes.

What Happened?

Our initial running theory was correct--the server that hosted the apachecon.com (dv35.apachecon.com) website had been compromised. The machine was running CentOS, and we suspect they may have used the recent local root exploits patched in RHSA-2009-1222 to escalate their privileges on this machine. The attackers fully compromised this machine, including gaining root privileges, and destroyed most of the logs, making it difficult for us to confirm the details of everything that happened on the machine. 

This machine is owned by the ApacheCon conference production company, not by the Apache Software Foundation. However, members of the ASF infrastructure team had accounts on this machine, including one used to create backups.

The attackers attempted unsuccessfully to use passwords from the compromised ApacheCon host to log on to our production webservers.  Later, using the SSH Key of the backup account, they were able to access people.apache.org (minotaur.apache.org). This account was an unprivileged user, used to create backups from the ApacheCon host.

minotaur.apache.org runs FreeBSD 7-STABLE, and acts as the staging machine for our mirror network. It is our primary shell account server, and provides many other services for Apache developers. None of our Subversion (version control) data is kept on this machine, and there was never any risk to any Apache source code.

Once the attackers had gained shell access, they added CGI scripts to the document root folders of several of our websites. A regular, scheduled rsync process copied these scripts to our production web server, eos.apache.org, where they became externally visible. The CGI scripts were used to obtain remote shells, with information sent using HTTP POST commands.

Our download pages are dynamically generated, to enable us to present users with a local mirror of our software. This means that all of our domains have ExecCGI enabled, making it harder for us to protect against an attack of this nature.

After discovering the CGI scripts, the infrastructure team decided to shutdown any servers that could potentially have been affected. This included people.apache.org, and both the EU and US website servers. All website traffic was redirected to a known-good server, and a temporary security message was put in place to let people know we were aware of an issue.

One by one, we brought the potentially-affected servers up, in single user mode, using our out of band access. It quickly became clear that aurora.apache.org, the EU website server, had not been affected. Although the CGI scripts had been rsync'd to that machine, they had never been run. This machine was not included in the DNS rotation at the time of the attack.

aurora.apache.org runs Solaris 10, and we were able to restore the box to a known-good configuration by cloning and promoting a ZFS snapshot from a day before the CGI scripts were synced over. Doing so enabled us to bring the EU server back online, and to rapidly restore our main websites. Thereafter, we continued to analyze the cause of the breach, the method of access, and which, if any, other machines had been compromised.

Shortly after bringing up aurora.apache.org we determined that the most likely route of the breach was the backup routine from dv35.apachecon.com. We grabbed all the available logs from dv35.apachecon.com, and promptly shut it down.

Analysis continued on minotaur.apache.org and eos.apache.org (our US server), until we were confident that all remants of the attackers had been removed. As each server was declared clean, it was brought back online.

What worked?

  • The use of ZFS snapshots enabled us to restore the EU production web server to a known-good state.
  • Redundant services in two locations allowed us to run services from an alternate location while continuing to work on the affected servers and services.
  • A non-uniform set of compromised machines (Linux/CentOS i386, FreeBSD-7 amd_64, and Solaris 10 on sparc) made it difficult for the attackers to escalate privileges on multiple machines.

What didn't work?

  • The use of SSH keys facilitated this attack. In hindsight, our implementation left a lot to be desired--we did not restrict SSH keys appropriately, and we were unaware of their misuse.
  • The rsync setup, which uses people.apache.org to manage the deployment of our websites, enabled the attackers to get their files onto the US mirror, undetected.
  • The ability to run CGI scripts in any virtual host, when most of our websites do not need this functionality, made us unneccesarily vulnerable to an attack of this nature.
  • The lack of logs from the ApacheCon host prevents us from conclusively determining the full course of action taken by the attacker. All but one log file were deleted by the attacker, and logs were not kept off the machine.

What changes we are making now?

As a result of this intrusion we are making several changes, to help further secure our infrastructure from such issues in the future. These changes include the following:
  • Requiring all users with elevated privileges to use OPIE for sudo on certain machines.  We already require this in some places, but will expand its use as necessary.
  • Recreating and using new SSH keys, one per host, for backups.  Also enforcing use of the from="" and command="" strings in the authorized keys file on the destination backup server. In tandem with access restrictions which only allow connections from machines that are actually backing up data, this will prevent 3rd party machines from being able to establish an SSH connection. 
    • The command="" string in the authorized_keys file is now explicit, and only allows one way rsync traffic, due to the paths and flags used.
    • New keys have been generated for all hosts, with a minimum key length of at least 4096 bits .
  • The VM that hosted the old apachecon.com site remains powered down, awaiting further detailed analysis.  The apachecon.com website has been re-deployed on a new VM, with a new provider and different operating system.
  • We are looking at disabling CGI support on most of our website systems.  This has led to the creation of a new httpd module that will handle things like mirror locations for downloads.
  • The method by which most of our public facing websites are deployed to our production servers will also change, becoming a much more automated process. We hope to have switched over to a SvnSubPub / SvnWcSub based system within the next few weeks.
  • We will re-implement measures such as IP banning after several failed logins, on all machines. 
  • A proposal has been made to introduce centralized logging. This would include all system logs, and possibly also services such as smtpd and httpd.



Calendar

Search

Hot Blogs (today's hits)

Tag Cloud

Categories

Feeds

Links

Navigation