Entries tagged [perl]

Tuesday March 25, 2014

Scaling down the CMS to modest but intricate websites

The original focus of the CMS was to provide the tools necessary for handling http://www.apache.org/ and similar Anakia-based sites.  The scope quickly changed when Apache OpenOffice was accepted into the incubator... handling over 9GB of content well was quite an undertaking and will be soon discussed at Apachecon in Denver during Dave Fisher's talk.  From there the build system was extended to allow builds using multiple technologies and programming languages.

Since that time in late 2012 the CMS codebase has sat still, but recently we've upped the ante and decided to offer features aimed at parity with other site building technologies like jekyll, nanoc and middleman.  You can see some of the new additions to the Apache CMS in action at http://thrift.apache.org/. The Apache Thrift website was originally written to use nanoc before being ported to the newly improved Apache CMS. They kept the YAML headers for their markdown pages and converted from a custom preprocessing script used for inserting code snippets to using a fully-supported snippet-fetching feature in the Apache CMS. 

"The new improvements to the Apache CMS allowed us to quickly standardize the build process and guarantee repeatable results as well as integrate direct code snippets into the website from our source repository."
- Jake Farrell, Apache Thrift PMC Chair

Check out the Apache Thrift website cms sources for sample uses of the new features found in ASF::View and ASF::Value::Snippet.

Saturday June 09, 2012

The value of taint checks in CGI scripts

Consider the following snippet taken from a live CGI script running on the host that serves www.apache.org:

#!/usr/bin/perl

use strict;
use warnings;

print "Content-Type: text/html\n\n";
my $artifact = "/apache-tomee/1.0.1-SNAPSHOT/";
$artifact = $ENV{PATH_INFO} if $ENV{PATH_INFO};

$artifact = "/$artifact/";
$artifact =~ s,/+,/,g;
$artifact =~ s,[^a-zA-Z.[0-9]-],,g;
$artifact =~ s,\.\./,,g;

my $content = `wget -q -O - http://repository.apache.org/snapshots/org/apache/openejb$artifact`;
... 


Looks pretty good right?  Any questionable characters are removed from $artifact before exposing it to the shell via backticks... hmm, well turns out that's not so easy to determine.

The first warning sign that was given to the author of this script was that he hadn't enabled taint checks- if he had this is how things probably would have looked:

#!/usr/bin/perl -T

use strict;
use warnings;

print "Content-Type: text/html\n\n";
my $artifact = "/apache-tomee/1.0.1-SNAPSHOT/";
$artifact = $ENV{PATH_INFO} if $ENV{PATH_INFO};

$artifact = "/$artifact/";
$artifact =~ s,/+,/,g;
$artifact =~ m,^([a-zA-Z.[0-9]-]*)$, or die "Detainting regexp failed!";
$artifact = $1;
$artifact =~ s,\.\./,,g;

my $content = `wget -q -O - http://repository.apache.org/snapshots/org/apache/openejb$artifact`;
... 

Which doesn't look like much of a change, but the impact on the actual logic is massive: we've gone from a substitution that strips unwanted chars to a fully-anchored pattern that matches only a string full of wanted chars only, and dies on pattern match failure.  Sadly the developer in question did not heed this early advice.

As it turns out, there is a bug (well several) in the core pattern that renders the original substitution ineffective.  However the impact on the taint-checked version causes the detainting match to fail and renders the script harmless!  The practical difference is that instead of a script with a working remote shell exploit, we have script that serves no useful purpose.  To the Apache sysadmins this is a superior outcome, even though to the developer the original, essentially working script is preferable- worlds are colliding here, but guess who wins?

At the ASF the sysadmins almost invariably refuse to run perl or ruby CGI scripts without taint-checking enabled, and will always prefer CGI scripts be written in languages that support taint checks as they tend to enforce good practice in dealing with untrusted input.  This example, which is in fact one of the first times we've even considered allowing Apache devs to deploy non-download CGI scripts on the www.apache.org  server, serves as a useful reminder to Apache devs as to why using languages that support taint checks is an essential component of scripting on the web.



Calendar

Search

Hot Blogs (today's hits)

Tag Cloud

Categories

Feeds

Links

Navigation