MRUnit

Wednesday May 23, 2012

Apache MRUnit is now a TLP (Top Level Project)

We (the Apache MRUnit team) have just graduated from the Apache Incubator to an Apache TLP (Top Level Project)! MRUnit is a Java library that helps developers unit test Apache Hadoop MapReduce jobs. Unit testing is a technique for improving project quality and reducing overall costs by writing a small amount of code that can automatically verify the software you write performs as intended. This is considered a best practice in software development since it helps identify defects early, before they're deployed to a production system.

In its monthly meeting in May of 2012, the board of Apache Software Foundation (ASF) resolved to grant a Top-Level Project status to Apache MRUnit, thus graduating it from the Incubator. This is a significant milestone in the life of MRUnit, which has come a long way since its inception as a Hadoop Contrib project in HADOOP-5518 contributed by Aaron Kimball.

  • May 2012 MRUnit graduates from the Incubator to become a TLP
  • May 2012 Version 0.9.0-incubating released.
  • April 2012 Dave Beech added as a new committer.
  • April 2012 Jarek Jarcec Cecho added as a new committer.
  • April 2012 New website created using the CMS.
  • March 2012 Version 0.8.1-incubating released.
  • March 2012 Jim Donofrio added as a new committer.
  • Feburary 2012 Version 0.8.0-incubating released.
  • November 2011 Version 0.5.0-incubating released.
  • October 2011 Brock Noland added as a new committer.
  • March 2011 Project enters incubation.
  • April 2009 Doug Cutting commits Aaron's patch to Hadoop
  • March 2009 Aaron Kimball contributes MRunit to Hadoop as a contrib project

Below is the graduation resolution:

X. Establish the Apache MRUnit Project

WHEREAS, the Board of Directors deems it to be in the best
interests of the Foundation and consistent with the
Foundation's purpose to establish a Project Management
Committee charged with the creation and maintenance of
open-source software related to unit testing Apache Hadoop map
reduce jobs for distribution at no charge to the public.

NOW, THEREFORE, BE IT RESOLVED, that a Project Management
Committee (PMC), to be known as the "Apache MRUnit Project",
be and hereby is established pursuant to Bylaws of the
Foundation; and be it further

RESOLVED, that the Apache MRUnit Project be and hereby is
responsible for the creation and maintenance of software
related to unit testing Apache Hadoop map reduce jobs;
and be it further

RESOLVED, that the office of "Vice President, Apache MRUnit" be
and hereby is created, the person holding such office to
serve at the direction of the Board of Directors as the chair
of the Apache MRUnit Project, and to have primary responsibility
for management of the projects within the scope of
responsibility of the Apache MRUnit Project; and be it further

RESOLVED, that the persons listed immediately below be and
hereby are appointed to serve as the initial members of the
Apache MRUnit Project:

* Brock Noland - brock@apache.org
* Patrick Hunt - phunt@apache.org
* Nigel Daley - nigel@apache.org
* Eric Sammer - esammer@apache.org
* Aaron Kimball - kimballa@apache.org
* Konstantin Boudnik - cos@apache.org
* Garrett Wu - gwu@apache.org
* Jim Donofrio - jdonofrio@apache.org
* Jarek Jarcec Cecho - jarcec@apache.org
* Dave Beech - dbeech@apache.org

NOW, THEREFORE, BE IT FURTHER RESOLVED, that Brock Noland
be appointed to the office of Vice President, Apache MRUnit, to
serve in accordance with and subject to the direction of the
Board of Directors and the Bylaws of the Foundation until
death, resignation, retirement, removal or disqualification,
or until a successor is appointed; and be it further

RESOLVED, that the initial Apache MRUnit PMC be and hereby is
tasked with the creation of a set of bylaws intended to
encourage open development and increased participation in the
Apache MRUnit Project; and be it further

RESOLVED, that the Apache MRUnit Project be and hereby
is tasked with the migration and rationalization of the Apache
Incubator MRUnit podling; and be it further

RESOLVED, that all responsibilities pertaining to the Apache
Incubator MRUnit podling encumbered upon the Apache Incubator
Project are hereafter discharged.

Tuesday May 01, 2012

Apache MRUnit 0.9.0-incubating has been released!

We (the Apache MRUnit team) have just released Apache MRUnit 0.9.0-incubating (tarball, nexus, javadoc). Apache MRUnit is an Apache Incubator project. MRUnit is a Java library that helps developers unit test Apache Hadoop MapReduce jobs. Unit testing is a technique for improving project quality and reducing overall costs by writing a small amount of code that can automatically verify the software you write performs as intended. This is considered a best practice in software development since it helps identify defects early, before they're deployed to a production system.

The MRUnit project is quite active, 0.9.0 is our fourth release since entering the incubator and we have added 4 new committers beyond the projects initial charter! We are very interested in having new contributors and committers join the project! Please join our mailing list to find out how you can help!

The MRUnit build process has changed to produce mrunit-0.9.0-hadoop1.jar and mrunit-0.9.0-hadoop2.jar instead of mrunit-0.9.0-hadoop020.jar, mrunit-0.9.0-hadoop100.jar and mrunit-0.9.0-hadoop023.jar. The hadoop1 classifier is for all Apahce Hadoop versions based off the 0.20.X line including 1.0.X. The hadoop2 classifier is for all Apache Hadoop versions based off the 0.23.X line including the unreleased 2.0.X.

This release contains 6 bug fixes, 15 improvements, and 2 new features. I will highlight a few below:

  • Support custom counter checking in MRUNIT-68
  • runTest() should optionally ignore output order in MRUNIT-91
  • Driver.runTest throws RuntimeException should it throw AssertionError in MRUNIT-54
  • o.a.h.mrunit.mapreduce.MapReduceDriver should support a combiner in MRUNIT-67
  • Better support for other serializations besides Writable: MRUNIT-70, MRUNIT-86, MRUNIT-99, MRUNIT-77
  • Better error messages from validate, null checking and forgetting to set mappers and reducers: MRUNIT-74, MRUNIT-66, MRUNIT-65
  • add static convenience methods to PipelineMapReduceDriver class in MRUNIT-89
  • Test and Deprecate Driver.{*OutputFromString,*InputFromString} Methods in MRUNIT-48

Support custom counter checking

It has always been possible to check the counter values like so:

assertEquals(2, mapDriver.getCounters().findCounter(CustomMapper.CustomCounter.NAME).getValue()); 

but this is quite tedious. As such Jarek Jarcec Cecho (our second newest committer) added this feature directly to the drivers:

.withCounter(CustomMapper.CustomCounter.Name, 2);

runTest() should optionally ignore output order

Previous to this change MRUnit required Mapper/Reducer classes to output key value pairs in the order specified on the test. Well defined output order is common, but strictly not universal. Dave Beech (our newest committer) contributed a patch so you optionally turn this ordered requirement off by using:

.runTest(false)

instead of

.runTest()

Driver.runTest throws RuntimeException should it throw AssertionError

Previous versions of MRUnit threw a RuntimeException when a test failed. This worked well, but it meant that testing frameworks saw the the test as having erred, not failed. We have changed this to AssertionError so that testing frameworks see the tests as failed. The distinction is small but important.

o.a.h.mrunit.mapreduce.MapReduceDriver should support a combiner

Previously the MRUnit only supported a combiner in the mapred MapReduceDriver class but now the mapreduce MapReduceDriver also supports a combiner by:

MapReduceDriver.newMapReduceDriver(mapper, reducer, combiner)

or

.withCombiner(combiner) or .setCombiner(combiner)

Better support for other serializations besides Writable

Previous versions of MRUnit did not support JavaSerialization, Avro or other Serialization frameworks well. We improved alternative serialization support by not forcing K2 in MapReduceDriver to be Comparable and supporting serializations that cannot clone into a object or that do not have default constructors.

Better error messages from validate, null checking and forgetting to set mappers and reducers

We have improved checking of parameters passed to MRUnit and the error messages when the parameters are invalid including throwing NullPointerException immediately when receiving a null value and throwing a IllegalStateExcpetion when no mapper or reducer class is provided instead of a NullPointerException.

add static convenience methods to PipelineMapReduceDriver class

add static convenience constructors similar to those in the other driver classes:

PipelineMapReduceDriver.newPipelineMapReduceDriver()

or

PipelineMapReduceDriver.newPipelineMapReduceDriver(list of Pair<Mapper, Reducer>)

Test and Deprecate Driver.{*OutputFromString,*InputFromString} Methods

The OutputFromString and InputFromString methods are now deprecated because they required Text inputs or outputs with no way to enforce that the inputs or outputs from a mapper or reducer were actually Text. These methods also provided little convenience as a user can just pass the string they intended to new Text(string)

Wednesday March 14, 2012

Apache MRUnit 0.8.1-incubating has been released!

We (the Apache MRUnit team) have just released Apache MRUnit 0.8.1-incubating. Apache MRUnit is an Apache Incubator project. MRUnit is a Java library that helps developers unit test Apache Hadoop MapReduce jobs. Unit testing is a technique for improving project quality and reducing overall costs by writing a small amount of code that can automatically verify the software you write performs as intended. This is considered a best practice in software development since it helps identify defects early, before they're deployed to a production system.

The MRUnit project is actively looking for contributors, even ones brand new to the world of open source software. There are many ways to contribute: documentation, bug reports, blog articles, etc. If you are interested but have no idea where to start, please email brock at cloudera dot com. If you are an experienced open source contributor, the MRUnit wiki explains How you can Contribute.

The new release includes one bug fix and a number of new features including, but not limited to:

Support Hadoop 0.20 and 0.23

In MAPREDUCE-954 Hadoop changed the public org.apache.hadoop.mapreduce API. The MRUnit team in MRUNIT-31 was able to provide one release artifact which supports both versions of the API. MRUNIT-56 provided three maven classifiers, hadoop020, hadoop023, and hadoop100 to provide easy dependency on the correct binary.

Improved test failure messages

Improving the test case failure messages is how I became involved in the project. Before this change you had to look at your console to see what errors had occurred. Now, when a test fails the error message contains an explanation of exactly what failed:

java.lang.RuntimeException: 1 Error(s): (Missing expected output (cat1, 1) at position 0.)
	at org.apache.hadoop.mrunit.TestDriver.validate(TestDriver.java:194)
	at org.apache.hadoop.mrunit.MapDriverBase.runTest(MapDriverBase.java:186)
	at TestWordCount.testMapper(TestWordCount.java:34)

Static driver factories

Static driver factories are an addition which reduce much the weight imposed by Java generics. Before this new feature, to create a driver, say MapDriver, you would have to specify all the generic types:

mapDriver = new MapDriver<LongWritable, Text, Text, IntWritable>();

Now, the generic types are inferred and you can simply specify:

mapDriver = MapDriver.newMapDriver();

This should provide users with the ability to write simpler code. Additionally, using the MapDriver constructor will continue to work and is fully supported.

Download the source release, binaries via the Maven artifact and classifier, or download them directly via the Maven repo. More information on MRUnit is available on the project website.

Calendar

Search

Hot Blogs (today's hits)

Tag Cloud

Categories

Feeds

Links

Navigation