MRUnit
Apache MRUnit is now a TLP (Top Level Project)
We (the Apache MRUnit team) have just graduated from the Apache Incubator to an Apache TLP (Top Level Project)! MRUnit is a Java library that helps developers unit test Apache Hadoop MapReduce jobs. Unit testing is a technique for improving project quality and reducing overall costs by writing a small amount of code that can automatically verify the software you write performs as intended. This is considered a best practice in software development since it helps identify defects early, before they're deployed to a production system.
In its monthly meeting in May of 2012, the board of Apache Software Foundation (ASF) resolved to grant a Top-Level Project status to Apache MRUnit, thus graduating it from the Incubator. This is a significant milestone in the life of MRUnit, which has come a long way since its inception as a Hadoop Contrib project in HADOOP-5518 contributed by Aaron Kimball.
- May 2012 MRUnit graduates from the Incubator to become a TLP
- May 2012 Version 0.9.0-incubating released.
- April 2012 Dave Beech added as a new committer.
- April 2012 Jarek Jarcec Cecho added as a new committer.
- April 2012 New website created using the CMS.
- March 2012 Version 0.8.1-incubating released.
- March 2012 Jim Donofrio added as a new committer.
- Feburary 2012 Version 0.8.0-incubating released.
- November 2011 Version 0.5.0-incubating released.
- October 2011 Brock Noland added as a new committer.
- March 2011 Project enters incubation.
- April 2009 Doug Cutting commits Aaron's patch to Hadoop
- March 2009 Aaron Kimball contributes MRunit to Hadoop as a contrib project
Below is the graduation resolution:
X. Establish the Apache MRUnit Project WHEREAS, the Board of Directors deems it to be in the best interests of the Foundation and consistent with the Foundation's purpose to establish a Project Management Committee charged with the creation and maintenance of open-source software related to unit testing Apache Hadoop map reduce jobs for distribution at no charge to the public. NOW, THEREFORE, BE IT RESOLVED, that a Project Management Committee (PMC), to be known as the "Apache MRUnit Project", be and hereby is established pursuant to Bylaws of the Foundation; and be it further RESOLVED, that the Apache MRUnit Project be and hereby is responsible for the creation and maintenance of software related to unit testing Apache Hadoop map reduce jobs; and be it further RESOLVED, that the office of "Vice President, Apache MRUnit" be and hereby is created, the person holding such office to serve at the direction of the Board of Directors as the chair of the Apache MRUnit Project, and to have primary responsibility for management of the projects within the scope of responsibility of the Apache MRUnit Project; and be it further RESOLVED, that the persons listed immediately below be and hereby are appointed to serve as the initial members of the Apache MRUnit Project: * Brock Noland - brock@apache.org * Patrick Hunt - phunt@apache.org * Nigel Daley - nigel@apache.org * Eric Sammer - esammer@apache.org * Aaron Kimball - kimballa@apache.org * Konstantin Boudnik - cos@apache.org * Garrett Wu - gwu@apache.org * Jim Donofrio - jdonofrio@apache.org * Jarek Jarcec Cecho - jarcec@apache.org * Dave Beech - dbeech@apache.org NOW, THEREFORE, BE IT FURTHER RESOLVED, that Brock Noland be appointed to the office of Vice President, Apache MRUnit, to serve in accordance with and subject to the direction of the Board of Directors and the Bylaws of the Foundation until death, resignation, retirement, removal or disqualification, or until a successor is appointed; and be it further RESOLVED, that the initial Apache MRUnit PMC be and hereby is tasked with the creation of a set of bylaws intended to encourage open development and increased participation in the Apache MRUnit Project; and be it further RESOLVED, that the Apache MRUnit Project be and hereby is tasked with the migration and rationalization of the Apache Incubator MRUnit podling; and be it further RESOLVED, that all responsibilities pertaining to the Apache Incubator MRUnit podling encumbered upon the Apache Incubator Project are hereafter discharged.
Posted at 09:23PM May 23, 2012
by jdonofrio in General |
|
Apache MRUnit 0.9.0-incubating has been released!
We (the Apache MRUnit team) have just released Apache MRUnit 0.9.0-incubating (tarball, nexus, javadoc). Apache MRUnit is an Apache Incubator project. MRUnit is a Java library that helps developers unit test Apache Hadoop MapReduce jobs. Unit testing is a technique for improving project quality and reducing overall costs by writing a small amount of code that can automatically verify the software you write performs as intended. This is considered a best practice in software development since it helps identify defects early, before they're deployed to a production system.
The MRUnit project is quite active, 0.9.0 is our fourth release since entering the incubator and we have added 4 new committers beyond the projects initial charter! We are very interested in having new contributors and committers join the project! Please join our mailing list to find out how you can help!
The MRUnit build process has changed to produce mrunit-0.9.0-hadoop1.jar and mrunit-0.9.0-hadoop2.jar instead of mrunit-0.9.0-hadoop020.jar, mrunit-0.9.0-hadoop100.jar and mrunit-0.9.0-hadoop023.jar. The hadoop1 classifier is for all Apahce Hadoop versions based off the 0.20.X line including 1.0.X. The hadoop2 classifier is for all Apache Hadoop versions based off the 0.23.X line including the unreleased 2.0.X.
This release contains 6 bug fixes, 15 improvements, and 2 new features. I will highlight a few below:
- Support custom counter checking in MRUNIT-68
- runTest() should optionally ignore output order in MRUNIT-91
- Driver.runTest throws RuntimeException should it throw AssertionError in MRUNIT-54
- o.a.h.mrunit.mapreduce.MapReduceDriver should support a combiner in MRUNIT-67
- Better support for other serializations besides Writable: MRUNIT-70, MRUNIT-86, MRUNIT-99, MRUNIT-77
- Better error messages from validate, null checking and forgetting to set mappers and reducers: MRUNIT-74, MRUNIT-66, MRUNIT-65
- add static convenience methods to PipelineMapReduceDriver class in MRUNIT-89
- Test and Deprecate Driver.{*OutputFromString,*InputFromString} Methods in MRUNIT-48
Support custom counter checking
It has always been possible to check the counter values like so:
assertEquals(2, mapDriver.getCounters().findCounter(CustomMapper.CustomCounter.NAME).getValue());
but this is quite tedious. As such Jarek Jarcec Cecho (our second newest committer) added this feature directly to the drivers:
.withCounter(CustomMapper.CustomCounter.Name, 2);
runTest() should optionally ignore output order
Previous to this change MRUnit required Mapper/Reducer classes to output key value pairs in the order specified on the test. Well defined output order is common, but strictly not universal. Dave Beech (our newest committer) contributed a patch so you optionally turn this ordered requirement off by using:
.runTest(false)
instead of
.runTest()
Driver.runTest throws RuntimeException should it throw AssertionError
Previous versions of MRUnit threw a RuntimeException when a test failed. This worked well, but it meant that testing frameworks saw the the test as having erred, not failed. We have changed this to AssertionError so that testing frameworks see the tests as failed. The distinction is small but important.
o.a.h.mrunit.mapreduce.MapReduceDriver should support a combiner
Previously the MRUnit only supported a combiner in the mapred MapReduceDriver class but now the mapreduce MapReduceDriver also supports a combiner by:
MapReduceDriver.newMapReduceDriver(mapper, reducer, combiner)
or
.withCombiner(combiner) or .setCombiner(combiner)
Better support for other serializations besides Writable
Previous versions of MRUnit did not support JavaSerialization, Avro or other Serialization frameworks well. We improved alternative serialization support by not forcing K2 in MapReduceDriver to be Comparable and supporting serializations that cannot clone into a object or that do not have default constructors.
Better error messages from validate, null checking and forgetting to set mappers and reducers
We have improved checking of parameters passed to MRUnit and the error messages when the parameters are invalid including throwing NullPointerException immediately when receiving a null value and throwing a IllegalStateExcpetion when no mapper or reducer class is provided instead of a NullPointerException.
add static convenience methods to PipelineMapReduceDriver class
add static convenience constructors similar to those in the other driver classes:
PipelineMapReduceDriver.newPipelineMapReduceDriver()
or
PipelineMapReduceDriver.newPipelineMapReduceDriver(list of Pair<Mapper, Reducer>)
Test and Deprecate Driver.{*OutputFromString,*InputFromString} Methods
The OutputFromString and InputFromString methods are now deprecated because they required Text inputs or outputs with no way to enforce that the inputs or outputs from a mapper or reducer were actually Text. These methods also provided little convenience as a user can just pass the string they intended to new Text(string)
Posted at 02:02PM May 01, 2012
by brock in General |
Comments [2]
|