Entries tagged [hadoop]
20x faster mapreduce with gridgain-hadoop accelerator
Posted at 03:49AM Feb 11, 2015 by cos in General | |
Release of Apache Bigtop 0.8.0
This new release brings tons of new features and fixes for the beloved 100% community and open source driven big data distribution.
Among the new features:
- New Gradle based build system, with better integration into underlying Bigtop platform. Going forward it will allow us to improve build experience as well as simplify if for new and experienced users.
- Groovy runtime was added directly to the stack, which will allow to speed-up initial HDFS cluster deployment
- BigPetStore data analytics blueprints
- JDK/OpenJDK 7 support. Yes, we have completely moved away from JDK6
- Support for latest Ubuntu, OpenSUSE, and CentOS
- Removing support for some outdated distros like CentOS5, Fedora17, and Ubuntu Lucid
- The release fixes close to 200 bugs and adds a lot of new features focused on the usability and stability of the software stack
Of course this release brings its set of upgrades:
- Apache Hadoop 2.4.1
- Apache Giraph 1.1.0
- Apache Flume 184.108.40.206
- Apache Pig datafu 1.0.0
- Apache Crunch 0.10.0
- Apache HBbase 0.98.5
- Apache Hive 0.13.0
- Hue 3.6.0
- Pig 0.12
- Apache Mahout 0.9
- Apache Solr 4.6.0
- Spark 0.9.1
And as usual Apache Bigtop supports and provide convenience artifacts for a wide array of GNU/Linux distributions:
- Fedora 18
- Ubuntu 12.04 (Precise)
- Ubuntu 13.04 (Quetzal)
- Ubuntu 14.04 (Trusty Tahr)
- OpenSUSE 13.1
With convenience repositories available: from http://www.apache.org/dist/bigtop/bigtop-0.8.0/repos/
Overall Apache Bigtop 0.8.0 is a great release and is in the continuity of what the previous ones brought to the field:
- Great set of features
As a developer of the distribution and a user, everything felt right and worked out of the box.
I would like to take the opportunity to
thank all the members Apache Bigtop community for all the effort put
into such a great community and distribution. And am looking forward for coming 0.9.0 release that will make the stack more focused on in-memory analytics with addition of Apache Ignite (incubating) and other improvements.
I would also like to encourage everyone to give a try to our latest release and to not hesitate to come in, participate and give any feedback on our mailing lists: http://bigtop.apache.org/mail-lists.html .[Read More]
Testing Apache Tez with Apache BigTop's new Testing Infrastructure
Apache BigTop's utilities can be consumed and reused by any hadoop distribution, not just itself. Puppet recipes, RPM specifications, and so on, can save vendors weeks or months of time if borrowed from BigTop rather than maintained in house. However, until recently, the tests were somewhat difficult to customize and hack on.
So, after alot of stimulating debate, we finally settled on our new test infrastructure. For those interested in building an integration test framework, especially a Java based application, where Java/Scala/Groovy based API calls will be important to run in an integration context, gradle based tests can be very powerful.
- You can organize gradle source trees easily, without any requirement for complex package hierarchies.
- You can dynamically add source sets without alot of boiler plate, meaning the tests can easily by extended and hacked by new engineers, devops folks, etc.
- You can still test low level java functionality easily, by adding java libraries to the classpath at runtime, without needing to compile jars and manage a whole maven style project.
- The test interface is easy to customize with arguments. You can parse arguments however you want.
- Gradle combines the power of groovy into a declarative language for builds
- Using something like gradle-wrapper, you can make your java based tests easy to consume by anyone, even folks outside the java community.
As an example of how to use gradle for integration tests, I'll demonstrate how we retooled the BigTop tests.
You can check out the new tests by cloning bigtop, and going into the bigtop-tests/smoke-tests directory.
First lets take a look at the overall directory structure of the testing suite.
[bigtop@sandbox smoke-tests]$ tree
│ ├── build.gradle
│ ├── conf
│ │ └── flume.conf
│ ├── log4j.properties
│ └── TestFlumeNG.groovy
│ ├── build.gradle
│ └── log4j.properties
│ ├── build.gradle
│ └── log4j.properties
│ └── build.gradle
Testing Apache Tez with Apache BigTop
So, what better way to demonstrate the flexibility of the BigTop testing suite than to use it to test another tool, native to another distribution : Apache Tez on Hortonworks HDP !
The code for these tests is in this jira which also has the patch to add a simple Tez test to bigtop attached to it.
How it works
Its pretty simple... Above we can see that each ecosystem component has a "build.gradle" file. The build.gradle file contains a few dependencies, and the names of classes which it will be calling for tests. There is also a top-level build.gradle file. The job of this file is to send global parameters to the sub tests, it does no testing of its own. We do this using the "subprojects" directive. Finally, there is a settings.gradle file, that parses our input arguments to decide which tests to run.
So, how do we extend these tests? Easy !
- Pick any existing test as a template (for example, pig/) and just copy the files into a new directory.
- Create a directory, for example "tez/"
- Customize the environment variables you want defined for your test in build.gradle
- Customize the unit testing script (which uses itest and junit for assertions and running bash commands)
- Run your new tests : gradle clean compileGroovy -Dsmoke.tests=tez --info
There it is : In slightly below 100 lines of code, by adding two simple files, we were able to add a new test the bigtop test suite. Note that we didn't have to edit a single existing file to run this test, rather, we just dumped some groovy scripts into a new directory, and gradle discovered, ran the tests for us, and created a nice little html report as well, which is now available in ./build/reports/tests/index.html. Gradle also injected the inherited dependencies for us, and did some basic sanity checking as well.
We can see that, the original test ram a MapReduce job - but after turning tez on, indeed our job UI shows that we can now test Tez using BigTop's test framework.
Posted at 02:52AM Aug 28, 2014 by Jay Vyas in General | |
BigTop hackathon at Hadoop Summit !
Time for another hackathon.
There are alot of companies who contribute to BigTop. Pivotal, Cloudera, Red Hat, WanDisco, Amazon and so on... if I left yours out feel free to leave a comment below and I'll update this post. And today, we are proud to announce that Red Hat is hosting the next bigtop hackathon, immediately following Hadoop Summit 2014, in San Jose. Hadoop is about alot more than just soucre code - its about packaging, deployment, configuration, and so on. And BigTop has embraced the difficult task of tying all this together.
Apache BigTop makes Hadoop deployment transparent
All source code is complex, regardless of the language. But whats even more complex is the deployment of code on a distributed system. While vendors have come a long way making it easy to DEPLOY hadoop with black box administrative or cloud tools, nobody has really opened up hadoop by building a culture around the deployment of it.
Enter Apache BigTop.
BigTop contains all the stuff thats not in the hadoop docs. For example:
- Puppet modules for installation and configuration of hadoop without using a tarball.
- Vagrant recipes for deployment-from-zero.
- Smoke tests for fine grained testing.
- The intersection of Java and RPM .
BigTop is working to embrace HCFS
The community has put alot of work into testing different hadoop stacks on different file systems (https://wiki.apache.org/hadoop/HCFS/Progress), and the bigtop community has embraced this effort - to their own higher cost of having to support a generic filesystem deployment, and also, at the cost of alot of JIRA reviewing. For example, with the recent BIGTOP-952 and BIGTOP-1200 JIRAs, we're now packaging HDFS independent artifacts into BigTop. That paves the way for more competition, more choice, and more hadoop hacking - which ultimately translates to a better end-user experiences, around hadoop.
BigTop builds OS+Admin freindly packages for emerging ecosystem projects, fast !
If you compare apache bigtop with other hadoop vendor distributions, you'll find that it is the bleeding edge. For example, you can watch this recent video demonstration of spinning up Storm on BigTop, from ApacheCon 2014: https://www.youtube.com/watch?v=VZzJxsMJahc, to see just how easy it is to deploy spark out of the box using BigTop's deployment recipes. As new projects come forward in the upstream, the first place to put them is into apache BigTop. This means that if you want to try out a new animal in hadoop's stack, you can easily do so with the bigtop stack. And again : the infrastructure around vagrant makes it easy to build maintainable VM workflows around hadoop app and distribution development tasks, which easily be modified to include/exlude whichever bleeding edge packages. Think of bigtop's approach to packaging and deployment as a lower-level version of apache ambari.
Sounds Interesting? come to the hackathon in Mountain View after Hadoop Summit !
So this is all prelude to the BigTop hackathon that we are hosting at Red Hat. The focus will be on hacking - not presentations. But that doesn't mean you have to be an expert to get involved. Coming to this hackathon will give you a chance to pair program with the BigTop commiters, and try your hand at a working directly on a JIRA. I think most would agree that hacking around on apache bigtop is an excellent introduction to that hadoop ecosystem.
The NEXT HACKATHON will be from June 6th - JUNE 9th at the Red Hat Offices in Mountain View California. For more details, ping us on the BigTop mailing list and check the meetup URL : http://www.meetup.com/Bay-Area-Bigtop-Meetup/events/184893732/ .
See you there !
Posted at 03:46PM May 29, 2014 by Jay Vyas in General | |
Getting involved with BigTop packaging
To get a feel for the need that bigtop packaging of hadoop components is all about, I suggest checking out Roman's puppetcon bigtop talk a few years back.
The thrust of this talk is that that we need to bring the uniformity to the hadoop ecosystem, and ease of use for end users of hadoop. To me an important first step down this path, is bringing the Java community in-line with what packaging is really all about and why it makes it easier to maintain complex systems.
As a Java/Maven guy, wrapping my head around "packaging" has been a little tricky... And according to stephen r. covey, change begins on the inside :).
How would YOU package hadoop as an RPM ?
The thought of this is pretty daunting, and its really interesting to see how this is solved in bigtop. I've begin documenting my current adventures into the world of RPMs, packaging, and BigTop. I've just begin to scratch the surface of all of the services, users, binaries, and security features associated with a basic RPM hadoop installation, and it will probably be a while before I fully understand how it all really works.
So in the meanwhile, lets learn about hadoop packaging with a simpler project... Apache Mahout.
Here are the packaging resources for mahout inside of bigtop:
Above you can see that there are three main components to packaging of mahout.
1) The "do-component-build" file.
2) The "install_mahout.sh" file.
3) The rpm file "mahout.spec", which actually uses these two components to do its work.
The do-component-build builds the raw mahout artifact directly from source. You can see the java specific details of mahout compilation in there.
. `dirname $0`/bigtop.bom
mvn clean install -Dmahout.skip.distribution=false -DskipTests -Dhadoop2.version=$HADOOP_VERSION "$@"
for i in distribution/target/mahout*.tar.gz ; do
tar -C build --strip-components=1 -xzf $i
Meanwhile, install_mahout.sh contains the actual logic of how and where mahout jars will go, and a snippet that writes out the mahout startup shell script /usr/bin/mahout.
# Copy in the /usr/bin/mahout wrapper
install -d -m 0755 $PREFIX/$BIN_DIR
cat > $PREFIX/$BIN_DIR/mahout <<EOF
# Autodetect JAVA_HOME if not defined
# FIXME: MAHOUT-994
# FIXME: the following line is a workaround for BIGTOP-259
export HADOOP_CLASSPATH="`echo /usr/lib/mahout/mahout-examples-*-job.jar`":\$HADOOP_CLASSPATH
exec $INSTALLED_LIB_DIR/bin/mahout "\$@"
chmod 755 $PREFIX/$BIN_DIR/mahout
Anyways, hope this quick tour helps those who are trying to get involved with the bigtop packaging process. It took me a few days to understand how it all works, because after all, packaging software is an intrinsically complex task. But thankfully, there are TONS of examples of how to package all the different players of the hadoop ecosystem underneath bigtop-packages/src which can easily help you get started.
Posted at 07:46PM May 01, 2014 by Jay Vyas in General | |
BigTop: the way to grow open Hadoop stack acceptance
BigTop is stepping up in its role as the foundation of a standard Hadoop-based data analytics stack, essentially bringing most of the commercial offering to the standard footing.
What is Bigtop, and Why Should You Care?
Ever since Apache Bigtop entered an incubation, we've been answering a very basic question: what exactly is Bigtop and why should you or anyone in the Apache (or Hadoop) community care. The earliest and the most succinct answer (the one used for the Apache Incubator proposal) simply stated that "Bigtop is a project for the development of packaging and tests of the Hadoop ecosystem". That was a nice explanation of how Bigtop relates to the rest of the Apache Software Foundation's (ASF) Hadoop ecosystem projects, yet it doesn't really help you understand the aspirations of Bigtop that go beyond what the ASF has traditionally done.[Read More]
Posted at 03:09AM Jul 08, 2012 by rvs in General | |
Bigtop presents full stack based on Apache Hadoop 1.0
First ever full stack of Hadoop 1.0 has been just released. It includes all data analytics components like Hive, HBase, Pig, Mahout and my more. The release is available for immediate download from all ASF mirrors for all major Linux distributions: Ubuntu, Fedora, CentOS, Suse.
Posted at 05:06PM Apr 02, 2012 by cos in General | |
All you wanted to know about Hadoop, but were too afraid to ask: genealogy of elephants.
Lining up versions of Hadoop and making sense of all of them and their relations can be quit difficult.
This article attempts to address the moot points and help you understand the "bigger picture" - literally.
Conception and validation of Hadoop BigData stack.
What is BigTop project? What are the goals and how it is getting to achieve it? What are the roots and founding ideas of the project?
I think you'll find the answers for these questions in what hopefully became a series of helpful posts helping IT professionals with Hadoop stack deployment and adoption.
Posted at 12:59AM Dec 28, 2011 by cos in General | |