Entries tagged [bigtop]

Wednesday Feb 11, 2015

20x faster mapreduce with gridgain-hadoop accelerator

The article will show you how to speed-up your existing MapReduce code using a new Hadoop Accelerator by GridGain. The Accelerator is now available as a part of Apache Ignite (incubating).[Read More]

Tuesday Dec 02, 2014

Release of Apache Bigtop 0.8.0

This new release brings tons of new features and fixes for the beloved 100% community and open source driven big data distribution.

Among the new features:

  • New Gradle based build system, with better integration into underlying Bigtop platform. Going forward it will allow us to improve build experience as well as simplify if for new and experienced users.
  • Groovy runtime was added directly to the stack, which will allow to speed-up initial HDFS cluster deployment
  • BigPetStore data analytics blueprints
  • JDK/OpenJDK 7 support. Yes, we have completely moved away from JDK6
  • Support for latest Ubuntu, OpenSUSE, and CentOS
  • Removing support for some outdated distros like CentOS5, Fedora17, and Ubuntu Lucid
  • The release fixes close to 200 bugs and adds a lot of new features focused on the usability and stability of the software stack

Of course this release brings its set of upgrades:

  • Apache Hadoop 2.4.1
  • Apache Giraph 1.1.0
  • Apache Flume 1.5.0.1
  • Apache Pig datafu 1.0.0
  • Apache Crunch 0.10.0
  • Apache HBbase 0.98.5
  • Apache Hive 0.13.0
  • Hue 3.6.0
  • Pig 0.12
  • Apache Mahout 0.9
  • Apache Solr 4.6.0
  • Spark 0.9.1

And as usual Apache Bigtop supports and provide convenience artifacts for a wide array of GNU/Linux distributions:

  • Centos6
  • Fedora 18
  • Ubuntu 12.04 (Precise)
  • Ubuntu 13.04 (Quetzal)
  • Ubuntu 14.04 (Trusty Tahr)
  • OpenSUSE 13.1
  • SLES11

With convenience repositories available: from http://www.apache.org/dist/bigtop/bigtop-0.8.0/repos/

Overall Apache Bigtop 0.8.0 is a great release and is in the continuity of what the previous ones brought to the field:

  • Stability
  • Reliability
  • Great set of features

As a developer of the distribution and a user, everything felt right and worked out of the box.

I would like to take the opportunity to thank all the members Apache Bigtop community for all the effort put into such a great community and distribution. And am looking forward for coming 0.9.0 release that will make the stack more focused on in-memory analytics with addition of Apache Ignite (incubating) and other improvements.

I would also like to encourage everyone to give a try to our latest release and to not hesitate to come in, participate and give any feedback on our mailing lists: http://bigtop.apache.org/mail-lists.html  .

[Read More]

Thursday May 29, 2014

BigTop hackathon at Hadoop Summit !

Time for another hackathon.

There are alot of companies who contribute to BigTop.  Pivotal, Cloudera, Red Hat, WanDisco, Amazon and so on... if I left yours out feel free to leave a comment below and I'll update this post.  And today, we are proud  to announce that Red Hat is hosting the next bigtop hackathon, immediately following Hadoop Summit 2014, in San Jose.  Hadoop is about alot more than just soucre code - its about packaging, deployment, configuration, and so on.  And BigTop has embraced the difficult task of tying all this together.

Apache BigTop makes Hadoop deployment transparent

All source code is complex, regardless of the language.  But whats even more complex is the deployment of code on a distributed system.  While vendors have come a long way making it easy to DEPLOY hadoop with black box administrative or cloud tools, nobody has really opened up hadoop by building a culture around the deployment of it.

Enter Apache BigTop. 

BigTop contains all the stuff thats not in the hadoop docs.  For example:

  • Puppet modules for installation and configuration of hadoop without using a tarball.
  • Vagrant recipes for deployment-from-zero.
  • Smoke tests for fine grained testing.
  • The intersection of Java and RPM .
  • bigpetstore app for demonstrating to the business community how to actually use hadoop, gradle, pig, and google's javascript visualization widgets tools to build, test, and deploy a reference "hadoop app".

BigTop is working to embrace HCFS

The community has put alot of work into testing different hadoop stacks on different file systems (https://wiki.apache.org/hadoop/HCFS/Progress), and the bigtop community has embraced this effort - to their own higher cost of having to support a generic filesystem deployment, and also, at the cost of alot of JIRA reviewing.  For example, with the recent BIGTOP-952 and BIGTOP-1200 JIRAs, we're now packaging HDFS independent artifacts into BigTop.  That paves the way for more competition, more choice, and more hadoop hacking - which ultimately translates to a better end-user experiences, around hadoop.

BigTop builds OS+Admin freindly packages for emerging ecosystem projects, fast !

If you compare apache bigtop with other hadoop vendor distributions, you'll find that it is the bleeding edge.  For example, you can watch this recent video demonstration of spinning up Storm on BigTop, from ApacheCon 2014: https://www.youtube.com/watch?v=VZzJxsMJahc, to see just how easy it is to deploy spark out of the box using BigTop's deployment recipes.  As new projects come forward in the upstream, the first place to put them is into apache BigTop.  This means that if you want to try out a new animal in hadoop's stack, you can easily do so with the bigtop stack.  And again : the infrastructure around vagrant makes it easy to build maintainable VM workflows around hadoop app and distribution development tasks, which easily be modified to include/exlude whichever bleeding edge packages.   Think of bigtop's approach to packaging and deployment as a lower-level version of apache ambari.

Sounds Interesting? come to the hackathon in Mountain View after Hadoop Summit  ! 

So this is all prelude to the BigTop hackathon that we are hosting at  Red Hat.  The focus will be on hacking - not presentations.  But that doesn't mean you have to be an expert to get involved.  Coming to this hackathon will give you a chance to pair program with the BigTop commiters, and try your hand at a working directly on a JIRA.  I think most would agree that hacking around on apache bigtop is an excellent introduction to that hadoop ecosystem.  

The NEXT HACKATHON will be from June 6th - JUNE 9th at the Red Hat Offices in Mountain View California.  For more details, ping us on the BigTop mailing list and check the meetup URL : http://www.meetup.com/Bay-Area-Bigtop-Meetup/events/184893732/ .


See you there ! 

Thursday May 01, 2014

Getting involved with BigTop packaging

To get a feel for the need that bigtop packaging of hadoop components is all about, I suggest checking out Roman's puppetcon bigtop talk a few years back. 

The thrust of this talk is that that we need to bring the uniformity to the hadoop ecosystem, and ease of use for end users of hadoop. To me an important first step down this path, is bringing the Java community in-line with what packaging is really all about and why it makes it easier to maintain complex systems. 

As a Java/Maven guy, wrapping my head around "packaging" has been a little tricky... And according to stephen r. covey, change begins on the inside :).   

How would YOU package hadoop as an RPM ?

The thought of this is pretty daunting, and its really interesting to see how this is solved in bigtop.  I've begin documenting my current adventures into the world of RPMs, packaging, and BigTop.  I've just begin to scratch the surface of all of the services, users, binaries, and security features associated with a basic RPM hadoop installation, and it will probably be a while before I fully understand how it all really works. 

So in the meanwhile, lets learn about hadoop packaging with a simpler project... Apache Mahout.

Here are the packaging resources for mahout inside of bigtop:

common/mahout/
├── do-component-build
└── install_mahout.sh

...
bigtop-packages/rpm/mahout/SPECS/mahout.spec

Above you can see that there are three main components to packaging of mahout.

1) The "do-component-build" file.

2) The "install_mahout.sh" file.

3) The rpm file "mahout.spec", which actually uses these two components to do its work.

The do-component-build builds the raw mahout artifact directly from source.  You can see the java specific details of mahout compilation in there. 


set -ex

. `dirname $0`/bigtop.bom

mvn clean install -Dmahout.skip.distribution=false -DskipTests -Dhadoop2.version=$HADOOP_VERSION "$@"
mkdir build
for i in distribution/target/mahout*.tar.gz ; do
  tar -C build --strip-components=1 -xzf $i
done


Meanwhile, install_mahout.sh contains the actual logic of how and where mahout jars will go, and a snippet that writes out the mahout startup shell script /usr/bin/mahout.


# Copy in the /usr/bin/mahout wrapper
install -d -m 0755 $PREFIX/$BIN_DIR
cat > $PREFIX/$BIN_DIR/mahout <<EOF

#!/bin/bash

# Autodetect JAVA_HOME if not defined
. /usr/lib/bigtop-utils/bigtop-detect-javahome

# FIXME: MAHOUT-994
export HADOOP_HOME=\${HADOOP_HOME:-/usr/lib/hadoop}
export HADOOP_CONF_DIR=\${HADOOP_CONF_DIR:-/etc/hadoop/conf}

export MAHOUT_HOME=\${MAHOUT_HOME:-$INSTALLED_LIB_DIR}
export MAHOUT_CONF_DIR=\${MAHOUT_CONF_DIR:-$CONF_DIR}
# FIXME: the following line is a workaround for BIGTOP-259
export HADOOP_CLASSPATH="`echo /usr/lib/mahout/mahout-examples-*-job.jar`":\$HADOOP_CLASSPATH
exec $INSTALLED_LIB_DIR/bin/mahout "\$@"
EOF
chmod 755 $PREFIX/$BIN_DIR/mahout


Anyways, hope this quick tour helps those who are trying to get involved with the bigtop packaging process.  It took me a few days to understand how it all works, because after all, packaging software is an intrinsically complex task.  But thankfully, there are TONS of examples of how to package all the different players of the hadoop ecosystem underneath bigtop-packages/src which can easily help you get started.


Wednesday Mar 26, 2014

Bigtop events at ApacheCon 2014, Denver, CO

Details of Bigtop meetup and hackathon durung ApacheCon 2014[Read More]

Wednesday Nov 06, 2013

Release of Apache Bigtop 0.7.0

Exciting times for the Apache Bigtop community and the Apache Hadoop ecosystem at large with the release of the brand new Apache Bigtop 0.7.0!

This new release brings tons of new features and fixes for the beloved 100% community and open source driven big data distribution.

Among the new features:

  • Addition of Phoenix, the Apache HBase SQL layer project

  • Addition of Apache Spark (incubating), the general-purpose cluster computing system

  • A brand new SolrCloud 4.5 integration with HDFS

  • Improvement to our puppet recipes so setting up your Apache Bigtop toolchain will be easier than ever

  • Addition of a new script init-hdfs.sh to initialize Apache HDFS filesystem structure so you can get up and running without manually setting up a complex directory structure

  • Add flexibility in the way Apache Oozie manages its Apache Tomcat application

  • Add a new standard location to install SQL connectors and other plug-ins


Also this releases fixes:

  • Libsnappy is now included in Apache Hadoop packages

  • Clean up some dependencies throughout our packages

  • Update to our documentation


Of course this release brings its set of upgrades:

  • Apache Hadoop 2.0.6-alpha

  • Apache Giraph 1.0.0

  • Apache Flume 1.4.0

  • Apache Pig datafu 1.0.0

  • Apache Crunch 0.7.0

  • Apache HBbase 0.94.12

  • Apache Hive 0.11.0

  • Hue 2.5.1

  • Apache Mahout 0.7.5

  • Apache Solr 4.5.0


And as usual Apache Bigtop supports and provide convenience artifacts for a wide array of GNU/Linux distributions:

  • Centos 5

  • Centos 6

  • Fedora 17

  • Fedora 18

  • Ubuntu Lucid

  • Ubuntu Precise

  • Ubuntu Quetzal

  • OpenSUSE 12

  • SLES11

With convenience repositories available: from http://www.apache.org/dist/bigtop/bigtop-0.7.0/repos/


Overall Apache Bigtop 0.7.0 is a great release and is in the continuity of what the previous ones brought to the field:

  • Stability

  • Reliability

  • Feature rich

The first release candidate uncovered a few issues which were promptly fixed for the following and last release candidate RC1.

As a developer of the distribution and a user, everything felt right and worked out of the box. I was even able to set up in a very short time an Apache Hadoop cluster shipping its logs to Apache Flume and aggregating them into an ElasticSearch cluster. Everything being visualized with Kibana.


I would like to take the opportunity to thank all the members Apache Bigtop community for all the effort put into such a great community and distribution.

I would also like to encourage everyone to give a try to our latest release and to not hesitate to come in, participate and give any feedback on our mailing lists: http://bigtop.apache.org/mail-lists.html  .

Friday Apr 19, 2013

BigTop: the way to grow open Hadoop stack acceptance

BigTop is stepping up in its role as the foundation of a standard Hadoop-based data analytics stack, essentially bringing most of the commercial offering to the standard footing.
[Read More]

Sunday Jul 08, 2012

What is Bigtop, and Why Should You Care?

Ever since Apache Bigtop entered an incubation, we've been answering a very basic question: what exactly is Bigtop and why should you or anyone in the Apache (or Hadoop) community care. The earliest and the most succinct answer (the one used for the Apache Incubator proposal) simply stated that "Bigtop is a project for the development of packaging and tests of the Hadoop ecosystem". That was a nice explanation of how Bigtop relates to the rest of the Apache Software Foundation's (ASF) Hadoop ecosystem projects, yet it doesn't really help you understand the aspirations of Bigtop that go beyond what the ASF has traditionally done.

[Read More]

Monday Apr 02, 2012

Bigtop presents full stack based on Apache Hadoop 1.0

First ever full stack of Hadoop 1.0 has been just released. It includes all data analytics components like Hive, HBase, Pig, Mahout and my more. The release is available for immediate download from all ASF mirrors for all major Linux distributions: Ubuntu, Fedora, CentOS, Suse.
[Read More]

Wednesday Dec 28, 2011

Conception and validation of Hadoop BigData stack.

What is BigTop project? What are the goals and how it is getting to achieve it? What are the roots and founding ideas of the project?

I think you'll find the answers for these questions in what hopefully became a series of helpful posts helping IT professionals with Hadoop stack deployment and adoption.

[Read More]

Calendar

Search

Hot Blogs (today's hits)

Tag Cloud

Categories

Feeds

Links

Navigation