Apache Sqoop

Monday April 02, 2012

Apache Sqoop Graduates from Incubator

Apache Sqoop is a tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases. You can use Sqoop to import data from external structured datastores into Hadoop Distributed File System or related systems like Hive and HBase. Conversely, Sqoop can be used to extract data from Hadoop and export it to external structured datastores such as relational databases and enterprise data warehouses.

In its monthly meeting in March of 2012, the board of Apache Software Foundation (ASF) resolved to grant a Top-Level Project status to Apache Sqoop, thus graduating it from the Incubator. This is a significant milestone in the life of Sqoop, which has come a long way since its inception almost three years ago. The following figure offers a brief overview of what has happened in the life of Sqoop so far:

Apache Sqoop: Timeline

Figure 1: A timeline of Sqoop Project

Sqoop started as a contrib module for Apache Hadoop in May of 2009, first submitted as a patch to HADOOP-5815 by Aaron Kimball. Over the course of next year, it saw about 56 patches submitted towards its development. Given the inertia of large projects, Aaron decided to decouple it from Hadoop and host it elsewhere to facilitate faster development and release cycles. Consequently, in April of 2010 Sqoop was taken out from Hadoop via MAPREDUCE-1644 and hosted on GitHub by Cloudera as an Apache Licensed project.

Over the course of next year, Sqoop saw wide adoption along with four releases and 191 patches. An extension API was introduced early in Sqoop that allowed the development of high-speed third party connectors for rapid data transfer from specialized systems such as enterprise data warehouses. As a result, multiple connectors were developed by various vendors that plugged into Sqoop. To bolster this fledgling community of users and third party connector vendors, Cloudera decided to propose it for incubation in Apache. Sqoop was accepted for incubation by the Apache Incubator in June of 2011.  

Inside the Incubator, Sqoop saw a healthy growth in its community and gained four new committers. With active community and committers, Sqoop made two incubating releases. The focus of its first release was migration of code from com.cloudera.sqoop namespace to org.apache.sqoop while preserving backward compatibility. Thanks to phenomenal work by Bilung Lee, the release manager of the first incubating release, this release met all of its expectations. The second incubating release of Sqoop focused on its interoperability with various versions of Hadoop. The release manager of this release - Jarek Jarcec Cecho - was instrumental in making sure that it delivered to this requirement and could work with Hadoop versions 0.20, 0.23 and 1.0. Along with the stated goals of these incubating releases, Sqoop saw a steady growth with 116 patches by various contributors and committers. With excellent mentorship by Patrick Hunt, other mentors of the project, and from Incubator PMC members, Sqoop acquired the ability to self-govern, follow the ASF policies and guidelines, and, foster and grow the community.

Sqoop successfully graduated from the Incubator in March of 2012 and is now a Top-Level Apache project. You can download its latest release artifacts by visiting http://sqoop.apache.org/.

While Sqoop has no doubt delivered significant value to the community of users, it is fair to say that it is in the early stages of fulfilling requirements of data integration around Hadoop. Work has started towards the development of next major revision of Sqoop which will address more of these requirements than before. Along the way, we are looking forward to grow the community many folds, get more committers on board, and solve some real challenging problems of data movement between Hadoop and external systems. We sincerely hope you will join us in taking Sqoop towards fulfilling all these goals and to become a standard component in Hadoop deployments everywhere.


Why sqoop can not export data from hive to mysql ?

Posted by chendong on May 04, 2012 at 01:54 AM PDT #

no ... we can export data from hive to mysql

Posted by Ramanjaneya Reddy M on December 09, 2012 at 09:15 PM PST #

Does Sqoop connect to SQL server 2012 (From Apache Hadoop on Linux) as it does for SQL server 2008 R2?

Posted by Parry on February 10, 2013 at 09:16 PM PST #

Can Sqoop be used to transfer sql data from a remote cluster into HDFS? Or does the sql database have to reside on the same hardware?

Posted by HaDERP on January 16, 2014 at 02:02 PM PST #

How to run sqoop commands on java program?

Posted by mazwana on January 22, 2015 at 04:55 AM PST #

Sqoop2 - Mysql is not supported at start service in sqoop.propertie Sqoop2 - fail on hadoop 2.6.0 in no such method exception. java.lang.NoSuchMethodError: org.apache.http.client.utils.URLEncodedUtils.par org.apache.hadoop.security.token.delegation.web.ServletUtils.getParameter(ServletUtils.java:48)

Posted by on April 26, 2015 at 04:16 AM PDT #

Can Sqoop export blob type from HDFS to Mysql? I have a table with blob type column, and I can import it to HDFS, but when export it back it raises java.lang.CloneNotSupportedException: com.cloudera.sqoop.lib.BlobRef

Posted by lizhen05 on June 04, 2015 at 12:42 AM PDT #

Can Sqoop export data from hive to hbase?

Posted by sunzhandong on January 04, 2017 at 05:50 PM PST #

Sqoop is cool!

Posted by Travis Cunningham on August 21, 2018 at 07:00 AM PDT #

How to change the isolation mode used by sqoop while connection to metastore?

Posted by Purna on October 03, 2018 at 03:26 PM PDT #

can sqoop 1.4.7 work with hbase 2.0.4 and hadoop 3.1.1 ?

Posted by lucky on March 01, 2019 at 01:58 PM PST #

Does Sqoop work with Apache Spark? (Spark on YARN)

Posted by Hindra on June 17, 2019 at 10:43 AM PDT #

issues with data transfer to mysql..

Posted by Dynnex on June 22, 2019 at 11:07 AM PDT #

What is the default file format to import data using Apache Sqoop? Trying to resolve an import issue for https://askhighroller.com/ I have added a field to an SQL table, and it shows data that I have excluded from the file. What should I do?

Posted by HighRoller on June 25, 2019 at 07:07 AM PDT #

Hi! Great!!! What about RSLs? is the Flex caching mechanism still supported?

Posted by film izle on June 29, 2019 at 01:07 PM PDT #

Your entry is so helpful for me and i am waiting for your next post.

Posted by command prompt windows on June 30, 2019 at 11:01 PM PDT #

Very creative compositions! WOW Great work!

Posted by trial packs on July 20, 2019 at 08:20 AM PDT #

I had many doubts regarding this topic and I was able to clarify all my doubts regarding this topic. It helped me a lot in finding many new terms and I was able to understand easily the language which is used in it. http://hostsailorservers.com

Posted by moses antony on January 15, 2020 at 03:11 AM PST #

I found your this post while searching for information about blog-related research ... It's a good post .. keep posting and updating information. https://thefancyvoyager.com

Posted by sawenis on February 06, 2020 at 09:39 AM PST #

Post a Comment:
Comments are closed for this entry.



Hot Blogs (today's hits)

Tag Cloud