Apache Sqoop

Wednesday October 05, 2011

Apache Sqoop - Overview

Apache Sqoop - Overview 

Using Hadoop for analytics and data processing requires loading data into clusters and processing it in conjunction with other data that often resides in production databases across the enterprise. Loading bulk data into Hadoop from production systems or accessing it from map reduce applications running on large clusters can be a challenging task. Users must consider details like ensuring consistency of data, the consumption of production system resources, data preparation for provisioning downstream pipeline. Transferring data using scripts is inefficient and time consuming. Directly accessing data residing on external systems from within the map reduce applications complicates applications and exposes the production system to the risk of excessive load originating from cluster nodes.


This is where Apache Sqoop fits in. Apache Sqoop is currently undergoing incubation at Apache Software Foundation. More information on this project can be found at http://incubator.apache.org/sqoop.

Sqoop allows easy import and export of data from structured data stores such as relational databases, enterprise data warehouses, and NoSQL systems. Using Sqoop, you can provision the data from external system on to HDFS, and populate tables in Hive and HBase. Sqoop integrates with Oozie, allowing you to schedule and automate import and export tasks. Sqoop uses a connector based architecture which supports plugins that provide connectivity to new external systems.

What happens underneath the covers when you run Sqoop is very straightforward. The dataset being transferred is sliced up into different partitions and a map-only job is launched with individual mappers responsible for transferring a slice of this dataset. Each record of the data is handled in a type safe manner since Sqoop uses the database metadata to infer the data types.

In the rest of this post we will walk through an example that shows the various ways you can use Sqoop. The goal of this post is to give an overview of Sqoop operation without going into much detail or advanced functionality.

Importing Data

The following command is used to import all data from a table called ORDERS from a MySQL database:


---
$ sqoop import --connect jdbc:mysql://localhost/acmedb \
  --table ORDERS --username test --password ****
---

In this command the various options specified are as follows:

  • import: This is the sub-command that instructs Sqoop to initiate an import.
  • --connect <connect string>, --username <user name>, --password <password>: These are connection parameters that are used to connect with the database. This is no different from the connection parameters that you use when connecting to the database via a JDBC connection.
  • --table <table name>: This parameter specifies the table which will be imported.


The import is done in two steps as depicted in Figure 1 below. In the first Step Sqoop introspects the database to gather the necessary metadata for the data being imported. The second step is a map-only Hadoop job that Sqoop submits to the cluster. It is this job that does the actual data transfer using the metadata captured in the previous step.

Figure 1: Sqoop Import Overview

Figure 1: Sqoop Import Overview

The imported data is saved in a directory on HDFS based on the table being imported. As is the case with most aspects of Sqoop operation, the user can specify any alternative directory where the files should be populated.

By default these files contain comma delimited fields, with new lines separating different records. You can easily override the format in which data is copied over by explicitly specifying the field separator and record terminator characters.

Sqoop also supports different data formats for importing data. For example, you can easily import data in Avro data format by simply specifying the option --as-avrodatafile with the import command.

There are many other options that Sqoop provides which can be used to further tune the import operation to suit your specific requirements.

Importing Data into Hive

In most cases, importing data into Hive is the same as running the import task and then using Hive to create and load a certain table or partition. Doing this manually requires that you know the correct type mapping between the data and other details like the serialization format and delimiters. Sqoop takes care of populating the Hive metastore with the appropriate metadata for the table and also invokes the necessary commands to load the table or partition as the case may be. All of this is done by simply specifying the option --hive-import with the import command.

----
$ sqoop import --connect jdbc:mysql://localhost/acmedb \
  --table ORDERS --username test --password **** --hive-import
----

When you run a Hive import, Sqoop converts the data from the native datatypes within the external datastore into the corresponding types within Hive. Sqoop automatically chooses the native delimiter set used by Hive. If the data being imported has new line or other Hive delimiter characters in it, Sqoop allows you to remove such characters and get the data correctly populated for consumption in Hive.

Once the import is complete, you can see and operate on the table just like any other table in Hive.

Importing Data into HBase

You can use Sqoop to populate data in a particular column family within the HBase table. Much like the Hive import, this can be done by specifying the additional options that relate to the HBase table and column family being populated. All data imported into HBase is converted to their string representation and inserted as UTF-8 bytes.

----
$ sqoop import --connect jdbc:mysql://localhost/acmedb \
 --table ORDERS --username test --password **** \
--hbase-create-table --hbase-table ORDERS --column-family mysql
----

In this command the various options specified are as follows:
  • --hbase-create-table: This option instructs Sqoop to create the HBase table.
  • --hbase-table: This option specifies the table name to use.
  • --column-family: This option specifies the column family name to use.

The rest of the options are the same as that for regular import operation.

Exporting Data

In some cases data processed by Hadoop pipelines may be needed in production systems to help run additional critical business functions. Sqoop can be used to export such data into external datastores as necessary. Continuing our example from above - if data generated by the pipeline on Hadoop corresponded to the ORDERS table in a database somewhere, you could populate it using the following command:

----
$ sqoop export --connect jdbc:mysql://localhost/acmedb \
 --table ORDERS --username test --password **** \
--export-dir /user/arvind/ORDERS
----

In this command the various options specified are as follows:
  • export: This is the sub-command that instructs Sqoop to initiate an export.
  • --connect <connect string>, --username <user name>, --password <password>: These are connection parameters that are used to connect with the database. This is no different from the connection parameters that you use when connecting to the database via a JDBC connection.
  • --table <table name>: This parameter specifies the table which will be populated.
  • --export-dir <directory path>: This is the directory from which data will be exported.


Export is done in two steps as depicted in Figure 2. The first step is to introspect the database for metadata, followed by the second step of transferring the data. Sqoop divides the input dataset into splits and then uses individual map tasks to push the splits to the database. Each map task performs this transfer over many transactions in order to ensure optimal throughput and minimal resource utilization.

Figure 2: Sqoop Export Overview

Figure 2: Sqoop Export Overview

Some connectors support staging tables that help isolate production tables from possible corruption in case of job failures due to any reason. Staging tables are first populated by the map tasks and then merged into the target table once all of the data has been delivered it.

Sqoop Connectors

Using specialized connectors, Sqoop can connect with external systems that have optimized import and export facilities, or do not support native JDBC. Connectors are plugin components based on Sqoop’s extension framework and can be added to any existing Sqoop installation. Once a connector is installed, Sqoop can use it to efficiently transfer data between Hadoop and the external store supported by the connector.

By default Sqoop includes connectors for various popular databases such as MySQL, PostgreSQL, Oracle, SQL Server and DB2. It also includes fast-path connectors for MySQL and PostgreSQL databases. Fast-path connectors are specialized connectors that use database specific batch tools to transfer data with high throughput. Sqoop also includes a generic JDBC connector that can be used to connect to any database that is accessible via JDBC.

Apart from the built-in connectors, many companies have developed their own connectors that can be plugged into Sqoop. These range from specialized connectors for enterprise data warehouse systems to NoSQL datastores.

Wrapping Up

In this post you saw how easy it is to transfer large datasets between Hadoop and external datastores such as relational databases. Beyond this, Sqoop offers many advance features such as different data formats, compression, working with queries instead of tables etc. We encourage you to try out Sqoop and give us your feedback.

More information regarding Sqoop can be found at:

Project Website: http://incubator.apache.org/sqoop

Wiki: https://cwiki.apache.org/confluence/display/SQOOP

Project Status:  http://incubator.apache.org/projects/sqoop.html

Mailing Lists: https://cwiki.apache.org/confluence/display/SQOOP/Mailing+Lists

Comments:

Done with Microsofts Sqoop Driver. And sqoop let us create a jar file: http://mapredit.blogspot.com/2011/10/speedup-sqoop.html - Alex

Posted by Alexander Lorenz on October 17, 2011 at 06:41 AM PDT #

[URGENT] I have a NoSQL distributed database system. Now I want a connector which can dump my data from this system to Hadoop (HBase). Is that possible using Sqoop? Please reply as soon as possible. Regards

Posted by Vishal Agrawal on May 24, 2012 at 01:45 PM PDT #

[Trackback] If this is datawarehouse? Sqoop and Hive helping you http://ankitasblogger.blogspot.com/2012/01/sqoop-export-and-import-commands.html https://blogs.apache.org/sqoop/entry/apache_sqoop_overview

Posted by Quora on June 11, 2012 at 09:09 AM PDT #

[Trackback] HDFS - Hadoop Distributed File System Hive  Hive i

Posted by Confluence: Isabella on October 25, 2012 at 06:14 AM PDT #

[Trackback] Hadoop - whole Map/Reduce ecosystem HDFS - Hadoop

Posted by Confluence: Isabella on October 25, 2012 at 06:15 AM PDT #

[Trackback] Hadoop - whole Map/Reduce ecosystem HDFS - Hadoop

Posted by Confluence: Isabella on October 25, 2012 at 06:16 AM PDT #

[Trackback] Hadoop - whole Map/Reduce ecosystem HDFS - Hadoop

Posted by Confluence: Isabella on October 25, 2012 at 06:16 AM PDT #

[Trackback] Hadoop - whole Map/Reduce ecosystem HDFS - Hadoop

Posted by Confluence: Isabella on October 25, 2012 at 06:20 AM PDT #

[Trackback] Hadoop - whole Map/Reduce ecosystem HDFS - Hadoop

Posted by Confluence: Isabella on October 25, 2012 at 06:20 AM PDT #

[Trackback] Hadoop - whole Map/Reduce ecosystem HDFS - Hadoop

Posted by Confluence: Isabella on October 30, 2012 at 07:29 AM PDT #

[Trackback] Hadoop - whole Map/Reduce ecosystem HDFS - Hadoop

Posted by Confluence: Isabella on February 08, 2013 at 12:49 AM PST #

[Trackback] Hadoop - whole Map/Reduce ecosystem HDFS - Hadoop

Posted by Confluence: Isabella on February 08, 2013 at 12:50 AM PST #

[Trackback] Hadoop - whole Map/Reduce ecosystem HDFS - Hadoop

Posted by Confluence: Isabella on February 11, 2013 at 06:28 AM PST #

[Trackback] Hadoop - whole Map/Reduce ecosystem HDFS - Hadoop

Posted by Confluence: Isabella on April 08, 2013 at 04:28 AM PDT #

Nice article. Tech looks very promising. -James http://allabtcloud.blogspot.com

Posted by James on October 02, 2013 at 07:40 AM PDT #

I am using this add on https://converter.tools and very helpful for me.

Posted by Converter Site on October 19, 2017 at 12:45 PM PDT #

Privileged to read this informative blog on Hadoop.Commendable efforts to put on research the hadoop. Please enlighten us with regular updates on hadoop. Friends if you're keen to learn more about AI you can watch this amazing tutorial on the same. https://www.youtube.com/watch?v=1jMR4cHBwZE

Posted by Roma Sharma on April 11, 2018 at 12:04 AM PDT #

Worthful Hadoop tutorial. Appreciate a lot for taking up the pain to write such a quality content on Hadoop course. Just now I watched this similar Hadoop tutorial and I think this will enhance the knowledge of other visitors for sure. Thanks anyway.https://www.youtube.com/watch?v=1OFFAr8zYEY

Posted by Sunita on May 20, 2018 at 05:46 AM PDT #

Creative Destruction is a sandbox survival mobile game. https://www.creativedestructionpc.com/

Posted by Babbu Mann on October 29, 2018 at 04:41 AM PDT #

Hadoop will be reduce the need for sql environment. On the other hand, NoSQL recently is big hype.

Posted by James on November 13, 2018 at 10:23 AM PST #

This is a tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases. Thanks for sharing. www.decorativeconcretebirm.com

Posted by Decorative Concrete on November 27, 2018 at 12:50 AM PST #

Nice post, thanks https://trello.com/c/ND2AnkIN/1-why-you-really-need-a-tik-tok-followers

Posted by David Jaworski on June 04, 2019 at 12:20 PM PDT #

Membuat baju itu memang mudah sekali jika tau caranya, berikut ini adalah kumpulan kit dream league soccer yang bisa digunakan ke game dls: https://viruzcode.blogspot.com/2019/02/cara-membuat-kit-dream-league-soccer-sendiri.html banyak kit yang memiliki desain yang bagus disana yang bisa di download.

Posted by Roberj Downer on June 28, 2019 at 01:26 PM PDT #

thank you good post really like.

Posted by film izle on June 29, 2019 at 01:02 PM PDT #

thank you good post really like

Posted by fikryan on July 13, 2019 at 08:47 PM PDT #

Good information. Very useful article for my blog

Posted by ponseli on July 13, 2019 at 08:53 PM PDT #

Articles that are very useful and very helpful. Also visit my website about <a href="https://www.xiaomidev.com/">harga hp xiaomi</a>

Posted by lavina on July 13, 2019 at 09:21 PM PDT #

this is just awesome..... GREAT!!! Very nice! :)

Posted by loceexryne1979 on July 20, 2019 at 09:46 AM PDT #

Amaaaaaaaaazing wow great imagery. well done!

Posted by liroofquege1979 on July 20, 2019 at 10:28 AM PDT #

@Enes Önbaş @Julia Vin Thank you guys for your nice comments, have a great day! That's nice work. Are these for anything in particular, as I think they'd suit a cyber security campaign I want to put together and with a little adaptation these bright illustrations would really grab people's attention. Let me know

Posted by farmomoba1978 on July 20, 2019 at 10:51 AM PDT #

Big thanx @Guilherme Palmerio !! ;) This is a really professional work done.

Posted by perrispsentgy1982 on July 20, 2019 at 11:24 AM PDT #

This is so great! Congrats ;)

Posted by cintlufiful1980 on July 20, 2019 at 04:28 PM PDT #

awesome.

Posted by tcholexucan1978 on July 20, 2019 at 06:14 PM PDT #

So unique good presentation and work!!

Posted by amnatikoo1980 on July 20, 2019 at 07:53 PM PDT #

Great Work!! Looks like new Black Mirror ;)

Posted by boggmalsancchemd1972 on July 20, 2019 at 09:48 PM PDT #

Thanks a lot!! Big up for such a talent.

Posted by huddbelretumb1974 on July 20, 2019 at 11:00 PM PDT #

It took me a while to realise this was a 36 days of type project. I came for the lovely visuals but thought that was quite clever! Outstanding + + +

Posted by scapanlizphi1988 on July 20, 2019 at 11:34 PM PDT #

your use of color theory is so unmatched wow Amazing work, really profesional. I love it! @BIS Studio Graphique

Posted by chainexbocas1988 on July 21, 2019 at 12:06 AM PDT #

Classy! I like it!

Posted by initkire1982 on July 21, 2019 at 12:38 AM PDT #

love the colors in the refraction ! So strong & aesthetic ! Love it ✨

Posted by highneemannau1985 on July 21, 2019 at 01:10 AM PDT #

Рыли кул! The last photo is amazing!

Posted by easexalot1986 on July 21, 2019 at 01:42 AM PDT #

Very clean and awesome, compliments! ;)"!

Posted by nafusnessgran1987 on July 21, 2019 at 02:15 AM PDT #

I like this.. Mmmmm, gentle Spain colors

Posted by sunvasato1989 on July 21, 2019 at 02:48 AM PDT #

Very artistic. Amazing work! Nice work! :)

Posted by freembutfunkni1973 on July 21, 2019 at 03:21 AM PDT #

I will be very happy if you see my portfolio Hard to find more compliments - Great job!

Posted by blazosarep1984 on July 21, 2019 at 03:54 AM PDT #

unduh ost boruto paling lengkap dan terbaru http://downloadost.id/boruto-naruto-next-generations/

Posted by Downloadost on July 21, 2019 at 01:15 PM PDT #

Complete Odin3 apps for flashing https://tukangoprek.id/kumpulan-odin-lengkap/

Posted by Mike on July 21, 2019 at 01:19 PM PDT #

Lihat https://portal-uang.com/tips-blogging

Posted by Andri on July 22, 2019 at 09:13 PM PDT #

free robux codes https://freerobuxcodes.net/

Posted by jack on July 24, 2019 at 11:43 AM PDT #

Earn enough points and you can get Clash Royale free gems. https://royalebots.com/

Posted by kevin77 on July 24, 2019 at 11:44 AM PDT #

Saya akan memberi informasi tentang cara mendapatkan skin pubg mobile gratis https://www.mastimon.com/2019/07/10-cara-mendapatkan-skin-pubg-gratis.html?m=1

Posted by Skin pubg gratis on July 28, 2019 at 09:11 AM PDT #

Mau tau manfaat kopi bagi kesehatan, yuk lihat informasinya di situs https://www.playon.co.id/manfaat-kopi-bagi-kesehatan/170/

Posted by Playon on July 28, 2019 at 09:28 AM PDT #

Informasi anda sangat bermanfaat seperti informasi ini https://bit.ly/2K0fcva dan ijinkan saya membagikan informasi anda

Posted by Kode redeem pb zepetto on July 29, 2019 at 08:26 AM PDT #

Cara Menanam Tomat Agar Cepat Berbuah Lebat

Posted by Solusi Menarik on August 09, 2019 at 02:33 AM PDT #

Dampak Positif dan Negatif Perubahan Sosial Terhadap Masyarakat https://www.solusimenarik.com/2019/07/dampak-positif-dan-negatif-perubahan-sosial.html

Posted by Solusi Menarik on August 09, 2019 at 02:34 AM PDT #

jangan sampai ketinggalan itu perkembangan jaringan telekomunikasi <a href="https://www.masodee.com/2019/07/perbedaan-jaringan-2g-3g-dan-4g.html/">https://www.masodee.com/2019/07/perbedaan-jaringan-2g-3g-dan-4g.html</a>

Posted by borr on August 14, 2019 at 07:49 AM PDT #

jangan sampai ketinggalan itu perkembangan jaringan telekomunikasi <a href="https://www.masodee.com/2019/07/perbedaan-jaringan-2g-3g-dan-4g.html">https://www.masodee.com/2019/07/perbedaan-jaringan-2g-3g-dan-4g.html</a>

Posted by borr on August 14, 2019 at 07:53 AM PDT #

very helpful, great job

Posted by masodee.com on August 14, 2019 at 07:56 AM PDT #

Haii yang mau bikin seragam pengajian ,boleh kotak kami https://www.gamissyaricantik.com/blog/grosir-baju-gamis-seragam-pengajian/

Posted by seragam pengajian on August 19, 2019 at 08:29 PM PDT #

Great!!!

Posted by Irvan Maulana on August 26, 2019 at 11:35 AM PDT #

perbaharui informasi otomotif sepeda motor di www.satupiston.com

Posted by Satupiston.com - Motorbike Blog Site on August 26, 2019 at 11:37 AM PDT #

Cobalah mengerti. . . .aku hanya butiran debu tanpamu..... Oke cukup dan terimakasih https://flipstartpc.com/ https://netsains.com/ https://www.barickly.com/ https://hellosemarang.com/ https://pernikahan.co/

Posted by Flipstart PC on August 28, 2019 at 05:28 AM PDT #

resep rendang...makanan paling enak sedunia

Posted by mister jon on August 29, 2019 at 10:54 AM PDT #

If you need information about smartphone, computer, software and hardware visit in my site www.nengtech.xyz

Posted by Tya on August 29, 2019 at 11:01 AM PDT #

If u need information about technology, software, computer and computer visit this link https://www.nengtech.xyz

Posted by Tya on August 29, 2019 at 11:03 AM PDT #

If u need information about android, smartphone technology, software, hardware and computer visit this link https://www.nengtech.xyz

Posted by Tya on August 29, 2019 at 11:05 AM PDT #

Jika kalian ingin mengetahui resep2 masakan enak dan kreatif untuk mendirikan peluang bisnis jualan makanan atau bagi yg ingin belajar masak, kunjungi website saya ada banyak resep2 makanan kreatif yg dijamin klo dijual pasti bakal laku karena belum byk dijual di restoran ataupun warung2 https://nenginfoo.xyz

Posted by nenginfoo.xyz on August 29, 2019 at 11:11 AM PDT #

All about cute pets, cute birds, cute cat and cute dogs

Posted by www.allpetz.xyz on August 29, 2019 at 11:17 AM PDT #

If u love pets and u wanna see the most cute birds, cute cats and cute dogs visit here https://www.allpetz.xyz

Posted by allpetz.xyz on August 29, 2019 at 11:20 AM PDT #

Jika kalian ingin mengetahui resep2 masakan enak dan kreatif untuk mendirikan peluang bisnis jualan makanan atau bagi yg ingin belajar masak, kunjungi website saya ada banyak resep2 makanan kreatif yg dijamin klo dijual pasti bakal laku karena belum byk dijual di restoran ataupun warung2 https://www.nenginfoo.xyz

Posted by nenginfoo.xyz on August 29, 2019 at 11:21 AM PDT #

Great!!

Posted by Junk cars on August 29, 2019 at 11:46 AM PDT #

Great content! https://www.atsjunkcars.com

Posted by Website on August 29, 2019 at 11:47 AM PDT #

Maaf kurang 2 . . . https://pernikahan.co/ https://www.barickly.com/

Posted by pernikahan on August 29, 2019 at 09:47 PM PDT #

resep masakan indonesia semuanya lengkap ada disini dari resep rumahan , resep sederhana hingga resep tradisional asli indonesia https://www.masakanindonesia.xyz

Posted by indonesian recipe on August 30, 2019 at 01:08 PM PDT #

Thanks for this Info.

Posted by CodyCross Pakete Lösungen on August 31, 2019 at 09:56 AM PDT #

resep masakan indonesia semuanya lengkap ada disini dari resep rumahan , resep sederha

Posted by holytricks on August 31, 2019 at 11:58 AM PDT #

Nice article. Tech looks very promising. Study online business

Posted by Bisnis Online Gus. on September 01, 2019 at 11:38 PM PDT #

Nice article. Tech looks very promising. Study online business

Posted by Bisnis Online Gus. on September 01, 2019 at 11:41 PM PDT #

So unique good presentation and work!! jualan online terlaris https://www.sharemaz.com/2019/08/8-hal-yang-wajib-diketahui-sebelum-Jualan-Online.html

Posted by gustyawan on September 01, 2019 at 11:45 PM PDT #

amazing software. very helpful

Posted by Blog Teknologi on September 02, 2019 at 09:12 PM PDT #

Thank you for the wonderful description on Sqoop, one of the best tools out there for transferring data between Hadoop and RDBMS. Knowledge of big data stack tools has become a prerequsite for every data scientust

Posted by learn big data and data science on September 05, 2019 at 10:27 AM PDT #

Amazing software. https://www.fontanapestpros.com

Posted by pest control on September 05, 2019 at 12:38 PM PDT #

Amazing software https://www.sanbernadinopestpros.com

Posted by Pest control on September 05, 2019 at 12:39 PM PDT #

Great software https://www.irvingpestpros.com

Posted by Pest Control on September 05, 2019 at 12:40 PM PDT #

Great software https://www.irvingpestpros.com

Posted by Pest Control on September 05, 2019 at 12:40 PM PDT #

Best software ever https://www.lewisvillepestpros.com

Posted by Pest Control on September 05, 2019 at 12:41 PM PDT #

Thanks for the content https://www.dentonpestpros.com

Posted by Pest Control on September 05, 2019 at 12:42 PM PDT #

cara membuat

Posted by cara membuat on September 09, 2019 at 11:21 PM PDT #

oke

Posted by oke on September 09, 2019 at 11:57 PM PDT #

oke1

Posted by oke on September 09, 2019 at 11:59 PM PDT #

Dapatkan Bonus Forex Tanpa Deposit - Promosi Terbaru https://www.ninjafx.site/p/bonus-forex-tanpa-deposit.html

Posted by Tuman FX on September 10, 2019 at 12:04 AM PDT #

Forex Bonus https://www.fxzone.site/

Posted by FX Tuman on September 10, 2019 at 12:12 AM PDT #

Kamu kemana sih sayang https://caramerawatrambutampuh.blogspot.com/2017/05/manfaat-mentega-untuk-rambut.html

Posted by erwin on September 11, 2019 at 01:56 AM PDT #

This software is really amazing, i love it

Posted by montirbox.com on September 12, 2019 at 06:03 PM PDT #

Great software!, i using this software too

Posted by henrire.com on September 12, 2019 at 06:05 PM PDT #

Jika anda bingung hendak memilih mendirikan cv atau pt begini perbedaan cv dan pt https://www.omevan.com/2019/09/perbedaan-cv-dan-pt.html

Posted by omevan.com on September 14, 2019 at 04:33 AM PDT #

thanks for the information

Posted by isengnulis on September 14, 2019 at 09:36 PM PDT #

Post a Comment:
  • HTML Syntax: NOT allowed

Calendar

Search

Hot Blogs (today's hits)

Tag Cloud

Categories

Feeds

Links

Navigation