Apache Sqoop

Thursday January 12, 2012

Apache Sqoop: Highlights of Sqoop 2

Apache Sqoop (incubating) was created to efficiently transfer bulk data between Hadoop and external structured datastores, such as RDBMS and data warehouses, because databases are not easily accessible by Hadoop. Sqoop is currently undergoing incubation at The Apache Software Foundation. More information on this project can be found at http://incubator.apache.org/sqoop.

The popularity of Sqoop in enterprise systems confirms that Sqoop does bulk transfer admirably. That said, to enhance its functionality, Sqoop needs to fulfill data integration use-cases as well as become easier to manage and operate.

What is Sqoop?
As described in a previous blog post, Sqoop is a bulk data transfer tool that allows easy import/export of data from structured datastores such as relational databases, enterprise data warehouses, and NoSQL systems. Using Sqoop, you can provision the data from an external system into HDFS, as well as populate tables in Hive and HBase. Similarly, Sqoop integrates with the workflow coordinator Apache Oozie (incubating), allowing you to schedule and automate import/export tasks. Sqoop uses a connector-based architecture which supports plugins that provide connectivity to additional external systems.

Sqoop's Challenges
Sqoop has enjoyed enterprise adoption, and our experiences have exposed some recurring ease-of-use challenges, extensibility limitations, and security concerns that are difficult to support in the original design:
- Cryptic and contextual command line arguments can lead to error-prone connector matching, resulting in user errors
- Due to tight coupling between data transfer and the serialization format, some connectors may support a certain data format that others don't (e.g. direct MySQL connector can't support sequence files)
- There are security concerns with openly shared credentials
- By requiring root privileges, local configuration and installation are not easy to manage
- Debugging the map job is limited to turning on the verbose flag
- Connectors are forced to follow the JDBC model and are required to use common JDBC vocabulary (URL, database, table, etc), regardless if it is applicable

These challenges have motivated the design of Sqoop 2, which is the subject of this post. That said, Sqoop 2 is a work in progress whose design is subject to change.

Sqoop 2 will continue its strong support for command line interaction, while adding a web-based GUI that exposes a simple user interface. Using this interface, a user can walk through an import/export setup via UI cues that eliminate redundant options. Various connectors are added in the application in one place and the user is not tasked with installing or configuring connectors in their own sandbox. These connectors expose their necessary options to the Sqoop framework which then translates them to the UI. The UI is built on top of a REST API that can be used by a command line client exposing similar functionality. The introduction of Admin and Operator roles in Sqoop 2 will restrict 'create' access for Connections to Admins and 'execute' access to Operators. This model will allow integration with platform security and restrict the end user view to only operations applicable to end users.

Ease of Use
Whereas Sqoop requires client-side installation and configuration, Sqoop 2 will be installed and configured server-side. This means that connectors will be configured in one place, managed by the Admin role and run by the Operator role. Likewise, JDBC drivers will be in one place and database connectivity will only be needed on the server. Sqoop 2 will be a web-based service: front-ended by a Command Line Interface (CLI) and browser and back-ended by a metadata repository. Moreover, Sqoop 2's service level integration with Hive and HBase will be on the server-side. Oozie will manage Sqoop tasks through the REST API. This decouples Sqoop internals from Oozie, i.e. if you install a new Sqoop connector then you won't need to install it in Oozie also.

Ease of Extension
In Sqoop 2, connectors will no longer be restricted to the JDBC model, but can rather define their own vocabulary, e.g. Couchbase no longer needs to specify a table name, only to overload it as a backfill or dump operation.

Common functionality will be abstracted out of connectors, holding them responsible only for data transport. The reduce phase will implement common functionality, ensuring that connectors benefit from future development of functionality.

Sqoop 2's interactive web-based UI will walk users through import/export setup, eliminating redundant steps and omitting incorrect options. Connectors will be added in one place, with the connectors exposing necessary options to the Sqoop framework. Thus, users will only need to provide information relevant to their use-case.

With the user making an explicit connector choice in Sqoop 2, it will be less error-prone and more predictable. In the same way, the user will not need to be aware of the functionality of all connectors. As a result, connectors no longer need to provide downstream functionality, transformations, and integration with other systems. Hence, the connector developer no longer has the burden of understanding all the features that Sqoop supports.

Currently, Sqoop operates as the user that runs the 'sqoop' command. The security principal used by a Sqoop job is determined by what credentials the users have when they launch Sqoop. Going forward, Sqoop 2 will operate as a server based application with support for securing access to external systems via role-based access to Connection objects. For additional security, Sqoop 2 will no longer allow code generation, require direct access to Hive and HBase, nor open up access to all clients to execute jobs.

Sqoop 2 will introduce Connections as First-Class Objects. Connections, which will encompass credentials, will be created once and then used many times for various import/export jobs. Connections will be created by the Admin and used by the Operator, thus preventing credential abuse by the end user. Furthermore, Connections can be restricted based on operation (import/export). By limiting the total number of physical Connections open at one time and with an option to disable Connections, resources can be managed.

As detailed in this presentation, Sqoop 2 will enable users to use Sqoop effectively with a minimal understanding of its details by having a web-application run Sqoop, which allows Sqoop to be installed once and used from anywhere. In addition, having a REST API for operation and management will help Sqoop integrate better with external systems such as Oozie. Also, introducing a reduce phase allows connectors to be focused only on connectivity and ensures that Sqoop functionality is uniformly available for all connectors. This facilitates ease of development of connectors.

We encourage you to participate in and contribute to Sqoop 2's Design and Development (SQOOP-365).

Guest post by Kathleen Ting.


Dear all, I am installing sqoop2 over centos 6. Sqoop 2 cli running fine but can't access web ui through browser. When I am trying to access the url with http://localhost:8080/sqoop then browser only showing "Apache Sqoop Server". Please help me to know how can I access sqoop 2 web ui. nice Arindam

Posted by Arindam Mitra on August 01, 2012 at 04:38 AM PDT #

Hi, Expert: Oracle 11g, varchar2, char type data cannot import create table tt442222(c1 varCHAR(32), c2 varCHAR(32)); insert into tt442222 values ('abc', 'abc2'); jdbc:oracle:thin:@//10.103.xx.xx:1521/ENG10GR2.pd.local jdbc driver name oracle.jdbc.OracleDriver 2014-04-18 15:39:08,117 ERROR mapreduce.MapreduceSubmissionEngine [org.apache.sqoop.submission.mapreduce.MapreduceSubmissionEngine.submit(MapreduceSubmissionEngine.java:257)] Error in submitting job java.lang.ClassCastException: java.lang.Integer cannot be cast to java.math.BigDecimal at org.apache.sqoop.connector.jdbc.GenericJdbcImportPartitioner.constructTextConditions(GenericJdbcImportPartitioner.java:528)

Posted by David Zhang on April 19, 2014 at 07:27 AM PDT #

Oracle 11g, varchar2, char type data cannot import (2) See my post above Error a table with two integer columns worked a mix of nchars and number(38)s not works or 2014-04-18 19:03:38,585 ERROR mapreduce.MapreduceSubmissionEngine [org.apache.sqoop.submission.mapreduce.MapreduceSubmissionEngine.submit(MapreduceSubmissionEngine.java:257)] Error in submitting job org.apache.sqoop.common.SqoopException: GENERIC_JDBC_CONNECTOR_0011:The type is not supported - -15 at org.apache.sqoop.connector.jdbc.GenericJdbcImportPartitioner.getPartitions(GenericJdbcImportPartitioner.java:121) or duce.MapreduceSubmissionEngine.submit(MapreduceSubmissionEngine.java:257)] Error in submitting job java.lang.ArithmeticException: / by zero at java.math.BigDecimal.divideAndRound(BigDecimal.java:1424) at Thanks a lot again!

Posted by David Zhang on April 19, 2014 at 07:33 AM PDT #

I want import data from oracle to hive, could u send me a detail doc about how to configure sqoop2? best wishes

Posted by BigQ on June 23, 2015 at 06:48 PM PDT #

Can you tell me how to import data from mysql to hbase/hive? Best Wishes!

Posted by big fly on December 16, 2015 at 07:37 PM PST #

Table imported to hdfs splits to a few text files with no meaningful name, is there anyway to configure it to become file1.txt, file2.txt and so on? The of the content of one column were replaced by net.sourceforge.jtds.jdbc.ClobImpl@<some funny number How to automate a sqoop2 job?

Posted by ronnie on March 02, 2016 at 06:32 AM PST #

how to add new column to existing table while sqoop import. plz let me knw cli query used for that and how it works.

Posted by murthy on April 17, 2016 at 11:59 PM PDT #

now,i want to use sqoop2 to import data from mysql to hdfs,and use sqoop hook with atlas.i want to know how to use sqoop2 hook.it is difference to sqoop1.

Posted by wuqi on September 18, 2016 at 08:00 PM PDT #

now,i want to use sqoop2 to import data from mysql to hdfs,and use sqoop hook with atlas.i want to know how to use sqoop2 hook.it is difference to sqoop1.

Posted by wuqi on September 18, 2016 at 08:02 PM PDT #

I have Apache Hadoop 2.7.3 working and installed on Centos 7 Server. we are using Sqoop Version 1.99.4. ****command sqoop2-tool verify**** org.apache.catalina.startup.Tool main SEVERE: Exception calling main() method java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

Posted by swapnil dubey on January 27, 2017 at 06:57 AM PST #

My children were searching for NJ DoT C-9600 last month and saw a web service that hosts a lot of fillable forms . If others are wanting NJ DoT C-9600 too , here's <a>https://goo.gl/CspkEB</a>

Posted by Fatimah Hafer on April 06, 2017 at 09:46 PM PDT #

Many users facing the different problem,like that conhost-exe,may people asked what is conhost-exe,so friends i am here and sharing the all basic information for this issue https://fixconnectionsbluetoothaudiodeviceswirelessdisplayswindows10.net/conhost-exe this is a virus,and it is mostly enter different method in your email,finally forward this visit to all window users.

Posted by martin garix on August 28, 2018 at 04:39 AM PDT #

Interesting to learn about this stuff. Great information http://www.winnipegeavestroughcleaning.com/

Posted by David on September 01, 2018 at 02:12 PM PDT #

Creative Destruction is a sandbox survival mobile game. https://www.creativedestructionpc.com/

Posted by Babbu Mann on October 29, 2018 at 04:42 AM PDT #

Creative Destruction is a sandbox survival mobile game https://www.quickensupporthelpnumber.com/ you must deploy this.

Posted by Nadia Brook on December 03, 2018 at 11:14 PM PST #

Great, thank you for sharing these cool and useful videos. I enjoyed the videos you shared because it provided a lot of knowledge for me. https://www.pubglitepc.co https://www.pubglitepc.co/pubg-lite-pc-system-requirements

Posted by Amir Khan on February 02, 2019 at 08:51 PM PST #

This made something complicated make a lot more sense. Thank you! Sincerely, Victoria https://www.computerrepairnerds.com/

Posted by Victoria B on February 06, 2019 at 12:37 PM PST #

Thanks for the great content admin, I like to see more quality content on your website. you explained everything nicely. The Cisco Certifications exam is one of the toughest exams for IT professionals. Cisco focuses on network hardware and devices such as routers and network switches. This is why its certification program is geared toward the information technology field. KillerDumps is a reliable and trusty website that provides the most reliable Cisco dumps. Our Cisco Dump issue will help you pass the exam in first attempt. All Cisco exams are regularly updated and approved by our experts’ professionals. If you want to pass Cisco exam You can get Cisco Exams Dumps https://www.killerdumps.com/cisco-exams

Posted by david cameron on February 06, 2019 at 11:44 PM PST #

could u send me a detail doc about how to configure sqoop2

Posted by تردد قناة الشرق on February 07, 2019 at 12:15 AM PST #

Really informative article - many thanks for sharing! http://johndobbsroofing.com/services/industrial-cladding-installation-newcastle/

Posted by Jack on April 16, 2019 at 06:17 AM PDT #

Wonderful https://bit.ly/gfdjg

Posted by on May 07, 2019 at 02:48 AM PDT #

Hi, dear I am installing the sqoop on linux with oracle 11g but the problem is it is giving me "sqoop server error". https://bit.ly/2PNqLXU

Posted by Me on May 07, 2019 at 02:50 AM PDT #

good post on sqoop. https://nebastore.com

Posted by ecommerce on May 07, 2019 at 03:01 AM PDT #

Thanks for the great content admin,

Posted by Anurag Ranjan on May 20, 2019 at 12:00 AM PDT #

You shared to us the importance of this article. https://accesselitehealth.com/

Posted by Daniel Perez on June 17, 2019 at 06:26 AM PDT #

Sqoop has been useful for us althroughout our online sessions at https://www.us-mailing-change-of-address.com/ . This has been our key to success and achieving goals!

Posted by Brittany on June 18, 2019 at 07:18 AM PDT #

Thanks for the great content adminn, I like to see more quality content on your websitee. you explained everything nicelyt. The Cisco Certifications exam is one of the toughest exams for IT professionals.

Posted by Ravindra kele on June 23, 2019 at 11:23 AM PDT #

These challenges have motivated the design of Sqoop 2, which is the subject of this post. That said, Sqoop 2 is a work in progress whose design is subject to change.

Posted by film izle on June 29, 2019 at 01:08 PM PDT #

Great content. Thank you for sharing this. Incase you might need a painting contractor just visit our website -> https://www.paintersfortwayne.com

Posted by Mark Johnson on July 01, 2019 at 02:20 PM PDT #

thank you admin for sharing such valuable blogs with us. Great content

Posted by spring house on July 02, 2019 at 10:01 PM PDT #

Thank You Very Much For This Great Content. I Loved This Article. Because It Helps Me a Lot. https://odiafast.com/bhulekh-odisha-check-land-records-online/ https://odiafast.com/online-certificate-verification/

Posted by Odia Fast on July 11, 2019 at 01:44 AM PDT #

Wow, Nice Content Admin. I Got Lot's of Information From This Post. Thank You Very Much. https://tech2charge.com/ https://tech2charge.com/download-pubg-lite-for-pc/

Posted by Sanam on July 11, 2019 at 01:47 AM PDT #

Hey Admin, Thanks For This Amazing Content. I Loved It. https://www.odiagita.com/

Posted by Rekha on July 11, 2019 at 01:49 AM PDT #

I like this style Wooooow!!!!

Posted by haidogaber1982 on July 20, 2019 at 09:47 AM PDT #

A little, "Alexey Brodovitch" in style depicting human beauty in a creative form, style. A fan of this myself, I like your vision. Excelente trabajo. I love the animation, keep going

Posted by miggehamiths1976 on July 20, 2019 at 10:31 AM PDT #

Wonderful, i'm in love :-))) fcking amazinG!

Posted by hamssterdertemb1978 on July 20, 2019 at 11:23 AM PDT #

looks cool! fantastic! such detailed work

Posted by buynisimal1989 on July 20, 2019 at 12:43 PM PDT #

well done! 너무 멋지네요!

Posted by presfermapis1985 on July 20, 2019 at 04:30 PM PDT #

Sensacional work + + + Beautiful colours and portraits! :)

Posted by bronisorka1980 on July 20, 2019 at 05:20 PM PDT #

Im not a big fan of a such style but I like your works Congratulations, consistent design and presentation :)

Posted by probexocuc1986 on July 20, 2019 at 06:19 PM PDT #

Thank you! Nice project:)

Posted by nmonsilebri1981 on July 20, 2019 at 09:49 PM PDT #

Awesome work! I love it :) WOW! Amazing!

Posted by bafepanas1978 on July 20, 2019 at 11:02 PM PDT #

so stylish! amazing! Super consistent and well built! Great work!

Posted by burgvinpera1973 on July 20, 2019 at 11:35 PM PDT #

Nice branding great concept. excellent images

Posted by tyouvinebe1971 on July 21, 2019 at 12:08 AM PDT #

We are not kids anymore but this even put a smile on us! Nice !

Posted by simezamy1986 on July 21, 2019 at 12:40 AM PDT #

Very clean and nice design!) Это очень круто!!!

Posted by silafoterb1978 on July 21, 2019 at 01:12 AM PDT #

Thanks so much! love that gold foil effect!

Posted by litenvire1981 on July 21, 2019 at 01:44 AM PDT #

So beautiful ❤️ Thanks

Posted by urekforma1987 on July 21, 2019 at 02:50 AM PDT #

OMG this is insane work! Amazin!

Posted by hoyrolipo1970 on July 21, 2019 at 03:23 AM PDT #

Inspiring! Looks amazing! :)

Posted by terfsofidna1974 on July 21, 2019 at 03:56 AM PDT #

Nice article sir.. Keep it up If you want to see mine check >>>>https://www.pubgpclite.com

Posted by Ravindra kele on July 24, 2019 at 11:35 PM PDT #

Some of my favorite Sqoop features.. provides generalized JDBC extensions to migrate data between most database systems, generates Java classes upon reading database records for use in other code utilizing Hadoop's client libraries, and allows for both import and export features.. keep improving - can't wait for more >> https://www.lawncarewilmington.net/

Posted by Amanda on August 11, 2019 at 09:20 AM PDT #

We are the specialist organization of Professional SEO Services in the commercial center. Regardless of what showcasing objectives, you have for your company; we will help raise your bottommost line with an SEO campaign that is calculated for your select business. for more visit here: - https://www.firstrankseoservices.com

Posted by SEO Services on September 09, 2019 at 08:52 PM PDT #

great information, good work, I really appreciate

Posted by quick tech on September 10, 2019 at 10:48 AM PDT #

Hi, I am available all the time in order to resolve Apple Phone related issues of clients. In my immense years of familiarity, I have guided many Apple phone users to eliminate their problems. If you are also infuriated due to your phone’s issues, then instead of feeling blue just take Apple Phone Support to join hands with me to get appropriate assistance. Read more :-https://www.applesupportphonenumber.co/

Posted by Apple support on September 11, 2019 at 04:52 AM PDT #

Very nice!!! This is really good blog information thanks for sharing. We are a reliable third party Quickbooks Help company offering technical support for various any types of technical errors. https://www.quickbooksphonenumber.com/

Posted by Quickbooks Help on September 11, 2019 at 06:03 AM PDT #

An entrepreneur can accomplish sublime achievement and makes a clear event in the marketplace with the help of connected and acquainted with Quality SEO services. As such, it is the best internet marketing method that produces more potential customer by giving a top position on web crawler result page.

Posted by Andre Russell on September 13, 2019 at 09:07 PM PDT #

As a qualified connoisseur of Dell support team, I Robertjonz is reachable 24*7 to answer your entire issues associated with your Dell PC. One such issue which is bothersome for many people is how to screenshot on Dell computer. With my years of experience and excellent knowledge I can assure you that, after taking my assistance you will be very happy as your entire issues are resolved. https://www.dellsupport.co/blog/screenshot-on-dell-laptop/

Posted by Dell Support on September 13, 2019 at 09:45 PM PDT #

Thanks for sharing you are really good content writer https://bit.ly/2ma7K7g

Posted by monika on September 15, 2019 at 03:00 AM PDT #

Great content!!!

Posted by workwithus on October 07, 2019 at 01:53 PM PDT #

Thanks for sharing such a valuable info

Posted by PUBG PC Lite on October 13, 2019 at 05:59 AM PDT #

Thank for your writting! It is easy to understand and detailed. I feel it is interesting, I hope you continue to have such good posts. regards

Posted by prashanth on October 14, 2019 at 02:37 AM PDT #

Great article. Thanks for sharing this. https://www.roadsideassistancestlouis.com

Posted by Roadside Assistance St. Louis on October 16, 2019 at 03:00 PM PDT #

Very informative piece! http://aspbillings.com/

Posted by Michael Ward on October 19, 2019 at 10:38 AM PDT #

Thanks for posting! https://paintingoverlandpark.com/

Posted by Michael Ward on October 19, 2019 at 12:49 PM PDT #

Dus heb je het telefooncentrum gewist met behulp van om je Facebook-zwakte te vinden. Luister fysiotherapie bedrijf en ze zijn een groot aantal mensen gebruiken Facebook en de onder Davis's en dus het is echt moeilijk ze was marktmarkt op i-5 bellen internationale Google verschillende smaken van virussen kanaal 11 langdurige zaken in deze Exeter. Facebook als het bedrijf is erg groot en miljarden miljarden gebruikers inloggen elke dag en veel lonten om te baken richten op onze drukste.

Posted by Facebook-zwakte te vinden on October 20, 2019 at 06:50 AM PDT #

Post a Comment:
  • HTML Syntax: NOT allowed



Hot Blogs (today's hits)

Tag Cloud