Apache Sqoop
Apache Sqoop: Highlights of Sqoop 2
Apache Sqoop (incubating) was created to efficiently transfer bulk data between Hadoop and external structured datastores, such as RDBMS and data warehouses, because databases are not easily accessible by Hadoop. Sqoop is currently undergoing incubation at The Apache Software Foundation. More information on this project can be found at http://incubator.apache.org/sqoop.
The popularity of Sqoop in enterprise systems confirms that Sqoop does bulk transfer admirably. That said, to enhance its functionality, Sqoop needs to fulfill data integration use-cases as well as become easier to manage and operate.
What is Sqoop?
As described in a previous blog post, Sqoop is a bulk data transfer tool that allows easy import/export of data from structured datastores such as relational databases, enterprise data warehouses, and NoSQL systems. Using Sqoop, you can provision the data from an external system into HDFS, as well as populate tables in Hive and HBase. Similarly, Sqoop integrates with the workflow coordinator Apache Oozie (incubating), allowing you to schedule and automate import/export tasks. Sqoop uses a connector-based architecture which supports plugins that provide connectivity to additional external systems.
Sqoop's Challenges
Sqoop has enjoyed enterprise adoption, and our experiences have exposed some recurring ease-of-use challenges, extensibility limitations, and security concerns that are difficult to support in the original design:
- Cryptic and contextual command line arguments can lead to error-prone connector matching, resulting in user errors
- Due to tight coupling between data transfer and the serialization format, some connectors may support a certain data format that others don't (e.g. direct MySQL connector can't support sequence files)
- There are security concerns with openly shared credentials
- By requiring root privileges, local configuration and installation are not easy to manage
- Debugging the map job is limited to turning on the verbose flag
- Connectors are forced to follow the JDBC model and are required to use common JDBC vocabulary (URL, database, table, etc), regardless if it is applicable
These challenges have motivated the design of Sqoop 2, which is the subject of this post. That said, Sqoop 2 is a work in progress whose design is subject to change.
Sqoop 2 will continue its strong support for command line interaction, while adding a web-based GUI that exposes a simple user interface. Using this interface, a user can walk through an import/export setup via UI cues that eliminate redundant options. Various connectors are added in the application in one place and the user is not tasked with installing or configuring connectors in their own sandbox. These connectors expose their necessary options to the Sqoop framework which then translates them to the UI. The UI is built on top of a REST API that can be used by a command line client exposing similar functionality. The introduction of Admin and Operator roles in Sqoop 2 will restrict 'create' access for Connections to Admins and 'execute' access to Operators. This model will allow integration with platform security and restrict the end user view to only operations applicable to end users.
Ease of Use
Whereas Sqoop requires client-side installation and configuration, Sqoop 2 will be installed and configured server-side. This means that connectors will be configured in one place, managed by the Admin role and run by the Operator role. Likewise, JDBC drivers will be in one place and database connectivity will only be needed on the server. Sqoop 2 will be a web-based service: front-ended by a Command Line Interface (CLI) and browser and back-ended by a metadata repository. Moreover, Sqoop 2's service level integration with Hive and HBase will be on the server-side. Oozie will manage Sqoop tasks through the REST API. This decouples Sqoop internals from Oozie, i.e. if you install a new Sqoop connector then you won't need to install it in Oozie also.
Ease of Extension
In Sqoop 2, connectors will no longer be restricted to the JDBC model, but can rather define their own vocabulary, e.g. Couchbase no longer needs to specify a table name, only to overload it as a backfill or dump operation.
Common functionality will be abstracted out of connectors, holding them responsible only for data transport. The reduce phase will implement common functionality, ensuring that connectors benefit from future development of functionality.
Sqoop 2's interactive web-based UI will walk users through import/export setup, eliminating redundant steps and omitting incorrect options. Connectors will be added in one place, with the connectors exposing necessary options to the Sqoop framework. Thus, users will only need to provide information relevant to their use-case.
With the user making an explicit connector choice in Sqoop 2, it will be less error-prone and more predictable. In the same way, the user will not need to be aware of the functionality of all connectors. As a result, connectors no longer need to provide downstream functionality, transformations, and integration with other systems. Hence, the connector developer no longer has the burden of understanding all the features that Sqoop supports.
Security
Currently, Sqoop operates as the user that runs the 'sqoop' command. The security principal used by a Sqoop job is determined by what credentials the users have when they launch Sqoop. Going forward, Sqoop 2 will operate as a server based application with support for securing access to external systems via role-based access to Connection objects. For additional security, Sqoop 2 will no longer allow code generation, require direct access to Hive and HBase, nor open up access to all clients to execute jobs.
Sqoop 2 will introduce Connections as First-Class Objects. Connections, which will encompass credentials, will be created once and then used many times for various import/export jobs. Connections will be created by the Admin and used by the Operator, thus preventing credential abuse by the end user. Furthermore, Connections can be restricted based on operation (import/export). By limiting the total number of physical Connections open at one time and with an option to disable Connections, resources can be managed.
Summary
As detailed in this presentation, Sqoop 2 will enable users to use Sqoop effectively with a minimal understanding of its details by having a web-application run Sqoop, which allows Sqoop to be installed once and used from anywhere. In addition, having a REST API for operation and management will help Sqoop integrate better with external systems such as Oozie. Also, introducing a reduce phase allows connectors to be focused only on connectivity and ensures that Sqoop functionality is uniformly available for all connectors. This facilitates ease of development of connectors.
We encourage you to participate in and contribute to Sqoop 2's Design and Development (SQOOP-365).
Guest post by Kathleen Ting.
Posted at 02:58PM Jan 12, 2012
by arvind in General |
Comments [132]
|
Posted by Arindam Mitra on August 01, 2012 at 04:38 AM PDT #
Posted by David Zhang on April 19, 2014 at 07:27 AM PDT #
Posted by David Zhang on April 19, 2014 at 07:33 AM PDT #
Posted by BigQ on June 23, 2015 at 06:48 PM PDT #
Posted by big fly on December 16, 2015 at 07:37 PM PST #
Posted by ronnie on March 02, 2016 at 06:32 AM PST #
Posted by murthy on April 17, 2016 at 11:59 PM PDT #
Posted by wuqi on September 18, 2016 at 08:00 PM PDT #
Posted by wuqi on September 18, 2016 at 08:02 PM PDT #
Posted by swapnil dubey on January 27, 2017 at 06:57 AM PST #
Posted by Fatimah Hafer on April 06, 2017 at 09:46 PM PDT #
Posted by martin garix on August 28, 2018 at 04:39 AM PDT #
Posted by David on September 01, 2018 at 02:12 PM PDT #
Posted by Babbu Mann on October 29, 2018 at 04:42 AM PDT #
Posted by Nadia Brook on December 03, 2018 at 11:14 PM PST #
Posted by Amir Khan on February 02, 2019 at 08:51 PM PST #
Posted by Victoria B on February 06, 2019 at 12:37 PM PST #
Posted by david cameron on February 06, 2019 at 11:44 PM PST #
Posted by تردد قناة الشرق on February 07, 2019 at 12:15 AM PST #
Posted by Jack on April 16, 2019 at 06:17 AM PDT #
Posted by 10.10.3.59 on May 07, 2019 at 02:48 AM PDT #
Posted by Me on May 07, 2019 at 02:50 AM PDT #
Posted by ecommerce on May 07, 2019 at 03:01 AM PDT #
Posted by Anurag Ranjan on May 20, 2019 at 12:00 AM PDT #
Posted by Daniel Perez on June 17, 2019 at 06:26 AM PDT #
Posted by Brittany on June 18, 2019 at 07:18 AM PDT #
Posted by Ravindra kele on June 23, 2019 at 11:23 AM PDT #
Posted by film izle on June 29, 2019 at 01:08 PM PDT #
Posted by Mark Johnson on July 01, 2019 at 02:20 PM PDT #
Posted by spring house on July 02, 2019 at 10:01 PM PDT #
Posted by Odia Fast on July 11, 2019 at 01:44 AM PDT #
Posted by Sanam on July 11, 2019 at 01:47 AM PDT #
Posted by Rekha on July 11, 2019 at 01:49 AM PDT #
Posted by haidogaber1982 on July 20, 2019 at 09:47 AM PDT #
Posted by miggehamiths1976 on July 20, 2019 at 10:31 AM PDT #
Posted by hamssterdertemb1978 on July 20, 2019 at 11:23 AM PDT #
Posted by buynisimal1989 on July 20, 2019 at 12:43 PM PDT #
Posted by presfermapis1985 on July 20, 2019 at 04:30 PM PDT #
Posted by bronisorka1980 on July 20, 2019 at 05:20 PM PDT #
Posted by probexocuc1986 on July 20, 2019 at 06:19 PM PDT #
Posted by nmonsilebri1981 on July 20, 2019 at 09:49 PM PDT #
Posted by bafepanas1978 on July 20, 2019 at 11:02 PM PDT #
Posted by burgvinpera1973 on July 20, 2019 at 11:35 PM PDT #
Posted by tyouvinebe1971 on July 21, 2019 at 12:08 AM PDT #
Posted by simezamy1986 on July 21, 2019 at 12:40 AM PDT #
Posted by silafoterb1978 on July 21, 2019 at 01:12 AM PDT #
Posted by litenvire1981 on July 21, 2019 at 01:44 AM PDT #
Posted by urekforma1987 on July 21, 2019 at 02:50 AM PDT #
Posted by hoyrolipo1970 on July 21, 2019 at 03:23 AM PDT #
Posted by terfsofidna1974 on July 21, 2019 at 03:56 AM PDT #
Posted by Ravindra kele on July 24, 2019 at 11:35 PM PDT #
Posted by Amanda on August 11, 2019 at 09:20 AM PDT #
Posted by SEO Services on September 09, 2019 at 08:52 PM PDT #
Posted by quick tech on September 10, 2019 at 10:48 AM PDT #
Posted by Apple support on September 11, 2019 at 04:52 AM PDT #
Posted by Quickbooks Help on September 11, 2019 at 06:03 AM PDT #
Posted by Andre Russell on September 13, 2019 at 09:07 PM PDT #
Posted by Dell Support on September 13, 2019 at 09:45 PM PDT #
Posted by monika on September 15, 2019 at 03:00 AM PDT #
Posted by workwithus on October 07, 2019 at 01:53 PM PDT #
Posted by PUBG PC Lite on October 13, 2019 at 05:59 AM PDT #
Posted by prashanth on October 14, 2019 at 02:37 AM PDT #
Posted by Roadside Assistance St. Louis on October 16, 2019 at 03:00 PM PDT #
Posted by Michael Ward on October 19, 2019 at 10:38 AM PDT #
Posted by Michael Ward on October 19, 2019 at 12:49 PM PDT #
Posted by Facebook-zwakte te vinden on October 20, 2019 at 06:50 AM PDT #
Posted by Lehenga Dress on October 30, 2019 at 01:18 AM PDT #
Posted by Lea on October 30, 2019 at 02:25 AM PDT #
Posted by Kajsa on October 30, 2019 at 02:31 AM PDT #
Posted by abhijit barua on October 30, 2019 at 08:02 AM PDT #
Posted by haivo chabo on October 31, 2019 at 05:31 AM PDT #
Posted by MCE Membrane Filter on October 31, 2019 at 10:13 PM PDT #
Posted by Topxlisting on October 31, 2019 at 10:16 PM PDT #
Posted by MCE Membrane Filter on October 31, 2019 at 10:24 PM PDT #
Posted by Pankaj Sharma on November 01, 2019 at 10:24 PM PDT #
Posted by Hilary Smith on November 06, 2019 at 04:31 AM PST #
Posted by WhatsApp Plus on November 17, 2019 at 03:07 AM PST #
Posted by WhatsApp Plus on November 17, 2019 at 03:07 AM PST #
Posted by abhijit barua on November 18, 2019 at 08:22 AM PST #
Posted by CertsExpert on November 20, 2019 at 02:26 AM PST #
Posted by Sacha Ertt on November 21, 2019 at 03:29 PM PST #
Posted by stephnienewhappy on November 24, 2019 at 10:26 PM PST #
Posted by stephnie on November 24, 2019 at 10:27 PM PST #
Posted by Lesdolf King on November 25, 2019 at 06:23 AM PST #
Posted by Signature Virtual Assistance, Inc. on November 27, 2019 at 01:37 PM PST #
Posted by AZ-500 dumps PDF on December 02, 2019 at 02:36 AM PST #
Posted by Jim Tucket on December 10, 2019 at 04:42 PM PST #
Posted by juxupoj on December 10, 2019 at 10:45 PM PST #
Posted by Minakhi on December 10, 2019 at 10:49 PM PST #
Posted by Ratha on December 10, 2019 at 10:58 PM PST #
Posted by Anand on December 13, 2019 at 11:29 PM PST #
Posted by anand on December 13, 2019 at 11:32 PM PST #
Posted by Yo on December 14, 2019 at 04:51 AM PST #
Posted by myanxiety on December 15, 2019 at 03:27 AM PST #
Posted by tree removal service on December 16, 2019 at 03:40 PM PST #
Posted by dumps4less on December 16, 2019 at 11:39 PM PST #
Posted by GenuineDumps on December 19, 2019 at 12:14 AM PST #
Posted by isocert on December 20, 2019 at 02:16 AM PST #
Posted by Keith Harper on December 20, 2019 at 06:12 PM PST #
Posted by realpdfdumps on December 22, 2019 at 11:32 PM PST #
Posted by anand on December 31, 2019 at 04:12 AM PST #
Posted by anand on December 31, 2019 at 04:15 AM PST #
Posted by Tree Service on January 02, 2020 at 06:08 PM PST #
Posted by St. Louis Duct Cleaning Pros on January 08, 2020 at 04:06 PM PST #
Posted by Kushwaha on January 13, 2020 at 07:47 PM PST #
Posted by certstest engine on January 18, 2020 at 03:17 AM PST #
Posted by certstest engine on January 18, 2020 at 03:18 AM PST #
Posted by certstest engine on January 20, 2020 at 02:19 AM PST #
Posted by Hacking Castle on January 20, 2020 at 05:06 AM PST #
Posted by Indianapolis Towing on January 20, 2020 at 05:20 AM PST #
Posted by Asheville Handyman on January 20, 2020 at 06:32 AM PST #
Posted by Towing Spartanburg, SC on January 20, 2020 at 06:36 AM PST #
Posted by Towing Charleston, SC on January 20, 2020 at 06:40 AM PST #
Posted by Towing Charleston, SC on January 20, 2020 at 06:43 AM PST #
Posted by Chris on January 25, 2020 at 04:47 PM PST #
Posted by Kent Towing on January 26, 2020 at 08:32 PM PST #
Posted by Kent Towing on January 26, 2020 at 09:53 PM PST #
Posted by Kent Towing on January 26, 2020 at 11:28 PM PST #
Posted by Lawnstreet on February 02, 2020 at 06:02 PM PST #
Posted by jasa aqiqah on February 02, 2020 at 11:56 PM PST #
Posted by about dogs on February 04, 2020 at 11:40 AM PST #
Posted by whatsapp gb on February 08, 2020 at 05:04 PM PST #
Posted by olathe churches on February 09, 2020 at 12:49 PM PST #
Posted by Barrister Babu Tv on February 12, 2020 at 05:46 AM PST #
Posted by Dumpsworld Best Exam Material on February 12, 2020 at 10:57 AM PST #
Posted by alina chopra on February 12, 2020 at 09:31 PM PST #
Posted by maria alex on February 13, 2020 at 02:55 PM PST #
Posted by GB WhatsApp on February 14, 2020 at 11:09 PM PST #
Posted by WhatsApp Plus on February 14, 2020 at 11:10 PM PST #
Posted by yowhatsapp apk on February 15, 2020 at 12:00 AM PST #
Posted by iso9001 on February 17, 2020 at 09:03 PM PST #
Posted by movierulz on February 21, 2020 at 03:01 AM PST #