Apache Sqoop

Friday December 30, 2011

What's New in Apache Sqoop 1.4.0-incubating

What's New in Apache Sqoop 1.4.0-incubating

Apache Sqoop recently celebrates its first incubator release, version 1.4.0-incubating.  There are several new features and improvements added in this release.  This post will cover some of those interesting changes.  Sqoop is currently undergoing incubation at The Apache Software Foundation.  More information on this project can be found at http://incubator.apache.org/sqoop.

Customized Type Mapping (SQOOP-342)

Sqoop is equipped with a default mapping from most SQL types to appropriate Java or Hive counterparts during import.  Even though, this one-mapping-fits-all approach might not be ideal in all scenarios considering a wide variety of data stores available today, not to mention there are certain vendor-specific SQL types that may not be covered by the default mapping.

To allow customized type mapping, two new arguments, map-column-java and map-column-hive, are introduced for changing mapping to Java and Hive, respectively.  The list of mapping is expected in the form of <column name>=<target type>, such as

$ sqoop import ... --map-column-java id=Integer,name=String

For the above example, the columns id and name will be mapped to Java Integer and String, respectively.

Boundary Query Support (SQOOP-331)

Sqoop uses a canned query (select min(<split column>), max(<split column>) from <table name>) to determine boundaries for creating splits in all cases by default.  This query may not always be the most optimal one however.  Hence, to provide flexibility for using different queries based on distinct usages, a new boundary-query argument is provided to take any arbitrary query returning two numeric columns for the same purpose of creating splits.

Date/Time Incremental Append (SQOOP-321)

Incremental import in Sqoop can be used to only retrieve those rows with the value of a check column beyond a certain threshold.  The threshold needs to be the maximum value of the check column (in append mode) or the timestamp (in lastmodified mode) at the end of last import.

Previously, in append mode, the check column has to be in numeric type.  If a date/time type is desired, the user has to manually select the maximum value out of the date/time column and then specify that value as the last-value argument in lastmodified mode instead.  As part of this release, now the check column can be in date/time type as well.

Composite Key Update (SQOOP-313)

By default, Sqoop export adds new records into a table by INSERT statements.  However, if any record is in conflict with an existing one due to table constraints (such as a unique key), the underlying INSERT statement will fail and the export process will fail.  If an existing record needs to be modified, the update-key argument can be specified and UPDATE statements will be used instead underneath.

Before this release, only a single column name can be specified in the update-key argument.  This column name will be used to determine the matching record(s) for update.  However, in many real world situations, multiple columns are required to identify the matching record(s).  Thus, starting from this release, a comma separated list of column names can be given as the update-key argument.

Mixed Update/Insert Export (SQOOP-327)

As mentioned, Sqoop export can only either insert (by default) or update (with the update-key argument) records into a table.  As a result, one issue is that if data are being inserted, they may cause constraint violations when they exist already.  Another issue is that if data are being updated, they may be silently ignored when there are no matching update keys found.  It lacks the functionality to both update those data with matching update keys and insert those without.

A new update-mode argument is introduced to resolve the above issues.  Its value can be either updateonly or allowinsert.  As the name suggests, the difference is those records without matching update keys are simply dropped when the value is updateonly or are inserted when the value is allowinsert.  Note that this feature is currently provided only for built-in Oracle connector.

IBM DB2 Support (SQOOP-329)

The extensible architecture used by Sqoop allows support for a data store to be added as a so-called connector.  By default, Sqoop comes with connectors for a variety of databases such as MySQL, PostgreSQL, Oracle, and SQL Server.  In addition, there are also third-party connectors available separately from various vendors for several other data stores, such Couchbase, VoltDB, and Netezza.  As part of this release, a new connector is provided to import and export data against IBM DB2 database.

The Final Chapter

If you are interested in learning more about the changes, a complete list for Sqoop 1.4.0-incubating can be found here.  You are also encouraged to give this new release a try.  Any help and feedback is more than welcome. For more information on how to report problems and to get involved, visit the Sqoop project website at http://incubator.apache.org/sqoop/.



Posted by Sara Taylor on August 29, 2018 at 01:18 AM PDT #

Apache Sqoop new version 1.4.0 had several new features and improvements. Here each an everything related to this new updates version is shared. It makes easy http://www.howtofixprinterissues.com/troubleshooting-printer-issues/ to understand all the updates and thanks for this article.

Posted by Tiya on February 05, 2019 at 04:05 AM PST #

Great review

Posted by stenn on February 06, 2019 at 07:19 AM PST #

I recently started reading apache server and i hope this blog will give what i needed. https://tinyurl.com/yxo3peye Today i am going to read this blog.

Posted by Mihun on May 22, 2019 at 03:06 AM PDT #

The rebrand is absolutely stunning, fresh and bold. Inspiration comes in wave! wow this looks... edible

Posted by nolvadex on July 20, 2019 at 10:23 AM PDT #

I have a blog named https://tinyurl.com/y43f9n9q which is hosted on apache server. Sometimes my blog got down and i need to restart server. Is is normal or i need to do something?

Posted by Netaji on July 24, 2019 at 02:55 AM PDT #

Version 1.4.0-incubating did wonders I think. Bringing back memories reading this! This was when I was still a window cleaner at https://www.claritywindowsltd.com/ haha! So nostalgic. Thanks!

Posted by Joseph Strider on October 29, 2019 at 05:07 PM PDT #

The previous version of apache sqoop had some limitations for the users, but in this new release there is no such type of limitation.This https://customwriting.com/ will teach the people to write unique content about their interest topic.There are few features added in this present version on behalf of apache foundation.

Posted by Brown on November 09, 2019 at 02:29 AM PST #

Apache Sqoop is a tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases. http://privateparistours.fr/ This post shares information about the new features of Apache Sqoop like the first incubator release, boundary query support, IBM DB2 support, composite key update and customized type mapping.

Posted by KathrinaDupree on November 29, 2019 at 04:00 AM PST #

Post a Comment:
Comments are closed for this entry.



Hot Blogs (today's hits)

Tag Cloud