Connectors and drivers in the world of Apache Sqoop
Apache Sqoop is a tool for highly efficient data transfers between relational databases and the entire Hadoop ecosystem. One of the significant benefits of Sqoop is that it’s easy to use and can work with a variety of systems both inside and outside of the Hadoop ecosystem. With one tool, Sqoop, you can import or export data from all databases supporting the JDBC interface using the same command line arguments exposed by Sqoop. Additionally, Sqoop was designed in modular fashion, allowing you to plug in specialized additions to optimise transfers for particular database systems.
While some users of various projects within the Hadoop ecosystem use the words "connector" and "driver" interchangeably, these words mean completely different things in context of Sqoop. As both connector and driver are needed for every Sqoop invocation, we see a lot of confusion in the use and understanding of these concepts. This blog post will explain the difference between them and how Sqoop uses these concepts to transfer data between Hadoop and other systems.[Read More]
What's new in Apache Sqoop 1.4.2
Sqoop 1.4.2 was released in August 2012. As this was an extremely important release for the Sqoop community - our first release as an Apache Top Level project - I would like to highlight the key features and fixes of this release. The entire change log can be viewed on our JIRA and actual bits can be downloaded from the usual place. [Read More]
Sqoop Graduation Meetup
Cloudera hosted the Apache Sqoop Meetup last week at Cloudera HQ in Palo Alto. About 20 of the Meetup attendees had not used Sqoop before, but were interested enough to participate in the Meetup on April 4th. We believe this healthy interest in Sqoop will contribute to its wide adoption. Not only was this Sqoop's second Meetup but also a celebration for Sqoop's graduation from the Incubator, cementing its status as a Top-Level Project in the Apache Software Foundation.[Read More]
Apache Sqoop: Highlights of Sqoop 2
The popularity of Apache Sqoop (incubating) in enterprise systems confirms that Sqoop does bulk transfer admirably. That said, to enhance its functionality, Sqoop needs to fulfill data integration use-cases as well as become easier to manage and operate. Sqoop 2 addresses these issues and its high-level design overview is detailed in this post.
Inaugural Sqoop Meetup
Earlier this month over 30 people attended the inaugural Sqoop Meetup on the eve of Hadoop World in NYC. Faces were put to names, troubleshooting tips were swapped, and stories were topped. This post summarizes the Meetup, posts pictures from it, and links to the three presentations.[Read More]
Apache Sqoop - Overview
This post provides a high-level overview of Apache Sqoop (incubating). It discusses the general problem addressed by Sqoop and provides simple examples on how to use it. This post is written by Arvind Prabhakar, who is a Sqoop committer.[Read More]