Apache Sqoop

Monday Nov 21, 2011

Inaugural Sqoop Meetup

Over 30 people attended the inaugural Sqoop Meetup on the eve of Hadoop World in NYC. Faces were put to names, troubleshooting tips were swapped, and stories were topped - with the table-to-end-all-tables weighing in at 28 billion rows.

I started off the scheduled talks by discussing "Habits of Effective Sqoop Users." One tip to make your next debugging session more effective was to provide more information up front on the mailing list such as versions used and running with the --verbose flag enabled. Also, I pointed out workarounds to common MySQL and Oracle errors.

Next up was Eric Hernandez's "Sqooping 50 Million Rows a Day from MySQL," where he displayed battle scars from creating a single data source for analysts to mine. Key lessons learned were: (1.) Develop an incremental import when sqooping in large active tables. (2.) Limit the amount of parts that data will be stored in HDFS. (3.) Compress data in HDFS.

The final talk of the night was given by Joey Echeverria on "Scratching Your Own Itch." Joey methodically stepped future Sqoop committers through the science from finding a Sqoop bug, filing a jira, coding a patch, submitting it for review, revising accordingly, and finally to ship it '+1' approval.

With the conclusion of the scheduled talks, the hallway talks commenced and went well into the night. Sqoop Committer Aaron Kimball was even rumored to have shed a tear over the healthy turnout and impending momentum barreling towards the next Sqoop Meetup on the Left Coast. See you there!

Aaron Kimball

Kate Ting

Eric Hernandez

Joey Echeverria

Guest post by Kate Ting.
Photos from Masatake Iwasaki and Kate Ting.

Comments:

Post a Comment:
  • HTML Syntax: NOT allowed

Calendar

Search

Hot Blogs (today's hits)

Tag Cloud

Categories

Feeds

Links

Navigation