8 SQL-on-Hadoop frameworks worth checking out ~ Hadoop Tutorials & Materials

Thursday, 29 October 2015

8 SQL-on-Hadoop frameworks worth checking out

by jason58245 on 06:09 No comments

The language of data is SQL, so naturally lots of tools have been developed to bring SQL to Hadoop. They range from simple wrappers on top of Map Reduce to full data warehouse implementations built on top of HDFS and everywhere in between.

There are more tools than you might think, so this is my attempt at listing them all and hopefully providing some insight into what each of them actually does.

I’ve tried to order them based on ‘installation friction’, so the more complex products are towards the bottom.

I’ll cover the following technologies:

· Apache Hive

· Impala

· Presto (Facebook)

· Shark

· Apache Drill

· EMC/Pivotal HAWQ

· BigSQL by IBM

· Apache Pheonix (for HBase)

· Apache Tajo

Apache Hive

Hive is the original SQL-on-Hadoop solution.

Hive is an open-source Java project which converts SQL to a series of Map-Reduce jobs which run on standard Hadoop tasktrackers. It tries to look like MySQL by using a metastore (itself a database) to store table schemas, partitions, and locations. It largely supports MySQL syntax and organizes datasets using familiardatabase/table/view conventions. Hive provides:

· A SQL-like query interface called Hive-QL, loosely modelled after MySQL

· A command line client

· Metadata sharing via a central service

· JDBC drivers

· Multi-language Apache Thrift drivers

· A Java API for creating custom functions and transformations

SHOULD YOU USE IT?

Hive is considered one of the de-facto tools installed on almost all Hadoop installations. It’s simple to set up and doesn’t require much infrastructure to get started with. Given the small cost of use, there’s pretty much no reason to not try it.

That said, queries performed with Hive are usually very slow because of the overhead associated with using Map-Reduce.

THE FUTURE OF HIVE

Hortonworks has been pushing the development of Apache Tez as a new back-end for Hive to provide fast response times currently unachievable using Map Reduce.

Categories: Material

jason58245
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Pellentesque volutpat volutpat nibh nec posuere. Donec auctor arcut pretium consequat. Contact me 123@abc.com

Hadoop Tutorials & Materials

Study Materials, Interview Questions, Sample Resumes, Helpful notes etc

Thursday, 29 October 2015

8 SQL-on-Hadoop frameworks worth checking out

0 comments:

Post a Comment

Student Registration form

Popular Posts

Recent Posts

Unordered List

Sample Text

Categories

Blog Archive