Just another free Blogger theme - NewBloggerThemes.com

Thursday, 29 October 2015


The language of data is SQL, so naturally lots of tools have been developed to bring SQL to Hadoop. They range from simple wrappers on top of Map Reduce to full data warehouse implementations built on top of HDFS and everywhere in between.
There are more tools than you might think, so this is my attempt at listing them all and hopefully providing some insight into what each of them actually does.
I’ve tried to order them based on ‘installation friction’, so the more complex products are towards the bottom.
I’ll cover the following technologies:
·         Apache Hive
·         Impala
·         Presto (Facebook)
·         Shark
·         Apache Drill
·         EMC/Pivotal HAWQ
·         BigSQL by IBM
·         Apache Pheonix (for HBase)
·         Apache Tajo

Hive is the original SQL-on-Hadoop solution.
Hive is an open-source Java project which converts SQL to a series of Map-Reduce jobs which run on standard Hadoop tasktrackers. It tries to look like MySQL by using a metastore (itself a database) to store table schemas, partitions, and locations. It largely supports MySQL syntax and organizes datasets using familiardatabase/table/view conventions. Hive provides:
·         A SQL-like query interface called Hive-QL, loosely modelled after MySQL
·         A command line client
·         Metadata sharing via a central service
·         JDBC drivers
·         Multi-language Apache Thrift drivers
·         A Java API for creating custom functions and transformations
SHOULD YOU USE IT?
Hive is considered one of the de-facto tools installed on almost all Hadoop installations. It’s simple to set up and doesn’t require much infrastructure to get started with. Given the small cost of use, there’s pretty much no reason to not try it.
That said, queries performed with Hive are usually very slow because of the overhead associated with using Map-Reduce.
THE FUTURE OF HIVE
Hortonworks has been pushing the development of Apache Tez as a new back-end for Hive to provide fast response times currently unachievable using Map Reduce.


Categories:


Lorem ipsum dolor sit amet, consectetur adipiscing elit. Pellentesque volutpat volutpat nibh nec posuere. Donec auctor arcut pretium consequat. Contact me 123@abc.com

0 comments:

Post a Comment