If you want to learn more about Hadoop there are many resources
at your disposal, one such resource is books. I keep a list of Hadoop books
privately, so I thought I’d put it on-line to save other people having to do
the same research
FEB
22ND 2014 - UPDATED
3 new books added to the list!
Books for
Hadoop & Map Reduce
The Definitive guide is in some ways the ‘hadoop bible’, and can
be an excellent reference when working on Hadoop, but do not expect it to
provide a simple getting started tutorial for writing a Map Reduce. This book
is great for really understanding how everything works and how all the systems
fit together.
This is the book if you need to know the ins and outs of
prototyping, deploying, configuring, optimizing, and tweaking a production
Hadoop system. Eric Sammer is a very knowledgeable
engineer, so this book is chock full of goodies.
Design Patterns is a great resource to get some insight into how
to do non-trivial things with Hadoop. This book goes into useful detail on how
to design specific types of algorithms, outlines why they should be designed
that way, and provides examples.
One of the few non-O’Reilly books in this list, Hadoop in Action
is similar to the definitive guide in that it provides a good reference for what Hadoop is and how
to use it. It seems like this book provides a more gentle introduction to
Hadoop compared to the other books in this list.
A slightly more advanced guide to running Hadoop. It includes
chapters that detail how to best move data around, how to think in Map Reduce,
and (importantly) how to debug and optimize your jobs.
This A-Press book claims it will guide you through initial
hadoop set up while also helping you avoid many of the pitfalls that usual
Hadoop novices encounter. Again it is similar in contents to Hadoop in Action and The Definitive Guide
Another Hadoop intro book, Hadoop Essentials focuses on
providing a more practical introduction to Hadoop which seems ideal for a CS
classroom setting
A book which aims to provide real-world examples of common
hadoop problems. It also covers building integrated solutions using surrounding
tools (hive, pig, girafe, etc)
The cookbook provides an introduction to installing /
configuring Hadoop along with ‘more than 50 ready-to-use Hadoop MapReduce
recipes’.
Released July 2013 this book promises to guide readers through
writing and testing Cascading based workflows. This is one of the few books
written about higher level Map Reduce frameworks, so I’m excited to give it a
read.
A front to back guide to YARN, the next generation task
management layer for Hadoop. This book is written (in part) by the YARN project
founder, and the project lead.
This book is built around seven map reduce ‘recipes’ to learn
from. It aims to be a consise, practical guide to get you coding.
Books for
related projects
A detailed guide for understanding, running, debugging, and
extending Hive
Programming Pig describes pig, walks you through how to use it,
and helps you understand how to extend it
This book is to HBase what the Hadoop Guide is to Hadoop, a comprehensive walk-through of HBase, how it
works, how to use it, and how it is designed.
A standalone Sqoop recipe book which covers common usage and
integrations
Apache Mahout is a set of machine learning libraries for Hadoop.
This book provides a hands-on introduction and some sample use-cases.
Holden walks through the ins and outs of Apache Spark including
set up, interactive querying, and job deployment. Fun fact - I used to work
with Holden, he’s a super smart guy so I’m sure this book is excellent.
Bonus
Russell introduces his own version of an agile tool-set for data
analysis and exploration. The book covers both investigative tools (like Apache
Pig), and visualization tools like D3. His pitch is pretty compelling
That’s It
There are many, many books on more general topics of big data,
data science, analytics, etc, but I think I’ve covered the main books that
specifically focus on Hadoop and related projects. Please email me or tweet me
if I’ve missed anything!

0 comments:
Post a Comment