If you get this NullPointer exception when joining two tables in
hadoop hive, the problem may be that in one of the two tables the join key
value is “” (blank string).
For example, if you’re running this query:
select
users.id, locations.address from users left outer join locations on
users.location_id = locations.id;
and users.location_id happens to be “” somewhere, then you will
get this error.
(sometimes I’ve even had it happen because another, non-join
column was “”)
To get around this an easy workaround is to create a temporary
table holding users with location_id’s that aren’t “”, and with only the
columns absolutely needed to process the job.
This drove me crazy for hours today, so hopefully it won’t
happen again
Further Reading
There are lots of good Hadoop books on the market covering a
variety of topics. Here is a fairly comprehensive
list of books, complete with a high level summary for each.

0 comments:
Post a Comment