Tag Archives: Hadoop

The DataHive on Apache Hive

Hive is a terrific Big Data tool
Regular Apis Flores Nest Closeup image courtesy of The Beehive, Oxford | Maths in the City.

As DataHive Consulting, we have been remiss in not mentioning anything about Hive up until now, especially since we think Hive is the easiest way to start using Hadoop for those just starting to make the jump from structured to unstructured data. For those just starting to look into Big Data, Apache Hive is a data warehouse software built on top of Hadoop, which supports the management, querying, and analysis of distributed datasets. It includes ETL (extract-load-transfer) tools, MapReduce-based queries, metadata storage, and indexing. But most importantly, it can all be managed through HiveQL, a query language similar to SQL. Although it lacks full ACID functionality at this point, Hive is a quick way to use Hadoop for those who have SQL and/or MapReduce framework experience.

Here’s a couple of our favorite starting points for learning more about Hive:

Where are you picking up your Hive tips? Please feel free to share in the comments!