Tech Talk Thursday: Weighted Averaging with Apache Hive and SQL Server 2008 R2

Apache Hive Logo

By J’son Cannelos
Partner/Principal Architect

To Be Is To Do        -Shakespeare

To Do Is To Be        – Voltaire

DoBeDoBeDo           – Sinatra

I recently had an interesting challenge with a client who wanted to see how long a particular group of users would watch an online video. Through a partnership with Video Content leader, Freewheel (http://www.freewheel.tv/), our client knew exactly when a user hit a particular online video, how long they stayed there, and for good measure, that the user belonged to a group of segments that further defined who they were. A user could belong to segments such as “College Men” or “Mothers”, et. In all, there were more than 100 segments available and user could belong to more than one (although cross over was rare).

Due to the sheer amount of log data available for the client, Apache Hadoop was the only reasonable way to store and backup this data while still allowing for reasonably quick analysis. Installed with the cluster was Apache Hive – a data warehouse system that allows for ad-hoc queries and more – no Java required! See Raul Olvera’s recent article, “Starting with Hive“ (https://esagegroup.wordpress.com/2012/05/17/starting-with-hive/), for a good primer on this subject.

Click here for the rest of this article.

Click here for more information on eSage Group.

Author: Tina Munro

I am Director of Marketing for eSage Group. Check out my LinkedIn profile here: https://www.linkedin.com/in/tinamunro

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s