Tech Talk Thursday: Weighted Averaging with Apache Hive and SQL Server 2008 R2

Apache Hive Logo

By J’son Cannelos
Partner/Principal Architect

To Be Is To Do        -Shakespeare

To Do Is To Be        – Voltaire

DoBeDoBeDo           – Sinatra

I recently had an interesting challenge with a client who wanted to see how long a particular group of users would watch an online video. Through a partnership with Video Content leader, Freewheel (, our client knew exactly when a user hit a particular online video, how long they stayed there, and for good measure, that the user belonged to a group of segments that further defined who they were. A user could belong to segments such as “College Men” or “Mothers”, et. In all, there were more than 100 segments available and user could belong to more than one (although cross over was rare).

Due to the sheer amount of log data available for the client, Apache Hadoop was the only reasonable way to store and backup this data while still allowing for reasonably quick analysis. Installed with the cluster was Apache Hive – a data warehouse system that allows for ad-hoc queries and more – no Java required! See Raul Olvera’s recent article, “Starting with Hive“ (, for a good primer on this subject.

Click here for the rest of this article.

Click here for more information on eSage Group.