Saffron is more than just a spice!

panoramaLast night was the 8th eSage Group co-sponsored Seattle Scalability MeetUp hosted at WhitePages.com. There were about 130 people in attendance to hear about HBase and Saffron. Very cool stuff!! Here is the SlideShare.

Summary:

Nick Dimiduk from Hortonworks, the father of HBase, gave us a sneak peek at what’s in store for the developer using HBase as a backing datastore for web apps. He reviewed the standard HBase client API before going into a framework architecture that makes HBase development more like other frameworks designed for developer productivity. He then went over fundamentals like rowkey design and column family considerations and also dug into how to tap coprocessors to add functionality to apps that otherwise might normally be overlooked.

Nick’s Bio: Nick Dimiduk is an engineer and hacker with a respect for customer-driven products. He started using HBase before it was a thing, and co-wrote HBase in Action to share that experience. He studied Computer Science & Engineering at The Ohio State University, specifically programming languages, and artificial intelligence.

Paul Hofmann from Saffron gave a talk titled “Sense Making And Prediction Like The Human Brain.” It was an amazing presentation on machine learning and predictive analytics. Cool stuff!!

Abstract of Paul’s talk: There is growing interest in automating cognitive thinking, but can machines think like humans? Associative memories learn by example like humans. We present the world’s fastest triple store -SaffronMemory Base- for just in time machine learning. Saffron Memory Base uncovers connections, counts and context in the raw data. It builds out of the box a semantic graph from hybrid data sources. Saffronstores the graph and its statistics in matrices that can be queried in real time even for Big Data. Connecting the DotsWe demonstrate the power of entity rank for real time search by the example of the London Bomber and Twitter sentiment analysis. Illuminating the Dots We show the power of Saffron’s model free approach for pattern recognition and prediction on a couple of real world examples like Boeing’s use case of predictive maintenance for aircraft and risk prediction at The Bill and Melinda Gates Foundation.

Pauls Bio: Dr. Paul Hofmann is an expert in AI, computer simulations and graphics. He is CTO of Saffron Technology, a Big Data predictive analytics firm named top 5 coolest vendors in Enterprise Information Management by Gartner. Before joining Saffron, Paul was VP of Research at SAP Labs in Silicon Valley. He has authored two books and numerous publications. Paul received his Ph.D. in Physics at the Darmstadt University of Technology.

Make sure to put April 24th for the next Scalability MeetUp at RedFin.

Tech Talk Thursday – Remove Table References in Hive ORDER BY clause

By J’son Cannelos – Partner / Principal Architect, eSage Group

“In God we trust; all others pay cash.”  

– Bob French, New Orleans Tuxedo Jazz Musician (1938 – 2012)

This fairly simple Hive issue was driving me nuts for a while, so I wanted to get it out to the blog while its still fresh on my mind.

Take the following innocent Hive query:

select distinct s.date_local, s.user_id from slice_played s Where LENGTH(s.user_id) > 0 and s.date_local >= ‘2012-10-07’ and s.date_local <= ‘2012-10-08’ order by s.date_local desc limit 150;

Time and time again this would return:

Error in semantic analysis. Invalid table alias or column reference s

After removing each piece of the query, it turns out that the culprit was the ORDER BY clause. This piece is seems to be illegal.

order by s.date_local

Why you ask? Because, apparently, Hive doesn’t allow table references in the ORDER BY clause! Ack!

The solution is pretty simple, but not intuitive. You need to either a) remove the table reference in the fields in your ORDER BY clause or b) alias the columns you would like to use in the order by clause. Here is the corrected Hive query that works:

select distinct s.date_local as date_pacific, s.user_id from slice_played s Where LENGTH(s.user_id) > 0 and s.date_local >= ‘2012-10-07’ and s.date_local <= ‘2012-10-08’ order by date_pacific desc limit 150;

I’ve fell into this trap several times now. In our Hive implementation, we pretty much force strict mode (hive.mapred.mode = strict), so we have to alias tables, use existing partitions in the WHERE clause, et.

According to this JIRA link (https://issues.apache.org/jira/browse/HIVE-1449), it’s a known issue. It just says that table references are a no-no, so you don’t need to really alias your columns, however; column aliases seem safer to me. I could just as easily be joining to several tables with a “date_local” column.

Hope this helps and happy coding!
Sincerely,

J’son

Tech Talk Thursday – SSRS ReportViewer Chart Caching Issue – Resolved!

I’ve wanted to write about this for some time, as it was a pretty tough challenge.

First, some background: A couple years ago, I built a custom SSRS Parameter Viewer Control to replace the one that ships with ASP.NET ReportViewer control. The default prompt area of the ReportViewer was not very visually appealing and selection changes to the parameters themselves often caused full postbacks! L I understand that’s been fixed now with SSRS 2008/R2, but it is still lacking in text lookup and other features that I have since added to my custom control.

This is a pretty long post, so I PDF’d it:  SSRS ReportView Chart Caching Issue – Resolved!

eSage is co-sponsoring the Seattle Scalability MeetUp

Seattle Scalability MeetUp
The group listening to the presentation. Thank you Microsoft for hosting us!
Post MeetUp Social sponsored by eSage. It was a pretty darn good turn out. About 35-40 people attended!

eSage is in its second month of being the host of the post-MeetUp “MeetUp” for the Seattle Scalability MeetUp.  It is a time where attendees can chat casually about all things Big Data and enjoy a beverage on eSage.

We are excited to be supporting the Hadoop community in Seattle in a fun way!

The Seattle Scalability MeetUp a group of folks who use/are interested in scalable computing technologies, mostly Hadoop, HBase, and NoSQL platforms.

They have had attendees and speakers from Amazon, Facebook, Microsoft, Visible Technologies, Drawn to Scale, U.S. National Labs, and many more!

Groups are usually 75-100 attendees.

Usually they have:

  • 1 or 2 ~20 minute “Feature” presentations
  • Up to 4 “lighting talks”
  • Friendly and helpful group discussion
  • And Pizza!!
Hortonworks provided the pizza!

They are going to start rotating the location between Seattle and the Eastside.

If you would like more information or have a suggestion on a topic, email Tina at tinam (at) esagegroup (dot) com and she will pass them along to the organizers.

Tech Talk Thursday: Weighted Averaging with Apache Hive and SQL Server 2008 R2

Apache Hive Logo

By J’son Cannelos
Partner/Principal Architect

To Be Is To Do        -Shakespeare

To Do Is To Be        – Voltaire

DoBeDoBeDo           – Sinatra

I recently had an interesting challenge with a client who wanted to see how long a particular group of users would watch an online video. Through a partnership with Video Content leader, Freewheel (http://www.freewheel.tv/), our client knew exactly when a user hit a particular online video, how long they stayed there, and for good measure, that the user belonged to a group of segments that further defined who they were. A user could belong to segments such as “College Men” or “Mothers”, et. In all, there were more than 100 segments available and user could belong to more than one (although cross over was rare).

Due to the sheer amount of log data available for the client, Apache Hadoop was the only reasonable way to store and backup this data while still allowing for reasonably quick analysis. Installed with the cluster was Apache Hive – a data warehouse system that allows for ad-hoc queries and more – no Java required! See Raul Olvera’s recent article, “Starting with Hive“ (https://esagegroup.wordpress.com/2012/05/17/starting-with-hive/), for a good primer on this subject.

Click here for the rest of this article.

Click here for more information on eSage Group.

Starting With Hive

Raul at St. Paddy's Day Run
Raul Overa

By Raul Overa, Software Engineer

So you have Big Data stored in Hadoop and want to make it accessible to Non-Java programmers?  Hives lets you access your data without the need to create Map Reduce jobs .  They let you access your data with SQL-like language and takes cares of translating it into Map Reduce jobs, so if you already know SQL, you can start using Hive almost immediately.

Now if you have Hive installed and configured, there is a couple of small steps you need to take in order to be able to extract data with SQL-like queries.  Click here for the full  Starting with Hive article.

Thursday Tech Talk with J’son

Setting Up Excel Services in SharePoint 2010 for Testing

May 1, 2012
By J’son Cannelos, Partner/Principal ArchitectExcel Services

Microsoft SharePoint 2010 is here to stay. According to global360.com, 67% of companies participating in a recent survey reporting deploying SharePoint in an enterprise environment. Managing document workflows and attaching content to business processes were reported as the highest priorities were given as reasons for using SharePoint. At the same time, Microsoft Office Excel is the de-facto tool for data analysis and enablement. Everyone from the company CEO on down to the accountant uses Excel today for examining their present situation and forecasting the future. Could web enabling Excel via SharePoint be far behind?

Excel Services has been around since SharePoint 2007; however it’s made a big leap in SharePoint 2010. A good write up on Excel Services for SharePoint 2010 is located here (yep, you even get Slicers!):

http://blogs.office.com/b/microsoft-excel/archive/2009/11/11/excel-services-in-sharepoint-2010-dashboard-improvements.aspx

The service allows Excel spreadsheets to be presented in a web browser using a slick Excel-like interface. External data connections, workbook calculations, user defined functions, and charts are supported out of the box for a true near desktop experience. Business stakeholders love and need Excel? Check. They need to access Excel workbooks and reports anytime, anywhere, even on a computer without Microsoft Office? Check!

Continue reading here.