The Future of Enterprise Analytics

Over the last couple weeks since the 2016 Hadoop Summit in San Jose, eSage Group has been discussing the future of big data and enterprise analytics.  Quick note – Data is data and data is produced by everything, thus big data is really no longer an important term.

hspeopleeSage Group is specifically focused on the tidal wave of sales and marketing data that is being collected across all channels, to name a few:

  • Websites – Cross multiple sites, Clicks, Pathing, Unstructured web logs, Blogs
  • SEO –  Search Engine, Keywords, Placement, URL Structure, Website Optimization
  • Digital Advertising – Format, Placement, Size, Network
  • Social
    • Facebook – Multiple pages, Format (Video, Picture, GIF), Likes (now with emojis), Comments, Shares, Events, Promoted, Platform (mobile, tablet, PC) and now Facebook Live
    • Instagram – Picture vs Video, Follows, Likes, Comments, Reposts (via 3rd Party apps), LiketoKnow.it, Hashtags, Platform
    • Twitter – Likes, RT, Quoted RT, Promoted, Hashtags, Platform
    • SnapChat – Follows, Unique views, Story completions, Screenshots.  SnapChat to say the least is still the wild west as to what brands can do to engage and ultimately drive behavior.

Then we have Off-Line (Print, TV, Events,  etc). Partners. 3rd Party DataDon’t get me started on International Data. 

Tired yet?

blog

While sales and marketing organizations see the value of analytics, they are hindered by what is accessible from the agencies they work with and by the difficulty of accessing internal siloed data stored across functions within the marketing organization – this includes central corporate marketing, divisional/product groups, field marketing, product planning, market research and operations.

Marketers are hindered by access to the data and the simple issue of not knowing what data is being collected.  Wherever the data lies, it is often controlled by a few select people that service the marketers and don’t necessary know the value of the data they have collected.  Self-service and exploration is not possible yet.

Layer on top this the fact that agile marketing campaigns require real-time data (at least close real time) and accurate attribution/predictive analytics.

So, you can see there are a lot of challenges that face a marketing team, let alone the deployment of an enterprise analytics platform that can service the whole organization.

Now that I have outlined the business challenges, let’s look at what technologies were mentioned at the 2016 Hadoop Summit that are being developed to solve some of these issues.

  • Cloud, cloud, cloud– lots of data can be sent up, then actively used or sent to cold storage on or off prem.  All the big guys have the “best” cloud platform
  • Security – divisional and function roles, organization position, workflow
  • Self-Service tools – ease of data exploration, visualization, costs
  • Machine Learning and other predictive tools
  • Spark
  • Better technical tools to work with Hadoop, other analytics tools and data stores
  • And much more!  

Next post, we will focus on the technical challenges and tools that the eSage Group team is excited about.

Cheers! Tina

 

 

 

Definitions for “Big Data” – A Starting Point

Big Data

Written by Rob Lawrence, eSage Group’s Strategic Relationship Manager

Will someone please tell us all, once and for all, just what in tarnation is Big Data? What is it? Where is it? Who’s doing what with it? And why are they doing that? In one blog article I can maybe just scratch the surface of those questions. I might even provide some level of understanding for those curious marketers, bewildered and attempting to make heads or tails of the concept of Big Data. I could certainly dive deeper than even that because I’ve spent some time with this, and done homework, and lived Big Data. But this is a blog article, not a dissertation, so I’ll keep it at a 10,000 foot view of the ever elusive, yet intriguing, Big Data!

If you are one of the rare data scientists that have graduated recently from one of few schools offering Big Data degrees, which makes you an expert in this field, please feel free to stop reading here, or continue on to better understand what the rest of us are, well, trying to grasp when it comes to Big Data. For the rest of us, here is my take on the whole Big Data craze:

Big Data is simply all the data available. That means, in realistic terms, all of the data one can gather about a subject from all the places data resides: data sitting in some long forgotten enterprise software program in the basement of a large corporation, data from social media websites, website traffic data (click-through’s and pathing and such), text from blogs, even data from a sensor on a rocket ship or bridge in Brooklyn (not sure if they’re using sensor data on the Brooklyn Bridge, but they could be). Sources of data are vast, and growing. It’s cheaper to store data than ever before, and we now have the computing capability to sift through it, so now there is lots more data being collected, “Big” amounts of Data are being stored and analyzed. There is a lot you can do with all this Big Data, but this is where it gets dicey. You can collect all kinds of data with one subject, question or problem in mind, but end up realizing (through analysis) more important information about a totally different subject, question or problem. That’s why Big Data is so confusing to lots of folks just getting their hands dirty with it, and apparently also why it is so valuable to Marketers, Engineers, CEO’s, The FBI, Data Geeks, and anyone else interested in edging out the competition. Let’s explore some basics:

Wikipedia says: “Big data usually includes data sets with sizes beyond the ability of commonly used software tools to capture, curate, manage, and process the data within a tolerable elapsed time. Big data sizes are a constantly moving target, as of 2012 ranging from a few dozen terabytes to many petabytes of data in a single data set. With this difficulty, new platforms of “big data” tools are being developed to handle various aspects of large quantities of data.”

The Big Data Institute says: “Big Data is a term applied to voluminous data objects that are variety in nature – structured, unstructured or a semi-structured, including sources internal or external to an organization, and generated at a high degree of velocity with an uncertainty pattern, that does not fit neatly into traditional, structured, relational data stores and requires strong sophisticated information ecosystem with high performance computing platform and analytical capabilities to capture, process, transform, discover and derive business insights and value within a reasonable elapsed time.”

So, we’ve only scratched the surface of truly understanding what Big Data is here in this blog, and really the multitude of possibilities Big Data represents has only begun to unfold to those of us using it to better understand whatever it is we’re collecting data about. I hope at a minimum by reading this you have gained a better understanding of what “Big Data” is, but moreover, a curiosity to learn more and perhaps even apply it to something you are working on. These are exciting times whether you are using data for marketing or designing a new rocket ship to explore Mars. Big things are coming, and it’s all due to Big Data!

Here are some great articles I’ve recently enjoyed regarding Big Data:

Saffron is more than just a spice!

panoramaLast night was the 8th eSage Group co-sponsored Seattle Scalability MeetUp hosted at WhitePages.com. There were about 130 people in attendance to hear about HBase and Saffron. Very cool stuff!! Here is the SlideShare.

Summary:

Nick Dimiduk from Hortonworks, the father of HBase, gave us a sneak peek at what’s in store for the developer using HBase as a backing datastore for web apps. He reviewed the standard HBase client API before going into a framework architecture that makes HBase development more like other frameworks designed for developer productivity. He then went over fundamentals like rowkey design and column family considerations and also dug into how to tap coprocessors to add functionality to apps that otherwise might normally be overlooked.

Nick’s Bio: Nick Dimiduk is an engineer and hacker with a respect for customer-driven products. He started using HBase before it was a thing, and co-wrote HBase in Action to share that experience. He studied Computer Science & Engineering at The Ohio State University, specifically programming languages, and artificial intelligence.

Paul Hofmann from Saffron gave a talk titled “Sense Making And Prediction Like The Human Brain.” It was an amazing presentation on machine learning and predictive analytics. Cool stuff!!

Abstract of Paul’s talk: There is growing interest in automating cognitive thinking, but can machines think like humans? Associative memories learn by example like humans. We present the world’s fastest triple store -SaffronMemory Base- for just in time machine learning. Saffron Memory Base uncovers connections, counts and context in the raw data. It builds out of the box a semantic graph from hybrid data sources. Saffronstores the graph and its statistics in matrices that can be queried in real time even for Big Data. Connecting the DotsWe demonstrate the power of entity rank for real time search by the example of the London Bomber and Twitter sentiment analysis. Illuminating the Dots We show the power of Saffron’s model free approach for pattern recognition and prediction on a couple of real world examples like Boeing’s use case of predictive maintenance for aircraft and risk prediction at The Bill and Melinda Gates Foundation.

Pauls Bio: Dr. Paul Hofmann is an expert in AI, computer simulations and graphics. He is CTO of Saffron Technology, a Big Data predictive analytics firm named top 5 coolest vendors in Enterprise Information Management by Gartner. Before joining Saffron, Paul was VP of Research at SAP Labs in Silicon Valley. He has authored two books and numerous publications. Paul received his Ph.D. in Physics at the Darmstadt University of Technology.

Make sure to put April 24th for the next Scalability MeetUp at RedFin.

eSage Group is excited to be a part of the PSAMA MarketMix!

 MarketMixBig data offers a lot of promise and opportunities for improving the way we do marketing.  As floods of data pour in from social media, mobile, weblogs,  digital advertising, CRM, POS, etc., companies need to effectively store it and develop robust analytics to mine the data for knowledge. By gaining new insights, marketers can tailor our marketing message to provide customers with the most relevant information and better engage with them through the lifecycle.  But how do we manage this data to make it truly usable?  How do we avoid the perils that comes with identifying and gathering the data, putting the analytics system in place, and getting the right people in place, so we can turn the data into actionable insights?

eSage Group’s very own Duane Bedard will lead a panel discussion on this and more at the Puget Sound American Marketing Association MarketMix on March 20th.  Panelists include Shish Shirdhar from Microsoft, Romi Mahajan from KKM Group, and Adam Weiner from RedFin.

ShiSh Shridhar is the Retail Industry Solutions Director at Microsoft and is responsible for strategy around Business Analytics, Big Data & Productivity Solutions for the Retail Industry. ShiSh has worked in Microsoft for the last 16 years across several groups and geographies and has a passion for empowering organizations through collaboration, knowledge management and analytics. ShiSh contributes to Retail Industry magazines, blogs and maintains the Retail Industry twitter presence for Microsoft: @msretail . He also regularly speaks at Industry events. ShiSh loves working on innovative ideas and has a patent in the Social Media space. When he isn’t working he sails and windsurfs the waters around the Puget Sound.  Follow Shish on Twitter at @5h15h.

Romi Mahajan is an award-winning marketer, marketing thinker, and author. His career is a storied one, including spending 9 years at Microsoft, being the first CMO of Ascentium, a leading digital agency, and founding the KKM Group, a boutique advisory firm focused on strategy and marketing. Romi has also authored two books on marketing- the latest one can be found here. A prolific writer and speaker, Mahajan lives in Bellevue, WA, with his wife and two kids. Mahajan graduated from the University of California at Berkeley, at the age of 19 with a Bachelor’s degree in South Asian Studies. He also received a Master’s degree from the University of Texas at Austin. He can be reached at romi@thekkmgroup.com

Adam Weiner is Vice President of Analytics and New Business at Redfin, He leads the company’s efforts to use our proprietary data to build new products for the web and improve our real estate services. He is also responsible for identifying opportunities for business growth that align with Redfin’s overall mission to reinvent the consumer experience for buying and selling real estate. Adam joined Redfin in 2007 on the product management team and was one of the pioneers of the Redfin Partner Program for agents, in addition to our service provider directory, Redfin Open Book. Prior to Redfin, Adam worked at Microsoft in the SQL Server Division for 5 years. Adam graduated from Stanford with a degree in Symbolic Systems, and a concentration in Human-Computer Interaction. Follow Adam on Twitter at @adamRedfin.

You can still register for the event at www.marketmix2013.com!

 

eSage is co-sponsoring the Seattle Scalability MeetUp

Seattle Scalability MeetUp
The group listening to the presentation. Thank you Microsoft for hosting us!
Post MeetUp Social sponsored by eSage. It was a pretty darn good turn out. About 35-40 people attended!

eSage is in its second month of being the host of the post-MeetUp “MeetUp” for the Seattle Scalability MeetUp.  It is a time where attendees can chat casually about all things Big Data and enjoy a beverage on eSage.

We are excited to be supporting the Hadoop community in Seattle in a fun way!

The Seattle Scalability MeetUp a group of folks who use/are interested in scalable computing technologies, mostly Hadoop, HBase, and NoSQL platforms.

They have had attendees and speakers from Amazon, Facebook, Microsoft, Visible Technologies, Drawn to Scale, U.S. National Labs, and many more!

Groups are usually 75-100 attendees.

Usually they have:

  • 1 or 2 ~20 minute “Feature” presentations
  • Up to 4 “lighting talks”
  • Friendly and helpful group discussion
  • And Pizza!!
Hortonworks provided the pizza!

They are going to start rotating the location between Seattle and the Eastside.

If you would like more information or have a suggestion on a topic, email Tina at tinam (at) esagegroup (dot) com and she will pass them along to the organizers.

Tech Talk Thursday: Weighted Averaging with Apache Hive and SQL Server 2008 R2

Apache Hive Logo

By J’son Cannelos
Partner/Principal Architect

To Be Is To Do        -Shakespeare

To Do Is To Be        – Voltaire

DoBeDoBeDo           – Sinatra

I recently had an interesting challenge with a client who wanted to see how long a particular group of users would watch an online video. Through a partnership with Video Content leader, Freewheel (http://www.freewheel.tv/), our client knew exactly when a user hit a particular online video, how long they stayed there, and for good measure, that the user belonged to a group of segments that further defined who they were. A user could belong to segments such as “College Men” or “Mothers”, et. In all, there were more than 100 segments available and user could belong to more than one (although cross over was rare).

Due to the sheer amount of log data available for the client, Apache Hadoop was the only reasonable way to store and backup this data while still allowing for reasonably quick analysis. Installed with the cluster was Apache Hive – a data warehouse system that allows for ad-hoc queries and more – no Java required! See Raul Olvera’s recent article, “Starting with Hive“ (https://esagegroup.wordpress.com/2012/05/17/starting-with-hive/), for a good primer on this subject.

Click here for the rest of this article.

Click here for more information on eSage Group.

Starting With Hive

Raul at St. Paddy's Day Run
Raul Overa

By Raul Overa, Software Engineer

So you have Big Data stored in Hadoop and want to make it accessible to Non-Java programmers?  Hives lets you access your data without the need to create Map Reduce jobs .  They let you access your data with SQL-like language and takes cares of translating it into Map Reduce jobs, so if you already know SQL, you can start using Hive almost immediately.

Now if you have Hive installed and configured, there is a couple of small steps you need to take in order to be able to extract data with SQL-like queries.  Click here for the full  Starting with Hive article.