The Future of Enterprise Analytics

Over the last couple weeks since the 2016 Hadoop Summit in San Jose, eSage Group has been discussing the future of big data and enterprise analytics.  Quick note – Data is data and data is produced by everything, thus big data is really no longer an important term.

hspeopleeSage Group is specifically focused on the tidal wave of sales and marketing data that is being collected across all channels, to name a few:

  • Websites – Cross multiple sites, Clicks, Pathing, Unstructured web logs, Blogs
  • SEO –  Search Engine, Keywords, Placement, URL Structure, Website Optimization
  • Digital Advertising – Format, Placement, Size, Network
  • Social
    • Facebook – Multiple pages, Format (Video, Picture, GIF), Likes (now with emojis), Comments, Shares, Events, Promoted, Platform (mobile, tablet, PC) and now Facebook Live
    • Instagram – Picture vs Video, Follows, Likes, Comments, Reposts (via 3rd Party apps), LiketoKnow.it, Hashtags, Platform
    • Twitter – Likes, RT, Quoted RT, Promoted, Hashtags, Platform
    • SnapChat – Follows, Unique views, Story completions, Screenshots.  SnapChat to say the least is still the wild west as to what brands can do to engage and ultimately drive behavior.

Then we have Off-Line (Print, TV, Events,  etc). Partners. 3rd Party DataDon’t get me started on International Data. 

Tired yet?

blog

While sales and marketing organizations see the value of analytics, they are hindered by what is accessible from the agencies they work with and by the difficulty of accessing internal siloed data stored across functions within the marketing organization – this includes central corporate marketing, divisional/product groups, field marketing, product planning, market research and operations.

Marketers are hindered by access to the data and the simple issue of not knowing what data is being collected.  Wherever the data lies, it is often controlled by a few select people that service the marketers and don’t necessary know the value of the data they have collected.  Self-service and exploration is not possible yet.

Layer on top this the fact that agile marketing campaigns require real-time data (at least close real time) and accurate attribution/predictive analytics.

So, you can see there are a lot of challenges that face a marketing team, let alone the deployment of an enterprise analytics platform that can service the whole organization.

Now that I have outlined the business challenges, let’s look at what technologies were mentioned at the 2016 Hadoop Summit that are being developed to solve some of these issues.

  • Cloud, cloud, cloud– lots of data can be sent up, then actively used or sent to cold storage on or off prem.  All the big guys have the “best” cloud platform
  • Security – divisional and function roles, organization position, workflow
  • Self-Service tools – ease of data exploration, visualization, costs
  • Machine Learning and other predictive tools
  • Spark
  • Better technical tools to work with Hadoop, other analytics tools and data stores
  • And much more!  

Next post, we will focus on the technical challenges and tools that the eSage Group team is excited about.

Cheers! Tina

 

 

 

Seeking 4 mid/senior level engineers to work on a Cloud-based Big Data project

eSage Group is always on the lookout for talented developers at all levels.  We have worked hard to create a company culture of sharp, quick learning, hardworking professionals who enjoy being part of a winning team with high expectations.   As such, we hire self-motivated people with excellent technical abilities who also exhibit keen business acumen and a drive for customer satisfaction and solving our client’s business challenges.   We have quarterly profit sharing based on companywide goals, allowing everyone on the team to participate in and enjoy the rewards of our careful but consistently strong growth. We are currently looking to fill 4 openings to complete a team that will be working together on a large-scale “big data” deployment on AWS.

  1. Cloud-operations specialist who can design a distributed platform for analyzing terabytes of data using MapReduce, Hive, and Spark.
  2. Cloud-database engineer who can construct an enterprise caliber database architecture and schema for a high-performance Cloud-based platform that stores terabytes of data from several heterogeneous data sources.
  3. Mid/senior-level software developer with extensive experience in Java, who can write and deploy a variety of data processing algorithms using Hadoop.
  4. A technical business analyst who can translate business requirements into user stories and envision them through Tableau charts/reports.

1) Cloud-operations specialist: • Bachelor’s degree in Computer Science or related field; or, 4 years of IT work experience • Familiarity with open-source programming environments and tools (e.g., ant, maven, Eclipse) • Comfortable using the Linux operating system, and familiarity with command-line tools (e.g., awk, sed, grep, scp, ssh). • Experience working with Web/Cloud-based systems (e.g., AWS, REST) • Knowledge of database concepts, specifically, SQL syntax • Data warehouse architecture, modeling, profiling and integration experience • Comfortable using the command line (e.g., Bash), experience with systems deployment and maintenance (e.g., cron job scheduling, iptables) • Practical work experience designing and deploying large-scale Cloud-based solutions on AWS using EC2, EBS, and S3 • Working knowledge of one or more scripting languages (e.g., Perl, Python) • Experience using systems management infrastructure (e.g., LDAP, Kerberos, Active Directory) and deployment software (e.g., Puppet, Chef) • Programming ability in an OOP language (e.g., Java, C#, C++) is a plus 2) Cloud-database engineer: • Bachelor’s degree in Computer Science or related field; or, 4 years of IT work experience • Familiarity with open-source programming environments and tools (e.g., ant, maven, Eclipse) • Comfortable using the Linux operating system, and familiarity with command-line tools (e.g., awk, sed, grep, scp, ssh). • Experience working with Web/Cloud-based systems (e.g., AWS, REST) • Knowledge of database concepts, specifically, SQL syntax • Firm grasp of databases and distributed systems; expert knowledge of SQL (i.e., indexes, stored procedures, views, joins, SISS) • Extensive experience envisioning, designing, and deploying large-scale database systems both in traditional computational environments and in the Cloud • Ability to design complex data ETLs and database schemas • Desire to work with many heterogeneous terabyte-scale datasets to identify and extract Business Intelligence • Experience using multiple DBMS (e.g., MySQL, PostgreSQL, Oracle, SQL Server) • Work experience using Hive and NOSQL databases is a plus 3) Mid/senior-level software developer: • Bachelor’s degree in Computer Science or related field; or, 4 years of IT work experience • Familiarity with open-source programming environments and tools (e.g., ant, maven, Eclipse) • Comfortable using the Linux operating system, and familiarity with command-line tools (e.g., awk, sed, grep, scp, ssh). • Experience working with Web/Cloud-based systems (e.g., AWS, REST) • Knowledge of database concepts, specifically, SQL syntax • Excellent Java developer with knowledge of software design practices (e.g., OOP, design patterns) who writes sustainable programs and employs coding best practices • Ability to program, build, troubleshoot, and optimize new or existing Java programs • Several years development experience using both version control (e.g., SVN, Git) and build management systems (e.g., Ant, Maven) • Able to create and debug programs both within IDE environments and also on the command line • Working knowledge of Web development frameworks and distributed systems (e.g., Spring, REST APIs) • Experience using Hadoop ecosystem (e.g., MapReduce, Hive, Pig, Shark, Spark, Tez) to program, build, and deploy distributed data processing jobs • Programming ability in Scala is a plus 4) Technical Business Analyst: • Strong background in business intelligence • Minimum of 1 year using Tableau and Tableau server. • Able to work closely with cross-functional business groups to define reporting requirements and use-cases • Extensive experience manipulating data (e.g., data cubes, pivot tables, SSIS) • Passion for creating insight out of data and data investigation • Experience using R, Mahout, or Matlab is a plus Please send resumes to tinam (at) esagegroup (dot) com

Got Data? Big Data Panel in LA a success!

I can’t believe it has been two weeks since the AMA Los Angeles and eSage Group sponsored Big Data panel in LA. It was a full house at BlankSpaces in Downtown LA. The 3 panelists where Raj Babu from Universal Music Group, Christopher Bridges from ValueClick, and Brian Kao from AEG. eSage Group’s Duane Bedard moderated. There was lots on great insights from the panelists. I will be posting more edited clips, but for now, here are a few pictures and a video clip!

eSageBigData-4eSageBigData-1eSageBigData-3

Definitions for “Big Data” – A Starting Point

Big Data

Written by Rob Lawrence, eSage Group’s Strategic Relationship Manager

Will someone please tell us all, once and for all, just what in tarnation is Big Data? What is it? Where is it? Who’s doing what with it? And why are they doing that? In one blog article I can maybe just scratch the surface of those questions. I might even provide some level of understanding for those curious marketers, bewildered and attempting to make heads or tails of the concept of Big Data. I could certainly dive deeper than even that because I’ve spent some time with this, and done homework, and lived Big Data. But this is a blog article, not a dissertation, so I’ll keep it at a 10,000 foot view of the ever elusive, yet intriguing, Big Data!

If you are one of the rare data scientists that have graduated recently from one of few schools offering Big Data degrees, which makes you an expert in this field, please feel free to stop reading here, or continue on to better understand what the rest of us are, well, trying to grasp when it comes to Big Data. For the rest of us, here is my take on the whole Big Data craze:

Big Data is simply all the data available. That means, in realistic terms, all of the data one can gather about a subject from all the places data resides: data sitting in some long forgotten enterprise software program in the basement of a large corporation, data from social media websites, website traffic data (click-through’s and pathing and such), text from blogs, even data from a sensor on a rocket ship or bridge in Brooklyn (not sure if they’re using sensor data on the Brooklyn Bridge, but they could be). Sources of data are vast, and growing. It’s cheaper to store data than ever before, and we now have the computing capability to sift through it, so now there is lots more data being collected, “Big” amounts of Data are being stored and analyzed. There is a lot you can do with all this Big Data, but this is where it gets dicey. You can collect all kinds of data with one subject, question or problem in mind, but end up realizing (through analysis) more important information about a totally different subject, question or problem. That’s why Big Data is so confusing to lots of folks just getting their hands dirty with it, and apparently also why it is so valuable to Marketers, Engineers, CEO’s, The FBI, Data Geeks, and anyone else interested in edging out the competition. Let’s explore some basics:

Wikipedia says: “Big data usually includes data sets with sizes beyond the ability of commonly used software tools to capture, curate, manage, and process the data within a tolerable elapsed time. Big data sizes are a constantly moving target, as of 2012 ranging from a few dozen terabytes to many petabytes of data in a single data set. With this difficulty, new platforms of “big data” tools are being developed to handle various aspects of large quantities of data.”

The Big Data Institute says: “Big Data is a term applied to voluminous data objects that are variety in nature – structured, unstructured or a semi-structured, including sources internal or external to an organization, and generated at a high degree of velocity with an uncertainty pattern, that does not fit neatly into traditional, structured, relational data stores and requires strong sophisticated information ecosystem with high performance computing platform and analytical capabilities to capture, process, transform, discover and derive business insights and value within a reasonable elapsed time.”

So, we’ve only scratched the surface of truly understanding what Big Data is here in this blog, and really the multitude of possibilities Big Data represents has only begun to unfold to those of us using it to better understand whatever it is we’re collecting data about. I hope at a minimum by reading this you have gained a better understanding of what “Big Data” is, but moreover, a curiosity to learn more and perhaps even apply it to something you are working on. These are exciting times whether you are using data for marketing or designing a new rocket ship to explore Mars. Big things are coming, and it’s all due to Big Data!

Here are some great articles I’ve recently enjoyed regarding Big Data:

Saffron is more than just a spice!

panoramaLast night was the 8th eSage Group co-sponsored Seattle Scalability MeetUp hosted at WhitePages.com. There were about 130 people in attendance to hear about HBase and Saffron. Very cool stuff!! Here is the SlideShare.

Summary:

Nick Dimiduk from Hortonworks, the father of HBase, gave us a sneak peek at what’s in store for the developer using HBase as a backing datastore for web apps. He reviewed the standard HBase client API before going into a framework architecture that makes HBase development more like other frameworks designed for developer productivity. He then went over fundamentals like rowkey design and column family considerations and also dug into how to tap coprocessors to add functionality to apps that otherwise might normally be overlooked.

Nick’s Bio: Nick Dimiduk is an engineer and hacker with a respect for customer-driven products. He started using HBase before it was a thing, and co-wrote HBase in Action to share that experience. He studied Computer Science & Engineering at The Ohio State University, specifically programming languages, and artificial intelligence.

Paul Hofmann from Saffron gave a talk titled “Sense Making And Prediction Like The Human Brain.” It was an amazing presentation on machine learning and predictive analytics. Cool stuff!!

Abstract of Paul’s talk: There is growing interest in automating cognitive thinking, but can machines think like humans? Associative memories learn by example like humans. We present the world’s fastest triple store -SaffronMemory Base- for just in time machine learning. Saffron Memory Base uncovers connections, counts and context in the raw data. It builds out of the box a semantic graph from hybrid data sources. Saffronstores the graph and its statistics in matrices that can be queried in real time even for Big Data. Connecting the DotsWe demonstrate the power of entity rank for real time search by the example of the London Bomber and Twitter sentiment analysis. Illuminating the Dots We show the power of Saffron’s model free approach for pattern recognition and prediction on a couple of real world examples like Boeing’s use case of predictive maintenance for aircraft and risk prediction at The Bill and Melinda Gates Foundation.

Pauls Bio: Dr. Paul Hofmann is an expert in AI, computer simulations and graphics. He is CTO of Saffron Technology, a Big Data predictive analytics firm named top 5 coolest vendors in Enterprise Information Management by Gartner. Before joining Saffron, Paul was VP of Research at SAP Labs in Silicon Valley. He has authored two books and numerous publications. Paul received his Ph.D. in Physics at the Darmstadt University of Technology.

Make sure to put April 24th for the next Scalability MeetUp at RedFin.

eSage Group is excited to be a part of the PSAMA MarketMix!

 MarketMixBig data offers a lot of promise and opportunities for improving the way we do marketing.  As floods of data pour in from social media, mobile, weblogs,  digital advertising, CRM, POS, etc., companies need to effectively store it and develop robust analytics to mine the data for knowledge. By gaining new insights, marketers can tailor our marketing message to provide customers with the most relevant information and better engage with them through the lifecycle.  But how do we manage this data to make it truly usable?  How do we avoid the perils that comes with identifying and gathering the data, putting the analytics system in place, and getting the right people in place, so we can turn the data into actionable insights?

eSage Group’s very own Duane Bedard will lead a panel discussion on this and more at the Puget Sound American Marketing Association MarketMix on March 20th.  Panelists include Shish Shirdhar from Microsoft, Romi Mahajan from KKM Group, and Adam Weiner from RedFin.

ShiSh Shridhar is the Retail Industry Solutions Director at Microsoft and is responsible for strategy around Business Analytics, Big Data & Productivity Solutions for the Retail Industry. ShiSh has worked in Microsoft for the last 16 years across several groups and geographies and has a passion for empowering organizations through collaboration, knowledge management and analytics. ShiSh contributes to Retail Industry magazines, blogs and maintains the Retail Industry twitter presence for Microsoft: @msretail . He also regularly speaks at Industry events. ShiSh loves working on innovative ideas and has a patent in the Social Media space. When he isn’t working he sails and windsurfs the waters around the Puget Sound.  Follow Shish on Twitter at @5h15h.

Romi Mahajan is an award-winning marketer, marketing thinker, and author. His career is a storied one, including spending 9 years at Microsoft, being the first CMO of Ascentium, a leading digital agency, and founding the KKM Group, a boutique advisory firm focused on strategy and marketing. Romi has also authored two books on marketing- the latest one can be found here. A prolific writer and speaker, Mahajan lives in Bellevue, WA, with his wife and two kids. Mahajan graduated from the University of California at Berkeley, at the age of 19 with a Bachelor’s degree in South Asian Studies. He also received a Master’s degree from the University of Texas at Austin. He can be reached at romi@thekkmgroup.com

Adam Weiner is Vice President of Analytics and New Business at Redfin, He leads the company’s efforts to use our proprietary data to build new products for the web and improve our real estate services. He is also responsible for identifying opportunities for business growth that align with Redfin’s overall mission to reinvent the consumer experience for buying and selling real estate. Adam joined Redfin in 2007 on the product management team and was one of the pioneers of the Redfin Partner Program for agents, in addition to our service provider directory, Redfin Open Book. Prior to Redfin, Adam worked at Microsoft in the SQL Server Division for 5 years. Adam graduated from Stanford with a degree in Symbolic Systems, and a concentration in Human-Computer Interaction. Follow Adam on Twitter at @adamRedfin.

You can still register for the event at www.marketmix2013.com!

 

Tech Talk Thursday – Remove Table References in Hive ORDER BY clause

By J’son Cannelos – Partner / Principal Architect, eSage Group

“In God we trust; all others pay cash.”  

– Bob French, New Orleans Tuxedo Jazz Musician (1938 – 2012)

This fairly simple Hive issue was driving me nuts for a while, so I wanted to get it out to the blog while its still fresh on my mind.

Take the following innocent Hive query:

select distinct s.date_local, s.user_id from slice_played s Where LENGTH(s.user_id) > 0 and s.date_local >= ‘2012-10-07’ and s.date_local <= ‘2012-10-08’ order by s.date_local desc limit 150;

Time and time again this would return:

Error in semantic analysis. Invalid table alias or column reference s

After removing each piece of the query, it turns out that the culprit was the ORDER BY clause. This piece is seems to be illegal.

order by s.date_local

Why you ask? Because, apparently, Hive doesn’t allow table references in the ORDER BY clause! Ack!

The solution is pretty simple, but not intuitive. You need to either a) remove the table reference in the fields in your ORDER BY clause or b) alias the columns you would like to use in the order by clause. Here is the corrected Hive query that works:

select distinct s.date_local as date_pacific, s.user_id from slice_played s Where LENGTH(s.user_id) > 0 and s.date_local >= ‘2012-10-07’ and s.date_local <= ‘2012-10-08’ order by date_pacific desc limit 150;

I’ve fell into this trap several times now. In our Hive implementation, we pretty much force strict mode (hive.mapred.mode = strict), so we have to alias tables, use existing partitions in the WHERE clause, et.

According to this JIRA link (https://issues.apache.org/jira/browse/HIVE-1449), it’s a known issue. It just says that table references are a no-no, so you don’t need to really alias your columns, however; column aliases seem safer to me. I could just as easily be joining to several tables with a “date_local” column.

Hope this helps and happy coding!
Sincerely,

J’son