The Future of Enterprise Analytics

Over the last couple weeks since the 2016 Hadoop Summit in San Jose, eSage Group has been discussing the future of big data and enterprise analytics.  Quick note – Data is data and data is produced by everything, thus big data is really no longer an important term.

hspeopleeSage Group is specifically focused on the tidal wave of sales and marketing data that is being collected across all channels, to name a few:

  • Websites – Cross multiple sites, Clicks, Pathing, Unstructured web logs, Blogs
  • SEO –  Search Engine, Keywords, Placement, URL Structure, Website Optimization
  • Digital Advertising – Format, Placement, Size, Network
  • Social
    • Facebook – Multiple pages, Format (Video, Picture, GIF), Likes (now with emojis), Comments, Shares, Events, Promoted, Platform (mobile, tablet, PC) and now Facebook Live
    • Instagram – Picture vs Video, Follows, Likes, Comments, Reposts (via 3rd Party apps), LiketoKnow.it, Hashtags, Platform
    • Twitter – Likes, RT, Quoted RT, Promoted, Hashtags, Platform
    • SnapChat – Follows, Unique views, Story completions, Screenshots.  SnapChat to say the least is still the wild west as to what brands can do to engage and ultimately drive behavior.

Then we have Off-Line (Print, TV, Events,  etc). Partners. 3rd Party DataDon’t get me started on International Data. 

Tired yet?

blog

While sales and marketing organizations see the value of analytics, they are hindered by what is accessible from the agencies they work with and by the difficulty of accessing internal siloed data stored across functions within the marketing organization – this includes central corporate marketing, divisional/product groups, field marketing, product planning, market research and operations.

Marketers are hindered by access to the data and the simple issue of not knowing what data is being collected.  Wherever the data lies, it is often controlled by a few select people that service the marketers and don’t necessary know the value of the data they have collected.  Self-service and exploration is not possible yet.

Layer on top this the fact that agile marketing campaigns require real-time data (at least close real time) and accurate attribution/predictive analytics.

So, you can see there are a lot of challenges that face a marketing team, let alone the deployment of an enterprise analytics platform that can service the whole organization.

Now that I have outlined the business challenges, let’s look at what technologies were mentioned at the 2016 Hadoop Summit that are being developed to solve some of these issues.

  • Cloud, cloud, cloud– lots of data can be sent up, then actively used or sent to cold storage on or off prem.  All the big guys have the “best” cloud platform
  • Security – divisional and function roles, organization position, workflow
  • Self-Service tools – ease of data exploration, visualization, costs
  • Machine Learning and other predictive tools
  • Spark
  • Better technical tools to work with Hadoop, other analytics tools and data stores
  • And much more!  

Next post, we will focus on the technical challenges and tools that the eSage Group team is excited about.

Cheers! Tina

 

 

 

Seeking 4 mid/senior level engineers to work on a Cloud-based Big Data project

eSage Group is always on the lookout for talented developers at all levels.  We have worked hard to create a company culture of sharp, quick learning, hardworking professionals who enjoy being part of a winning team with high expectations.   As such, we hire self-motivated people with excellent technical abilities who also exhibit keen business acumen and a drive for customer satisfaction and solving our client’s business challenges.   We have quarterly profit sharing based on companywide goals, allowing everyone on the team to participate in and enjoy the rewards of our careful but consistently strong growth. We are currently looking to fill 4 openings to complete a team that will be working together on a large-scale “big data” deployment on AWS.

  1. Cloud-operations specialist who can design a distributed platform for analyzing terabytes of data using MapReduce, Hive, and Spark.
  2. Cloud-database engineer who can construct an enterprise caliber database architecture and schema for a high-performance Cloud-based platform that stores terabytes of data from several heterogeneous data sources.
  3. Mid/senior-level software developer with extensive experience in Java, who can write and deploy a variety of data processing algorithms using Hadoop.
  4. A technical business analyst who can translate business requirements into user stories and envision them through Tableau charts/reports.

1) Cloud-operations specialist: • Bachelor’s degree in Computer Science or related field; or, 4 years of IT work experience • Familiarity with open-source programming environments and tools (e.g., ant, maven, Eclipse) • Comfortable using the Linux operating system, and familiarity with command-line tools (e.g., awk, sed, grep, scp, ssh). • Experience working with Web/Cloud-based systems (e.g., AWS, REST) • Knowledge of database concepts, specifically, SQL syntax • Data warehouse architecture, modeling, profiling and integration experience • Comfortable using the command line (e.g., Bash), experience with systems deployment and maintenance (e.g., cron job scheduling, iptables) • Practical work experience designing and deploying large-scale Cloud-based solutions on AWS using EC2, EBS, and S3 • Working knowledge of one or more scripting languages (e.g., Perl, Python) • Experience using systems management infrastructure (e.g., LDAP, Kerberos, Active Directory) and deployment software (e.g., Puppet, Chef) • Programming ability in an OOP language (e.g., Java, C#, C++) is a plus 2) Cloud-database engineer: • Bachelor’s degree in Computer Science or related field; or, 4 years of IT work experience • Familiarity with open-source programming environments and tools (e.g., ant, maven, Eclipse) • Comfortable using the Linux operating system, and familiarity with command-line tools (e.g., awk, sed, grep, scp, ssh). • Experience working with Web/Cloud-based systems (e.g., AWS, REST) • Knowledge of database concepts, specifically, SQL syntax • Firm grasp of databases and distributed systems; expert knowledge of SQL (i.e., indexes, stored procedures, views, joins, SISS) • Extensive experience envisioning, designing, and deploying large-scale database systems both in traditional computational environments and in the Cloud • Ability to design complex data ETLs and database schemas • Desire to work with many heterogeneous terabyte-scale datasets to identify and extract Business Intelligence • Experience using multiple DBMS (e.g., MySQL, PostgreSQL, Oracle, SQL Server) • Work experience using Hive and NOSQL databases is a plus 3) Mid/senior-level software developer: • Bachelor’s degree in Computer Science or related field; or, 4 years of IT work experience • Familiarity with open-source programming environments and tools (e.g., ant, maven, Eclipse) • Comfortable using the Linux operating system, and familiarity with command-line tools (e.g., awk, sed, grep, scp, ssh). • Experience working with Web/Cloud-based systems (e.g., AWS, REST) • Knowledge of database concepts, specifically, SQL syntax • Excellent Java developer with knowledge of software design practices (e.g., OOP, design patterns) who writes sustainable programs and employs coding best practices • Ability to program, build, troubleshoot, and optimize new or existing Java programs • Several years development experience using both version control (e.g., SVN, Git) and build management systems (e.g., Ant, Maven) • Able to create and debug programs both within IDE environments and also on the command line • Working knowledge of Web development frameworks and distributed systems (e.g., Spring, REST APIs) • Experience using Hadoop ecosystem (e.g., MapReduce, Hive, Pig, Shark, Spark, Tez) to program, build, and deploy distributed data processing jobs • Programming ability in Scala is a plus 4) Technical Business Analyst: • Strong background in business intelligence • Minimum of 1 year using Tableau and Tableau server. • Able to work closely with cross-functional business groups to define reporting requirements and use-cases • Extensive experience manipulating data (e.g., data cubes, pivot tables, SSIS) • Passion for creating insight out of data and data investigation • Experience using R, Mahout, or Matlab is a plus Please send resumes to tinam (at) esagegroup (dot) com

Tech Talk Thursday – Remove Table References in Hive ORDER BY clause

By J’son Cannelos – Partner / Principal Architect, eSage Group

“In God we trust; all others pay cash.”  

– Bob French, New Orleans Tuxedo Jazz Musician (1938 – 2012)

This fairly simple Hive issue was driving me nuts for a while, so I wanted to get it out to the blog while its still fresh on my mind.

Take the following innocent Hive query:

select distinct s.date_local, s.user_id from slice_played s Where LENGTH(s.user_id) > 0 and s.date_local >= ‘2012-10-07’ and s.date_local <= ‘2012-10-08’ order by s.date_local desc limit 150;

Time and time again this would return:

Error in semantic analysis. Invalid table alias or column reference s

After removing each piece of the query, it turns out that the culprit was the ORDER BY clause. This piece is seems to be illegal.

order by s.date_local

Why you ask? Because, apparently, Hive doesn’t allow table references in the ORDER BY clause! Ack!

The solution is pretty simple, but not intuitive. You need to either a) remove the table reference in the fields in your ORDER BY clause or b) alias the columns you would like to use in the order by clause. Here is the corrected Hive query that works:

select distinct s.date_local as date_pacific, s.user_id from slice_played s Where LENGTH(s.user_id) > 0 and s.date_local >= ‘2012-10-07’ and s.date_local <= ‘2012-10-08’ order by date_pacific desc limit 150;

I’ve fell into this trap several times now. In our Hive implementation, we pretty much force strict mode (hive.mapred.mode = strict), so we have to alias tables, use existing partitions in the WHERE clause, et.

According to this JIRA link (https://issues.apache.org/jira/browse/HIVE-1449), it’s a known issue. It just says that table references are a no-no, so you don’t need to really alias your columns, however; column aliases seem safer to me. I could just as easily be joining to several tables with a “date_local” column.

Hope this helps and happy coding!
Sincerely,

J’son

eSage Group Becomes Hortonworks Systems Integration Partner

HDP ImageWe are excited about our partnership with Hortonworks and the value it adds to our customers needing to unlock the insights that are in Big Data!

eSage Group leverages Hortonworks Data Platform to integrate Microsoft Office and Server tools with Big Data – extracting valuable marketing insights in just weeks

SEATTLE – June 13, 2012 – eSage Group, an enterprise business intelligence consultancy, today announced it has become a Hortonworks Systems Integration Partner.  Hortonworks is a leading commercial vendor promoting the innovation, development and support of Apache Hadoop. The Hortonworks Data Platform is a 100 percent open source platform powered by Apache Hadoop, which makes Hadoop easy to consume and use in enterprise environments.

eSage Group is the first Hortonworks Systems Integration Partner that specializes in delivering sales and marketing analytics using trusted Microsoft tools such as Excel, PowerPivot, and backend applications like Microsoft SQL Server in which both IT and marketers are most familiar.  eSage Group helps organizations capture and analyze structured and unstructured data (Big Data) to gain new insights that were previously not available.

“eSage Group has the unique combination of business acumen and the technical expertise to harness the vast amount of data from disparate marketing channels and make it meaningful and actionable,” said Mitch Ferguson, vice president of business development, Hortonworks.  “By partnering with Hortonworks and leveraging the Hortonworks Data Platform for customer engagements, eSage Group can deliver increased value and data insights to their customers.”

Helping Marketers Make Business Sense Out of Big Data

To derive value from data residing in today’s vast number of marketing channels – web sites, social, CRM, digital advertising, mobile, and unstructured “Big Data” – organizations must develop robust business intelligence to extract hidden relationships across these channels and derive new insights. The result is more effective marketing campaigns, improved customer engagement, increased customer lifetime value and ultimately greater revenue.

eSage Group helps marketers bridge the gap between marketing and IT. By leveraging its Intelligent Enterprise Marketing Platform with decades’ worth of business intelligence expertise, eSage Group creates a roadmap for integration that allows organizations to capture key insights in just a few weeks versus months or even years.

“Savvy marketing organizations today have an unprecedented opportunity to leap ahead of their competition by tapping the valuable customer insight locked in big data,” said Duane Bedard, eSage Group President. “eSage Group couples an Agile marketing process with newly available tools to quickly and easily expose this information, allowing marketers to create targeted, rapid messaging that drives sales and engages their existing customers.”

About eSage Group

Founded in 1998, eSage Group is an enterprise technology consultancy that helps marketers make sense out of marketing data, including Big Data. Leveraging its Intelligent Enterprise Marketing Platform (iEMP) with business intelligence expertise, eSage Group helps clients create and implement an agile marketing strategy and infrastructure, enabling them to map key sales and marketing goals to actionable performance metrics.  With eSage, organizations can gain deep cross-channel understanding of their customers, track which marketing campaigns are meeting their overall marketing goals, and ultimately increase revenue.  eSage Group’s customers include Disney, Microsoft, Chase, Classmates.com, ViAir, Wireless Services, and many more.

About Hortonworks

Hortonworks is a leading commercial vendor of Apache Hadoop, the preeminent open source platform for storing, managing, and analyzing big data.  Our distribution, Hortonworks Data Platform powered by Apache Hadoop, provides an open and stable foundation for enterprises and a growing ecosystem to build and deploy big data solutions. Hortonworks is the trusted source for information on Hadoop and together with the Apache community, Hortonworks is making Hadoop more robust and easier to install, manage, and use. Hortonworks provides unmatched technical support, training and certification programs for enterprises, systems integrators, and technology vendors. For more information, visit www.hortonworks.com.

About The Hortonworks Systems Integration Partner Program

The Hortonworks Systems Integrator Partner Program was created to train, certify and enable a broad ecosystem of systems integrators to deliver expert Apache Hadoop consulting and integration services. For more information about the program, visit http://hortonworks.com/partners/systems-integrator-partner-program/.

Tech Talk Thursday: Weighted Averaging with Apache Hive and SQL Server 2008 R2

Apache Hive Logo

By J’son Cannelos
Partner/Principal Architect

To Be Is To Do        -Shakespeare

To Do Is To Be        – Voltaire

DoBeDoBeDo           – Sinatra

I recently had an interesting challenge with a client who wanted to see how long a particular group of users would watch an online video. Through a partnership with Video Content leader, Freewheel (http://www.freewheel.tv/), our client knew exactly when a user hit a particular online video, how long they stayed there, and for good measure, that the user belonged to a group of segments that further defined who they were. A user could belong to segments such as “College Men” or “Mothers”, et. In all, there were more than 100 segments available and user could belong to more than one (although cross over was rare).

Due to the sheer amount of log data available for the client, Apache Hadoop was the only reasonable way to store and backup this data while still allowing for reasonably quick analysis. Installed with the cluster was Apache Hive – a data warehouse system that allows for ad-hoc queries and more – no Java required! See Raul Olvera’s recent article, “Starting with Hive“ (https://esagegroup.wordpress.com/2012/05/17/starting-with-hive/), for a good primer on this subject.

Click here for the rest of this article.

Click here for more information on eSage Group.