The Great Data Migration – Part One

June - A Field Guide - Blog Banner.pngI don’t care who doubts our ability to succeed, as long as it’s nobody on my team.
– Kobe Bryant, Los Angeles Lakers Guard

Prepare For Takeoff

Everyone, these days, is jettisoning on premise storage and sending their data to the cloud. The reasons are varied, but generally come down to two factors: time and cost. Cloud storage, from any of the major providers like Amazon or Microsoft, can cost less than $0.02 per GB per month. Compare that to Apple’s revolutionary magnetic hard drive that debuted in 1981. It had 5MB of storage and cost $3,500, which is over $700,000 per GB. Ok, there was no monthly fee, but I digress. 😉 Time is usually how long it takes to get a new server, file share, or document repository installed in your corporate headquarters vs simply storing new data in Amazon S3 or Microsoft Azure. Or perhaps it’s the amount of IT resources that are needed to keep depreciating and out classed data centers up and running.

There are many advantages to cloud storage, which won’t be rehashed here. If you need a refresher (or convincing), this site may come in handy:

For the moment, let’s assume that you have decided to move your data to the cloud. This article will help you decide where to move it, how best to do so, and an ideal way to keep it updated and fresh.

 Where To Go?

In today’s cloud landscape, there are two players: Amazon and Microsoft. There are others, such as Google, but Amazon Web Services (AWS) and Microsoft Azure hold the keys. In addition to storage, they both offer services such as Virtual Machines, Caching, Load Balancing, REST interfaces, Web hosting and more, which can handle your other applications, should you need to migrate them to the cloud in the future. There are pros and cons to each, but both will handle your data securely, provide timely and cost effective access, and transparently maintain ready to use backups in case of unforeseen events. Let’s break them both down:

blog - aws s3AWS S3 (Simple Storage Service) is, as the name states, pretty simple. It has a free tier with 5GB of data and then breaks down into 3 categories – Standard, Infrequent Access (IA), and Glacier. If you just need to stash old data in the cloud and have no idea how it will be used in the future, use IA or Glacier for extremely cheap storage. Glacier is only $0.004 per GB vs Standard at $0.023 per GB per month (US West Region). The trade of with Glacier and IA is that it takes a little longer to get at the data you want to use, anywhere from a few minutes to several hours. Data can be moved up down from Standard, IA, and Glacier tiers so, for instance, those old application logs that no one was using can quickly be made available for reporting when needed.

Standard storage is what most people use for quick access to data. For the first 50TB per month, the price is $0.023 per month (US West Region). Anything can be stored here, such as images, binary files, text data, et. AWS S3 uses “buckets” to contain data, which can have an unlimited number of object. Each object within a bucket is limited to 5TB in size. For a breakdown on AWS S3 pricing, go here:

We’ll discuss how to migrate data to S3 a bit later. For now, know that access to your S3 data is through the AWS web console and a variety of APIs, such as the AWS S3 API, the s3:// or s3n:// file protocols, et. AWS S3 is secure by default, with only the bucket / object creators having initial access to the data. Permission is granted via IAM roles, secret / access keys, and other methods that are out of scope for today. A good primer for S3, including security, can be found at the S3 FAQ:

blog - azure storageAzure Storage has a lot more options that AWS S3. This can be confusing at first but also offers the most in terms of flexibility and redundancy for your data. There is a small free tier, and as a counterpart to AWS Glacier, Azure Storage offers “Azure Cool Blob Storage” for your archival, compliance, or other “don’t use but can’t throw away” data. Prices are usually less than $0.01 per GB per month in some regions.

Unlike S3, Azure Storage comes in several flavors of redundancy, so one can choose how many backups of their data exist and how easily they are accessed. If you have easily replaceable data, say from a 3rd party API or data source, then choose the cheaper LRS (Locally Redundant Storage) option which will keep local copies of your data in a single Azure data center. Need a durable, always available, “a crater can hit my data center yet I’m still OK” option? Then RA-GRS (Read-Access Geographically Redundant Storage) is the preferred option. This will ensure that copies of your data are also maintained at a second data center hundreds of miles away, yet always available for easy access. Middle ground hot and cool options exist as well. For a breakdown of Azure Storage pricing, please visit here:

Note: AWS S3 is functionally equivalent to Azure Storage GRS (Geographically Redundant Storage), so use this option when comparing prices.

Azure Storage uses “containers”, instead of buckets like S3, and containers can contain blobs, tables, or queues (discussed below). There is no limit to object size, yet each container can “only” hold 500TB. However, there can be multiple containers per storage account. Access to data is through the Azure Portal, Azure Storage Explorer, PowerShell, the Azure CLI (command line interface), and other APIs and file protocols.

Aside from regular blobs, there are a couple different types of data that can be stored in containers: tables and queues. Think of both as convenient layers laid over the raw data, to ease read and write access for different applications and scenarios.

 blog - azure storage tablesAn Azure Storage (AST) is essentially a NoSQL key-value store. If you’re not sure what that is, then you likely don’t need it. 🙂 NoSQL data stores support massive scalability yet the dataset and server sharding that is normally necessary (and a headache) for this is handled for you. AST, like other NoSQL datasets, supports a flexible schema model which allows one to keep customer data, application logs, web logs, and more – all with different schemas – in the same table. Learn more here:

blog - azure Storage QueueAzure Storage Queue (ASQ) provides cloud-based messaging between application components. Having a central messaging queue is critical for different applications and parts of applications that are often decoupled and need to scale independently of one another. This would only likely be needed if you have applications which currently store message data on-premises, but needs to be migrated to the cloud. The size of each message is limited to 64K, but there can be an almost unlimited number of messages (up to the container limit). Learn more here:

Another option unique to Azure in general, is the ability to link your corporate Windows Active Directory or Office 365 Active Directory with the Azure Active Directory (Azure AD). This feature, called Azure AD Connect, allows SSO (single sign on) between on-prem and cloud based applications and services. It is easy, for example, to quickly setup permissions and roles that manage access to essential services and storage across your organization.

For a rundown on security and encryption options, please visit:

This is great for raw storage, but now what about my DATABASE?

Almost every organization has relational data. While this can be extracted and placed into raw storage, it’s often easier to just lift it entirely into the cloud and go from there. Both Amazon and Azure platforms support numerous relational database hosting options, from Azure’s SQL Database to AWS’s Relational Database Service and many options in between. We’ll look at some of them here:

blog - aws rds.pngAWS Relational Database Service (RDS) allows easily deploying and scaling relational databases in the cloud. It frees one from the hassle of managing servers, patching, clustering and other IT heavy tasks. Also, it supports six different flavors of database: Amazon Aurora, PostgreSQL, MySQL, MariaDB, Oracle, and Microsoft SQL Server. One unique option is the ability to offload read traffic to one or more “Read Replicas” and thus increase availability and performance of your primary database instance. Database security differs depending upon your database flavor (some support encryption at rest, et) but the AWS RDS itself can be secured by being deployed within an organization’s AWS VPC (Virtual Private Cloud). In my opinion, AWS as a service has a simpler approach to security than Azure, because more AWS services can be setup behind the VPC, which acts as a gateway to sensitive data and applications. Learn more about AWS RDS here:

blog - aws redshift.pngAWS Redshift is Amazon’s data warehouse in the sky. Essentially a supersized PostgreSQL, it provides scalable, cost-effective SQL based storage that includes queries that can run both on S3 (via Redshift Spectrum) and Redshift. It stores data in a columnar-based fashion, giving fast query times over massive amounts of data. It might be overkill if your dataset is small, but if you have petabytes (or exabytes) of structured data that need analyzing quickly, Redshift can likely handle it. Start with AWS Redshift’s home page here:

blog - aws athenaAWS Athena is a new service which attempts to blend the raw and relational data worlds together. Simply point it at an S3 bucket, define a schema, write your SQL query, and go. You only pay for the queries run on the raw data and the schema definition can be reused with other queries, modified for another run, or simply tossed away when finished. Also, Athena can turn around and store the results back into AWS S3 or be used by another workflow to furnish data. By not having a permanent relational layer, data workflows and ETLs have less steps and less points of failure. Learn more about AWS Athena here:

blog - azure SQLAzure SQL Database is as simple as it gets: An SQL Server database as a service. No Virtual Machines or licenses to manage, no patching, lots of redundancy, and fast performance. One can pay per database or in “elastic pool database units” (EPDUs) which can be spread resources across many databases. Azure SQL database is meant for small to medium size databases (up to 50GB) that can operate independently from one another, as in a multi-tenant application or reporting solution. If you have a lot of SQL data to migrate, it is a good idea to break it up and store it along date or business lines or make the jump to Azure SQL Data Warehouse, a larger service meant for enterprise intentions (see below). Connections are made through a standard ODBC / JDBC connection string, with the URI as the service endpoint for your database.

Keep in mind, that this is not the same as full SQL Server. Since there is no real “server” involved, most system stored procedures (DBCC, et) and distributed transactions won’t work and SQL Server peripheral services, such as SQL Server Reporting Services (SSRS) and SQL Server Integration Services (SSIS) are not included. These voids can be filled by other services in the Azure stack; however, or by running a full copy of SQL Server in an Azure Virtual Machine (see below). Although a look at Azure Analytics is out of scope here, you should know that Azure supports an entire range of analytical services which can consume data from Azure SQL databases. Learn more about Azure SQL Database here:

blog - azure SQL Data WarehouseAzure SQL Data Warehouse (ASDW) is for enterprise grade data warehouse deployments. Like AWS Redshift, one can scale compute and storage independently and prioritize analytical workloads or long term storage. It also lets you pause the compute side entirely, turning ASDW into an archival data store when analytics aren’t needed. It also leverages Microsoft’s PolyBase service, which allows queries to execute against other data sources, such as Azure Data Lake, Azure HDInsight, or even another on-prem SQL Server data warehouse. Unlike Azure SQL database, Azure SQL data warehouse stores data in a columnar-based format, for maximum performance and scalability. Learn more about the Azure SQL Data Warehouse here –

 Rolling Your Own – AWS and Azure Virtual Machines

Of course, if you need advanced SQL Server features, such as SQL Server Analysis Services (SSAS), or want to run a completely different database type, you can always spin up a Virtual Machine (VM) and install the relational database software there. Many VM images from both AWS and Azure come with SQL Server, Oracle, or other software preinstalled, so all you need is your licensing information. Also, some images include the cost of the database software, effectively renting the license to users for a monthly fee. This can be useful, for example, if you would like to try out the features of SQL Server Enterprise before making a full purchase. Virtual Machines are also useful when ETLs and data workflows need to also be migrated to the cloud, as the VM can simply host the software required to run it.

NOTE: When you go the VM route, you are usually responsible for hardware provisioning/formatting, software patches, service upgrades, and maintaining secure access (through firewall rules, etc) to your system. A good pro/con for evaluating Azure SQL Database vs SQL Server on Azure VMs can be found here:

Now, how do I get it up there?

Alright, you’ve chosen your cloud platform, you know what data to move… or do you? How do you prioritize what goes and what stays? Stay tuned for The Great Migration – Part II, where I’ll cover next steps in how to lift your data into the cloud.

Hope this helps and happy migrating! Feel free to email me at with any additional questions!

Sincerely,  J’son 


The Future of Enterprise Analytics

Over the last couple weeks since the 2016 Hadoop Summit in San Jose, eSage Group has been discussing the future of big data and enterprise analytics.  Quick note – Data is data and data is produced by everything, thus big data is really no longer an important term.

hspeopleeSage Group is specifically focused on the tidal wave of sales and marketing data that is being collected across all channels, to name a few:

  • Websites – Cross multiple sites, Clicks, Pathing, Unstructured web logs, Blogs
  • SEO –  Search Engine, Keywords, Placement, URL Structure, Website Optimization
  • Digital Advertising – Format, Placement, Size, Network
  • Social
    • Facebook – Multiple pages, Format (Video, Picture, GIF), Likes (now with emojis), Comments, Shares, Events, Promoted, Platform (mobile, tablet, PC) and now Facebook Live
    • Instagram – Picture vs Video, Follows, Likes, Comments, Reposts (via 3rd Party apps),, Hashtags, Platform
    • Twitter – Likes, RT, Quoted RT, Promoted, Hashtags, Platform
    • SnapChat – Follows, Unique views, Story completions, Screenshots.  SnapChat to say the least is still the wild west as to what brands can do to engage and ultimately drive behavior.

Then we have Off-Line (Print, TV, Events,  etc). Partners. 3rd Party DataDon’t get me started on International Data. 

Tired yet?


While sales and marketing organizations see the value of analytics, they are hindered by what is accessible from the agencies they work with and by the difficulty of accessing internal siloed data stored across functions within the marketing organization – this includes central corporate marketing, divisional/product groups, field marketing, product planning, market research and operations.

Marketers are hindered by access to the data and the simple issue of not knowing what data is being collected.  Wherever the data lies, it is often controlled by a few select people that service the marketers and don’t necessary know the value of the data they have collected.  Self-service and exploration is not possible yet.

Layer on top this the fact that agile marketing campaigns require real-time data (at least close real time) and accurate attribution/predictive analytics.

So, you can see there are a lot of challenges that face a marketing team, let alone the deployment of an enterprise analytics platform that can service the whole organization.

Now that I have outlined the business challenges, let’s look at what technologies were mentioned at the 2016 Hadoop Summit that are being developed to solve some of these issues.

  • Cloud, cloud, cloud– lots of data can be sent up, then actively used or sent to cold storage on or off prem.  All the big guys have the “best” cloud platform
  • Security – divisional and function roles, organization position, workflow
  • Self-Service tools – ease of data exploration, visualization, costs
  • Machine Learning and other predictive tools
  • Spark
  • Better technical tools to work with Hadoop, other analytics tools and data stores
  • And much more!  

Next post, we will focus on the technical challenges and tools that the eSage Group team is excited about.

Cheers! Tina




Executive Evening Out with eSage Group and Microsoft

On March 10th, eSage Group held its first Executive Evening Out at the exclusive Rainier Club in Downtown Seattle. The event was sponsored by Microsoft Advanced Analytics.

On March 10th, eSage Group held its first Executive Evening Out at the exclusive Rainier Club in Downtown Seattle.  The event was sponsored by Microsoft Advanced Analytics.

15 Seattle area executives, from the likes of Starbucks, Trupanion, Allrecipes, Alaska Air, Disney and their guests joined us for a short presentation by Shish Shridhar, Worldwide Director for Business Intelligence Solutions – Retail for Microsoft, then sat down to a 5 course meal with wine pairings presented by The Rainier Club sommelier.

Microsoft has a powerful offering, from Azure Machine Learning, Cortana Analytics Suite, SQL 2016 and PowerBI. It was definitely a learning experience along with a wonderful meal and wines.


Immediate Job Opening – Mexico Based Microsoft BI Stack Engineer

Microsoft BIThe eSage Group is a Marketing Data Analytics firm established in 1998, headquartered in Seattle, Washington, USA. We have Fortune 500 clients including Disney, the LA Times, and Microsoft.

 Our company is always on the lookout for talented developers at all levels in both Mexico and the US. We have worked hard to create a company culture of sharp, quick learning, hardworking professionals who enjoy being part of a winning team with high expectations. As such, we hire self-motivated people with excellent technical abilities who also exhibit keen business acumen and a drive for customer satisfaction and solving our client’s business challenges.

 We have a strong remote team in various locations in Mexico, including Monterrey, Aguascalientes, Mexico City, and Guadalajara. All employees work from home. All employees are full-time employees, not contractors.

We need  Mid-Level Software Engineers (3+ years of experience) with the Microsoft BI stack.esage 2logo

All candidates must have a strong interested in business intelligence and marketing analytics. They must be willing to work with other companies at the same time. They must have a strong desire to understand business problems and look to disparate data sources to integration to gain insights to help solve the business problems.

• Advanced English Skills both written and spoken
• Advanced Excel and SQL skills
• Strong past data analysis experience
• Good oral and written communication skills
• A keen eye for detail is required. This person must be extremely detail oriented
• Ability to produce high quality, accurate, deliverables
• Proven ability to work under pressure with deadlines
• Ability to learn quickly, follow direction, and execute tasks independently

Technical Skills
• SQL Server
Knowledge of databases, stored procedures, and writing T-SQL script. Should know about primary and foreign keys, indexes and why they are important. Should know how to use temporary tables and know something about the use of cursors within a script. Basic error handling within a script would also be nice.

• OLAP / Analysis Services
Ability to design and build an OLAP database within Visual Studio. Understand the following concepts: Data Source View, calculated measures, named set, referenced dimension, Measure Group. Able to deploy and process an OLAP database. Understanding of basic MDX and how it is different from SQL. Knows the difference between a set, a tuple, and a value.

• SSIS (Integration Services)
Experience designing SSIS packages that can extract, transform, and load data. Use of Data Flow and Script Task components. Able to use C# and variables within SSIS. Able to package and deploy SSIS components.

• C# / .NET Framework
This includes the ability to create custom classes, interfaces, and events. Understands concepts like inheritance, polymorphism, referenced assemblies, and delegates. Should be comfortable working with lists (.NET or custom) and familiar with the standard .NET types arrays, dictionaries, lists, structs. Knowledge of Generics is a plus.

Please email your resume to tinam (at) esagegroup (.) com and/or complete form below.