Quantcast
Channel: Blog | Dell
Viewing all articles
Browse latest Browse all 17822

Schmarzo’s 2015 #BigData Predictions!!

$
0
0
EMC logo

Bill Schmarzo

Oh boy, I love this time of year!  Lots of candy and cookies to eat, and time to gaze into my crystal ball and, like every other industry “expert”, make outrageous predictions for 2015.  So without further fanfare, let’s jump into it!

The Data Lake Gains Traction

There will certainly be plenty of hype around the data lake this year, and I’m going to do my share of contributing to that hype.  I’ll be speaking at the Santa Clara Strata event in February where I’m talking about the role of the data lake within the Big Data MBA framework; that is, how the data lake helps organizations optimize key business processes and uncover new monetization opportunities.

However, my take on the data lake is a bit different than that of others.  Although I think long-term the data lake has huge potential to empower and/or disrupt the traditional data warehouse market, I think the immediate benefits are much more mundane: 1) provide a line of demarcation between the data warehouse and the newly christened analytic sandbox, and 2) off-load ETL processing off of your expensive, SLA-constrained data warehouse (see Figure 1).  In fact, I think the data warehouse managers will be the biggest beneficiaries of the data lake in 2015 since ETL processing consumes 60 to 70% of the processing cycles on some of today’s largest data warehouses.  Boring!

Figure 1: Modern Data / Analytics Environment for Big Data

Figure 1: Modern Data / Analytics Environment

More Relevant Real-world Business Success Stories…

We’ll start hearing more big data “business success” stories, and not just the same old stories from the same old companies.  We’ll start hearing from municipalities and state governments, casinos and resorts, schools and universities, energy providers, credit unions, small retailers, high-tech manufacturers, distributors and wholesalers, health care providers and payers, and other organizations.

And these success stories will start with small successes – improving marketing campaign effectiveness, increasing customer store visits, improving customer / employee / teacher/ nurse retention, predictive maintenance.  And these small successes will build upon each other. For example, what you learn from improving marketing campaign effectiveness will impact how you improve customer retention and product design.

But then again, maybe these companies won’t talk.  Why give away trade secrets in how they are gaining insights about their customers, products, employees and operations that help them to optimize key business processes and uncover new monetization opportunities?  So then again, maybe we’ll be stuck with the same old stories from the same old companies…

…But Also a Couple of Colossal Hadoop Failures

fig 2I also think that we are likely to hear about a couple of “colossal” big data project failures.  Maybe it’s because Hadoop is being used for tasks for which it was never designed, like being the platform for an organization’s ERP or CRM application?  Or maybe it’s because Hadoop’s naturally batch environment just can’t support a high-volume, low-latency custom OLTP application?  Or maybe because the project didn’t do enough upfront work to understand the decisions that the environment needed to support and the questions or hypotheses that the environment needed to answer? Or maybe it’s just because organizations are still treating Hadoop and big data as yet another technology science experience?

No matter the reason, failure is important because you can sometimes learn more from failure than from success.  We just need to capture, triage and share these failures if everyone is to benefit.

More Native Hadoop Tools and Products

The Silicon Valley and the VC community are working hard to make the data scientist obsolete, even before we’ve come to realize how valuable these folks are.  “Business Objects Killers” and “Tableau Killers” and “SAS Killers” are lurking everywhere and these start-ups are doing a two things that may make them viable options:  1) they are building upon open source technologies (standing on the shoulders of others) and 2) they are building tools that run natively on Hadoop and HDFS; that is, they are building tools and products to run natively on Hadoop and not just treat Hadoop as yet another data source.

If I hear one more RDBMS or Business Intelligence vendor announce “Don’t worry, our products will interoperate with Hadoop,” I think I’ll throw up.  Here’s what I think of that “let’s just interoperate with Hadoop” strategy…
fig 3

Data Governance Moves Front and Center

I love the industry pundits who quickly jump on the “What about data governance?” issue when we talk about big data and the data lake.  Well, what about it?  Of course we know it’s important and of course smart organizations never forgot about it (remember, I said smart organizations).  As the volume of data grows in the data lake, governance becomes even more of a critical tool for answering the data “What is it?”, “Where is it?”, and “Who has access to it?” questions.

However the data governance discussion takes on a new wrinkle when you contemplate data in the data warehouse versus data in the data lake.  As my friend Rachel Haines writes and speaks about data governance in a big data world, organizations are going to realize that there needs to be different “degrees” of data governance:

  • Highly governed: for data in the data warehouse, where the data about what happened last quarter and what shows up in management reports needs to be 100% accurate.
  • Moderately governed: for data that is used to drive predictive and prescriptive analytic models.  The exact level of governance needs to be determined based upon the cost of the model being wrong (see my blog “Understanding Type I and Type II Errors” for insights in how to assign value or cost to the models being wrong).
  • Ungoverned for data that is just being held for now in the data lake and to which no value has yet been attributed.

As Rachel says, in the big data world, the goal for the smart organization should be “Just-enough Data Governance”.  Why waste cycles governing data when that data might not even be used by the organization?  But once the value of that data has been ascertained, then the appropriate degrees of governance need to be determined and applied.

The Rise of the CDMO

What is the CDMO?  It’s the “Chief Data Monetization Officer” and I think CDMO is a much better moniker for the organization’s data champion than “Chief Data Officer” or CDO.  They more I talk about it, the more I don’t like the title Chief Data Officer; it misses the primary responsibility of the CDMO role which is to lead the organization in identifying, valuing, acquiring, analyzing and monetizing the organization’s data assets.

To be successful, the CDMO will have to become proficient at identifying and valuing both internal and external (public, third party, open data) data sources in order to uncover new monetization opportunities.  If the role is only managing data, well, that’s what the Chief Information Officer did.  Organizations need a senior executive whose 100% focus is on how to leverage data in order to create competitive differentiation and drive a more compelling, more profitable customer relationship.  That’s all.

Data Scientist Shortage Shrinks, But…

Universities, colleges and large organizations are scrambling to fill the data scientist resources gap.  Online or on-premise, the number of data science classes and associated degrees and certification are exploding.  And while these educational organizations scramble to teach advanced statistics, data mining and predictive algorithms, analytic tools, and visualization techniques, these data scientists for the most part will continue to fall short of expectations for one simple reason – they just don’t and likely won’t ever understand the business as well as the business stakeholders and Subject Matter Experts (SME).

That’s why our Big Data Vision Workshops and Proof of Value Labs couple the data science team with the business SME’s.  The SMEs live in the business, so they are in the best position to lead the data science team by:

  • Providing a deep understanding of the targeted business initiative including business objectives, key performance indicators against which success will be measured, business stakeholders, time frame, etc.
  •  Identifying and capturing the decisions to be made, questions to be answered, hypothesis to be tested and recommendations to be delivered to particular customers and front-line employees (with respect to the targeted business initiative).
  • Brainstorming additional data sources, both internal to the organization as well as the ever-growing bevy of external, third party, and public data sources.
  • Brainstorming potential lead indicators that could be better predictors of key business initiative performance.
  • Validating the analytic results against the SAM principle.  In other words, are the analytic results Strategic to the business, Actionable by the business stakeholders, and of Material value (where the value of acting on the analytic results is greater than the cost to act)?

Open Source Software Gets Bigger, and Profits More Elusive

Open source software is wonderful, unless you are in the software business.  It’s really hard for a software company to figure out how to make money when the base software platform is free and openly modifiable by anyone.  And 2015 will show that many start-ups haven’t figured out how to make money in this market either.

fig 4In 1994 I was the Vice President of Sales & Marketing for Cygnus Support (another “Forrest Gump” moment in my life).  Cygnus Support was one of the first open source companies.  We provided services and support for the GNU development tools (g++, gcc, gdb, glib).  I came from a software background, so I really struggled to understand how to make money when the base product is available free to anyone who wanted to download and modify it.  But I eventually learned that the real money lay in service and support – that it was easier for companies to contract with us to manage, merge and validate the multiple source code trees, integrate patches, fix bugs, add enhancements (for an additional charge), provide 7×24 support and documentation.  Cygnus Support was eventually acquired by Red Hat, which was adopting this same model to drive Linux to market dominance.

Now I’m seeing several start-up companies trying to create some unique software capability that sits on top of Hadoop.  However the challenge is that software advantages are fleeting in an open source community driven by open source software generating giants such as Facebook, Google, eBay, Yahoo and many, many others.  The bottom line is that the best ideas can and will be eventually replicated by the open source community.

So that leaves two ways to make money on open source products like Hadoop:

  1. Provide outstanding service and support (like Cygnus Support and Red Hat)
  2. Use these products to build customer-specific business solutions (like EMC Global Services)

There are a lot of software start-up companies that believe that there are other options.  I hope that they prove me wrong.

My Next “Big Data MBA” Book?

A lot has happened since I released by book “Big Data: Understanding How Data Powers Big Business” in October 2013.  Lots of new learnings and new approaches have surfaced that can help organizations identify not only where and how to start their big data journeys, but more importantly can help them identify business opportunities to leverage customer, product and operational insights to optimize key business processes and uncover new monetization opportunities.

Maybe I’ll find the time on airplane flights (or in the terminals weathering yet again another flight delay), sitting in hotel lobbies or grabbing a coffee at one of my local watering holes to undertake that next edition – The Big Data MBA!  That edition would further focus on helping to empower the business stakeholders and leveraging big data to power an organization’s value creation processes.

I also plan on continuing to find time to teach my Big Data MBA course.  I find teaching both exhilarating as well as educational…for me.  These classes give me a chance to apply learnings from customer engagements and fine tune approaches and methodologies.  And hopefully everyone wins in that case.

Here’s to a BIG #BigData 2015!

Schmarzo’s 2015 #BigData Predictions!!
Bill Schmarzo


Viewing all articles
Browse latest Browse all 17822

Trending Articles