Quantcast
Channel: Blog | Dell
Viewing all articles
Browse latest Browse all 17822

Big Data Technology Transformation

$
0
0
EMC logo

I have been discussing my experiences working with companies starting Big Data projects. Every customer I talk to today is working on or planning projects to leverage predictive analytics to improve customer product experience, improve delivery efficiency, and create new revenue opportunities. In my previous post I shared my thoughts on the people and process transformation that Big Data projects require. Big Data projects also require technology transformation. I will focus on three common technology transformations:

  • Data Lake data management architectures
  • In Memory Database solutions
  • Value of Enterprise Infrastructure

Most customers I have worked with had the need to analyze not only tradition structure data such as transactions, but unstructured data such as video, audio, and images. For example, a call center looking to optimize the quality of customer interactions analyzing the call audio combined with previous transaction data to identify up sale opportunities. Customers need the ability to analyze greater volumes of data. The most recent EMC Digital Universe study predicts electronic data growth for enterprise customers will continue to accelerate. The volumes of digital data will double every two years for the rest of this decade. Certainly the capability to analyze unstructured data is contributing to this growth but Big Data analysts want larger, more complete data sets. Predictive analytics can be more accurate, and precise with larger data sets. Many customers have modified their data management policy to no longer delete data. I see the variety, and volume of data used for Big Data analytics customers are deploying new data management architectures call, Data Lakes. These large content repositories are typically Hadoop File System (HDFS) enabled. HDFS is optimal because it supports management of both structured and unstructured data, and its open API based interface. Most next generation Big Data analytics tools fully support HDFS for data access.

I am seeing more customers needing to incorporate near real time predictive analytics. This requires the ability to store and analyze data quickly using In Memory Data Grid (IMDG) data management solutions. At a high level, IMDG's leverage pools of memory from a cluster of servers as high speed, low latency storage. These IMDG periodically copy the data from the memory grid to HDFS for long term persistence and deep analytics. The combination of high speed, low latency storage (IMDG) with the scale, and openness of HDFS enabled storage is the predominant architecture of my most successful Big Data customers. This Data Lake architecture requires new storage architectures:

  • Hot Edge – high speed, low latency with few data services. Top requirement is raw performance.
  • Processing Core – Good performance with data efficiency services such as compression, and de-dup.
  • Cold Core – Data efficiency services coupled with security service such as encryption, and data protection

One trend I see emerging is customers looking to leverage cloud providers for disaster recovery and/or primary hosting of their cold core capacity. A cache of cold core capacity is kept on site and on application demand data is recalled to the local cold core cache. The cost advantage of leveraging a cloud core storage provider enables companies to keep more data, for longer periods of time. Customers are willing to absorb the performance impact to batch analytics for the extra cost effective capacity. I see this trend accelerating in 2015 as storage gateway technologies and the number of storage service provider's continue to increase.

The next trend I have seen emerge is Big Data projects embracing enterprise infrastructure. Many customers start with a rack of servers, network, and direct attach storage for their initial Big Data project. As their infrastructure capacity need grows they feel the pain of infrastructure support. Customers start looking for ways to improve the reliability, manageability, and support of their infrastructure. The two most common solutions considered are:

  • public cloud – outsourcing that responsibility
  • leverage enterprise grade Big Data infrastructure

Many customers are choosing to build their Big Data infrastructure on premise over public cloud for security, and flexibility. A Big Data practioner needs the ability to deploy the latest new analytics functionality in this quickly evolving market. The reliability and manageability of enterprise grade infrastructure is being achieved more and more through software. That allows the infrastructure to leverage generic server hardware. EMC recently introduced a storage software solution for Big Data, ViPR and we published a Haddop deployment guide a few months ago here. Interestingly many of our customers have asked for an option to buy our storage software bundled with hardware based on industry standard servers with enterprise break/fix support. Customers want the flexibility and low cost of commodity hardware combined with the predictability and support of an enterprise grade solution. Our new EMC Elastic Cloud Storage product is an example of this type solution. I believe we will see an acceleration of this type of infrastructure solution in 2015. It will likely include the addition of Big Data analytics software (i.e. Pivotal Big Data Suite) with the enterprise infrastructure.

Big Data projects will require transformation of your people, process, and technology. The technology infrastructure needed to support Big Data projects must be designed for the scale, velocity, and volume of data that are more complex than your current transaction and data warehouse systems. Your data needs to be easily accessible via HDFS by next generation analytics tools. Although you may use some familiar enterprise infrastructure products you will need to manage and integrate them using modern rack based architectures managing and aggregating capacity via software. The infrastructure capacity may be provided by both on premise and external service providers. Big Data projects are a great opportunity to deploy the infrastructure architectures you will use for the next several years and many of my customers are standing up new dedicated infrastructure for their Big Data systems. What are the technology transformation driven by Big Data projects?


Viewing all articles
Browse latest Browse all 17822

Trending Articles