Quantcast
Channel: Blog | Dell
Viewing all articles
Browse latest Browse all 17822

Workload Evolution and the Rise of Metadata

$
0
0
EMC logo

I have written several blog posts about the following hypothesis:

The evolution of application workflows is a major driver for innovation in IT infrastructures.

In order to prove this point, especially in regard to storage system IT infrastructure, I have written several articles describing:

I spent a good part of my career writing software that handled application workloads for VNX (block-based I/O).  In particular, I wrote a lot of software that performed caching algorithms and RAID algorithms inside of a cached disk array.

As part of my experience handling application workloads (e.g. block read and write requests) I was handling raw bytes of data and didn’t often consider the topic of metadata. 

In a recent conversation with EMC colleague Stephen Manley, we discussed the rise of metadata in the context of the application workloads of the 1990s. More and more applications began to emerge that focused on the management of raw blocks of data (otherwise known as files). By associating metadata with the raw content, applications realized the following benefits:

  • The ability to store data, and access it, in a more logical fashion.
  • The ability to easily share it by creating metadata with access rights.
  • The ability to protect content from unauthorized use.
  • The ability to create multiple copies (e.g. active, backup, and compliance copies) and to know where those copies are.
  • The ability to create workflows around the content and share it with the right people at the right time.

One of the most significant innovations in response to the metadata trend was NAS: Network-Attached Storage. NAS is predominantly about metadata management, or, as Stephen Manley likes to say, "knowing information about the data you are storing so you can do really cool things with it". The value is in the metadata. It dictates the accessibility to the content.

In the same way that application workloads drove the industry from physical disk drives to cached disk arrays, the rise of metadata drove the industry from local file systems to network-attached, shared storage. An example side-by-side view of block and file storage highlights this point.

BlockNAS
The deployment of block and file I/O systems into customer data centers eventually drove the industry to unified architectures supporting both block and file. One notable innovation that resulted was a hybrid  approach known as MPFS. This innovation allowed an application workload to write using a file protocol, but transparently read using the BLOCK protocol. This approach provided, for example, a 3-4x performance increase over traditional file system techniques (the industry eventually adopted this innovation into an industry standard approach called pNFS).

Due to the surge in the generation of unstructured content, the NAS market exploded. In some cases, application workloads began to exceed the capacities (and capabilities) of file system technology, pushing the industry toward a new paradigm: OBJECT.

Applications desired to associate increasing amounts of metadata with their content, which stressed the existing approach for interspersing metadata and content. These workloads began to push the industry deeper into the realm of capacity-oriented workloads, which required further innovations in the IT infrastructure. The diagram below highlights this push down the Y-axis.

Capacity-Oriented Infrastructure

Workloads pushed capacity-oriented infrastructure in two directions (as highlighted by the diagram above). Some applications began storing massive amounts of metadata and content with high service levels (e.g. fast and available X-ray retrieval during a hospital procedure), while other applications had less rigid availability and/or performance requirements (think YouTube videos).

In either case, application workloads desired to do more and better things with their metadata. This phenomena gave rise to a new class of storage system: object-based storage.

Steve

http://stevetodd.typepad.com

Twitter: @SteveTodd

EMC Fellow

 

Update your feed preferences

                submit to reddit    

Viewing all articles
Browse latest Browse all 17822

Trending Articles