![]() |
This series of blog posts has focused on the evolution of high-tech infrastructure in response to constantly-evolving application workloads. In my last post I described how unstructured and metadata-rich application workloads drove the rise of Network-Attached Storage (NAS). The diagram below allowed me to highlight differences between block and file system architecture. Unstructured content benefits from metadata association. NAS systems provided the binding between the two. The approach used by many vendors involved the interspersal of content and metadata within a disk array infrastructure. Block-based systems of that era, on the other hand, viewed all blocks as "content", and had no fundamental awareness of application metadata. The overlay below highlights this difference.
For example, applications wanted to:
The increased importance that these new workloads placed on metadata drove the industry to treat metadata as a first-class citizen. The "interspersal" technique used by most NAS devices did not lend itself to the new workloads. As a result, the industry evolved (yet again) in response to these new applications and facilitated the rise of object-based storage systems. Object-based systems allow applications to "attach" rich metadata to content and bind them together via an object-identifier. Under the covers, object-based storage systems were not constrained to intersperse the metadata and the content. They could be stored as separate entities, which "freed" the metadata to be used in more diverse and beneficial ways. In fact, the content itself was "freed" from the linkage to a specific directory, which facilitated new levels of sharing and collaboration for content. The implementation of object-based storage systems also gave vendors the opportunity to address additional shortcomings that NAS-based systems were experiencing at the time, including file size maximums and file count limits. The first object-based implementation was termed content-addressable storage, or CAS. Wikipedia provides the definition of CAS below: a mechanism for storing information that can be retrieved based on its content, not its storage location. The diagram below highlights CAS function and operation in the context of one of the first CAS implementations (known as Centera):
This approach caused a fundamental shift in application architectures, which enabled:
A third access pillar was added to the data center as a result of new application workloads. Many customers deployed all three access methods: block, file, and object. Capacity-based, object workloads are graphically depicted in the lower-half of our workload framework. Some object-based workloads required high service levels (e.g. hospital applications) while some did not (e.g. YouTube). As a result of all three types of application access methods (block, file, and object), data and meta-data continued to grow unabated within customer data centers. This gave rise to a new problem: the growth of new forms of metadata related to the data center operation itself. I'll cover "The Rise of Metadata Part 2" in my next post. Steve EMC Fellow
|
Update your feed preferences | |
![]() ![]() ![]() ![]() ![]() ![]() |
