![]() |
Last week, EMC announced the 2014 EMC Digital Universe, a study with IDC that details what the Digital Universe will look like in 2020 and the opportunities it presents. One of the interesting challenges coming out of the study was the need to “tag” data in order to understand it in the context in which it was captured or gathered. The study predicts that by 2020, a growing majority of new data being generated will be unstructured. That means that, more often than not, we will know little about the data, unless it is somehow characterized or tagged—a practice that results in metadata. Tagging Lessons Learned From Web AnalyticsTagging is a concept with which most web analytics users are familiar. Tagging is a method of tracking visitor activity on each page of the website (see Figure 1). ![]() Figure 1: Web analytics tagging process As each web page is requested, the web server returns the HTML page with the embedded JavaScript page code. The JavaScript page code sets the values for analytic data that you are collecting and calls functions and global variables in the JavaScript library file. The JavaScript code builds an image request for a 1×1 pixel image, also called a web beacon that concatenates a query string of name/value pairs of analytics data that is sent to a data center for reporting and analysis. The advantages of tagging include:
There’s Gold In Them Thar Hills of Metadata!!Sometimes it’s hard to imagine what metadata is and why it’s important. Let’s look at an example of the metadata associated with a 140-character tweet. 140 characters wouldn’t seem to be much data, even with a voluminous number of tweets. However, data volumes explode when you start coupling the tweet with all the metadata necessary to understand the 140-characters in context of the conversation (see Figure 2). ![]() Figure 2: Metadata associated with a tweet Here is some of the metadata associated with a 140-character tweet[1]:
It’s quick to see how the volume of metadata quickly dwarfs the amount of raw data, and this is what happens when organizations start tagging more of their transactions and interactions in order to gain additional insight into the nature and context of the dialogue and interaction. Untapped Data ExamplesNot all data is necessarily useful for Big Data analytics. However, some data types are particularly ripe for analysis, such as:
These are in addition to the normal transactional data running through the enterprise systems in the course of normal data processing today. SummaryThe IDC study states that from 2013 to 2020, the digital universe will grow by a factor of 10x—from 4.4 trillion gigabytes to 44 trillion. However, the IDC study estimates that only 3% of the potentially useful data will be tagged. Call this the Big Data gap: information that is untapped and waiting for enterprising digital explorers to extract the value hidden within it. The bad news is that this will take extra work and investment to tag all of these new data sources. The good news is that, as the digital universe expands, so does the amount of useful data it contains, and the invaluable insights about your customers, products, markets, and operations that can be used to optimize key business processes and uncover new monetization opportunities. |
