Importance of Metadata in a Big Data World

Last week, EMC announced the 2014 EMC Digital Universe, a study with IDC that details what the Digital Universe will look like in 2020 and the opportunities it presents. One of the interesting challenges coming out of the study was the need to “tag” data in order to understand it in the context in which it was captured or gathered. The study predicts that by 2020, a growing majority of new data being generated will be unstructured. That means that, more often than not, we will know little about the data, unless it is somehow characterized or tagged—a practice that results in metadata.

Tagging Lessons Learned From Web Analytics

Tagging is a concept with which most web analytics users are familiar. Tagging is a method of tracking visitor activity on each page of the website (see Figure 1).

Figure 1: Web analytics tagging process

As each web page is requested, the web server returns the HTML page with the embedded JavaScript page code. The JavaScript page code sets the values for analytic data that you are collecting and calls functions and global variables in the JavaScript library file. The JavaScript code builds an image request for a 1×1 pixel image, also called a web beacon that concatenates a query string of name/value pairs of analytics data that is sent to a data center for reporting and analysis.

The advantages of tagging include:

Data is gathered via a component (“tag”) in the page, usually written in JavaScript, Java or Flash.
The script may have access to additional information on the web client or on the user, not sent in the query, such as visitors’ screen sizes and the price of the goods they purchased.
Tagging can report on events that do not involve a request to the web server, such as interactions within Flash movies, partial form completion, mouse events such as onClick, onMouseOver, onFocus, onBlur, etc.
The tagging service manages the process of assigning cookies to visitors.

There’s Gold In Them Thar Hills of Metadata!!

Sometimes it’s hard to imagine what metadata is and why it’s important. Let’s look at an example of the metadata associated with a 140-character tweet. 140 characters wouldn’t seem to be much data, even with a voluminous number of tweets. However, data volumes explode when you start coupling the tweet with all the metadata necessary to understand the 140-characters in context of the conversation (see Figure 2).

Figure 2: Metadata associated with a tweet

Here is some of the metadata associated with a 140-character tweet[1]:

The screen name and user ID of the “replied to tweet” author
Tweet’s creation date and time
The author’s screen name
The author’s user name
The author’s biography
The author’s URL
The author’s location
Rendering information for the author
Account creation date
Number of favorites this user has
Number of users this user is following
Time zone and offset for this user
User’s selected language
Where the user is protected or not
Number of followers for this user
Place ID
Printable name for this place
Type of place
The country for this place
The application that sent the tweet

It’s quick to see how the volume of metadata quickly dwarfs the amount of raw data, and this is what happens when organizations start tagging more of their transactions and interactions in order to gain additional insight into the nature and context of the dialogue and interaction.

Untapped Data Examples

Not all data is necessarily useful for Big Data analytics. However, some data types are particularly ripe for analysis, such as:

Surveillance footage. Typically, generic metadata (date, time, location, etc.) is automatically attached to a video file. However, as IP cameras continue to proliferate, there is greater opportunity to embed more intelligence into the camera (on the edge) so that footage can be captured, analyzed, and tagged in real time. This type of tagging can expedite criminal investigations, enhance retail Big Data analytics for consumer traffic patterns, and improve military intelligence as videos from drones across multiple geographies are compared for pattern correlations, crowd emergence and response, or measuring the effectiveness of counterinsurgency.
Embedded and medical devices. In the future, sensors of all types (including those that may be implanted into the body) will capture vital and non-vital biometrics, track medicine effectiveness; correlate bodily activity with health, monitor potential outbreaks of viruses, etc.—all in real time.
Entertainment and social media. Trends based on crowds or massive groups of individuals can be a great source of Big Data to help bring to market the “next big thing,” help pick winners and losers in the stock market, and yes, even predict the outcome of elections—all based on information users freely publish through social outlets.
Consumer images. We say a lot about ourselves when we post pictures of ourselves or our families/ friends. A picture used to be worth a thousand words, but the advent of Big Data has introduced a significant multiplier. The key will be the introduction of sophisticated tagging algorithms that can analyze images either in real time when pictures are taken or uploaded or en masse after they are aggregated from various websites.

These are in addition to the normal transactional data running through the enterprise systems in the course of normal data processing today.

Summary

The IDC study states that from 2013 to 2020, the digital universe will grow by a factor of 10x—from 4.4 trillion gigabytes to 44 trillion. However, the IDC study estimates that only 3% of the potentially useful data will be tagged.

Call this the Big Data gap: information that is untapped and waiting for enterprising digital explorers to extract the value hidden within it. The bad news is that this will take extra work and investment to tag all of these new data sources. The good news is that, as the digital universe expands, so does the amount of useful data it contains, and the invaluable insights about your customers, products, markets, and operations that can be used to optimize key business processes and uncover new monetization opportunities.

[1] http://readwrite.com/2010/04/19/this_is_what_a_tweet_looks_like#awesm=~ozsgVzDdual4FS

Importance of Metadata in a Big Data World

Tagging Lessons Learned From Web Analytics

There’s Gold In Them Thar Hills of Metadata!!

Untapped Data Examples

Summary

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112