Unstructured Data Management: Finding Meaning in Unearthed Dark Data

CP.pngArchaeologists dig up the earth looking for items from yesteryear; they find raw, untouched data — ecofacts, artifacts, architecture, tombs — and analyze it, hoping to uncover a snippet of past cultures or some buried treasure.

This is not unlike how organizations approach unstructured data, or dark data. Dark data represents a pooled set of untapped facts, documents, and media that are stored and sit undisturbed until we dig at it, hoping to find those valuable gems in all the clutter that can give us opportunities for prediction and help us better understand the culture, strategy, or bottom line of our enterprise.

Dark Data Is Appearing – and Disappearing – at an Alarming Rate

With all the digitization of data we’ve seen since late in the 20th century, we’ve got a data flood on our hands: In 2012 alone, we created 2.5 quintillion bytes of data per day. That number has continued to grow at unprecedented rates since then. It’s estimated that in the next decade, a whopping 90 percent of all the data created will be unstructured, which is defined by Gartner as “the information assets organizations collect, process and store during regular business activities, but generally fail to use for other purposes.”

A subset of this unstructured data will soon come from the growing popularity of the Internet of Things (IoT). According to Randy Bean at MIT Sloan Management review, data generated by “things” is projected to grow from 2 percent in 2013 to 10 percent in 2020. Real-time data generated from the IoT will add a unique spin on the management of unstructured data, as the enterprise will need to start to find ways to use, process, and analyze this device-generated data as it occurs.

CPP.png

Finding and Using Dark Data Before It’s Archived

Dark data is not organized in a predefined, relational model database, like its structured counterpart. It’s variable and rich, and contains word processing documents, social media posts, images, presentations, emails. The majority might be digital noise, but by linking unstructured and structured data, there is real opportunity to make sense of this vast amount of information and unearth new intelligence.

Before we go on a digging expedition, however, it’s important to establish a system that can help your business analyze and create context around your dark data. An archeologist only begins excavation once he or she formalizes an objective and surveys the land. Find out how much data you have and where it is. Find out what types of data you have. Find out what types of data should be destroyed, kept for further analysis, or migrated to a less expensive facility.

At Intel, we’ve employed a multiple platform strategy for analyzing different data types including an EDW platform, Apache Hadoop platform, and low-cost Massively Parallel Processing platform. The Apache Hadoop platform is designed to process big batches of unstructured data. The Hadoop clusters work well with unstructured data since it acts as a cheap storage repository where potentially valuable data can be stored until a strategy can be implemented for its use.

While mining unstructured data can be a costly venture, it can deliver incredible value by pointing to trends that can cut cost, boost productivity, improve your ROI, and ultimately give you deeper insight into your organization.

For more resources on big data and predictive analytics, click here.

To continue this conversation, or to react to the topic, connect with me at @chris_p_intel or use #ITCenter.