The Internet of Things – Driving the Need for Data Reconciliation

Part 1: The challenge

In a recent engagement with a global company, I came face-to-face with a relatively new problem for Internet of Things analytics that organizations around the world are wrestling with. That problem is the reconciliation of diverse types of data that stream into edge devices and corporate data centers from the Internet of Things (IoT). Reconciliation enables diverse datasets to speak a common language that can be understood by algorithms. If you don’t do this, the data from the IoT can amount to a Tower of Babel, with algorithms unable to understand valuable diverse data streams.

Data reconciliation wasn’t much of an issue in years past, when the IoT was in its infancy because most data streams were only processed by a single, dedicated application. As we combine data streams to create new analytical value from IoT systems, it's now becoming an ever-growing beast of a problem. Gartner, Inc. forecasts that by the end of this year, 6.4 billion connected things will be in use worldwide, up 30 percent from 2015. By 2020, that number will reach 20.8 billion.[1] To generate value from all of those connected devices, we have to find ways to enable different datasets to speak a common language.

In the case of the global company I met with recently, dozens of business units are collecting massive amounts of data from connected devices, ranging from simple home products to sophisticated manufacturing process automation systems. We’re talking about perhaps 1,000 different device types. The data generated by those devices now sits in silos scattered around the world, and it is in all kinds of formats, from unstructured to semi-structured to neatly structured.

To gain the maximum value from all of that diverse data, the company wants to bring it all onto a single, unified, logical platform that can integrate and reconcile the different datasets. This logical consolidation of data opens the door to extracting value via powerful analytics applications.

Part 2: A look at the reconciliation challenge

When you want to reconcile data, the first thing you have to do is bring data together, either physically or logically. That is to say you can create a data lake that serves as a repository that consolidates different data onto a common physical platform or you can create a federated solution that allows the data to remain in different silos and come together on a single logical platform.

Once you’ve found a way to connect your data, you can push forward with reconciliation. Reconciliation of data is a process that makes associations between multiple pieces of data that mean the same thing, even though they may be expressed in different ways. Among data scientists, this is referred to as “semantic mapping.”

Let’s take a simple example-- two labs that do blood glucose tests might have different ways of coding and reporting the exact same measurements. The reconciliation process identifies places where data that is expressed differently means the same thing and then transforms the data so it can be used collectively in a single calculation.

Part 3: A look at the data exchange layer

In cases of federated solutions, in which data remains in silos and is aggregated on a logical platform, we need to add another dimension: the data exchange layer. Data exchange technology provides a standardized way for different systems and data sources to talk to each other. Furthermore, a data exchange makes it possible to perform specific analytics on the combined data without exposing the underlying data to anyone other than its originating owner. This allows organizations to draw insights from the complete picture created by aggregated data, regardless of its underlying proprietary architecture, while protecting the privacy and security of the data at each site.

That’s exactly what we are doing with the Collaborative Cancer Cloud (CCC), a precision medicine analytics platform that allows hospitals and research institutions to securely share patient genomic, imaging, and clinical data for potentially lifesaving discoveries in cancer care. Thanks to the capabilities of the data exchange layer, based on Intel technologies such as Intel® Software Guard Extensions, the CCC allows researchers to run analytics queries on confidential data in multiple federated genomic data sets in academic research centers without requiring any of these research centers to share the underlying data, or exposing any information about the patients whose data is being studied. The data exchange layer understands what data sources are in the CCC and who is allowed to run queries on them. In fact, the set of queries that are allowed to be run must be agreed upon by all parties and is strictly managed by the Intel secure computing technology.

The power of data exchange goes far beyond research on the genetic causes of cancer. In fact, it creates an environment that can significantly increase the accessibility of data to all practitioners of advanced analytics.  Data exchange creates a secure, efficient methodology to build analytical value around combined data sets. This puts advanced data-driven methods, including deep learning, within reach of many organizations that would otherwise not have the ability to generate significant value around their limited data sets.

Part 4: Integrating IoT Data

Let’s get back to the global company that I mentioned at the outset of this blog. The technologies are now available to enable the company to bring together datasets from silos scattered around the world and make them accessible via a combination of integrated cloud data center and logical federated platform. Moreover, the technologies now exist to enable the reconciliation of IoT data and to control access to that data via an integrated end-to-end data security model.

This is all good news for any organization that seeks to capitalize on massive amounts of data generated by connected devices on the Internet of Things. With the right systems and strategies in place, you can now put that data to work to create business value—and avoid turning your version of the IoT into a Tower of Babel.

If you’d like to learn more or have a conversation on data validation best practices, feel free to reach out on LinkedIn or follow me @ScientistBob



[1] Gartner press release. “Gartner Says 6.4 Billion Connected "Things" Will Be in Use in 2016, Up 30 Percent From 2015.” November 10, 2015.