Big Data Doesn’t Have to Be Big, If Done Right

By Brian Womack, Director, IPAG Data Analytics & Machine Learning at Intel

To meet demanding service-level agreements, data center operators need to aggregate and analyze telemetry data from many heterogeneous sources across the data center. These are essential capabilities in the effort to efficiently and adaptively manage SLA-related resources in a software-defined infrastructure environment.

data swirl.png

This analytics work, of course, can’t be done with manual processes in a data center that has thousands or tens of thousands, of servers with potentially millions of components that generate telemetry data. It requires the use of automated tools that capture and leverage telemetry data from processors, memory subsystems, storage, and fabric resources. Telemetry data enables the analytics, visualization, classification, and prediction capabilities that, in turn, drive efficient and adaptive SLA management.

That’s the way things should be, anyway. The reality in today’s data centers is something else. The use of telemetry data for SLA management is hindered by tools that are oriented toward the metrics of the past. For example, many telemetry APIs provide only a partial view of cluster resources without analytics for classification or prediction in mind. And today’s tools often provide insufficient metrics for SLA performance tracking and prediction.

In a related challenge, there are no set standards for the way different platforms present telemetry data. Telemetry data from different sources—such as the Microsoft Hyper-V or VMware vSphere ESXi hypervisors—is expressed in different units of measure, resolution, and naming. The result is the telemetry equivalent of apples-and-oranges comparisons—and a time consuming challenge for analytics developers in a heterogeneous data center.

These challenges create the need for a new approach that enables more efficient and adaptive service level agreement management. That’s the goal of Data Center Telemetry Analytics (DCTA).

DCTA leverages sophisticated capabilities like hierarchical telemetry analytics and distributed compression to take SLA management to a new level. In simple words, we’re talking about moving primary analytics operations close to the source of the raw telemetry data, doing the initial analysis there, and then sending the results — in a greatly condensed summary of the raw data that preserves its key attributes — to a central DCTA system for analysis in the context of telemetry data gathered from across the data center.

The ability to compact large amounts of raw data into a summary form is a hugely important capability. It’s the equivalent of condensing a room full of vapor into a coffee cup of liquid. This data condensation greatly reduces the overhead of processing and transmitting enormous volumes of telemetry data across a data center fabric that should be used for paying application customers. The ability to tame telemetry data at its source in a mission-driven manner proves the proposition that, when done right, big data doesn’t have to be all that big.

The key to DCTA lies in both a normalized and hierarchical ontology for telemetry. Here we’re talking about establishing one data schema and one interface for analytics developers that share the same units of measure for time, space, and domain. This unified interface that provides significantly more context over time simplifies and accelerates the work of analytics developers while enabling apples-to-apples comparisons of activities across a data center, a large enterprise, or an entire nation.

Collectively, these DCTA capabilities enable adaptive SLA monitoring and management. They allow data center operators to engage in accurate predictive capacity planning; automate root-cause determination and resolution of IT issues; monitor customer jobs over time to assess performance trends; proactively recommend processor, memory, and fabric upgrades or downgrades; and predict system or component failures.

Capabilities like these are keys to meeting ever-more-aggressive SLAs in a manner that makes optimal use of IT assets and drives economies across the data center.

This isn’t a vision for a distant future. This is a path that Intel is on today in a research-driven initiative to bring DCTA to life. For more information on this initiative, please click here for the presentation on Data Center Analytics for Efficient and Adaptive SLA Management, that Brian Womack gave at IDF on Aug 19.  In addition, we encourage you to share your thoughts on DCTA on Twitter @IntelITCenter.

Intel, the Intel logo Xeon, Intel Atom, and Intel Core are trademarks of Intel Corporation in the United States and other countries.

* Other names and brands may be claimed as the property of others.