Manage & Monetize Exponential Data Growth with Intel’s Data Management Platform

Abstract

The exponential growth of data is creating scaling and cost challenges. Not being able to scale storage and compute resources independently results in suboptimal resource utilization of data center infrastructure investments. Customers need to solve for this while controlling their licensing costs. They also need to perform real-time analytics on their ever-growing data sets. With Intel’s Data Management Platform (DMP), you can build an infrastructure that allows you to operate on petabyte-scale data and harness the power of that data. Solving the big data performance-at-scale problem makes it more practical to run data intensive applications and use-cases such as AI-based services. DMP allows enterprise and cloud developers to have a solid turnkey database that enables the creation of revenue-generating solutions that put their organizations and customers ahead of competition.

The Problem and Motivation

As Intel’s cloud and enterprise data center customers increase their adoption of  several types of data-intensive applications and tools such as AI inferencing and analytics, they generate and consume an exploding amount of data and telemetry that needs to be moved, stored and processed in a more secure, faster, and scalable way. In a hyper-scaled datacenter, this is typically done through the purchase of additional servers. Unfortunately, depending on the workloads being run on these systems, one type of component in these servers may be over-subscribed, while another maybe underutilized, which means customers and service providers are not optimizing the use of their investment. The majority of enterprise data centers today do not have the capacity to effectively manage and handle petabytes of data at scale and at performance.

During Intel's Data-Centric Innovation day on April 2, 2019, Navin Shenoy, Executive Vice President and General Manager of the Data Center Group at Intel Corporation, shared that over half of the world’s data was created in the last two years, and less than 2% of that data has been analyzed [1]. In addition, analysts forecast that by 2025, data will grow exponentially by 10x and reach 163 ZB [2]. Your solutions generate data, and this data needs to be acquired, aggregated, analyzed and acted upon in a smart and efficient way. AI helps make sense of all this data. In addition, neural nets and machine learning are dependent on storage performance at scale. AI is one of the fastest growing data center workloads. According to IDC, worldwide spending on AI systems is forecast to reach $35.8 billion in 2019, which is 44% higher than what was spent in 2018 [3]. IDC also expects that spending on AI systems will more than double to $79.2 billion in 2022 with a compound annual growth rate (CAGR) of 38% over the 2018-2022 forecast period [3].

Many data scientists, solution developers, IT professionals and business decision makers are actively looking for a turnkey solution to help them solve real-world problems that require the processing of a massive amount of data in real-time, and that enable them to harness the power of all this data. If you are like them, then imagine how much more value you can bring to your customers, and how competitive you would be if you were able to build a database infrastructure that effectively leverages this data at scale. You can now do that with Intel DMP.

The Solution

Your customers are smart. They require application-driven intelligent infrastructure. They also want to scale their operations in the most cost-effective way while maximizing utilization of their resources. Many enterprises have tried various software and services solutions (such as Hadoop*, AWS*, and Oracle Exadata*) but a common challenge they faced has been the lack of a turn-key solution that is intuitive, and provides NVMeoF functionality with extremely high data throughput and resiliency.

This is where DMP comes in. DMP provides an on-premise solution for enterprises as an alternative to public cloud providers. An on-premise or hybrid deployment may be desired by some enterprises for reasons such as information security, regulatory, or performance. With DMP, you get a cost-effective and flexible infrastructure that is built on disaggregated storage. By separating compute and storage into two distinct tiers, you can scale them independently. Storage is optimized for high-throughput sequential read and write access, while compute is optimized for memory locality and random accesses.

DMP is a multi-rack appliance that comes in two flavors, both based on Intel® Xeon® Scalable Processors:

Compute nodes use two Intel® Xeon® Gold 6254 CPUs, while Storage nodes use two Intel® Xeon® Gold 6240 CPUs. With DMP, it becomes easy to build storage-scalable solutions where data is shared between databases through advanced management functionality without significantly impacting performance. Customers can perform real-time analytics on petabytes of data without the high licensing costs or performance implications of other solutions, all the while using Intel® silicon and related hardware and software. A proof of concept configuration with six compute nodes, three storage nodes, and a 100GbE switch is available for initial evaluation.

Which workloads can be targeted?

The DMP Appliance targeted workloads include:

  1. Database as a Service: Initially MySQL* distributions that support the MyRocks* storage engine.
  2. Bucket as a Service: Provides an S3-compliant API for object storage.
  3. Analytics as a Service: Targets Apache Spark*.
  4. Partition as a Service: Targets Apache Kafka* for this service.

What are the benefits of DMP?

With Intel® DMP, you can improve resource management through rack-centric scalability.  Intel® DMP is a fully integrated turnkey solution that is supported through a healthy partner ecosystem, and provides the following key benefits:

  1.  Delivers performance at scale by utilizing Intel® Optane™ DC persistent memory [5] at the compute.
  2. Optimizes resource usage and reduces costs by enabling independent scaling of compute and storage resources, with pooling and disaggregation of NVMe-oF storage.
  3. Limits high licensing costs through the use of fully vetted open-source software.

Conclusion

Our vision is a Data Center that solves the big data management problem through disaggregated storage.  DMP does just that. It allows application developers to bring that vision to life for Cloud Service Providers and Enterprise customers and their developers. Through the disaggregation of storage from compute, each can scale separately and independently, without negatively impacting performance.  Customers are now better prepared to manage their ever-growing appetite for data. One of the significant contributors to the recent rise in data consumption is the growing use of AI-based workloads which rely on a significant amount of telemetry data that must be transferred, stored and processed properly as the data scales. Our goal is to help you deliver the best customer experience through fully software-defined application-driven close-looped autonomic service delivery. Intel is committed to making it easy to run AI and other data-intensive workloads, and harness the power of all that data. Stay tuned for more from Intel. The possibilities are endless!

Call to action

If the problems listed in this blog resonate with you, then we recommend that you reach Intel’s sales representative for a technical deep dive of DMP. We also encourage you to review the DMP Technical details [4], which includes information about what’s included in DMP and how DMP manages resource disaggregation. For more information or to schedule a demonstration, contact your Intel representative, or the program leads:  Prasad Alluri (prasad.alluri@intel.com) and David E. Cohen (David.e.cohen@intel.com).

____________

Acknowledgements

Many thanks to Peggy Irelan, Joe Carvalho, Prasad Alluri, David E Cohen, for reviewing and co-authoring this blog with me.

References
1. Data Centric Innovation day
2. The value of data: forecast to grow 10-fold by 2025
3. https://www.idc.com/getdoc.jsp?containerId=prUS44911419
4. DMP technical details
5. Intel® Optane DC Persistent memory

*Other names and brands may be claimed as the property of others.