Straightening the Road to HPC and Deep Learning

By Mike Yang, President of Quanta Cloud Technology


The business cases for HPC and deep learning are compelling. Designing new products faster and modeling consumer trends are just two of the many popular—and compute-intensive—applications of HPC. Deep learning is similarly intensive, focusing on performing tasks that only a few years ago sounded like magic. Natural-language translations, object recognition in images and video, and context-aware language processing are three excellent examples.

These incredible advancements place massive demands on data center infrastructure. Obvious issues like storage density and high availability are compounded by thorny challenges of data transfer rates, processor throughput and a bold new horizon of software/hardware integration for which best practices are still evolving.

All this presents real challenges for companies that want to put HPC and deep learning into production sooner rather than later. The cloud computing revolution has taught enterprises around the world that their best path to adding value for their customers is in focusing their attention on rapid iteration of software that meets quickly changing customer needs, and the cloud offers an ideal path for rapid software iteration.

That’s where QCT comes in. Working closely with Intel®, QCT has taken these challenges head on in developing two approaches to HPC and deep learning.

First, let’s look at HPC. Using the engineering resources at our headquarters and in our Cloud Solution Center in San Jose, we’ve developed reference architecture for an HPC storage solution using Intel® Enterprise Edition for Lustre* software (Intel® EE for Lustre* software) and Intel® Omni-Path Architecture (Intel® OPA). This scale-out and cost-effective parallel file system storage solution delivers high sustained throughput, which is vital for powering HPC workloads. The solution uses QCT QuantaGrid servers, designed for HPC applications, alongside the high-density QuantaVault storage products. The reference architecture can reach to a peak of 10GB/s across nearly 1,000 threads with 960TB of raw storage capacity, delivering outstanding scalability and breakthrough performance. It is ideal for HPC applications.

For deep learning, we’ve developed a rack-level deep learning testbed. This reference architecture integrates 22TB full SSD Lustre storage and 100 GB Intel® Omni-Path Architecture (Intel® OPA) , running atop the next-generation QuantaPlex S41T-2U multi-node server powered by the latest Intel® Xeon Phi™ X200 processor designed for highly parallel workloads, offering up to 72 power-efficient cores, integrated memory for high memory bandwidth, and integrated fabric technology. The testbed can deliver an impressive throughput of more than 100 TFLOPS. The real value to customers in the testbed is that it is pre-configured, fine-tuned and fully integrated, compressing the time it takes to go from concept to production. QCT also works with Intel and the open source community to build flexible and sufficient software stacks, leveraging the power of the Intel-optimized Caffe deep-learning framework.

Much has been written about Intel® Enterprise Edition for Lustre* software (Intel® EE for Lustre* software), so I’d like to focus a bit on the power of Intel® Omni-Path Architecture (Intel® OPA) as a key ingredient in building successful infrastructure for HPC and deep learning. Intel® Omni-Path Architecture (Intel® OPA) is a cost-efficient, scalable, high-performance fabric technology offering 100 GB/s bandwidth and low-latency transfers at large-scale deployments. Intel® Omni-Path Architecture (Intel® OPA) is based on a connectionless design for data networking. That’s a big deal, because it does not establish connection address information between nodes, cores, or processes, while a traditional implementation maintains this information in the cache of the adapter. As a result of this innovation, Intel® Omni-Path Architecture (Intel® OPA) delivers consistent latency independent of the scale or messaging partners. Why is that a big deal for HPC? Because it means that data networking is much more scalable across a large node or core count cluster while maintaining low end-to-end latency across the cluster. As we all know, low latency means better performance.

The value in these technologies for Intel® and QCT customers are threefold. First, by solving the integration issues, it is now possible to move from concept into deployment much more quickly. Second, because the work was done using open source technologies supported by HPC and deep learning experts at our respective companies, risk is reduced to manageable levels. Finally, provided with these roadmaps to deployment, companies have a proven engineering starting point on which to base performance tuning and other changes to suit individual HPC and deep learning workloads that they’ll be running.

You can see a live demo of the QCT Lustre reference architecture at booth 3672 at SC’16 in Salt Lake City. Come by our booth and find out how you can use the reference architecture to achieve breakthrough and sustained performance, based on the IOzone benchmark. And, of course, after SC’16, you can access the reference architecture for your own testing at our Cloud Solution Center in San Jose.

QCT and Intel are working together to solve the problems companies face as they apply HPC and deep learning to their businesses. Join us at SC’16 or visit us in San Jose Cloud Solution Center to find out more about how we’ve paved the road to implementation for you, while reducing risk for your business.