TACC Stampedes Into Deep Learning

The December 2017 cover image of Wind Energy, by University of Texas at Dallas researchers Christian Santoni Kenneth Carrasquillo Isnardo Arenas‐Navarro and Stefano Leonardi, was produced using Stampede2 at the Texas Advanced Computing Center.

Founded in 2001, the Texas Advanced Computing Center (TACC) at the University of Texas at Austin has a passion for serving its research community with the latest and greatest in high performance computing (HPC) technologies, most notably, with Stampede2. Touting cutting edge Intel® technologies, like Intel® Xeon® Scalable and Intel® Xeon Phi™ processors, Intel® SSDs, and Intel® Omni-Path Architecture (Intel® OPA) fabric, Stampede2 is the most powerful system at any U.S. university, and the 12th most powerful in the world1. Together, these Intel® HPC technologies are the foundation for the Stampede2 system, which currently supports almost 1,300 research projects for thousands of researchers.

The institutions supported by TACC’s underlying Intel® HPC technologies aren’t solely academic. Through TACC’s STAR program, industrial partners have the opportunity to experiment with various high-performance computing and deep learning techniques on Stampede2 to either speed up their corporate research objectives or increase business competitiveness. Some of these services include access to HPC and deep learning systems, software, expertise, and visualization services.

With end-users in mind, Intel® architecture is dynamic enough to support TACC’s various workloads with the high performance required by state-of-the-art research. In some cases, it can be the difference between a challenge that takes days or hours to solve, and one that takes minutes.

Breakthroughs in Deep Learning on Intel® Omni-Path Architecture

Deep learning, the “training” of a neural network, is what gives AI the background information and context it needs to actually make intelligent decisions. This is done by passing batches of “training data”—like the ImageNet image database—through a predictive model—like the AlexNet convolutional neural network—and updating the model with error correction, which is based on differences between the models predicted values and the training data’s actual values.

This process of training a neural network for an artificially intelligent application can take hours, days, or even weeks. Accordingly, instances in which neural networks are trained in a few minutes are of serious interest to the computational science community.

One such case, championed by researchers at TACC, was the record 11-minute training time of AlexNet2 with the ImageNet database, which contains over 14 million meticulously labeled images. The 11-minute training was completed in 100 system observations of the entire training data set, with 1,024 Intel® Xeon® Scalable processors3. Further, these researchers were able to achieve state-of-the-art, top-1 accuracy in just 20 minutes, when scaling ResNet-50 neural network training to 2,048 Intel® Xeon Phi™ processor-based compute nodes4.

This speedy training time is significant because researchers involved in the study used larger batch sizes. With algorithmic innovations, TACC was able to process larger batch sizes without sacrificing time or accuracy. In fact, for batch sizes above 16,000, TACC achieved inference accuracy higher than Facebook5. Further, when TACC increased its batch size to 32,000, TACC’s top-1 accuracy was nearly fully maintained, falling by only 0.4%6. TACC’s ability to use larger batch sizes without sacrificing accuracy implies the feasibility of feeding neural networks more information to learn from at once, further streamlining the deep learning process.

Fabrics Make Neural Networks Smarter, Faster

Deep learning requires huge amounts of algebraic math to be performed. Parallel deep learning takes this a step further, requiring the same amount of calculations, while introducing the requirement to communicate efficiently among compute nodes working on the same problem. This includes point-to-point communication and a fair amount of collective operations. By making use of Intel® Machine Learning Scaling Library (Intel® MLSL) on massively parallel computation resources interconnected by Intel® OPA, users have access to an optimized and efficient system capable of accelerated deep learning.

The value of Intel® OPA’s deep learning acceleration properties also extend further into data analysis. For example, consider the following ongoing TACC research project7 on Stampede1—Stampede2’s Intel® OPA connected predecessor. A team of National Renewable Energy Laboratory (NREL) scientists are trying to reduce fuel usage by making vehicles lighter, with cheaper carbon fiber. Their position is that, with bio-based carbon fiber, the material will be cheap enough to use in vehicles, consequently making vehicles lighter, thus reducing fuel usage.

A key component of this project’s success is identifying alternative carbon fiber precursor chemicals, based on plant waste materials. What if the team could skip the trial-and-error of locating eligible plant waste materials by letting a trained neural network go straight to the right one? With deep learning computational models running on the Intel® OPA fabric, they can. After training a neural network to recognize quantified bio-markers that indicate whether plant wastes are eligible as carbon fiber precursor ingredients, researchers can apply the deep learning framework to all their plant waste data simultaneously, ultimately yielding the right plant waste materials to be used in carbon fiber production.

Understanding Computational & Research Community Needs is Key to Providing Support

While TACC successfully runs both AI and HPC workloads over its Intel® OPA fabric, it isn’t an outlier, and is actually one of many notable institutions to do so. Others include the Tokyo Institute of TechnologyPittsburgh Supercomputing Center, and Barcelona Supercomputing Center. The common denominator amongst these institutions is they closely collaborate with research communities to move science forward. Collaborating in this way requires a very personal understanding of researchers’ problems, and the complex resources necessary to solve those problems.

Intel® HPC technologies, including Intel® OPA, are available to any and all organizations attempting to take collaborative research to new heights with HPC and AI. Check out the TACC case study here for a closer look at Stampede2 and the cutting edge Intel® high performance computing technologies that support TACCs 196 research projects. To learn more about Intel® Omni-Path Architecture, please visit www.intel.com/omnipath.

 


1 November 2017 Top500 List: https://www.top500.org/list/2017/11/

2 AlexNet is a convolutional neural network, originally written in CUDA

3 The authors acknowledge the Texas Advanced Computing Center (TACC) at The University of Texas at Austin for providing HPC resources that have contributed to the research results reported within this paper. URL: https://arxiv.org/pdf/1709.05011.pdf

4 The authors acknowledge the Texas Advanced Computing Center (TACC) at The University of Texas at Austin for providing HPC resources that have contributed to the research results reported within this paper. URL: https://arxiv.org/pdf/1709.05011.pdf

5 The authors acknowledge the Texas Advanced Computing Center (TACC) at The University of Texas at Austin for providing HPC resources that have contributed to the research results reported within this paper. URL: https://arxiv.org/pdf/1709.05011.pdf

6 The authors acknowledge the Texas Advanced Computing Center (TACC) at The University of Texas at Austin for providing HPC resources that have contributed to the research results reported within this paper. URL: https://arxiv.org/pdf/1709.05011.pdf

7 The authors acknowledge the Texas Advanced Computing Center (TACC) at The University of Texas at Austin for providing HPC resources that have contributed to the research results reported within this paper. URL: https://arxiv.org/pdf/1709.05011.pdf