Giving New Life to Old Data with High Performance Computing

Once a picturesque summer retreat for the prominent Girona-Agrafel family of Spain in the 19th Century, the Torre Girona stands as a center for advanced knowledge today and, most notably, the Barcelona Supercomputing Center’s (BSC) MareNostrum 4. The 11.1 petaFLOP cluster ranks 16th on the November Top500 list1, and is powered by over 165,000 Intel® Xeon® Platinum processor cores, all seamlessly connected by Intel® Omni-Path Architecture (Intel® OPA) high performance fabric. MareNostrum 4 gives the Torre Girona Chapel new life, supporting over 150 projects with research topics spanning from neuroscience to crop yield models. Upon diving deeper into BSC’s research findings though, it appears BSC is reinvigorating more than just the Torre Girona by uncovering new patterns in old data.

High Performance Computing Digs Deeper Than Traditional Analysis

High performance computing (HPC) clusters like MareNostrum 4 employ computational methods in which compute nodes rapidly, and in-parallel, slice through data to uncover insights that often go unnoticed by classical, non-parallelized analysis techniques. Recently, just by reanalyzing publicly available, genome-wide association studies (GWAS) data on over 70,000 individuals, MareNostrum 4 illuminated novel genetic variants that increase a person’s risk for Type-2 Diabetes – in one case by 200%.2

Successfully detecting new disease risk variants with this type of publicly available, previously analyzed data is an evolutionary step for medical HPC – towards treating not only Type-2 Diabetes, but all genetically complex diseases. Mapping the underlying genetic bases for a disease paints a clearer picture of new biological targets for therapies, and how to design more active, preventative approaches to the disease.

Fabrics Transform Repetitive Analytical Processes into Instant Results

Intel Omni-Path Architecture can drastically reduce a user’s time to solution when the problem comes with large data that requires high-precision, high-performance analyses like GWAS. Intel® OPA interconnects each of the 3,456 Lenovo ThinkSystem* SD530 compute servers in MareNostrum 4 and allows them to execute all 70,000 GWAS datasets in tight, parallel fashion. Intel® OPA’s extremely low latency, high 100G/s throughput, and intuitive management software allow users to drive the entire MareNostrum 4 cluster in parallel, or distribute particular compute tasks across the cluster. For example, one could use 90% of the compute nodes to run analyses on all 70,000 GWAS at once, while the remaining 10% summarize output data for reporting or additional study.

Interdisciplinary Collaboration Opens Doors

High performance computing isn’t new, so why hasn’t this been done for Type-2 Diabetes before? The BSC-assisted study3 answers this question by highlighting the following challenges.

Without the extreme parallelization offered by a fabric-connected cluster like MareNostrum 4, processing large datasets, like GWAS on 70,000 individuals, may require too much time. Concurrently, many medical scientists don’t even have access to systems like MareNostrum 4 and must rely on PC or workstation powered analyses. Although, even if most medical scientists did have access to many-node clusters like MareNostrum 4, the challenge of needing a parallel programming counterpart to frame scientific problems into computable solutions arises. While the pairing of medical scientists with parallel computing experts is an ongoing challenge for the academic research community, the solution could have groundbreaking benefits, especially since the amount of medical data that has yet to be touched by HPC is very large.

Looking into the Future

Thankfully, for medical research professionals, there are programs that proactively harness HPC systems, such as the Partnership for Advanced Computing in Europe (PRACE), and their small- and medium-enterprise program SHAPE. PRACE and SHAPE broaden access to HPC resources in medical and other fields to organizations of all sizes, allowing more minds to explore questions that only governments or research universities were equipped to tackle. Participating organizations are assisted by HPC specialists to take full advantage of highly-parallel, Intel® OPA-equipped HPC clusters. You can learn more about SHAPE, PRACE, and how to apply for access to high end systems like BSC’s MareNostrum 4. A closer look at MareNostrum 4, here, also offers insights for organizations considering their own HPC cluster design.

 

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. § For more information go to www.intel.com/benchmarks.

1 November 2017 Top500 List: https://www.top500.org/list/2017/11/

2 Re-Analysis of Public Genetic Data Reveals a Rare X-Chromosomal Variant Associated with Type-2 Diabetes. (Nature Communications 9, Article 321, 2018).  https://www.nature.com/articles/s41467-017-02380-9.

3 Re-Analysis of Public Genetic Data Reveals a Rare X-Chromosomal Variant Associated with Type-2 Diabetes, Nature Communications 9, Article 321 (2018).  https://www.nature.com/articles/s41467-017-02380-9.