SC14: When to Launch an Internal HPC Cluster

As SC14 approaches, we have invited industry experts to share their views on high performance computing and life sciences. Below is a guest post from Eldon M. Walker, Ph.D., Director, Research Computing at Cleveland Clinic's Lerner Research Institute. During SC14, Eldon will be sharing his thoughts on implementing a high performance computing cluster at the Intel booth (#1315) on Tuesday, Nov. 18, at 10:15 a.m. in the Intel Theater.

When data analyses grind to a halt due to insufficient processing capacity, scientists cannot be competitive. When we hit that wall at the Cleveland Clinic Lerner Research Institute, my team began consideration of the components of a solution, the cornerstone of which was a high performance computing (HPC) deployment.

In the past 20 years, the Cleveland Clinic Lerner Research Institute has progressed from a model of wet lab biomedical research that produced modest amounts of data to a scientific data acquisition and analysis environment that puts profound demands on information technology resources. This manifests as the need for the availability of two infrastructure components designed specifically to serve biomedical researchers operating on large amounts of unstructured data:

  1. A storage architecture capable of holding the data in a robust way
  2. Sufficient processing horsepower to enable the data analyses required by investigators

Deployment of these resources assumes the availability of:

  1. A data center capable of housing power and cooling hungry hardware
  2. Network resources capable of moving large amounts of data quickly

These components were available at the Cleveland Clinic in the form of a modern, tier 3 data center and ubiquitous 10 Gb / sec and 1 Gb / sec network service.

The storage problem was brought under control by way of 1.2 petabyte grid storage system in the data center that replicated to a second 1.2 petabyte system in the Lerner Research Institute server room facility. The ability to store and protect the data was the required first step in maintaining the fundamental capital (data) of our research enterprise.

It was equally clear to us that the type of analyses required to turn the data into scientific results had overrun the capacity of even high end desktop workstations and single unit servers of up to four processors. Analyses simply could not be run or would run too slowly to be practical. We had an immediate unmet need in several data processing scenarios:

  1. DNA Sequence analysis
    1. Whole genome sequence
      1. DNA methylation
    2. ChIP-seq data
      1. Protein – DNA interactions
    3. RNA-seq data
      1. Alternative RNA processing studies
  2. Finite Element Analysis
    1. Biomedical engineering modeling of the knee, ankle and shoulder
  3. Natural Language Processing
    1. Analysis of free text electronic health record notes

There was absolutely no question that an HPC cluster was the proper way to provide the necessary horsepower that would allow our investigators to be competitive in producing publishable, actionable scientific results. While a few processing needs could be met using offsite systems where we had collaborative arrangements, an internal resource was appropriate for several reasons:

  1. Some data analyses operated on huge datasets that were impractical to transport between locations.
  2. Some data must stay inside the security perimeter.
  3. Development of techniques and pipelines would depend on the help of outside systems administrators and change control processes that we found cumbersome; the sheer flexibility of an internal resource built with responsive industry partners was very compelling based on considerable experience attempting to leverage outside resources.
  4. Given that we had the data center, network and system administration resources, and given the modest price-point, commodity nature of much of the HPC hardware (as revealed by our due diligence process), the economics of obtaining an HPC cluster were practical.

Given the realities we faced and after a period of consultation with vendors, we embarked on a system design in collaboration with Dell and Intel. The definitive proof of concept derived from the initial roll out of our HPC solution is that we can run analyses that were impractical or impossible previously.

What questions do you have? Are you at the point of considering an internal HPC cluster?