The idea of precision medicine is simple: When it comes to medical treatment, one size does not necessarily fit all, so it's important to consider each individual's inherent variability when determining the most appropriate treatment. This approach makes sense, but until recently it has been very difficult to achieve in practice, primarily due to lack of data and insufficient technology. However, in a recent article in the New England Journal of Medicine, Dr. Francis Collins and Dr. Harold Varmus describe the President Obama’s new Precision Medicine Initiative, saying they believe the time is right for precision medicine. The way has been paved, the authors say, by several factors:
- The advent of important (and large) biological databases;
- The rise of powerful methods of generating high-resolution molecular and clinical data from each patient; and
- The availability of information technology adequate to the task of collecting and analyzing huge amounts of data to gain the insight necessary to formulate effective treatments for each individual's illness.
The near-term focus of the Precision Medicine Initiative is cancer, for a variety of good reasons. Cancer is a disease of the genome, and so genomics must play a large role in precision medicine. Cancer genomics will drive precision medicine by characterizing the genetic alterations present in patients' tumor DNA, and researchers have already seen significant success with associating these genomic variations with specific cancers and their treatments. The key to taking full advantage of genomics in precision medicine will be the use of state-of-the-art computing technology and software tools to synthesize, for each patient, genomic sequence data with the huge amount of contextual data (annotation) about genes, diseases, and therapies available, to derive real meaning from the data and produce the best possible outcomes for patients.
Big data and its associated techniques and technologies will continue to play an important role in the genomics of cancer and other diseases, as the volume of sequence data continues to rise exponentially along with the relevant annotation. As researchers at pharmaceutical companies, hospitals and contract research organizations make the high information processing demands of precision medicine more and more a part of their workflows, including next generation sequencing workflows, the need for high performance computing scalability will continue to grow. The ubiquity of genomics big data will also mean that very powerful computing technology will have to be made usable by life sciences researchers, who traditionally haven't been responsible for directly using it.
Fortunately, researchers requiring fast analytics will benefit from a number of advances in information technology happening at just the right time. The open-source Apache Spark™ project gives researchers an extremely powerful analytics framework right out of the box. Spark builds on Hadoop® to deliver faster time to value to virtually anyone with some basic knowledge of databases and some scripting skills. ADAM, another open-source project, from UC Berkeley's AMPLab, provides a set of data formats, APIs and a genomics processing engine that help researchers take special advantage of Spark for increased throughput. For researchers wanting to take advantage of the representational and analytical power of graphs in a scalable environment, one of Spark's key libraries is GraphX. Graphs make it easy to associate individual gene variants with gene annotation, pathways, diseases, drugs and almost any other information imaginable.
At the same time, Cray has combined high performance analytics and supercomputing technologies into the Intel-based Cray® Urika-XA™ extreme analytics platform, an open, flexible and cost-effective platform for running Spark. The Urika-XA system comes preintegrated with Cloudera Hadoop and Apache Spark and optimized for the architecture to save time and management burden. The platform uses fast interconnects and an innovative memory-storage hierarchy to provide a compact and powerful solution for the compute-heavy, memory-centric analytics perfect for Hadoop and Spark.
Collins and Varmus envision more than 1 million Americans volunteering to participate in the Precision Medicine Initiative. That's an enormous amount of data to be collected, synthesized and analyzed into the deep insights and knowledge required to dramatically improve patient outcomes. But the clock is ticking, and it's good to know that technologies like Apache Spark and Cray's Urika-XA system are there to help.
What questions do you have?
Ted Slater is a life sciences solutions architect at Cray Inc.