Speeding Plant and Animal Genomics at the Smithsonian

Discussions of genome sequencing often focus on human genomes and precision medicine. But genomic information about the plant and animal worlds is equally crucial. On a fragile planet, our ability to study genomes across the tree of life is critical to preserving biodiversity. Knowledge of plant and animal genomes can also help us manage climate change, feed a growing population, and mitigate the impact of newly emergent diseases. It can lead to breakthroughs in drug discovery, food safety, and more.

Reflecting the importance of plant and animal genomics, the Smithsonian Institution has established a new Institute for Biodiversity Genomics to focus on genomic studies that can help humans understand and preserve the diversity of life on earth.

To run their genomic assemblies and analysis, the institute’s researchers used the Smithsonian Institution’s shared HPC cluster, a massive system with multiple generations of processors and networked storage. But, as is often the case with such systems, there were problems. Genome assemblies often took weeks to complete. Some large assemblies failed to run to completion, causing frustration for scientists and slowing the research pipeline.

With the clock ticking on species extinction, leaders at the Smithsonian Institute for Biodiversity Genomics set out to see what impact Intel’s latest data center technologies could provide for their genomics workloads. They worked with Intel technologists to evaluate the performance of the Intel® Xeon® processor E7-8890 v3 using dedicated Intel® Solid-State Drive (Intel® SSD) Data Center (DC) Family for PCIe P3700 series.

We recently worked with Dr. Rebecca Dikow of the Smithsonian Institute for Biodiversity Genomics to create a white paper describing the results of this collaboration. This paper discusses the open-source technologies used in the institute’s genomics workflows, and describes the dramatic speedups produced by the new technologies. It also shares insights about what these performance improvements will mean for scientists like Dr. Dikow—and ultimately for all of us.

Read the paper and share your observations in the comments. How is your work affected by plant and animal genomics? How could your work benefit from newer processors and dedicated SSDs?

Learn more about big data in healthcare

Read about the Smithsonian Institute for Biodiversity Genomics

Follow us on Twitter:

  • @IntelHealth, @portlandketan
  • @smithsonian, @rdikow