Solving the Jigsaw Puzzle with 1 Billion Pieces

How does a centre of research excellence keep pace with ever increasing data volumes and demand for insight? It’s a recurring question we hear the world over so it’s great to be able to showcase an example of how one organisation is meeting these challenges here in Spain. Spain’s National Center of Genomic Analysis (CNAG) opened in 2009, supporting 120 researchers and conducting c. 300 projects per year. It has a clear mission: to deliver research and results that help make citizens’ lives better.

Finding the 0.1%

As one of the largest capacity sequencing facilities in Europe, CNAG sequences around 800 Gigabases per day. We know that for reliable analysis we need to sequence at 30-fold coverage, so CNAG are sequencing the equivalent of eight full human genomes every 24 hours, but it’s the variations that really hold the key to unlocking precision medicine.

And given that genomes are 99.9% identical the challenge becomes clear: find the 0.1%, break each genome down into short strings, sequence them and then rebuild them. Ivo Gut, director of CNAG summarizes this nicely in the Sequencing and Supercomputers case study when he says: “It’s like doing a jigsaw puzzle with 1 billion pieces.”

Combining Data Sources to Gain New Insights

If you are a regular reader of our Health and Life Sciences blogs you’ll know that the word collaboration appears frequently. Across the healthcare ecosystem, collaboration is driving change, it’s moved from something we all aspire to, to something we must embrace to deliver better care, reduced costs and improved workflows. So, it’s great to see CNAG combining their own data with other sources to gain new insights, e.g.  CNAG collaborated with other institutions as part of the International Cancer Genome Consortium to better understand chronic lymphocytic leukaemia.

Big Data leads to Big Information

CNAG’s aim is to be able to put the findings of their research into use in a clinical environment; this requires a powerful computing platform which allows them to locate and accurately predict the base variations in every genome of the 3.2m bases that are potentially responsible for diseases. Without the technical capabilities to deliver sequence analysis on an industrial scale it makes it difficult to do much more than one-off research projects. I recognise that these pockets of research are valuable but to move us closer to delivering personalized medicine we must begin to work more collaboratively.

CNAG’s new sequencing and analytics environment is helping the organisation to meet the growing volume and variety of data generated by collaborative working with Ivo Gut saying: “We’re certainly handling big data now – and it’s growing all the time – but what we’re really after is big information.”

Intel and Atos provide scale and flexibility

Being able to design the computational infrastructure from the ground up gives organisations such as CNAG the opportunity to utilise best-in-class technology. The organisations I talk to regularly all have the same priorities around flexibility and scale. With that in mind Atos Big Data and Security service line developed a tailor-made compute cluster, powered by the Intel® Xeon® processor E5 family, to conduct in-depth high-performance data analytics (HPDA) on genome sequencing.

And looking to the future, CNAG will provide more granular insights to help hospitals treat different diseases, whether that be for identification of the correct medication or for rapid initial diagnosis. As CNAG scales its computational infrastructure it will also increase its scope of research, ensuring that Spain stays at the forefront of global genomics research.

Contact Carlos Piqueras on LinkedIn