The road to personalized medicine is paved with a whole series of big data challenges, as the emphasis shifts from raw sequencing performance to mapping, assembly and analytics. The need to transmit terabytes of genomic information between different sites worldwide is both essential and daunting, including:
• Collaboration with research and clinical partners worldwide to establish statistically significant patient cohorts and leverage expertise across different institutions.
• Reference Genomes used to assemble sequences, perform quality control, identify and annotate variants, and perform genome-wide association studies (GWAS).
• Cloud-based Analytics to address critical shortages in bioinformatics expertise and burst capacity for HPC cluster compute.
• Data Management and Resource Utilization across departments in shared research HPC cluster environments, analytics clusters, storage archives, and external partners.
• Medical Genomics extends the data management considerations from research to clinical partners, CLIA labs, hospitals and clinics.
Most institutions still rely upon shipping physical disks due to inherent problems with commodity 1 Gigabit Ethernet (GbE) networks and TCP inefficiencies. When the goal is to reduce the analytics time from weeks to hours resulting in a meaningful clinical intervention, spending days just to transport the data is not a viable option. The transition from 1GbE to 10GbE and beyond has been unusually slow in healthcare and life sciences, likely due to an overemphasis on shared compute resources, out of context from the broader usage, system architecture, and scalability requirements.
Data centers in other industries have been quick to adopt 10GbE and unified networking due to impressive cost savings, performance and manageability considerations. Adopting a balanced compute model – where investments in processor capacity are matched with investments in network and storage – yields significant performance gains while reducing data center footprint, power and cooling costs. Demand for improved server density and shared resource utilization drives the need for virtualization. While I/O optimization historically has addressed jumbo packet transmissions on physical infrastructure, a more realistic test is that of regular packets, comparing physical and virtualized environments over both LAN/WAN traffic conditions. Aspera and Intel are working together to address these critical challenges to big data and personalized medicine.
Aspera develops high-speed data transfer technologies that provide speed, efficiency, and bandwidth control over any file size, transfer distance, network condition, and storage location (i.e., on-premise or cloud). Aspera® fasp™ Transfer Technology has no theoretical throughput limit and can only be constrained by the available network bandwidth and the hardware resources at both ends of the transfers. Complete security is built in, including secure endpoint authentication, on-the-fly data encryption, and integrity verification.
Intel has incorporated a number of I/O optimizations in conjunction with the Intel® Xeon® E5 processor and the Intel® 10Gb Ethernet Server Adapters:
• Intel® 10 Gigabit Ethernet (Intel® 10GbE) replaces and consolidates older 1GbE systems, reducing power costs by 45 percent, cabling by 80 percent and infrastructure costs by 15 percent, while doubling the bandwidth. When deployed in combination with Intel® Xeon® E5 processors, Intel 10GbE can deliver up to 3X more I/O bandwidth compared to the prior generation of Intel processors.
• Intel® Data Direct I/O Technology (Intel DDIO) is a key component of Intel® Integrated I/O that increases performance by allowing Intel Ethernet controllers and server adapters to talk directly with cache and maximize throughput.
• PCI-SIG* Single Root I/O Virtualization (SR-IOV) provides near-native performance by providing dedicated I/O to virtual machines and completely bypassing the software virtual switch in the hypervisor. It also improves data isolation among virtual machines and provides flexibility and mobility by facilitating live virtual machine migration.
Aspera® fasp™ demonstrated superior transfer performance when tested in conjunction with Intel® Xeon® E5-2600 processor and Intel® 10Gb Ethernet Server Adapter, utilizing both Intel® DDIO and SR-IOV. The real-world test scenarios transmitted regular packet sizes over both physical and virtualized environments, modeling a range of LAN/WAN traffic latency and packet loss:
• 300 percent throughput improvement versus a baseline system that did not contain support for Intel® DDIO and SR-IOV, showing the clear advantages of Intel’s innovative Intel® Xeon® E5 processor family.
• Similar results across both LAN and WAN transfers, confirming that Aspera® fasp™ transfer performance is independent of network latency and robust to packet loss on the network.
• Approximately the same throughput for both physical and virtualized computing environments, demonstrating the combined I/O optimizations effectively overcomes the performance penalty of virtualization.
International collaboration, cloud-based analytics, and data management issues with terabytes of genomic information will continue to pose challenges to life science researchers and clinicians alike, but working with I/O solutions driven by Aspera and Intel, we will get there faster.
Read the joint Intel-Aspera whitepaper, Big Data Technologies for Ultra-High-Speed Data Transfer in Life Sciences, for details of the I/O optimization results. Explore Aspera case studies with life science customers. Watch videos about the benefits of Intel DDIO and Intel Virtualization for Connectivity with PCI-SIG* SR-IOV.
How do you manage transport of your large medical genomics payloads? What big data challenges are you working to overcome?