In my previous blog, I discussed how the 4 V’s of Big Data apply to healthcare. This time around, I would like to focus on a specific class of Big Data solutions; distributed computing solutions that utilize Hadoop. So what is it exactly?
Hadoop is essentially a software framework that supports the storage and processing of large data sets in a highly parallelized manner. Two of the obvious benefits that Hadoop brings to Big Data solutions are scale and flexibility:
• Scale: You might remember from my last blog that “volume” is one of the key Big Data challenges facing health-IT organizations. Hadoop is typically deployed on a cluster of commodity servers. As computing or storage demand grows, the system is scaled by adding new nodes to the cluster. This is the “scale out” model, as opposed to “scale up” where an existing system is replaced with a new, more powerful system. The “scale out” model is less disruptive (and typically less expensive) for IT organizations than the “scale up” model.
• Flexibility: Variety of data is another consideration that is driving interest in Hadoop. While much of healthcare data is structured, resides in a traditional relational database, and conforms to a well-defined schema, there is also a lot of unstructured information such as images, faxes, and dictated/narrative notes. This unstructured information contains significant clinical and analytical value, but many organizations are not making effective use of it today. Hadoop includes the HDFS (Hadoop Distributed File System) and HBase, a non-relational, distributed database that has no problem storing these differing data types in a schema-less fashion. Furthermore, all of this data is triple-replicated across the cluster improving the resiliency of solutions that make use of this infrastructure.
So how are healthcare organizations making use of Hadoop today? Take a look at a new paper which describes in more detail how the healthcare industry can take advantage of Hadoop. Examples from three domains are highlighted; provider, payor and life sciences:
You might have gleaned from the title of the link above that Intel is among the growing list of companies convinced that Hadoop is a critical component of the data center, and at Strata a few weeks ago, Intel announced the North American release of the Intel Distribution for Apache Hadoop (IDH). Details can be found here.
Do you have any thoughts or experiences to share? How has Hadoop helped your organization? Please add to the discussion below. For information on the role Intel plays in Big Data for healthcare, please visit this site: Big Data and Analytics in Healthcare and Life Sciences. You can also follow me @CGoughPDX on Twitter.