Below is a guest post from Kyle H. Ambert, PhD, Intel Graph Analytics Operation.
Trends come and go, in the analytics world. First, everything is supercomputing, then everything is distributed computing. SQL. NoSQL. Hadoop. Hadoop! HADOOP! And then, Spark makes its way onto the scene, changing everything yet again. Navigating this alphabet soup of analytical spare parts is enough to make even the most devoted of data scientists wish they had listened to their respective mothers and become physicians.
As a graduate of Oregon Health & Science University School of Medicine, I lived at the forefront of where big data technology meets healthcare, while researching the biomedical and clinical applications of artificial intelligence, or “big health,” as I liked to refer to it. Like most data scientists, while there, I found myself spending a great deal of my time writing code to simply acquire data sets, format them in a sensible way, and remove uninformative or misleading information it may contain. This, most would agree, is what's referred to as "the essential pre-processing steps of data analysis," or, "the boring stuff," in technical parlance. The "development and application of analytical algorithms", or, "the reason I got into this business in the first place," was often relegated to an unfortunately modest fraction of my day.
An Experienced Programmer reading this is likely to observe, "well, the obvious solution to your problem is to write a software library abstracting out the repeated steps in your so-called boring stuff." Astute as ever, Experienced Programmer, but what of the ever-increasing population of domain experts who need to gain insights from their own data, but don’t write code? What of the physician who wants to examine the relative rates of diabetes diagnoses in their practice over time? Will you be the one to look the population geneticist writing a meta analysis in the eye and say, "I'm sorry, but if you want to do a large-scale text-mining study of the publications in your field, you're going to have to learn to program on the streets"? I couldn't do it.
That's why, when I was given the opportunity to join Intel's Graph Analytics Operation to guide the development of the Intel Analytics Toolkit (IAT) for end users in biomedicine, I jumped at the chance. Graph analytics enables users to analyze data using methods that take into account the relationships inherent in their data. With IAT, we enable biomedical researchers and physicians to use this technology to gain insight from networks of biomedical information. Developed with scalability in mind, we've protected the user from the laborious steps of working with big data in a distributed environment, creating an intuitive user interface to a suite of powerful analytical tools.
The real power here, for the aspiring data scientist, is that all the tools needed for importing, cleaning, storing, and analyzing data are all in the same place—no more writing code to connect an xml parser to a database; no more figuring out how to write analyses that efficiently scale to big data, or that are happy to work in a distributed environment—we’ve taken care of that for you. This, we’ve found, drastically decreases the time spent in the monotonous steps of data analysis, letting analysts focus on understanding their results—the reason they got into their business in the first place.
This month, we began a limited trial of the IAT, and we're partnering with university hospitals, private medical research organizations, and health insurance companies to better understand the needs of the biomedical and clinical communities, in terms of scalable data analysis. What we're already learning is that there is a huge need in the medical community for large-scale graph analytics, particularly when it comes to developing an integrated representation of heterogeneous data types—such as are found in electronic health records, or are used to inform Clinical Decision Support systems.
What questions do you have? To learn more about the IAT, watch this video, or see intel.com/graph. And, of course, if you have a biomedical data analysis problem you'd like to work on with us, or if you’d like to join the limited trial, leave a comment below.