Seeking Truth in Healthcare with Big Data Analytics

Would you use a medical diagnostic that is missing 63 percent of its input data, and simultaneously has a 30 percent false positive rate? I wouldn’t.

That is, I wouldn’t if I could avoid it. Sadly, I can’t. These are the stats for the structured problem list in the typical enterprise electronic health record (EHR). Two studies have found that at least 63 percent* of the key clinical information about patients can only be found in text and scanned documents, which means that it is not readily accessible to clinical decision support.

At the same time, consider heart failure. One study found that a shocking 30 percent** of patients with heart failure in their problem lists do not actually have heart failure!

On Wednesday, May 27, I had the pleasure of co-presenting on this topic at the Health Technology Forum with Vishnu Vyas. Vishnu is the Director of Research and Development at Apixio, a Big Data analytics company for healthcare. Our goal was to introduce the concept of The True State of the Patient as a computable model of the patient for optimizing care delivery, outcomes and revenue.

Our message was that Big Data analytics can dramatically improve, and eventually eliminate, these problems.

First, the bad news. I’ve already mentioned a source of error: improperly attributed diagnoses in the structured problem list. I’ve also mentioned a data gap: Missing diagnoses in the structured problem list. But there are other critical gaps. For example, typical patients spend only 0.08percent*** of their waking hours in a clinical setting, so we are missing observations of the remaining 99.92 percent*** of their lives. To manage a patient with diabetes or cardiovascular disease, the inability to measure activity over the course of the entire day makes it very difficult to proactively manage these chronic diseases. For Parkinson’s patients, whose symptoms such as tremor and sleep disturbance are intermittent and highly variable, it is nearly impossible to assess the impact of therapy with such meager sampling. In both cases, to fill in these gaps is to move closer to knowing the True State of the Patient.

Another gap is genomic data. The literature abounds with amazing examples of genomic markers that can inform care decisions for patients with a variety of conditions, including but not limited to cancer, however these markers are not routinely measured and applied in clinical practice. We would argue that genomic data is often an important, but overlooked, component of the True State of the Patient.

Ok, so we have errors and we’re missing data. How does Big Data analytics help?

Let’s look at false positives for heart failure in the structured problem list. This is an important example, because heart failure patients are very sick, and they are subject to Medicare 30-day readmission rules, so the correct assessment of heart failure has real consequences for patients and health systems.

It is possible to build a classifier (a kind of machine learning model) that can look at all of the information in a patient’s chart and make a determination about how likely it is that the patient actually has heart failure. This is very interesting, because the computer is not diagnosing heart failure, it is reliably identifying what the chart of a true heart failure patient looks like. Machine learning, a typical tool in the Big Data arsenal, allows us to dramatically reduce the impact of errors in the problem list.

What about gaps? The same trick can be used for filling in gaps in the structured data, and is especially effective when text mining and natural language processing are used to find key clinical information in the unstructured data. Progress notes, nursing notes, consult letters and hospital discharge summaries are fertile sources of information for this kind of analysis. Big Data analytics leads to fewer gaps in the structured data.

What about gaps caused by the 99.92 percent of the time that the patient spends outside the clinic? This is where Big Data’s Internet of Things (IOT) comes in. In healthcare, IOT means remote patient monitoring, which can include anything from activity monitoring with a smart watch, to weight, blood pressure and glucose monitoring with FDA-approved medical devices. There are a number of very interesting studies happening right now in which patient monitoring is filling in the gaps in the clinical record.

Finally, what about the genome? Everything about the human genome is Big Data analytics, from the raw data for more than two billion base pairs, to the huge datasets and patient cohorts that are needed to correlate observed genomic variation with clinical measurements and outcomes. Inclusion of the human genome in the True State of the Patient will lead to some of the largest and most interesting Big Data advances in healthcare over the next 10 years. 

Once we have the True State of the Patient in a computable form, it is possible to improve every area of healthcare. Quality reporting, precision medicine, “practice-based evidence,” proactive wellness and disease management, revenue optimization and care delivery optimization all flow from accurate models of healthcare data. There are few businesses that can survive without accurate, reliable business data to drive modeling and optimization. The changes we are seeing in healthcare today are demanding that healthcare have the same level of transparency into the True State of the Patient.

Big Data analytics will be a critical part of this transformation in healthcare. What questions do you have?

* Studies performed by Apixio (private communication) and Harvard: http://www.nejm.org/doi/full/10.1056/NEJMp1106313#t=article
** Apixio study (private communication)
***http://www.cdc.gov/nchs/fastats/physician-visits.htm
(4.08 visits/American/year)
http://mail.fmdrl.org/Fullpdf/July01/ss5.pdf
(all visits < 80 minutes time with physician- this is a strong upper limit)
NB: 0.08% assumes 16 waking hours in a day, 80 minutes per visit, 4.08 visits/year/patient