How to Securely Retain Data for Personalized Medicine

Personalized medicine promises compelling benefits in improving the quality and reducing the cost of healthcare. Personalized medicine is enabled by powerful new types of sensitive data including genetic information about patients. To ensure these benefits are realized quickly, effectively and smoothly it is desirable to avoid security incidents such as breaches. In prior blogs I discussed how to manage privacy and security risks, and securely collect and use data for personalized medicine. In this blog I focus on how to retain data for personalized medicine.

When looking at retention it is useful to consider the types and characteristics of the data used in personalized medicine. The types of data powering personalized medicine range from the original blood or saliva samples used to get genetic information for a patient, to the raw genomic data for a human which is approximately 3.2Gb in size, as well as various other types of derived data. One of the key steps in deriving meaning out of the raw genomic data involves comparing this raw genomic data to baseline genomic data to derive a variance file that is much smaller in size, highlighting only the interesting variations in the genomic data of the specific patient. The data points in the variance file are referred to as SNP’s. Lastly, a risk factors report can be produced from this variance file, highlighting the patient propensity to various traits such as diseases. This report may also highlight pharmacogenomics, specifying the efficacy or toxicity of various drugs to the patient. The risk factors report is often included in the EHR for the patient.

Genetic data are considered PHI and subject to federal regulations such as HIPAA, HITECH Act as well as state level regulations such as for breach notification, for example CA SB 1386, and subject to privacy, security and breach notification rules. The 2013 Cost of a Data Breach Study estimates the average total cost of a data breach in the US in 2012 at $5.4M. Clearly a major business impact. Avoiding such incidents requires a proactive approach to privacy and security.

Location of data retained has a direct impact on regulations and data protection laws that apply. This includes not only the primary backend servers, but also Business Continuity / Disaster Recovery sites, backup sites and any business associates or data processors that may also retain sensitive data. Recent studies and incidents point to the risk of BYOC (Bring Your Own Cloud). To ensure sensitive data for personalized medicine stays in the cloud where it is supposed to be, under the control of the healthcare organization with effective privacy and security controls, it is necessary to ensure solutions are usable, security is not cumbersome, and IT within the healthcare organization is responsive and not overly restrictive.

De-identification is a key safeguard often applied to enable research and mitigate risk of security incidents such as breaches. Various methods exist for de-identification. This can involve removing specific elements of PII, such as in the HIPAA Safe Harbor method. Alternatively a risk based method such as the HIPAA Statistical Method may be used. De-identified data often has some small risk of re-identification, and research has shown that it is possible to re-identify patients using de-identified genetic information. Further, some types of research require some elements of PII, for example phenotype research may require zip code. A practical approach to effectively mitigating risk of sensitive data retained for personalized medicine requires a holistic approach where administrative, physical and technical controls are applied in combination, together with a multi-layered approach where for example de-identification is combined with tokenization, access controls, encryption and so forth.

To ensure solutions are usable security must not be cumbersome, otherwise research shows that non-compliance and BYOC and other risks can increase. Hardware assisted security such as encryption acceleration enables such technical security controls to be implemented with improved performance, robustness to increasingly sophisticated malware, improved usability, and reduced cost. Performance testing shows that such an approach can be very effective in enabling sensitive data to be retained in a highly secure manner with minimal performance and usability impact.

What kinds of strategies are you using to protect sensitive data for personalized medicine?