How to Securely Use Data for Personalized Medicine

In my last blog, How to Securely Collect Data for Personalized Medicine, I discussed risks and safeguards for how to collect data for personalized medicine. The next step in the information lifecycle after collection is use, and I’ll focus on privacy and security concerns, risks and solutions in the use of sensitive data for personalized medicine.

During the collection phase a blood / saliva sample is typically acquired from the patient. Sample(s) are then sequenced to create the raw genome sequence data.

The raw genome sequence data for the patient is then compared to a typical raw genome data baseline data set to create a variance file, or a data set with points of interest where the patients raw genome deviates in interesting ways from the baseline. This raw genomic sequence data set can be very large, ranging to more than 3GB in size. Genomic databases can also contain tens or hundreds of thousands of raw genomic data sets. Maintaining security with such large data sets requires special attention to performance. Examples include hardware accelerated encryption, for example with Intel® Advanced Encryption Standard – New Instructions (AES-NI). Such hardware acceleration can be used in the high performance encryption of databases such as InterSystems Cache.

The variance file may then be annotated to attach meaning to the points of interest where they have been correlated with known conditions or traits, perhaps an increased propensity for a specific disease, or for pharmacogenomics where a specific point of interest in the variance file is associated with increased efficacy or toxicity of a given medicine.

Lastly, a risk factors report is produced from the annotated variance file and may be used by the healthcare professional to deliver personalized medicine.

The risk factors report may then be attached to the electronic health record (EHR) for the patient.

Clearly there are several data sets through the use of sensitive data in personalized medicine, from the raw genomic sequence data, to the variance file, risk factors report and patient EHR, and these need to be protected in confidentiality, integrity and availability.

Healthcare organizations using genetic information must constrain their use of this data to usage(s) specified in the privacy notice given to the patient prior to the patient granting consent to use their genetic data.

On the regulatory front, the Genetic Information Non-discrimination Act (GINA) prohibits the use of genetic information from any of these data sets by group health plans and health insurers for the purpose of denying coverage to a healthy individual or charging that patient higher premiums based solely on a genetic predisposition to developing a disease in the future. Genetic information is also considered Protected Health Information (PHI) and an organization using genetic information may be subject to the Health Insurance Portability and Accountability Act (HIPAA).

For healthcare organizations using genetic information in the United States, the Health Information Technology for Economic and Clinical Health (HITECH) Act requires organization subject to HIPAA to report data breaches affecting 500 or more individuals to Health and Human Services (HHS) and the media, in addition to notifying the affected individuals. Many states now also have breach notification laws, for example California SB 1386 requiring notification of affected individuals in the event of a breach of their sensitive information, which would include PHI such as genetic information that could be associated with them (was not de-identified).

Recently, the HIPAA Omnibus Rule became effective and includes further changes to when healthcare organizations must report breaches, together with new requirements Business Associates to comply with HIPAA Security and HITECH Act breach notification rules, holding them directly accountable for doing so. Business associates may include data processors that use genetic information in providing services to healthcare organizations. Disclaimer: this is publicly available information and not a legal summary or advice about regulations.

Personalized medicine use of sensitive data may also involve sensitive Intellectual Property (IP), especially in algorithms and knowledge bases used to analyze and assign meaning to genomic data. This IP must also be protected.

What types of privacy and security challenges and solutions do you see with the use of sensitive data for personalized medicine?