Ready to Rumble: Competing Standards for Incorporating Variant Data into Electronic Health Records
The first in a series on EHR integration
Even as DNA sequence data has become increasingly important in the treatment of cancer, the difficulties of actually using this data in clinical decision making continue to mount. Although there are a number of stumbling blocks, an annoyance that has grown in importance recently is the difficulty of getting discrete variant data into the Electronic Health Record (EHR). Most of the major sequencing vendors provide their results as PDF reports, containing a lot of great information including not just the variants, but what the variants mean as to possible therapies and/or clinical trials. Unfortunately, these reports are essentially opaque to the EHR, its backing database, and any analytic tools running on this database.
This problem has become more serious as patients are now beginning to be monitored for changes in variants status over time, at first among leukemia/lymphoma patients and now more generally, as liquid biopsy is becoming prevalent. Comparing variants and their allele frequency across two or more pdf documents is difficult and frustrating, while comparing changes in findings over time is a prime use of the EHR.
This unmet need has led to a number of approaches towards a solution. These approaches have contrasting strengths and weaknesses, which I'll be pointing out over a series of posts:
An article "in Press" in the Journal of Molecular Diagnostics (A Model Information Management Plan for Molecular Pathology Sequence Data Using Standards, Campbell, Walter S. et al., ).
The HL7 Version 2.5.1 Implementation Guide: Laboratory Results Interface, Release 1 STU Release 3 Pilot proposal
The SMART on FHIR Genomics standards
A wrap-up and summary
In this post we will look at the approach described by Campbell et al. in their new Journal of Molecular Diagnostics paper. Campbell et al. employ a standards -based approach: an HL7 laboratory results message format that encodes variant data using SNOMED CT and HGVS .
This work is part of an ongoing project at the University of Nebraska Medical Center (UNMC) to structure and encode all results specified in the College of American Pathologists (CAP) cancer protocol check lists into SNOMED CT. In the SNOMED ontology created by the authors, a gene is modeled as type of "Sub-cellular structure", which is a sub-type of "Observable Entity", and linked to the HGNC locus name (e.g. "BRAF"). Variants are modeled as extensions of "Measurement scales", with HGVS syntax (e.g. BRAF c.1799C>T(p.Val600Glu)) as one of the allowed scale types. With these clever extensions, genomic data can be glued into the comprehensive SNOMED ontology, and, can therefore be computed on by any of the plethora of medical informatics systems that understand this ontology.
The next step leverages these SNOMED extensions to generate HL7 V2.5 standards-based ORU (Unsolicited Observation Message) observations that can be reported to the EHR. Each variant is reported as a distinct OBX line in the HL7 message, using the SNOMED identifier for the gene and HGVS to describe the variant:
OBX|2|CWE|911752161000004103^EGFR sequence variant identified in excised malignant neoplasm (observable entity)^SCT|2|EGFR NP_005219.2:T790M NM_005228.3:c.2369C>T^EGFR T790M|||Tier1-Pathogenic|||F
In this example, the SNOMED identifier for EGFR (911752161000004103) and its human readable name (EGFR sequence variant identified in excised malignant neoplasm (observable entity)) take the role of the OBX-3 Observation Identifier. The next field tells the receiving system that this identifier is based on SNOMED (SCT) and the OBX-5 Observation Value is encoded as HGVS using both the amino acid change ("p.") and DNA change ("c.") and where these are expressed using the RefSeq IDs for the amino acid sequence and the transcript, respectively.
OBX lines in this format are analogous, for example, to reporting blood hemoglobin values:
OBX|1|NM|718-7^Hemoglobin [Mass/volume] in Blood^LN^^^^^^Hemoglobin [Mass/volume] in Blood||5.5|mEq/L|2.5- 5.3|H|||F
except that here we're using LOINC identifiers ("718-7") for the Observation Identifier, which, by the way, we encode as "LN" in the next field, and the Observation Value as mEq/L.
This is the second clever idea here: EHRs know how to interpret lab results returned as OBX lines in an ORU message, so can immediately interpret variant information returned in this format ("immediately" here means after updating the SNOMED tables in the EHR to contain the new mappings described above).
Scott Campbell and his co-authors were guided by considering the "Technical Desiderata for integration of genomic data into the EHR" from Masys et al.
Maintain separation of primary molecular observations from the clinical interpretations of those data
Support lossless compression from primary molecular observations to clinically manageable subsets
Maintain linkage of molecular observations to the laboratory methods used to generate them
Support compact representation of clinically actionable subsets for optimal performance
Simultaneously support human-viewable formats and machine-readable formats to facilitate implementation of decision support rules
Anticipate fundamental changes in the understanding of human molecular variation
Support both individual clinical care and discovery science
and their approach succeeds in implementing many of the "Desiderata":
Primary data, in the form of FastQ, BAM and VCF files are maintained in the pathology laboratory, and only the clinical interpretation (i.e. validated alterations and their impact) are transmitted to the EHR
SNOMED and HGVS provide a "lossless compression" from the primary data, and simultaneously provide both a human readable format (e.g. EGFR T790M) and machine readable one (e.g. EGFR is SNOMED observable entity 911752161000004103 and the variant is NM_005228.3:c.2369C>T)
Their format provides for both individual patient care and discovery science
The latter point is really important, since we want to "anticipate fundamental changes" in precision oncology, storing the data in a form that will be usable in the future through database search is critical, hence the use of full HGVS, including the dotted version number for the RefSeq identifiers.
The Molecular Pathology lab at UNMC are long time users of the GenomOncology Pathology Workbench for reporting somatic mutations. Working with Scott Campbell and Allison Cushman-Vokoun, we extended the Path Workbench to provide both traditional human-readable reports and HL7 2.5 ORU messages, as described above. GenomOncology has an exclusive license for commercial use of this technology, and has made it a part of our GO-Connect offering. Please email us to discuss implementing GO-Connect at your institution.
In the next post, I will describe a alternate HL7 2.5-based format for delivering variant information to the EHR.