*These two authors contributed equally to this work.
A distributed research network (DRN) has the advantages of improved statistical power, and it can reveal more significant relationships by increasing sample size. However, differences in data structure constitute a major barrier to integrating data among DRN partners. We describe our experience converting Electronic Health Records (EHR) to the Observational Health Data Sciences and Informatics (OHDSI) Common Data Model (CDM).
We transformed the EHR of a hospital into Observational Medical Outcomes Partnership (OMOP) CDM ver. 4.0 used in OHDSI. All EHR codes were mapped and converted into the standard vocabulary of the CDM. All data required by the CDM were extracted, transformed, and loaded (ETL) into the CDM structure. To validate and improve the quality of the transformed dataset, the open-source data characterization program ACHILLES was run on the converted data.
Patient, drug, condition, procedure, and visit data from 2.07 million patients who visited the subject hospital from July 1994 to November 2014 were transformed into the CDM. The transformed dataset was named the AUSOM. ACHILLES revealed 36 errors and 13 warnings in the AUSOM. We reviewed and corrected 28 errors. The summarized results of the AUSOM processed with ACHILLES are available at
We successfully converted our EHRs to a CDM and were able to participate as a data partner in an international DRN. Converting local records in this manner will provide various opportunities for researchers and data holders.
A distributed research network (DRN) enables observational studies to be conducted using multiple data sources, while confidential personal health data remain with the original data holders [
By providing all of their work products as open-source, OHDSI lowered the technical barriers required for participation in a DRN [
The adoption and use of EHRs has been increasing worldwide, but most EHRs are not interchangeable [
This study describes our conversion of EHR data to CDM ver. 4. For this conversion, we mapped codes from local coding systems into the standard vocabulary of OHDSI, performed data conversion called 'extraction, transformation, and loading (ETL)', and checked and improved the data quality. To validate our conversion, we ran the data characterization program Automated Characterization of Health Information at Large-scale Longitudinal Evidence Systems (ACHILLES) using the converted data.
The hospital was a Korean tertiary teaching hospital with 1,096 patient beds and 23 operating rooms that adopted a computerized provider order entry (CPOE) system in 1994 and a comprehensive EHR system in March 2010.
To standardize the format and content of observational data, CDM ver. 4.0 of OMOP was released in April 2012 [
Local codes for diagnoses, drugs, procedures, and laboratory tests were mapped into the OMOP standard vocabulary and reviewed by two physicians and two nurses. The coding system used for diagnosis in the subject hospital is the Korean Standard Classification of Diseases ver. 5 (KCD-5), a Korean derivative of the International Statistical Classification of Diseases and Related Health Problems, 10th revision (ICD-10), while the standard vocabulary of OMOP for diagnosis is based on the Systematized Nomenclature of Medicine Clinical Terms (SNOMED-CT). Because there was no mapping table from ICD-10 to SNOMED-CT in the version of the OMOP vocabulary available at the time of this analysis, we created our own mapping. Roughly 3,000 KCD-5 terms matched exact terms in the standard OMOP vocabulary, while the others had to be mapped manually. If there was no exact mapping term in the standard vocabulary, a parent term with broad meaning was mapped instead. As a result, 98.4% of the 20,721 KCD-5 codes were mapped to the CDM standardized vocabularies. Our local drug codes were mapped to the OMOP standardized vocabularies, which use RxNorm and the Anatomical Therapeutic Chemical (ATC) classification system. We could map 75.6% of the 5,233 local drug codes. However, unmapped drug codes were rarely used in our database, and their proportion of total prescription counts was only 0.4%. Of the 8,488 local procedure codes (anesthesia, laboratory tests, pathology, radiology, and surgery), 89.3% were mapped to codes in the OMOP standardized vocabularies, based on the Healthcare Common Procedure Coding System (HCPCS), the ICD 9th revision procedure coding system (ICD-9-PCS), and the Current Procedural Terminology, 4th edition (CPT-4) vocabularies.
The ETL process involves pulling data out of one database system and pushing them into another different database system. Of the 18 tables defined in OMOP CDM ver. 4, we performed the ETL process on all but four tables: Drug Cost, Procedure Cost, Payer Plan Period, and Provider. Because we planned to open our converted data to researchers, some data in the excluded tables were considered too sensitive to be opened. The Payer Plan Period table could not be included because Korean has a single mandatory governmental payer. Detailed documentation of the ETL is available in
The standardized dataset constructed using the above ETL process was named the AUSOM (Ajou University School of Medicine), pronounced 'awesome', database. The AUSOM database contains 2,073,120 individuals, 18,717,764 conditions (diagnoses), 99,331,794 drug exposures, and 15,002,879 procedures.
ACHILLES is open-source analytics software produced by OHDSI that runs on OMOP CDM ver. 4 and 5 for data characterization, quality assessment, and the visualization of observational health data [
We successfully converted our EHR to the CDM used within the OHDSI community and provided summary statistics for the data in an interactive webpage.
Controversies among studies of the same topic often arise due to differences in the participants, study designs, or interpretations [
OHDSI provides open-source software for not only implementing a CDM but also conducting analyses on a CDM [
In a DRN, one study protocol and the associated analytic code can be shared among data partners [
Two main limitations still exist in our data conversion. First, the code mapping process was imperfect. Two physicians and two nurses reviewed the code mapping, but because of different concepts and granularity between the coding systems, information loss was inevitable. Therefore, we need to revise our code mapping and improve it continuously, and update the AUSOM database accordingly. Second, we did not include cost information. Because we planned to open our data to the public via the ACHILLES webpage, we were reluctant to include such sensitive data. This limits the ability to conduct cost-effectiveness studies at present.
The summary statistics in the AUSOM database are open to the public via
We successfully converted local EHR data to OMOP CDM ver. 4 and opened its summary statistics to the public. The AUSOM database will be revised and updated continuously. We are ready to share our experience and data with anyone who wishes to adopt the OHDSI DRN.
This research was supported by a grant of the Korea Health Technology R&D Project through the Korea Health Industry Development Institute (KHIDI), funded by the Ministry of Health & Welfare, Republic of Korea (No. HI14C3201).
Supplementary materials can be found via
Values are presented as number (%) or the mean ± standard deviation.
aAge at the first observation. bThe difference between the number of persons and the sum of the number of individuals in all age categories is due to missing data in the Observation Period table for some individuals in the source database.