I. Introduction
The changing regulatory landscape of health products has led to an increasing interest in incorporating real-world evidence (RWE) for regulatory decision-making [
1]. Regulators are increasingly turning towards analytic frameworks and tools for evidence generation, using real-world data (RWD) to enhance their understanding of the benefits and risks of health products [
2]. The key evidentiary needs of regulators include monitoring the effectiveness, safety, and utilization of health products in routine care [
3]. Ideally, the evidence generated for regulatory purposes should be scientifically valid, timely, meaningfully contextualized, and sufficient for drawing conclusions while maintaining transparency in the evidence generation process [
3].
However, analysing RWD (typically from healthcare databases) and generating RWE that fulfils the aforementioned requirements can be challenging [
4]. RWD is predominantly observational in nature and is rarely collected for research purposes. RWD is also often not organized in a form that is suited for analysis. Disparate data coding standards, database architectures, and vocabularies can pose further challenges in generating RWE for informing regulatory decisions, particularly when multiple databases are involved [
5]. Using a common data model (CDM) may address some of these challenges by harmonizing the architectures and vocabularies of different databases, which confers analytical interoperability [
6]. Converting source data into a CDM creates a copy of the original data and reshapes it to fit the common structure of the CDM. Individual data elements from source are translated to the standardized vocabularies and columns from various source tables are split or merged to fit into target table columns of the CDM [
5,
7]. CDM-converted databases may then facilitate multi-centre analyses and pooling of results to obtain more robust inferences for various study questions of interest [
6,
8–
10].
While the benefits of CDM conversion for academic purposes are relatively clear, the contribution of CDM conversion towards meeting the broad evidentiary requirements set forth for regulatory purposes remains to be elucidated [
3,
8]. The aim of this study was to characterize the potential usefulness of CDM conversion by conducting a sample benefit-risk assessment involving CDM-converted data. The Observational Medical Outcomes Partnership (OMOP)-CDM was selected for this study because of its large active user community and use of open-source software, which facilitates code sharing and peer review [
6].
IV. Discussion
Our study identified several advantages of converting healthcare databases to the OMOP-CDM related to the conduct of RWD analysis. CDM conversion inevitably involves an inspection of the source data, which can uncover data defects. Tracing to find the root cause of these errors may enable appropriate fixes to be applied. Where unresolvable errors persist, insights as to which sections of the data (or time periods of data) are best left excluded from any analysis are invaluable, as their inclusion may lead to biased results. By exposing data inaccuracies and imposing data cleaning, CDM conversion can also be considered as a process of augmenting source data veracity.
However, CDM conversion alters only the form, but not the substance of the data. This underscores the need to understand the provenance and processes that generated the data and what the data may (and may not) represent. Upon conversion, the set architecture of the CDM, the OHDSI tools, resources and opportunities (i.e., past and ongoing study protocols and, analytic code templates) create a fertile ecosystem that can speed up analyses, although some modifications and extensions to previously written code are likely required for specific use cases.
Since the previous study by Hripcsak et al. [
14] focused on drug utilization patterns in chronic disease management, many code segments were reusable with simple modifications for the purposes of this study. The original code enabled easy specification of the inclusion and exclusion criteria, as well as the observation period of interest. The OMOP-CDM structure contains a derived table (termed the “Drug Era” table) that meaningfully aggregates all drug exposures. This consolidated drug exposure table allows analysts to define and apply the appropriate conditions required for a study (e.g., permitted gap days between prescription fills and stockpiling of previously filled prescriptions). The “Drug Era” table therefore simplifies precise exposure specifications, which are critical in pharmacoepidemiology analyses. Notably, these derived data element features are unavailable in other CDMs, such as the pCORnet, Sentinel, and i2b2 CDMs, which organize medication data at the transaction level, although there may be code segments available to instantaneously aggregate drug exposures during analysis.
The descriptive analysis of OAC usage provides insights on the background incidence of events of interest within a defined observation window. The analysis essentially covers what is described by the US Sentinel Initiative as level 1 analyses [
18]. These unadjusted descriptive analyses account for more than 80% of all queries by the US Food and Drug Administration in 2020 to investigate possible drug safety signals. Level 1 analyses help regulators filter signals that warrant subsequent analyses (level 2 and beyond), which typically involve more complex methods for covariate adjustment through various approaches including propensity score matching and stratification [
18,
19].
Beyond these analyses, comparative assessments may be needed to holistically evaluate the overall impact of any measures undertaken to optimize public health. This would include an analysis of the benefits and risks of a drug relative to that of available alternatives, in the context of its real-world utilization for various therapeutic purposes. Regulatory actions can have far-reaching effects on public health. Benefit-risk assessments facilitate understanding of the potential consequences of various measures undertaken. Large-scale comparative effectiveness analyses have been performed using OMOP-CDM converted data [
8–
10]. While these are useful, the primary focus and presentation of results in these analyses tend to focus on presenting risks on a relative scale. Regulatory agencies, however, require absolute risk estimates along with real world utilization to establish the net public health impact of policy decisions [
3].
To facilitate multiple comparisons as part of benefit-risk assessments, we propose using 100% horizontally stacked bar charts (
Figure 6) that amalgamate real-world utilization with effectiveness and safety information. The modularized code provided to derive these charts can be readily extended to other drug classes with composite endpoints to represent outcomes of interest (e.g., major adverse cardiovascular events). The figure facilitates comparisons of the overall prevalence of thromboembolic and bleeding events across anticoagulants at the end of follow-up. Such figures may also be useful for economic analyses, such as cost-effectiveness studies. However, unequal follow-up durations of patients on newer versus older medications are inevitable when using RWD for comparative analyses. To address this issue, we propose applying fixed time-point analyses to eliminate differential time zeros and the potential for immortal-time bias [
20] (
Figure 6C, 6D).
Our study has a few limitations. Firstly, the CDM conversion was only done using one hospital’s data; therefore, any characterization of the challenges and advantages of conversion may be limited. However, several advantages were identifiable even using only one database. Secondly, an identical analysis was not performed on pre-converted data, as the emphasis was on the possibility of using CDM for regulatory assessments rather than the technical details of conversion. As various data cleaning steps may be undertaken during conversion, not obtaining identical results (pre- and post-conversion) might be an expected outcome. Instead, we validated the analytic code by applying it on an external cohort of patients to indirectly validate the conversion process, while obtaining a separate set of results for comparison [
21]. Third, the proposed 100% stacked bar graphs remain an unadjusted descriptive analysis of the rate of events in different populations exposed to comparator agents. Incorporating methods to adjust for confounders and visualize the adjusted event rates would be important areas of future research. Fourthly, the cohorts from the two countries used were demographically different, which could introduce alternative explanations for the study findings; however, studying varied populations may occasionally be desirable to evaluate the consistency of results. Nonetheless, the use of data from two countries and the evaluation of the reproducibility of the analytic code across countries may be seen as a strength of this study, as this demonstrates the potential applicability of this approach to regulators of other countries. Lastly, our study did not evaluate aspects of CDM conversion relating to the mapping coverage and speed relative to other CDMs. These may be of interest to groups looking to embark on the journey of CDM conversion.
Regulatory agencies are increasingly looking to incorporate RWE generated through the analysis of RWD for regulatory decision-making. The findings of this study demonstrate that having access to datasets in the OMOP-CDM format facilitates RWD analysis and can be useful for gleaning insights on comparative drug utilization, effectiveness, and safety for risk-benefit assessments. While the initial conversion is challenging and needs to be done judiciously, the availability of an active community of researchers and open sharing of previously written analytic code promotes transparency and scientific validity in generating RWE that is fit-for-purpose. The ability to refine previously developed analytic code with simple modifications is an important step in harnessing RWD to supplement benefit-risk assessments and enable the conduct of robust evaluations on post-market drug effectiveness and safety use cases, and ultimately make evidence-based decisions to optimise health outcomes.