On July 25, 2024, the US Food and Drug Administration (FDA) announced the release of a final version of its Real-World Data: Assessing Electronic Health Records and Medical Claims Data to Support Regulatory Decision-Making for Drug and Biological Products Guidance. This guidance outlines important considerations for the use of real-world evidence (RWE) in the form of electronic health records (EHRs) and medical claims data in clinical studies, including interventional clinical investigations and observational studies. The guidance finalizes the prior draft guidance issued by FDA in September 2021 with minimal changes, such as clarifications related to use of data sources and discussion of artificial intelligence (AI) technologies.
Stakeholders potentially affected by this guidance include sponsors; sources of EHR or claims data such as EHR vendors, payors and data aggregators; and other interested parties such as developers of AI tools used to curate or analyze such data. These stakeholders should review the guidance and be familiar with its recommendations for leveraging data sources for FDA regulatory decision making.
In Depth
BACKGROUND
The 21st Century Cures Act, enacted December 13, 2016, required FDA to issue guidance regarding the use of RWE in regulatory decision making. This requirement led to FDA’s September 30, 2021, draft guidance outlining considerations for the use of real-world data (RWD) and RWE in regulatory decision making.
Although FDA received significant feedback from industry on the draft guidance that requested greater flexibility, the final guidance largely adopts the draft guidance with the following key changes:
- Clarifies that the selection of study variables for validation and the extent of effort required for validation depends on the necessary level of certainty and the implication of potential misclassification on study inference.
- Notes that choice of a reference standard for validation may vary based on study design and question, the variable of interest and the necessary level of certainty.
- Recommends the use of quantitative approaches to demonstrate whether and how misclassification, if present, might impact study findings.
- Removes generally understood defined terms.
ANALYSIS
The guidance discusses the selection of data sources; development and validation of definitions for study design elements; and data traceability and quality during data accrual, curation and incorporation. Notably, the guidance does not provide recommendations on study design or statistical analysis, nor does it endorse any type of data source or study methodology.
The guidance employs the following definitions for key terms:
- RWD means “data relating to patient health status or the delivery of health care routinely collected from a variety of sources.”
- RWE means the “clinical evidence regarding the usage and potential benefits or risks of a medical product derived from analysis of RWD.”
- Medical claims data means “information submitted to insurers to receive payment for treatments and other interventions.”
- Clinical studies mean “all study designs, including, but not limited to, interventional studies where the treatment is assigned by a protocol and non-interventional studies where treatment is determined in the course of routine clinical care.”
Data Sources
Under the guidance, FDA recommends that protocols submitted to FDA identify all data sources proposed for the study, and that all such data sources be assessed to determine whether they are appropriate for addressing specific study questions. In particular, FDA recommends that the protocol specify the relevance and reasoning for selecting the applicable data sources. Such reasoning should address the likelihood of a data source containing the sought after information. For example, given that EHRs are captured in the course of health services and not under the protocol, a protocol proposing to use EHRs should identify the type of information being sought and relevant background detailing how the EHR captures that information.
The guidance also discusses considerations for data linking to create broader longitudinal datasets. FDA notes that protocols should describe the accuracy and completeness of data linkages, linkage methods and any potential issues with linkage quality through the use of probabilistic and deterministic approaches. Sponsors and stakeholders that support such data linking should be mindful of the expectations that FDA outlines in the guidance regarding linking accuracy.
Recognizing the proliferation of AI and its potential applications in curating unstructured data, FDA recommends that protocols proposing to use AI or other derivation methods specify the assumptions and parameters of the algorithms, the data set used to train and build the algorithms, and information regarding supervision of the algorithm along with any metrics regarding validation of the methods. FDA makes clear that it does not endorse any type of AI technologies. It’s unclear whether any information regarding adverse events discovered during an AI scrape of the EHR would impose reporting or other obligations on the manufacturer.
Study Design Elements
FDA clarifies that the study should not be designed to fit a specific data source, but rather the data sources should be selected to best fit the questions of interest. FDA discusses design elements and parameters such as time, study population, exposure ascertainment and outcome ascertainment. In this discussion, FDA encourages the use of quantitative approaches, such as quantitative bias analyses, to demonstrate how key covariates might impact study findings.
Data Quality
The guidance provides recommendations for examining the quality of EHR and medical claims data during the data lifecycle. FDA recommends:
- Characterizing data according to completeness, conformance and plausibility of data values.
- Documenting the quality assurance and quality control (QA/QC) plan that includes transformation processes.
- Outlining procedures for ensuring data integrity.
These activities, including preparing the QA/QC plan for addressing data quality issues, may as a practical matter require careful coordination between the sponsor and the RWD sources, including EHR vendors and other stakeholders such as data extraction and migration vendors, data tokenization vendors, de-identification vendors, and AI and other developers that provide data curation solutions. Stakeholders should carefully evaluate the various characteristics and issues that the guidance highlights with respect to evaluating data quality.
The guidance represents the latest step in FDA’s ongoing efforts to address the appropriate role of RWD and RWE in the regulatory process.
Marissa Hill Daley and Jae Hyun Lee also contributed to this article.