Using Electronic Health Records to Classify Cancer Site and Metastasis
PMCID: PMC12176508
PMID: 40533095
DOI: 10.1055/a-2544-3117
Journal: Applied clinical informatics
Publication Date: 2025-6-18
Authors: Kroenke K, Ruddy KJ, Pachman DR, Grzegorczyk V, Herrin J, et al.
Key Points
- Multiple EHR data extraction methods can effectively identify cancer site and metastatic status
- 44.4% of patients were identified as having metastatic disease using ICD-10 diagnoses
- The study provides a pragmatic approach to leveraging EHR data for cancer research, with methods that may be acceptable for covariate identification
Summary
The Enhanced EHR-facilitated Cancer Symptom Control (E2C2) Trial investigated methods for accurately identifying cancer site and metastatic status using electronic health record (EHR) data in a large, diverse patient cohort of 50,559 patients. The study developed and compared multiple approaches to cancer site classification and metastatic disease identification, demonstrating the feasibility of using EHR data for comprehensive cancer characterization.
The research revealed that using the two most prevalent ICD-10 cancer site diagnoses captured a median of 92% of cases, compared to 65% when using only the single most prevalent diagnosis. Multiple methods for identifying metastatic status were evaluated, including ICD-10 diagnoses, natural language processing (NLP), cancer registry data, treatment plans, medications, and clinical trial information. The ICD-10 and NLP methods showed the highest agreement (kappa = 0.53) and could be applied to the entire cohort, highlighting their potential utility in clinical research and patient management.