Automated Extraction of Fluoropyrimidine Treatment and Treatment-Related Toxicities from Clinical Notes Using Natural Language Processing

arXiv ID: 2510.20727v1

Published: 2025-10-23

Authors: Xizhi Wu, Madeline S. Kreider, Philip E. Empey, Chenyu Li, Yanshan Wang

Categories: cs.CL, cs.AI

Relevance Score: 1.00 / 1.00

View on arXiv Download PDF

Summary

This paper developed and evaluated various Natural Language Processing (NLP) methods to automate the extraction of fluoropyrimidine treatment and associated toxicities from clinical notes. The study found that Large Language Model (LLM)-based approaches, particularly error-analysis prompting, achieved superior performance (F1=1.000) compared to traditional machine learning, deep learning, and rule-based methods. This highlights the potential of LLMs to enhance oncology research and pharmacovigilance by efficiently extracting crucial clinical information.

Medical Relevance

Automated and accurate extraction of fluoropyrimidine treatment and toxicity information from clinical notes is vital for real-time pharmacovigilance, enabling earlier detection of adverse drug reactions, improving patient safety, and facilitating more robust observational studies in oncology.

AI Health Application

The AI application is the automated extraction of structured medical data (treatment regimens and treatment-related toxicities) from unstructured clinical text using NLP techniques. This can support clinical decision-making, streamline data collection for medical research, enhance drug safety monitoring (pharmacovigilance) by identifying adverse events more efficiently, and potentially improve patient outcomes by highlighting toxicity risks.

Key Points

Fluoropyrimidines are widely used in oncology but are associated with significant toxicities (e.g., hand-foot syndrome, cardiotoxicity) often embedded in clinical notes.
A gold-standard dataset of 236 clinical notes from adult oncology patients was created and expertly annotated for 5 categories related to treatment regimens and toxicities.
A comprehensive comparison was performed across rule-based, machine learning (Random Forest, SVM, LR), deep learning (BERT, ClinicalBERT), and large language model (LLM) approaches (zero-shot and error-analysis prompting).
LLM-based error-analysis prompting achieved optimal precision, recall, and F1 scores (F1=1.000) for both treatment and toxicity extraction.
Traditional machine learning models (LR, SVM) ranked second for toxicity extraction (F1=0.937), outperforming deep learning models (BERT F1=0.839-0.873; ClinicalBERT F1=0.873-0.886).
The study identified limitations in machine learning and deep learning approaches due to small training data, which affected their generalizability, especially for rare categories.
The superior performance of LLMs indicates their strong potential for accurate, automated information extraction, crucial for pharmacovigilance and real-world evidence generation in oncology.

Methodology

The study utilized a gold-standard dataset of 236 expert-annotated clinical notes from adult oncology patients. It evaluated various NLP approaches: rule-based methods as a baseline, machine learning models (Random Forest, Support Vector Machine, Logistic Regression), deep learning models (BERT, ClinicalBERT), and large language model (LLM)-based methods (zero-shot prompting and error-analysis prompting). Models were trained and tested using an 80:20 split.

Key Findings

LLM-based error-analysis prompting achieved perfect F1 scores (1.000) for both fluoropyrimidine treatment and toxicity extraction. Zero-shot prompting also performed excellently (F1=1.000 for treatment, F1=0.876 for toxicities). Machine learning models (LR, SVM) were the next best for toxicities (F1=0.937), surpassing deep learning models (F1=0.839-0.886) and rule-based methods (F1=0.857-0.858).

Clinical Impact

This research offers a highly effective method for automating the extraction of critical treatment and toxicity data from unstructured clinical text. This can significantly enhance pharmacovigilance efforts by accelerating adverse event identification, support the generation of comprehensive real-world evidence for oncology research, and potentially aid clinicians by providing a more complete and readily accessible overview of patient-specific treatment responses and toxicities.

Limitations

Machine learning and deep learning approaches were found to be limited by the small size of the training dataset, which restricted their generalizability, particularly when attempting to identify and extract less common toxicity categories.

Future Directions

The strong potential of LLM-based NLP in effectively extracting critical clinical information suggests future research could focus on further integrating these models into clinical workflows, scaling up their application to larger and more diverse datasets, and exploring their utility for other complex information extraction tasks in oncology and broader medical domains.

Medical Domains

Oncology Pharmacovigilance Clinical Informatics Cancer Treatment Adverse Drug Reactions

Keywords

Natural Language Processing Fluoropyrimidine Clinical Notes Toxicity Extraction Oncology Large Language Models Pharmacovigilance Treatment Regimen

Abstract

Objective: Fluoropyrimidines are widely prescribed for colorectal and breast cancers, but are associated with toxicities such as hand-foot syndrome and cardiotoxicity. Since toxicity documentation is often embedded in clinical notes, we aimed to develop and evaluate natural language processing (NLP) methods to extract treatment and toxicity information. Materials and Methods: We constructed a gold-standard dataset of 236 clinical notes from 204,165 adult oncology patients. Domain experts annotated categories related to treatment regimens and toxicities. We developed rule-based, machine learning-based (Random Forest, Support Vector Machine [SVM], Logistic Regression [LR]), deep learning-based (BERT, ClinicalBERT), and large language models (LLM)-based NLP approaches (zero-shot and error-analysis prompting). Models used an 80:20 train-test split. Results: Sufficient data existed to train and evaluate 5 annotated categories. Error-analysis prompting achieved optimal precision, recall, and F1 scores (F1=1.000) for treatment and toxicities extraction, whereas zero-shot prompting reached F1=1.000 for treatment and F1=0.876 for toxicities extraction.LR and SVM ranked second for toxicities (F1=0.937). Deep learning underperformed, with BERT (F1=0.873 treatment; F1= 0.839 toxicities) and ClinicalBERT (F1=0.873 treatment; F1 = 0.886 toxicities). Rule-based methods served as our baseline with F1 scores of 0.857 in treatment and 0.858 in toxicities. Discussion: LMM-based approaches outperformed all others, followed by machine learning methods. Machine and deep learning approaches were limited by small training data and showed limited generalizability, particularly for rare categories. Conclusion: LLM-based NLP most effectively extracted fluoropyrimidine treatment and toxicity information from clinical notes, and has strong potential to support oncology research and pharmacovigilance.