Automated Generation of Custom MedDRA Queries Using SafeTerm Medical Map

arXiv ID: 2512.07694v1

Published: 2025-12-08

Authors: Francois Vandenhende, Anna Georgiou, Michalis Georgiou, Theodoros Psaras, Ellie Karekla, Elena Hadjicosta

Categories: cs.CL

Relevance Score: 0.95 / 1.00

View on arXiv Download PDF

Summary

This paper introduces SafeTerm, a novel artificial intelligence system designed to automate the generation of custom MedDRA queries, which are critical for signal detection in pre-market drug safety review. SafeTerm achieves this by embedding medical terminology and MedDRA Preferred Terms (PTs) into a multidimensional vector space and using cosine similarity with extreme-value clustering to retrieve and rank relevant PTs. Validation against FDA OCMQ v3.0 demonstrated that the system can achieve high recall (>95%) at moderate thresholds and improved precision (up to 86%) at higher thresholds, making it a viable supplementary tool for pharmacovigilance.

Medical Relevance

This system directly impacts pre-market drug safety review and pharmacovigilance by automating a historically labor-intensive and critical process, thereby improving the efficiency, consistency, and timeliness of adverse event signal detection for new drugs.

AI Health Application

The SafeTerm AI system uses natural language processing, vector space embeddings, cosine similarity, and extreme-value clustering to automatically generate custom MedDRA queries. This AI application assists in understanding medical terminology, processing adverse event data, and enhancing signal detection for drug safety, thereby improving the efficiency and accuracy of pharmacovigilance in healthcare.

Key Points

The system addresses the critical challenge of grouping related adverse event terms into standardized MedDRA queries or FDA Office of New Drugs Custom Medical Queries (OCMQs) for efficient signal detection in drug safety.
SafeTerm employs an AI-driven approach that embeds both input medical query terms and MedDRA Preferred Terms (PTs) into a multidimensional vector space.
Relevance ranking of PTs is performed using cosine similarity combined with extreme-value clustering and multi-criteria statistical methods.
Validation against 104 queries from the FDA OCMQ v3.0 showed that SafeTerm achieves high recall, exceeding 95% at moderate similarity thresholds.
Precision improved significantly, reaching up to 86% at higher similarity thresholds, though the identified optimal threshold (~0.70-0.75) yielded a balance of ~50% recall and ~33% precision.
The system is presented as a viable supplementary method for automated MedDRA query generation, recommending an initial similarity threshold of ~0.60 for broad term selection, with higher thresholds for refined results.
Performance on 'narrow-term PT subsets' was similar to general subsets but required slightly elevated similarity thresholds for optimal results.

Methodology

The SafeTerm system leverages a quantitative artificial intelligence approach. It begins by embedding medical query terms and MedDRA Preferred Terms (PTs) into a multidimensional vector space. Subsequently, it utilizes cosine similarity to assess the semantic relatedness between the input query and the embedded PTs. An extreme-value clustering technique, combined with multi-criteria statistical methods, is then applied to generate and rank a list of relevant PTs. Validation was conducted against the FDA OCMQ v3.0 (104 queries) by computing precision, recall, and F1 scores across various similarity thresholds.

Key Findings

The system demonstrated high recall rates (>95%) at moderate similarity thresholds. Higher thresholds consistently improved precision, achieving up to 86%. The identified optimal similarity threshold range of 0.70-0.75 yielded a recall of approximately 50% and a precision of about 33%. Performance for narrow-term PT subsets was comparable, albeit requiring slightly higher similarity thresholds. Overall, SafeTerm is validated as a viable supplementary method for automated MedDRA query generation.

Clinical Impact

SafeTerm has the potential to significantly enhance the efficiency and standardization of creating custom MedDRA queries, which are fundamental for identifying adverse drug reactions. By automating this process, it can accelerate signal detection during drug development and post-market surveillance, leading to earlier identification of potential safety issues and ultimately improving patient safety and regulatory compliance.

Limitations

The abstract does not explicitly state specific limitations of the SafeTerm system's performance or scope. It positions the system as a "viable supplementary method," implying it would complement rather than fully replace human expert review in MedDRA query generation.

Future Directions

The abstract does not explicitly suggest future research directions. However, it provides a practical recommendation for initial deployment, suggesting a similarity threshold of ~0.60 for broad term selection and higher thresholds for refined term selection, indicating an expectation of iterative user-driven optimization and practical application.

Medical Domains

Pharmacovigilance Drug Safety Regulatory Affairs Pharmaceutical Research and Development Clinical Trials

Keywords

MedDRA adverse events drug safety pharmacovigilance artificial intelligence natural language processing cosine similarity signal detection

Abstract

In pre-market drug safety review, grouping related adverse event terms into standardised MedDRA queries or the FDA Office of New Drugs Custom Medical Queries (OCMQs) is critical for signal detection. We present a novel quantitative artificial intelligence system that understands and processes medical terminology and automatically retrieves relevant MedDRA Preferred Terms (PTs) for a given input query, ranking them by a relevance score using multi-criteria statistical methods. The system (SafeTerm) embeds medical query terms and MedDRA PTs in a multidimensional vector space, then applies cosine similarity and extreme-value clustering to generate a ranked list of PTs. Validation was conducted against the FDA OCMQ v3.0 (104 queries), restricted to valid MedDRA PTs. Precision, recall and F1 were computed across similarity-thresholds. High recall (>95%) is achieved at moderate thresholds. Higher thresholds improve precision (up to 86%). The optimal threshold (~0.70 - 0.75) yielded recall ~50% and precision ~33%. Narrow-term PT subsets performed similarly but required slightly higher similarity thresholds. The SafeTerm AI-driven system provides a viable supplementary method for automated MedDRA query generation. A similarity threshold of ~0.60 is recommended initially, with increased thresholds for refined term selection.

Comments

12 pages, 4 figures