Performance of the SafeTerm AI-Based MedDRA Query System Against Standardised MedDRA Queries

arXiv ID: 2512.07552v1

Published: 2025-12-08

Authors: Francois Vandenhende, Anna Georgiou, Michalis Georgiou, Theodoros Psaras, Ellie Karekla, Elena Hadjicosta

Categories: cs.CL

Relevance Score: 0.95 / 1.00

View on arXiv Download PDF

Summary

SafeTerm AMQ is a novel AI system designed to automate the generation of MedDRA queries for drug safety signal detection by embedding medical terms and MedDRA Preferred Terms (PTs) into a vector space, then using cosine similarity and clustering to retrieve and rank relevant terms. Validated against 110 Standardised MedDRA Queries (SMQs), it demonstrated high recall (94%) at moderate thresholds and improved precision (up to 89%) at higher thresholds. This makes it a viable supplementary tool for automated MedDRA query generation, effectively balancing recall and precision for pharmacovigilance applications.

Medical Relevance

This AI system is highly relevant for pre-market drug safety review and pharmacovigilance as it automates and standardizes the crucial process of identifying and grouping related adverse event terms for signal detection, which is traditionally labor-intensive and expert-dependent.

AI Health Application

SafeTerm is an AI-based system that uses natural language processing (NLP) and machine learning (vector embeddings, cosine similarity, clustering) to automatically generate and retrieve relevant MedDRA Preferred Terms for medical queries. Its primary application is in automating medical query generation for drug safety review and signal detection of adverse events, thereby assisting in pharmacovigilance.

Key Points

SafeTerm AMQ is an AI-based system automating MedDRA query generation, critical for adverse event term grouping and signal detection in drug safety.
It operates by embedding medical query terms and MedDRA PTs into a multidimensional vector space, applying cosine similarity, and extreme-value clustering to rank relevant PTs by a relevance score (0-1).
The system was validated against 110 tier-1 Standardised MedDRA Queries (SMQs) from MedDRA v28.1, with performance assessed using precision, recall, and F1 scores at various similarity thresholds.
SafeTerm AMQ achieved high recall (94%) at moderate similarity thresholds, indicating strong retrieval sensitivity, and improved precision (up to 89%) at higher thresholds.
An optimal manual similarity threshold of 0.70 yielded an overall recall of 48% and precision of 45%, while an automated threshold (0.66) prioritized recall (58%) over precision (29%).
Performance slightly improved when restricting to narrow-term PTs, affirming their increased relatedness, and the system demonstrated comparable, satisfactory performance on both SMQs and sanitized OCMQs.
The authors recommend utilizing suitable MedDRA PT terminology in query formulation and applying the automated threshold method to optimize recall, with options to increase similarity scores for refined, narrow term selection.

Methodology

The SafeTerm Automated Medical Query (AMQ) system is a quantitative artificial intelligence approach. It functions by embedding input medical query terms and MedDRA Preferred Terms (PTs) into a multidimensional vector space. Cosine similarity is then calculated between these vectors to determine the relatedness, followed by extreme-value clustering to generate a ranked list of relevant PTs with a relevance score (0-1). Validation involved computing precision, recall, and F1 scores against 110 tier-1 Standardised MedDRA Queries (SMQs) from MedDRA v28.1, evaluating performance at both manually defined and an automated similarity thresholds.

Key Findings

The system demonstrated high recall (94%) at moderate similarity thresholds, indicating good retrieval sensitivity. Conversely, higher thresholds successfully filtered more terms, leading to improved precision (up to 89%). An optimal manual threshold of 0.70 achieved an overall recall of 48% and precision of 45% across all 110 queries. An automated threshold (0.66) was found to prioritize recall (0.58) over precision (0.29). Performance was slightly better when restricting to narrow-term PTs at an increased similarity threshold, confirming stronger relatedness of narrow terms. Overall, SafeTerm AMQ achieved comparable and satisfactory performance on both SMQs and sanitized OCMQs.

Clinical Impact

SafeTerm AMQ has the potential to significantly enhance efficiency and consistency in drug safety and pharmacovigilance by automating the generation of MedDRA queries. This automation can accelerate the identification of safety signals during both pre-market drug development and post-market surveillance, reducing manual workload and facilitating more timely risk assessments. Its ability to offer a balance between recall and precision provides flexibility for regulatory bodies and pharmaceutical companies to tailor its use based on specific signal detection priorities.

Limitations

The abstract does not explicitly state limitations or caveats. However, the reported trade-offs between recall and precision (e.g., an optimal threshold yielding 48% recall and 45% precision) imply that achieving simultaneously high performance across all metrics remains a challenge, requiring users to balance sensitivity and specificity based on their specific needs.

Future Directions

While not explicitly stated as future *research* directions, the authors recommend practical applications that could guide future refinements: using suitable MedDRA PT terminology in query formulation and applying the automated threshold method to optimize recall. They also suggest that increasing similarity scores allows for more refined, narrow term selection, implying continued optimization of thresholding strategies for specific use cases.

Medical Domains

Drug Safety Pharmacovigilance Regulatory Affairs Clinical Research

Keywords

MedDRA AI Drug Safety Pharmacovigilance Adverse Events Signal Detection Natural Language Processing Medical Query

Abstract

In pre-market drug safety review, grouping related adverse event terms into SMQs or OCMQs is critical for signal detection. We assess the performance of SafeTerm Automated Medical Query (AMQ) on MedDRA SMQs. The AMQ is a novel quantitative artificial intelligence system that understands and processes medical terminology and automatically retrieves relevant MedDRA Preferred Terms (PTs) for a given input query, ranking them by a relevance score (0-1) using multi-criteria statistical methods. The system (SafeTerm) embeds medical query terms and MedDRA PTs in a multidimensional vector space, then applies cosine similarity, and extreme-value clustering to generate a ranked list of PTs. Validation was conducted against tier-1 SMQs (110 queries, v28.1). Precision, recall and F1 were computed at multiple similarity-thresholds, defined either manually or using an automated method. High recall (94%)) is achieved at moderate similarity thresholds, indicative of good retrieval sensitivity. Higher thresholds filter out more terms, resulting in improved precision (up to 89%). The optimal threshold (0.70)) yielded an overall recall of (48%) and precision of (45%) across all 110 queries. Restricting to narrow-term PTs achieved slightly better performance at an increased (+0.05) similarity threshold, confirming increased relatedness of narrow versus broad terms. The automatic threshold (0.66) selection prioritizes recall (0.58) to precision (0.29). SafeTerm AMQ achieves comparable, satisfactory performance on SMQs and sanitized OCMQs. It is therefore a viable supplementary method for automated MedDRA query generation, balancing recall and precision. We recommend using suitable MedDRA PT terminology in query formulation and applying the automated threshold method to optimise recall. Increasing similarity scores allows refined, narrow terms selection.

Comments

8 pages, 3 figures