When normalization hallucinates: unseen risks in AI-powered whole slide image processing

Summary

Whole slide image (WSI) normalization remains a vital preprocessing step in computational pathology. Increasingly driven by deep learning, these models learn to approximate data distributions from training examples. This often results in outputs that gravitate toward the average, potentially masking...

Medical Relevance

Medical/health related research

AI Health Application

The research focuses on AI-powered preprocessing of whole slide images (WSI) used in pathology. This application aims to assist or automate diagnosis by analyzing digital tissue samples. The paper specifically highlights the risks (hallucinations) associated with deep learning models in this context and proposes methods to ensure the integrity and safety of these AI tools for clinical deployment in diagnostic pathology.

Key Points

  • See abstract for details

Methodology

See paper for methodology

Key Findings

See abstract

Clinical Impact

Potential clinical applications

Limitations

Not analyzed

Future Directions

Not analyzed

Medical Domains

cs.CV

Keywords

cs.CV cs.AI

Abstract

Whole slide image (WSI) normalization remains a vital preprocessing step in computational pathology. Increasingly driven by deep learning, these models learn to approximate data distributions from training examples. This often results in outputs that gravitate toward the average, potentially masking diagnostically important features. More critically, they can introduce hallucinated content, artifacts that appear realistic but are not present in the original tissue, posing a serious threat to downstream analysis. These hallucinations are nearly impossible to detect visually, and current evaluation practices often overlook them. In this work, we demonstrate that the risk of hallucinations is real and underappreciated. While many methods perform adequately on public datasets, we observe a concerning frequency of hallucinations when these same models are retrained and evaluated on real-world clinical data. To address this, we propose a novel image comparison measure designed to automatically detect hallucinations in normalized outputs. Using this measure, we systematically evaluate several well-cited normalization methods retrained on real-world data, revealing significant inconsistencies and failures that are not captured by conventional metrics. Our findings underscore the need for more robust, interpretable normalization techniques and stricter validation protocols in clinical deployment.

Comments

4 pages, accepted for oral presentation at SPIE Medical Imaging, 2026