Region-Normalized DPO for Medical Image Segmentation under Noisy Judges

Summary

This paper introduces Region-Normalized DPO (RN-DPO) to improve medical image segmentation models using inexpensive but noisy automatic quality-control signals, eliminating the need for additional pixel-wise annotations. RN-DPO enhances optimization stability by normalizing preference updates based on the size of the disagreement region between masks, leading to improved and more sustained segmentation performance, especially when judges are unreliable.

Medical Relevance

This research directly addresses the critical bottleneck of expensive and time-consuming manual annotations in medical imaging, enabling more scalable and robust development of AI models for segmentation. By leveraging existing quality control signals, it facilitates faster iteration and deployment of AI tools in healthcare, potentially leading to more efficient diagnostic workflows and improved patient outcomes.

AI Health Application

This research develops an AI method to improve the performance and stability of medical image segmentation models. It aims to reduce the reliance on expensive manual annotations, thereby making AI tools for tasks like organ or lesion delineation more scalable, accurate, and practical for use in clinical settings, ultimately aiding diagnosis, treatment planning, and disease monitoring.

Key Points

  • Dense pixel-wise annotations, the gold standard for medical image segmentation, are costly and limit the scalability of AI model development.
  • The research investigates using readily available, inexpensive, but noisy automatic quality-control (QC) signals (e.g., model agreement, uncertainty) for fine-tuning segmentation models without new ground-truth annotations.
  • Direct Preference Optimization (DPO) is adapted for segmentation, utilizing proposals generated by an initial supervised base segmenter trained on a small labeled dataset.
  • A critical finding is that standard DPO's performance depends heavily on how preference pairs are mined; while selecting the judge's top-ranked proposal can improve peak performance with reliable judges, it significantly amplifies harmful errors under weaker or noisier judges.
  • The paper proposes Region-Normalized DPO (RN-DPO), a novel segmentation-aware objective that normalizes preference updates by the size of the disagreement region between the preferred and rejected masks.
  • This normalization strategy in RN-DPO effectively reduces the leverage of potentially harmful comparisons, thereby improving the stability of the optimization process.
  • Across two medical datasets and various noise regimes, RN-DPO consistently demonstrates improved sustained performance and stabilizes preference-based fine-tuning, outperforming standard DPO and strong baselines without requiring any additional pixel-level ground-truth annotations.

Methodology

The study employs Direct Preference Optimization (DPO) for segmentation, where initial mask proposals are generated by a supervised base segmenter. Noisy automatic quality-control signals serve as 'judges' to establish preference pairs between these proposals. The core method is Region-Normalized DPO (RN-DPO), which modifies the standard DPO objective by incorporating a normalization factor based on the size of the disagreement region (symmetric difference) between the preferred and rejected segmentation masks in each preference pair, thereby re-weighting updates to enhance robustness against judge noise.

Key Findings

Standard DPO for segmentation is highly sensitive to the quality of noisy judges and the method of preference pair mining, exhibiting amplified errors with unreliable signals. RN-DPO, by normalizing preference updates based on mask disagreement region sizes, significantly improves sustained segmentation performance and provides enhanced stability during preference-based fine-tuning. It consistently outperforms standard DPO and other strong baselines across multiple medical datasets and varying noise conditions, all without requiring additional pixel-level ground-truth annotations.

Clinical Impact

By drastically reducing the reliance on costly and time-consuming manual pixel-wise annotations, RN-DPO offers a practical pathway to more efficiently train and adapt AI-powered segmentation tools in clinical settings. This can accelerate the adoption of advanced AI in medical imaging, leading to faster and more accurate diagnoses, improved treatment planning, and ultimately better patient care by enabling robust and scalable AI model deployment with existing clinical data streams.

Limitations

While not explicitly stated in the abstract, potential limitations could include the initial requirement for a small labeled dataset to train the base segmenter, the inherent quality ceiling determined by the base segmenter's proposal generation capabilities, and the implicit assumption that the noisy QC signals provide a sufficiently informative, albeit imperfect, preference signal. The generalizability might also depend on the specific nature and biases of the QC signals employed in different clinical contexts.

Future Directions

Not explicitly mentioned in the abstract.

Medical Domains

medical imaging radiology computational medicine diagnostic imaging AI in healthcare

Keywords

medical image segmentation Direct Preference Optimization RN-DPO noisy labels quality control signals annotation efficiency preference learning deep learning

Abstract

While dense pixel-wise annotations remain the gold standard for medical image segmentation, they are costly to obtain and limit scalability. In contrast, many deployed systems already produce inexpensive automatic quality-control (QC) signals like model agreement, uncertainty measures, or learned mask-quality scores which can be used for further model training without additional ground-truth annotation. However, these signals can be noisy and biased, making preference-based fine-tuning susceptible to harmful updates. We study Direct Preference Optimization (DPO) for segmentation from such noisy judges using proposals generated by a supervised base segmenter trained on a small labeled set. We find that outcomes depend strongly on how preference pairs are mined: selecting the judge's top-ranked proposal can improve peak performance when the judge is reliable, but can amplify harmful errors under weaker judges. We propose Region-Normalized DPO (RN-DPO), a segmentation-aware objective which normalizes preference updates by the size of the disagreement region between masks, reducing the leverage of harmful comparisons and improving optimization stability. Across two medical datasets and multiple regimes, RN-DPO improves sustained performance and stabilizes preference-based fine-tuning, outperforming standard DPO and strong baselines without requiring additional pixel annotations.