Hide-and-Seek Attribution: Weakly Supervised Segmentation of Vertebral Metastases in CT
Summary
This paper introduces a novel weakly supervised method for segmenting vertebral metastases (both lytic and blastic) in CT scans, trained exclusively on vertebra-level healthy/malignant labels without requiring expensive voxel-level annotations. By combining a Diffusion Autoencoder for classifier-guided healthy edits with a unique "Hide-and-Seek Attribution" mechanism to identify truly malignant regions, the method achieves strong segmentation performance, significantly outperforming baselines and demonstrating the feasibility of generating reliable lesion masks from coarse labels.
Medical Relevance
This method dramatically reduces the annotation burden for vertebral metastasis segmentation, making it more scalable and accessible for clinical use. Accurate and efficient identification of metastases is crucial for cancer staging, guiding treatment decisions (e.g., radiation therapy), and monitoring disease progression or response to therapy.
AI Health Application
The AI application described is a weakly supervised segmentation method designed to accurately identify and delineate vertebral metastases in CT scans. This assists healthcare professionals, particularly radiologists and oncologists, by automating or semi-automating the detection of cancer spread to the spine, potentially improving diagnostic accuracy, efficiency, and aiding in treatment planning and monitoring. It specifically tackles the challenge of scarce voxel-level annotations in medical imaging.
Key Points
- Addresses the critical clinical need for accurate vertebral metastasis segmentation in CT, which is challenging due to scarce voxel-level annotations and visual similarities with benign degenerative changes.
- Proposes a weakly supervised segmentation approach that utilizes only vertebra-level healthy/malignant labels, eliminating the need for labor-intensive and costly pixel-level lesion masks.
- The core methodology involves a Diffusion Autoencoder (DAE) which generates a classifier-guided "healthy edit" of an input vertebra, and pixel-wise difference maps between the original and edited images to propose initial candidate lesion regions.
- Introduces a novel technique called "Hide-and-Seek Attribution": candidate regions are selectively revealed while others are hidden, projected back to the data manifold by the DAE, and a latent-space classifier quantifies the isolated malignant contribution of each component.
- Achieves strong segmentation performance on held-out radiologist annotations, with F1 scores of 0.91 for blastic lesions and 0.85 for lytic lesions, and Dice scores of 0.87 (blastic) and 0.78 (lytic).
- Significantly outperforms baseline methods, which yielded F1 scores of 0.79 (blastic) and 0.67 (lytic), and Dice scores of 0.74 (blastic) and 0.55 (lytic).
- Demonstrates that coarse vertebra-level labels can be effectively transformed into precise voxel-level lesion masks, showcasing the power of generative editing combined with selective occlusion for accurate weakly supervised segmentation in CT.
Methodology
The method begins with a Diffusion Autoencoder (DAE) trained to produce a classifier-guided 'healthy edit' of each vertebra, effectively removing malignant features. Pixel-wise difference maps between the original and edited images generate initial candidate lesion regions. To confirm malignancy, a novel "Hide-and-Seek Attribution" technique is employed: each candidate region is revealed sequentially while others are hidden. The modified image is then projected back to the data manifold by the DAE, and a latent-space classifier quantifies the isolated malignant contribution of that specific component. High-scoring regions are aggregated to form the final lytic or blastic segmentation masks.
Key Findings
The study's key finding is that its weakly supervised method successfully generated highly accurate segmentation masks for both blastic (F1: 0.91, Dice: 0.87) and lytic (F1: 0.85, Dice: 0.78) vertebral metastases, despite being trained solely on vertebra-level healthy/malignant labels. This performance significantly surpassed established baselines, validating that generative editing combined with selective occlusion can effectively transform coarse supervision into reliable pixel-level annotations.
Clinical Impact
This research has the potential to significantly streamline and automate the process of vertebral metastasis detection and quantification in clinical practice. By reducing the reliance on laborious manual segmentation, it could enable faster diagnostic workflows, more consistent and objective assessment of tumor burden, precise planning for targeted therapies like radiation, and more efficient monitoring of treatment efficacy, ultimately improving patient care and outcomes in oncology.
Limitations
The abstract does not explicitly state any limitations or caveats of the proposed method.
Future Directions
The abstract does not explicitly state future research directions.
Medical Domains
Keywords
Abstract
Accurate segmentation of vertebral metastasis in CT is clinically important yet difficult to scale, as voxel-level annotations are scarce and both lytic and blastic lesions often resemble benign degenerative changes. We introduce a weakly supervised method trained solely on vertebra-level healthy/malignant labels, without any lesion masks. The method combines a Diffusion Autoencoder (DAE) that produces a classifier-guided healthy edit of each vertebra with pixel-wise difference maps that propose candidate lesion regions. To determine which regions truly reflect malignancy, we introduce Hide-and-Seek Attribution: each candidate is revealed in turn while all others are hidden, the edited image is projected back to the data manifold by the DAE, and a latent-space classifier quantifies the isolated malignant contribution of that component. High-scoring regions form the final lytic or blastic segmentation. On held-out radiologist annotations, we achieve strong blastic/lytic performance despite no mask supervision (F1: 0.91/0.85; Dice: 0.87/0.78), exceeding baselines (F1: 0.79/0.67; Dice: 0.74/0.55). These results show that vertebra-level labels can be transformed into reliable lesion masks, demonstrating that generative editing combined with selective occlusion supports accurate weakly supervised segmentation in CT.
Comments
In submission