R2MF-Net: A Recurrent Residual Multi-Path Fusion Network for Robust Multi-directional Spine X-ray Segmentation

Summary

This paper introduces R2MF-Net, a novel recurrent residual multi-path encoder-decoder network designed for robust and automatic segmentation of multi-directional spine X-ray images. The network addresses the limitations of manual segmentation for scoliosis assessment by employing a cascaded coarse-to-fine architecture and specialized modules to enhance accuracy and stability across varying imaging conditions.

Medical Relevance

Accurate and reproducible automatic segmentation of spinal structures is paramount for quantitative assessment of scoliosis, including precise Cobb angle measurement, vertebral translation estimation, and curvature classification. Automating this critical step significantly improves diagnostic consistency, streamlines clinical workflows, and reduces the subjectivity and inter-observer variability associated with current manual methods.

AI Health Application

The R2MF-Net is an AI application designed for automated medical image analysis, specifically for robust multi-directional spine X-ray segmentation. This AI system aims to assist clinicians in quantitative scoliosis assessment by automating a previously manual, time-consuming, and potentially non-reproducible step, thereby enhancing diagnostic efficiency and accuracy in spinal deformity evaluation.

Key Points

  • **Problem Addressed:** Aims to overcome manual, time-consuming, and non-reproducible spine segmentation in multi-directional X-rays, particularly challenged by low contrast, rib shadows, and overlapping tissues.
  • **Cascaded Architecture:** Utilizes a two-stage design with a coarse and a fine segmentation network connected in cascade, both incorporating an improved Inception-style multi-branch feature extractor.
  • **Recurrent Residual Jump Connection (R2-Jump):** Integrates this module into skip paths to gradually align semantic features between the encoder and decoder stages, enhancing information flow.
  • **Multi-scale Cross-stage Skip (MC-Skip) Mechanism:** Allows the fine segmentation network to reuse hierarchical representations from multiple decoder levels of the coarse network, strengthening segmentation stability across different imaging directions and contrast conditions.
  • **Lightweight Spatial-Channel Squeeze-and-Excitation (SCSE-Lite):** Employed at the network's bottleneck to selectively emphasize spine-related activations and suppress irrelevant structures or background noise.
  • **Robustness Focus:** The architecture is specifically tailored for robust performance across multi-directional spine X-ray images (coronal, left-bending, right-bending views) and diverse contrast conditions.
  • **Clinical Evaluation:** Evaluated on a clinical multi-view radiograph dataset consisting of 228 sets of coronal, left-bending, and right-bending spine X-ray images with expert annotations.

Methodology

R2MF-Net is a recurrent residual multi-path encoder-decoder network. Its design features a cascaded architecture with a coarse and a fine segmentation network. Both stages utilize an improved Inception-style multi-branch feature extractor. Key components include Recurrent Residual Jump Connections (R2-Jump) in skip paths for semantic alignment, a Multi-scale Cross-stage Skip (MC-Skip) mechanism enabling the fine network to leverage multi-level features from the coarse network's decoder, and a lightweight Spatial-Channel Squeeze-and-Excitation (SCSE-Lite) block at the bottleneck to enhance feature relevance. The model was evaluated on a clinical dataset of 228 multi-view spine X-ray images with expert annotations.

Key Findings

The R2MF-Net successfully achieves robust and accurate automatic segmentation of spinal structures in multi-directional X-ray images. Its specialized architectural modules, including R2-Jump, MC-Skip, and SCSE-Lite, contribute to stable performance despite challenging imaging conditions such as low contrast, rib shadows, and overlapping tissues, as demonstrated through evaluation on a comprehensive clinical dataset.

Clinical Impact

This automated solution has the potential to revolutionize scoliosis assessment by providing a highly reproducible and efficient method for spine segmentation. By reducing reliance on laborious manual tracing, it can significantly improve the accuracy and consistency of Cobb angle measurements and other quantitative analyses, leading to more precise diagnosis, better-informed treatment planning, and enhanced monitoring of spinal deformity progression in clinical practice.

Limitations

The abstract does not explicitly state any limitations of the proposed R2MF-Net.

Future Directions

The abstract does not explicitly state future research directions for R2MF-Net.

Medical Domains

Orthopedics Radiology Spinal Surgery Diagnostic Imaging Biomechanics

Keywords

Spine Segmentation X-ray Deep Learning Recurrent Neural Networks Residual Networks Scoliosis Medical Imaging Encoder-Decoder

Abstract

Accurate segmentation of spinal structures in X-ray images is a prerequisite for quantitative scoliosis assessment, including Cobb angle measurement, vertebral translation estimation and curvature classification. In routine practice, clinicians acquire coronal, left-bending and right-bending radiographs to jointly evaluate deformity severity and spinal flexibility. However, the segmentation step remains heavily manual, time-consuming and non-reproducible, particularly in low-contrast images and in the presence of rib shadows or overlapping tissues. To address these limitations, this paper proposes R2MF-Net, a recurrent residual multi-path encoder--decoder network tailored for automatic segmentation of multi-directional spine X-ray images. The overall design consists of a coarse segmentation network and a fine segmentation network connected in cascade. Both stages adopt an improved Inception-style multi-branch feature extractor, while a recurrent residual jump connection (R2-Jump) module is inserted into skip paths to gradually align encoder and decoder semantics. A multi-scale cross-stage skip (MC-Skip) mechanism allows the fine network to reuse hierarchical representations from multiple decoder levels of the coarse network, thereby strengthening the stability of segmentation across imaging directions and contrast conditions. Furthermore, a lightweight spatial-channel squeeze-and-excitation block (SCSE-Lite) is employed at the bottleneck to emphasize spine-related activations and suppress irrelevant structures and background noise. We evaluate R2MF-Net on a clinical multi-view radiograph dataset comprising 228 sets of coronal, left-bending and right-bending spine X-ray images with expert annotations.