StateSpace-SSL: Linear-Time Self-supervised Learning for Plant Disease Detectio

Summary

StateSpace-SSL introduces a linear-time self-supervised learning framework for image analysis, utilizing a Vision Mamba state-space encoder to efficiently model continuous patterns and long-range dependencies. This approach overcomes the limitations of CNNs in capturing evolving structures and the quadratic computational cost of Transformers for high-resolution images, demonstrating superior performance in plant disease detection and learning compact, lesion-focused features.

Medical Relevance

The StateSpace-SSL framework's linear-time complexity and capacity to model continuous, long-range patterns make it highly relevant for medical image analysis, enabling efficient processing of high-resolution scans and precise detection of diffuse or evolving pathologies like tumor margins, subtle lesions, or continuous tissue abnormalities.

AI Health Application

The AI method (Self-supervised learning with Vision Mamba) is applied to plant disease detection, which serves agricultural biosecurity. This application helps protect crop yields and ensure food supply, thereby indirectly contributing to human health by preventing food shortages and economic instability.

Key Points

  • Addresses critical limitations of existing SSL methods: CNNs struggle with continuously evolving disease patterns, while Transformers incur quadratic attention costs for high-resolution images.
  • Proposes StateSpace-SSL, a novel framework leveraging a Vision Mamba state-space encoder for linear-time complexity.
  • The Vision Mamba encoder models long-range lesion continuity through efficient directional scanning across image surfaces.
  • Incorporates a prototype-driven teacher-student objective to align multi-view representations, fostering stable and lesion-aware feature learning.
  • Achieves consistent outperformance against CNN- and transformer-based SSL baselines across multiple plant disease datasets, as measured by various evaluation metrics.
  • Qualitative analyses confirm that the framework learns compact, pathology-focused feature maps, indicating effective and interpretable representation learning.
  • Highlights the significant potential of linear state-space modeling for efficient and accurate self-supervised representation learning in domains with continuous, structural abnormalities.

Methodology

The StateSpace-SSL framework employs a Vision Mamba state-space encoder, which processes images by directional scanning to efficiently capture long-range dependencies. Self-supervised learning is achieved through a prototype-driven teacher-student objective that aligns representations from multiple augmented views of an input image, promoting the learning of stable and semantically meaningful features. The method's performance was evaluated by benchmarking against established CNN- and transformer-based SSL techniques on three public plant disease datasets using various quantitative metrics.

Key Findings

StateSpace-SSL consistently surpassed the performance of existing CNN- and transformer-based self-supervised learning baselines on plant disease detection tasks. The Vision Mamba encoder successfully captured long-range lesion continuity with linear-time complexity, effectively mitigating the computational burdens of Transformers and the pattern-recognition limitations of CNNs. Qualitative assessments further demonstrated that the framework generates compact, lesion-focused feature maps, signifying its superior capability in learning discriminative pathological representations.

Clinical Impact

This computationally efficient and highly accurate self-supervised learning paradigm could revolutionize the analysis of large volumes of medical imaging data. By precisely identifying subtle or continuously evolving pathologies in high-resolution scans, StateSpace-SSL has the potential to enhance early disease detection, improve diagnostic accuracy for conditions like cancer or dermatological diseases, and streamline clinical workflows, ultimately leading to better patient care and outcomes.

Limitations

The abstract does not explicitly mention specific limitations or caveats of the proposed StateSpace-SSL framework itself. It focuses on addressing the limitations of prior CNN and transformer-based SSL methods.

Future Directions

The abstract does not explicitly suggest future research directions.

Medical Domains

Medical Imaging Radiology Pathology (Histopathology) Dermatology (skin lesion analysis) Ophthalmology (retinal disease detection) Neurology (brain lesion detection)

Keywords

Self-supervised learning State-space models Vision Mamba Linear-time complexity Medical imaging Pathology detection Representation learning High-resolution imaging

Abstract

Self-supervised learning (SSL) is attractive for plant disease detection as it can exploit large collections of unlabeled leaf images, yet most existing SSL methods are built on CNNs or vision transformers that are poorly matched to agricultural imagery. CNN-based SSL struggles to capture disease patterns that evolve continuously along leaf structures, while transformer-based SSL introduces quadratic attention cost from high-resolution patches. To address these limitations, we propose StateSpace-SSL, a linear-time SSL framework that employs a Vision Mamba state-space encoder to model long-range lesion continuity through directional scanning across the leaf surface. A prototype-driven teacher-student objective aligns representations across multiple views, encouraging stable and lesion-aware features from labelled data. Experiments on three publicly available plant disease datasets show that StateSpace-SSL consistently outperforms the CNN- and transformer-based SSL baselines in various evaluation metrics. Qualitative analyses further confirm that it learns compact, lesion-focused feature maps, highlighting the advantage of linear state-space modelling for self-supervised plant disease representation learning.

Comments

Accepted to AAAI workshop (AgriAI 2026)