No Data? No Problem: Robust Vision-Tabular Learning with Missing Values

Summary

RoVTL is a novel framework addressing the critical challenge of missing tabular data in multimodal learning, particularly prevalent in medical biobanks. It achieves robust performance across varying levels of tabular data availability by employing contrastive pretraining with missingness augmentation and a unique fine-tuning approach, demonstrating superior results on medical imaging and natural image datasets compared to prior methods.

Medical Relevance

This research is highly relevant to medicine as it tackles a pervasive issue in clinical AI: incomplete patient records and missing tabular attributes often associated with imaging data. By providing a robust method for multimodal learning, it enables the full utilization of rich biobank data for model training while ensuring reliability in real-world clinical settings where patient information is frequently sparse, thus facilitating the translation of AI research into clinical practice.

AI Health Application

The AI application is to develop robust machine learning models for multimodal disease classification and diagnosis using a combination of medical imaging data (e.g., cardiac MRI) and tabular clinical/demographic information, particularly addressing the common challenge of incomplete patient data at inference time. This enables more reliable AI tools in clinical decision support and medical research.

Key Points

  • Addresses the discrepancy between rich training data in biobanks and real-world datasets with significant missing tabular attributes, requiring methods robust to data unavailability at inference.
  • Proposes RoVTL (Robust Vision-Tabular Learning), a two-stage framework designed to handle any level of tabular data availability, from 0% to 100%.
  • Employs contrastive pretraining where tabular attribute missingness is introduced as a data augmentation strategy to explicitly foster robustness to incomplete data.
  • Utilizes a gated cross-attention module during downstream task tuning for effective multimodal (vision-tabular) fusion.
  • Introduces novel fine-tuning components: a 'Tabular More vs. Fewer loss' that ranks performance based on the amount of available tabular data, combined with disentangled gradient learning for consistent performance across all data completeness scenarios.
  • Demonstrates superior robustness to missing tabular data on cardiac MRI scans from the UK Biobank, outperforming prior methods.
  • Successfully generalizes to an external cardiac MRI dataset for multimodal disease classification and extends robust performance to the natural images domain (car advertisements dataset).

Methodology

RoVTL is a two-stage multimodal learning framework. The first stage involves contrastive pretraining where tabular attribute missingness is strategically introduced as a data augmentation technique to explicitly enhance model robustness against incomplete data. The second stage, downstream task tuning, leverages a gated cross-attention module for effective vision-tabular fusion. During this fine-tuning stage, the framework incorporates two novel components: a 'Tabular More vs. Fewer loss' that ranks model performance based on the quantity of available tabular data, and disentangled gradient learning, designed to ensure consistent performance across all scenarios of tabular data completeness.

Key Findings

RoVTL achieved superior robustness to varying levels of missing tabular data when evaluated on cardiac MRI scans from the UK Biobank, outperforming prior methods. Furthermore, the framework successfully demonstrated generalization capabilities on an external cardiac MRI dataset for multimodal disease classification and extended its robust performance to a natural images domain (car advertisements dataset), indicating broad applicability.

Clinical Impact

RoVTL has the potential to significantly improve the clinical applicability of AI models by making them more resilient to the inevitable missing data in electronic health records and real-world patient cohorts. This could lead to more accurate and reliable diagnostic, prognostic, and predictive tools in cardiology and other medical fields, even when faced with incomplete patient information, thereby bridging the gap between comprehensive research datasets and practical clinical utility by enabling robust performance across diverse data availability scenarios.

Limitations

The abstract does not explicitly state any specific limitations of the RoVTL framework.

Future Directions

The abstract does not explicitly state future research directions. However, the demonstrated generalization to an external medical dataset and extension to natural images suggests potential for broader application across diverse medical imaging modalities, clinical prediction tasks, and various real-world scenarios where multimodal data with missing attributes is common.

Medical Domains

Cardiology Medical Imaging Diagnostic AI Clinical Prediction Models Biomedical Informatics

Keywords

multimodal learning missing data vision-tabular learning medical imaging cardiac MRI robustness contrastive learning deep learning biobanks data augmentation

Abstract

Large-scale medical biobanks provide imaging data complemented by extensive tabular information, such as demographics or clinical measurements. However, this abundance of tabular attributes does not reflect real-world datasets, where only a subset of attributes may be available. This discrepancy calls for methods that can leverage all the tabular data during training while remaining robust to missing values at inference. To address this challenge, we propose RoVTL (Robust Vision-Tabular Learning), a framework designed to handle any level of tabular data availability, from 0% to 100%. RoVTL comprises two key stages: contrastive pretraining, where we introduce tabular attribute missingness as data augmentation to promote robustness, and downstream task tuning using a gated cross-attention module for multimodal fusion. During fine-tuning, we employ a novel Tabular More vs. Fewer loss that ranks performance based on the amount of available tabular data. Combined with disentangled gradient learning, this enables consistent performance across all tabular data completeness scenarios. We evaluate RoVTL on cardiac MRI scans from the UK Biobank, demonstrating superior robustness to missing tabular data compared to prior methods. Furthermore, RoVTL successfully generalizes to an external cardiac MRI dataset for multimodal disease classification, and extends to the natural images domain, achieving robust performance on a car advertisements dataset. The code is available at https://github.com/marteczkah/RoVTL.