Improving action classification with brain-inspired deep networks

Summary

This paper investigates the reliance of deep neural networks (DNNs) on body versus background information for action recognition, finding that conventional DNNs predominantly use background cues. It proposes a novel brain-inspired DNN architecture with separate processing streams for body and scene data, demonstrating improved action recognition performance and a more human-like accuracy profile across varying stimulus conditions.

Medical Relevance

Improving the robustness and accuracy of action recognition, particularly under varying environmental conditions or occlusions, is vital for advanced healthcare monitoring applications such as detecting patient falls, tracking rehabilitation progress, and enabling safer and more precise assistive robotics in clinical and home settings.

AI Health Application

The improved action classification capabilities can be applied to develop more accurate and reliable AI systems for remote patient monitoring (tracking daily activities, sleep patterns), fall detection in vulnerable populations, automated assessment of rehabilitation exercise adherence and form, early detection of abnormal movements indicative of neurological conditions, and monitoring patient behavior in clinical settings for safety and care quality.

Key Points

  • Conventional deep neural networks (DNNs) trained on the HAA500 dataset exhibited a strong reliance on background information for action recognition, performing at chance-level when only body information was available.
  • Human participants (N=28) demonstrated robust action recognition across stimuli with both body and background, body-only, and background-only versions, performing significantly better with body-only stimuli than with background-only stimuli.
  • The study identified a critical disparity: original DNNs were almost as accurate on full scenes and background-only scenes, but failed on body-only scenes, unlike humans who performed well on all versions.
  • A novel brain-inspired DNN architecture was implemented, featuring separate, domain-specific streams for processing body and background information, mirroring human brain modularity.
  • This brain-inspired architecture significantly improved overall action recognition performance compared to conventional DNNs.
  • The accuracy pattern of the new architecture across the three stimulus types (full, body-only, background-only) more closely matched the performance profile observed in human participants.
  • The findings suggest that incorporating brain-inspired modularity for processing different visual cues can enhance DNN robustness and lead to more human-like AI performance in complex tasks.

Methodology

The study utilized the HAA500 dataset to compare action recognition performance between conventional deep neural networks (DNNs) and human participants (N=28). Stimuli were presented in three formats: full scenes (body + background), body-removed (background only), and background-removed (body only). Subsequently, a novel brain-inspired DNN architecture, featuring distinct processing streams for body and background information, was developed and tested using the same stimulus variations to evaluate its performance and alignment with human accuracy patterns.

Key Findings

Conventional DNNs showed a strong reliance on background information, performing near chance-level on body-only stimuli while maintaining accuracy on background-only stimuli. Conversely, human participants demonstrated accurate action recognition across all three stimulus types, with significantly higher performance on body-only stimuli than on background-only stimuli. The novel brain-inspired architecture not only improved overall action recognition performance but also exhibited an accuracy pattern across different stimulus versions that closely mirrored that observed in human participants.

Clinical Impact

The development of DNNs that more robustly leverage distinct visual cues for action recognition has significant clinical impact. It enables the creation of more reliable AI systems for patient safety, such as accurate real-time fall detection systems in homes or hospitals, especially in cluttered or partially obscured environments. It also supports better automated assessment of patient movement and progress in physical therapy and rehabilitation, and can lead to more sophisticated and context-aware assistive robotics for elderly care or surgical support, ultimately enhancing patient outcomes and reducing caregiver burden.

Limitations

The abstract does not explicitly state any limitations of the study.

Future Directions

The abstract does not explicitly state future research directions.

Medical Domains

Healthcare Monitoring Geriatric Care Rehabilitation Assistive Robotics Telemedicine

Keywords

Action Recognition Deep Neural Networks Brain-Inspired AI Body Perception Scene Perception Healthcare Monitoring Computer Vision Human-like Performance

Abstract

Action recognition is also key for applications ranging from robotics to healthcare monitoring. Action information can be extracted from the body pose and movements, as well as from the background scene. However, the extent to which deep neural networks (DNNs) make use of information about the body and information about the background remains unclear. Since these two sources of information may be correlated within a training dataset, DNNs might learn to rely predominantly on one of them, without taking full advantage of the other. Unlike DNNs, humans have domain-specific brain regions selective for perceiving bodies, and regions selective for perceiving scenes. The present work tests whether humans are thus more effective at extracting information from both body and background, and whether building brain-inspired deep network architectures with separate domain-specific streams for body and scene perception endows them with more human-like performance. We first demonstrate that DNNs trained using the HAA500 dataset perform almost as accurately on versions of the stimuli that show both body and background and on versions of the stimuli from which the body was removed, but are at chance-level for versions of the stimuli from which the background was removed. Conversely, human participants (N=28) can recognize the same set of actions accurately with all three versions of the stimuli, and perform significantly better on stimuli that show only the body than on stimuli that show only the background. Finally, we implement and test a novel architecture patterned after domain specificity in the brain with separate streams to process body and background information. We show that 1) this architecture improves action recognition performance, and 2) its accuracy across different versions of the stimuli follows a pattern that matches more closely the pattern of accuracy observed in human participants.