Monitoring Deployed AI Systems in Health Care

Summary

This paper presents a novel framework for the post-deployment monitoring of artificial intelligence (AI) systems in health care, crucial for ensuring their safety, quality, and sustained benefit. Organized around principles of system integrity, performance, and impact, this framework provides practical guidance for creating monitoring plans that facilitate informed governance decisions regarding AI system updates, modifications, or decommissioning. The framework is actively utilized at Stanford Health Care, demonstrating its real-world applicability.

Medical Relevance

This framework directly addresses critical issues of patient safety, quality of care, and clinical efficacy in AI-driven healthcare. By systematically monitoring deployed AI, it helps prevent adverse events, maintain diagnostic and therapeutic accuracy, and ensures that AI systems continue to deliver value to patients and clinicians over time, aligning with the ethical and regulatory demands of clinical practice.

AI Health Application

The paper focuses on the post-deployment monitoring and governance of AI systems used in clinical practice and healthcare operations, ensuring their continued safety, performance, and positive impact on patient care and clinical workflows.

Key Points

  • **Motivation for Monitoring**: Post-deployment monitoring of AI systems in healthcare is essential for safety, quality, sustained benefit, and informing governance decisions (update, modify, decommission).
  • **Framework Development**: A monitoring framework was developed, grounded in the mandate to take specific actions when AI systems fail to behave as intended.
  • **Three Core Principles**: The framework is structured around system integrity (uptime, error detection, IT ecosystem changes), performance (accuracy in face of changing input data/practices), and impact (value to clinicians/patients).
  • **Practical Guidance**: Provides practical guidance for creating detailed monitoring plans, including specifying metrics, review schedules, responsible parties, and concrete follow-up actions for both traditional and generative AI.
  • **Real-World Implementation**: The framework is actively deployed and used at Stanford Health Care, serving as a practical template for other health systems.
  • **Identified Challenges**: Implementation challenges include the significant effort and cost for resource-limited health systems, and difficulties integrating data-driven monitoring into complex organizations with often conflicting priorities.
  • **Action-Oriented Approach**: The entire framework emphasizes an action-oriented approach, ensuring that monitoring leads directly to decisions and interventions.

Methodology

The authors developed a conceptual framework for AI monitoring, explicitly 'grounded in the mandate to take specific actions' based on deviations from intended behavior. This framework was then practically implemented and is actively used at Stanford Health Care, indicating an applied research approach with real-world validation and continuous refinement based on operational experience with both traditional and generative AI systems.

Key Findings

The primary finding is the successful development and practical implementation of a comprehensive, action-oriented framework for post-deployment AI monitoring in healthcare. This framework, structured by system integrity, performance, and impact principles, provides a clear template for health systems to proactively manage AI risks and ensure sustained benefit. Its active use at Stanford Health Care validates its utility.

Clinical Impact

This framework offers a standardized, actionable approach for health systems to enhance patient safety by proactively identifying and mitigating AI failures or degradations. It improves the reliability and trustworthiness of AI-assisted clinical decisions, streamlines governance over AI lifecycles, and enables data-driven decisions on whether to update, modify, or decommission AI tools, ultimately fostering a safer and more effective digital health environment.

Limitations

The paper notes significant challenges, including the substantial effort and cost required for comprehensive monitoring, which may be prohibitive for health systems with limited resources. Additionally, integrating data-driven monitoring practices into complex healthcare organizations where conflicting priorities and varied definitions of success often exist presents a considerable difficulty.

Future Directions

Implicit future directions include developing more resource-efficient monitoring strategies for health systems with budget constraints, as well as researching effective methods to seamlessly integrate robust data-driven monitoring into complex organizational structures. Further validation and adaptation of the framework across diverse clinical settings and for an expanding array of AI applications (including more advanced generative AI) would also be beneficial.

Medical Domains

Clinical AI Health Informatics Patient Safety Quality Improvement Health Systems Management Digital Health

Keywords

AI monitoring healthcare AI post-deployment system integrity performance monitoring impact assessment patient safety clinical governance

Abstract

Post-deployment monitoring of artificial intelligence (AI) systems in health care is essential to ensure their safety, quality, and sustained benefit-and to support governance decisions about which systems to update, modify, or decommission. Motivated by these needs, we developed a framework for monitoring deployed AI systems grounded in the mandate to take specific actions when they fail to behave as intended. This framework, which is now actively used at Stanford Health Care, is organized around three complementary principles: system integrity, performance, and impact. System integrity monitoring focuses on maximizing system uptime, detecting runtime errors, and identifying when changes to the surrounding IT ecosystem have unintended effects. Performance monitoring focuses on maintaining accurate system behavior in the face of changing health care practices (and thus input data) over time. Impact monitoring assesses whether a deployed system continues to have value in the form of benefit to clinicians and patients. Drawing on examples of deployed AI systems at our academic medical center, we provide practical guidance for creating monitoring plans based on these principles that specify which metrics to measure, when those metrics should be reviewed, who is responsible for acting when metrics change, and what concrete follow-up actions should be taken-for both traditional and generative AI. We also discuss challenges to implementing this framework, including the effort and cost of monitoring for health systems with limited resources and the difficulty of incorporating data-driven monitoring practices into complex organizations where conflicting priorities and definitions of success often coexist. This framework offers a practical template and starting point for health systems seeking to ensure that AI deployments remain safe and effective over time.

Comments

36 pages, 3 figures