BadGraph: A Backdoor Attack Against Latent Diffusion Model for Text-Guided Graph Generation
Summary
This paper introduces BadGraph, a novel backdoor attack targeting latent diffusion models for text-guided graph generation, a critical technology in areas like drug discovery. The attack leverages textual triggers to poison training data, covertly implanting backdoors that compel the model to generate attacker-specified subgraphs when triggers are present during inference, all while preserving normal performance on benign inputs. BadGraph demonstrates high efficacy (over 80% attack success rate with 24% poisoning) and stealth, exposing significant security vulnerabilities in these generative AI models.
Medical Relevance
Text-guided graph generation is pivotal for accelerating drug discovery and materials science by generating novel molecular structures based on desired properties. A backdoor attack could lead to the generation of compounds with unintended, potentially harmful, or toxic substructures, directly impacting drug safety, efficacy, and preclinical development pipelines.
AI Health Application
This research analyzes a backdoor attack on AI models (latent diffusion models for text-guided graph generation) that are directly applied to molecular design and drug discovery. These AI applications aim to accelerate the identification and synthesis of novel drug candidates or molecules with specific biological activities, which are critical steps in developing new medicines and therapeutic interventions.
Key Points
- BadGraph is the first identified backdoor attack specifically designed for latent diffusion models in conditional, text-guided graph generation.
- The attack mechanism involves poisoning the training dataset with textual triggers linked to attacker-specified subgraphs.
- When a textual trigger is input during inference, the compromised model reliably generates graphs incorporating the predefined malicious subgraph.
- The attack is highly effective, achieving a 50% attack success rate with less than 10% poisoning rate and over 80% success with a 24% poisoning rate across various benchmark datasets.
- BadGraph is stealthy, ensuring negligible performance degradation on clean, untriggered inputs, making detection difficult.
- Ablation studies reveal that the backdoor is implanted during the Variational Autoencoder (VAE) and diffusion training phases, rather than during initial pretraining.
- The findings underscore critical security vulnerabilities in AI models used in sensitive applications such as drug discovery and highlight the urgent need for robust defensive measures.
Methodology
BadGraph implements a data poisoning attack. It involves carefully crafting a small subset of training data where specific textual triggers are paired with target graphs that contain attacker-specified subgraphs. This poisoned data is then used to train latent diffusion models for text-guided graph generation. The backdoor functionality is implicitly learned and embedded within the model's parameters during both the VAE and diffusion training stages, ensuring that the malicious subgraph is generated only when the specific textual trigger is present in the input.
Key Findings
The study demonstrates that BadGraph effectively implants backdoors in latent diffusion models for text-guided graph generation with high success rates (e.g., >80% ASR with 24% poisoning) while maintaining stealth (negligible performance impact on benign samples). Ablation studies pinpoint the VAE and diffusion training as the critical phases for backdoor implantation. These findings reveal a significant security vulnerability, particularly concerning for high-stakes applications like drug discovery.
Clinical Impact
The ability to covertly inject backdoors into models used for molecular design could lead to the systematic generation of drug candidates with hidden, harmful substructures, resulting in failed clinical trials, wasted resources, or even the accidental development of unsafe compounds. This necessitates a re-evaluation of security protocols for AI models in pharmaceutical R&D, potentially requiring rigorous auditing of training data and models to prevent such malicious manipulations and ensure patient safety.
Limitations
The paper highlights a significant security vulnerability in current latent diffusion models for text-guided graph generation. While it effectively demonstrates the attack's feasibility and impact, it implicitly points to the current lack of inherent robustness or defense mechanisms within these models against such sophisticated data poisoning attacks.
Future Directions
The research strongly advocates for the urgent development of robust defense mechanisms to detect and mitigate backdoor attacks in latent diffusion models for graph generation. Future work should focus on methods for identifying poisoned training data, detecting compromised models, and designing more resilient AI architectures to secure critical applications like drug discovery.
Medical Domains
Keywords
Abstract
The rapid progress of graph generation has raised new security concerns, particularly regarding backdoor vulnerabilities. While prior work has explored backdoor attacks in image diffusion and unconditional graph generation, conditional, especially text-guided graph generation remains largely unexamined. This paper proposes BadGraph, a backdoor attack method targeting latent diffusion models for text-guided graph generation. BadGraph leverages textual triggers to poison training data, covertly implanting backdoors that induce attacker-specified subgraphs during inference when triggers appear, while preserving normal performance on clean inputs. Extensive experiments on four benchmark datasets (PubChem, ChEBI-20, PCDes, MoMu) demonstrate the effectiveness and stealth of the attack: less than 10% poisoning rate can achieves 50% attack success rate, while 24% suffices for over 80% success rate, with negligible performance degradation on benign samples. Ablation studies further reveal that the backdoor is implanted during VAE and diffusion training rather than pretraining. These findings reveal the security vulnerabilities in latent diffusion models of text-guided graph generation, highlight the serious risks in models' applications such as drug discovery and underscore the need for robust defenses against the backdoor attack in such diffusion models.