Brian D. Colwell

Menu
  • Home
  • Blog
  • Contact
Menu

A Taxonomy Of Backdoor AI Data Poisoning Attacks

Posted on June 9, 2025June 9, 2025 by Brian Colwell

In this section, backdoor data poisoning attacks are divided into the following categories:

  • Backdooring Pretrained Models
  • Clean-label Backdoor Attacks
  • Generative Model Backdoor Attacks
  • Model Watermarking Backdoor Attacks
  • Object Recognition & Detection Backdoor Attacks
  • Physical Backdoor Attacks
  • Reinforcement Learning Backdoor Attacks

Backdooring Pretrained Models

Attacks that insert hidden malicious behaviors into models during the pretraining phase, before they are fine-tuned for specific tasks. Attackers compromise the training data or process of foundation models, causing them to exhibit triggered behaviors when deployed downstream, even after legitimate fine-tuning.

  • An Embarrassingly Simple Approach For Trojan Attack In Deep Neural Networks – https://arxiv.org/abs/2006.08131
  • Poisoned Classifiers Are Not Only Backdoored, They Are Fundamentally Broken – https://arxiv.org/abs/2010.09080
  • Trojaning Attack On Neural Networks – https://www.ndss-symposium.org/wp-content/uploads/2018/02/ndss2018_03A-5_Liu_paper.pdf

Clean-Label Backdoor Attacks

Sophisticated attacks where poisoned training samples appear correctly labeled and benign to human reviewers. Unlike traditional poisoning that mislabels data, these attacks subtly modify inputs (like adding imperceptible perturbations) while keeping the original label, making detection extremely difficult during data auditing.

  • Customizing Triggers With Concealed Data Poisoning – https://pdfs.semanticscholar.org/6d8d/d81f2d18e86b2fa23d52ef14dbcba39864b4.pdf
  • Hidden Trigger Backdoor Attacks – https://arxiv.org/abs/1910.00033 
  • Label-consistent Backdoor Attacks – https://arxiv.org/abs/1912.02771

Generative Model Backdoor Attacks

Attacks targeting generative AI systems (like GANs, diffusion models, or language models) where triggers cause the model to produce specific malicious outputs. For example, a backdoored text generator might insert propaganda when certain keywords appear, or an image generator might hide steganographic messages in its outputs.

  • BAAAN: Backdoor Attacks Against Autoencoder And GAN-based Machine Learning Models – https://arxiv.org/abs/2010.03007
  • Trojan Attack On Deep Generative Models In Autonomous Driving – https://link.springer.com/chapter/10.1007/978-3-030-37228-6_15
  • Trojaning Language Models Fun And Profit – https://arxiv.org/abs/2008.00312
  • You Autocomplete Me: Poisoning Vulnerabilities In Neural Code Completion – https://www.usenix.org/system/files/sec21-schuster.pdf

Model Watermarking Backdoor Attacks

Attacks that exploit or masquerade as legitimate model watermarking techniques. While watermarking is intended to prove model ownership, attackers can insert malicious backdoors that activate on watermark-like triggers, or compromise existing watermarking mechanisms to create vulnerabilities.

  • Protecting Intellectual Property Of Deep Neural Networks With Watermarking – https://dl.acm.org/doi/10.1145/3196494.3196550
  • Turning Your Weakness Into A Strength: Watermarking Deep Neural Networks By Backdooring – https://www.usenix.org/conference/usenixsecurity18/presentation/adi

Object Recognition & Detection Backdoor Attacks

Attacks specifically targeting computer vision models that classify or locate objects in images. These backdoors cause models to misclassify objects or fail to detect them when triggers (like specific patterns, stickers, or color combinations) are present in the visual input.

  • Badnets: Identifying Vulnerabilities In The Machine Learning Model Supply Chain – https://arxiv.org/abs/1708.06733
  • Targeted Backdoor Attacks On Deep Learning Systems Using Data Poisoning – https://arxiv.org/abs/1712.05526 

Physical Backdoor Attacks

Attacks where triggers exist in the physical world rather than just digital inputs. Examples include placing specific objects, patterns, or configurations in real environments that cause backdoored models to misbehave when processing camera feeds or sensor data from these physical scenes.

  • Poison Forensics: Traceback Of Data Poisoning Attacks In Neural Networks – ​​https://www.usenix.org/system/files/sec22-shan.pdf 

Reinforcement Learning Backdoor Attacks

Attacks on RL agents where specific states, observations, or sequences of actions trigger malicious policies. The compromised agent behaves normally during most interactions but executes harmful actions when encountering the backdoor trigger conditions in its environment.

  • Design Of Intentional Backdoors In Sequential Models – https://arxiv.org/abs/1902.09972
  • Stop-and-go: Exploring Backdoor Attacks On Deep Reinforcement Learning-based Traffic Congestion Control Systems – https://arxiv.org/abs/2003.07859
  • TrojDRL: Evaluation Of Backdoor Attacks On Deep Reinforcement Learning – https://dl.acm.org/doi/10.5555/3437539.3437570

Thanks for reading!

Browse Topics

  • Artificial Intelligence
    • Adversarial Examples
    • Alignment & Ethics
    • Backdoor & Trojan Attacks
    • Data Poisoning
    • Federated Learning
    • Model Extraction
    • Model Inversion
    • Prompt Injection & Jailbreaking
    • Sensitive Information Disclosure
    • Watermarking
  • Biotech & Agtech
  • Commodities
    • Agricultural
    • Energies & Energy Metals
    • Gases
    • Gold
    • Industrial Metals
    • Minerals & Metalloids
  • Economics & Game Theory
  • Management
  • Marketing
  • Philosophy
  • Robotics
  • Sociology
    • Group Dynamics
    • Political Science
    • Religious Sociology
    • Sociological Theory
  • Web3 Studies
    • Bitcoin & Cryptocurrencies
    • Blockchain & Cryptography
    • DAOs & Decentralized Organizations
    • NFTs & Digital Identity

Recent Posts

  • A Brief Introduction To AI Data Poisoning

    A Brief Introduction To AI Data Poisoning

    June 9, 2025
  • A History Of Clean-Label AI Data Poisoning Backdoor Attacks

    A History Of Clean-Label AI Data Poisoning Backdoor Attacks

    June 9, 2025
  • A History Of Label-Flipping AI Data Poisoning Attacks

    A History Of Label-Flipping AI Data Poisoning Attacks

    June 9, 2025
©2025 Brian D. Colwell | Theme by SuperbThemes