In this section, backdoor data poisoning attacks are divided into the following categories:

Backdooring Pretrained Models
Clean-label Backdoor Attacks
Generative Model Backdoor Attacks
Model Watermarking Backdoor Attacks
Object Recognition & Detection Backdoor Attacks
Physical Backdoor Attacks
Reinforcement Learning Backdoor Attacks

Backdooring Pretrained Models

Attacks that insert hidden malicious behaviors into models during the pretraining phase, before they are fine-tuned for specific tasks. Attackers compromise the training data or process of foundation models, causing them to exhibit triggered behaviors when deployed downstream, even after legitimate fine-tuning.

An Embarrassingly Simple Approach For Trojan Attack In Deep Neural Networks – https://arxiv.org/abs/2006.08131
Poisoned Classifiers Are Not Only Backdoored, They Are Fundamentally Broken – https://arxiv.org/abs/2010.09080
Trojaning Attack On Neural Networks – https://www.ndss-symposium.org/wp-content/uploads/2018/02/ndss2018_03A-5_Liu_paper.pdf

Clean-Label Backdoor Attacks

Sophisticated attacks where poisoned training samples appear correctly labeled and benign to human reviewers. Unlike traditional poisoning that mislabels data, these attacks subtly modify inputs (like adding imperceptible perturbations) while keeping the original label, making detection extremely difficult during data auditing.

Customizing Triggers With Concealed Data Poisoning – https://pdfs.semanticscholar.org/6d8d/d81f2d18e86b2fa23d52ef14dbcba39864b4.pdf
Hidden Trigger Backdoor Attacks – https://arxiv.org/abs/1910.00033
Label-consistent Backdoor Attacks – https://arxiv.org/abs/1912.02771

Generative Model Backdoor Attacks

Attacks targeting generative AI systems (like GANs, diffusion models, or language models) where triggers cause the model to produce specific malicious outputs. For example, a backdoored text generator might insert propaganda when certain keywords appear, or an image generator might hide steganographic messages in its outputs.

BAAAN: Backdoor Attacks Against Autoencoder And GAN-based Machine Learning Models – https://arxiv.org/abs/2010.03007
Trojan Attack On Deep Generative Models In Autonomous Driving – https://link.springer.com/chapter/10.1007/978-3-030-37228-6_15
Trojaning Language Models Fun And Profit – https://arxiv.org/abs/2008.00312
You Autocomplete Me: Poisoning Vulnerabilities In Neural Code Completion – https://www.usenix.org/system/files/sec21-schuster.pdf

Model Watermarking Backdoor Attacks

Attacks that exploit or masquerade as legitimate model watermarking techniques. While watermarking is intended to prove model ownership, attackers can insert malicious backdoors that activate on watermark-like triggers, or compromise existing watermarking mechanisms to create vulnerabilities.

Protecting Intellectual Property Of Deep Neural Networks With Watermarking – https://dl.acm.org/doi/10.1145/3196494.3196550
Turning Your Weakness Into A Strength: Watermarking Deep Neural Networks By Backdooring – https://www.usenix.org/conference/usenixsecurity18/presentation/adi

Object Recognition & Detection Backdoor Attacks

Attacks specifically targeting computer vision models that classify or locate objects in images. These backdoors cause models to misclassify objects or fail to detect them when triggers (like specific patterns, stickers, or color combinations) are present in the visual input.

Badnets: Identifying Vulnerabilities In The Machine Learning Model Supply Chain – https://arxiv.org/abs/1708.06733
Targeted Backdoor Attacks On Deep Learning Systems Using Data Poisoning – https://arxiv.org/abs/1712.05526

Physical Backdoor Attacks

Attacks where triggers exist in the physical world rather than just digital inputs. Examples include placing specific objects, patterns, or configurations in real environments that cause backdoored models to misbehave when processing camera feeds or sensor data from these physical scenes.

Poison Forensics: Traceback Of Data Poisoning Attacks In Neural Networks – https://www.usenix.org/system/files/sec22-shan.pdf

Reinforcement Learning Backdoor Attacks

Attacks on RL agents where specific states, observations, or sequences of actions trigger malicious policies. The compromised agent behaves normally during most interactions but executes harmful actions when encountering the backdoor trigger conditions in its environment.

Design Of Intentional Backdoors In Sequential Models – https://arxiv.org/abs/1902.09972
Stop-and-go: Exploring Backdoor Attacks On Deep Reinforcement Learning-based Traffic Congestion Control Systems – https://arxiv.org/abs/2003.07859
TrojDRL: Evaluation Of Backdoor Attacks On Deep Reinforcement Learning – https://dl.acm.org/doi/10.5555/3437539.3437570

Thanks for reading!