A hooded skeleton figure holding a glowing blue vial with a skull and crossbones symbol, surrounded by laboratory equipment and DNA strands.

A Taxonomy Of AI Training Data Poisoning Attacks

06/09/2025|Brian D. Colwell|Artificial Intelligence

Introduction

You may also be interested in these other articles on artificial intelligence:

A Brief Introduction To AI Model Inversion Attacks – https://briandcolwell.com/a-brief-introduction-to-ai-model-inversion-attacks/
A Brief Introduction To AI Prompt Injection Attacks – https://briandcolwell.com/a-brief-introduction-to-ai-prompt-injection-attacks/
A History Of AI Jailbreaking Attacks – https://briandcolwell.com/a-history-of-ai-jailbreaking-attacks/
A History Of Clean-Label AI Data Poisoning Backdoor Attacks – https://briandcolwell.com/a-history-of-clean-label-ai-data-poisoning-attacks/
AI Supply Chain Attacks Are A Pervasive Threat – https://briandcolwell.com/ai-supply-chain-attacks-are-a-pervasive-threat/
An Introduction To AI Model Extraction – https://briandcolwell.com/an-introduction-to-ai-model-extraction/
An Introduction To AI Side-Channel Attacks – https://briandcolwell.com/an-introduction-to-ai-side-channel-attacks/
Gradient And Update Leakage (GAUL) In Federated Learning – https://briandcolwell.com/gradient-and-update-leakage-gaul-in-federated-learning/
Membership Inference Attacks Leverage AI Model Behaviors – https://briandcolwell.com/membership-inference-attacks-leverage-ai-model-behaviors/
What Are AI Sensitive Information Disclosure Attacks? The Threat Landscape – https://briandcolwell.com/what-are-ai-sensitive-information-disclosure-attacks/
What Is AI Training Data Extraction? A Combination Of Techniques – https://briandcolwell.com/what-is-ai-training-data-extraction-a-combination-of-techniques/
What Is Model Leeching? – https://briandcolwell.com/what-is-model-leeching/

Training Data Poisoning Attack Taxonomy

In this brief taxonomy, training data poisoning attacks are divided into the following categories:

Bilevel Optimization Poisoning Attacks
Feature Collision Poisoning Attacks
Federated Learning Model Poisoning Attacks
Generative Model Poisoning Attacks
Influence Functions Poisoning Attacks
Label-Flipping Poisoning Attacks
p-Tampering Poisoning Attacks
Vanishing Gradient Poisoning Attacks

Bilevel Optimization Poisoning Attacks

These attacks frame the poisoning problem as a bilevel optimization where the attacker solves an outer optimization problem (choosing poisoned data) while anticipating the defender’s inner optimization problem (training the model). The attacker essentially optimizes their poisoning strategy by predicting how the model will be trained on the corrupted dataset.

Analysis Of Causative Attacks Against SVMs Learning From Data Streams – https://faculty.washington.edu/lagesse/publications/CausativeSVM.pdf
Data Poisoning Attacks On Factorization-based Collaborative Filtering – https://papers.nips.cc/paper_files/paper/2016/file/83fa5a432ae55c253d0e60dbfa716723-Paper.pdf
Data Poisoning Against Differentially-private Learners: Attacks And Defenses – https://arxiv.org/abs/1903.09860
Is Feature Selection Secure Against Training Data Poisoning? – https://arxiv.org/abs/1804.07933
Manipulating Machine Learning: Poisoning Attacks And Countermeasures For Regression Learning – https://ieeexplore.ieee.org/document/8418594
MetaPoison: Practical General-purpose Clean-label Data Poisoning – https://proceedings.neurips.cc/paper_files/paper/2020/file/8ce6fc704072e351679ac97d4a985574-Paper.pdf
Poisoning Attacks Against Support Vector Machines – https://arxiv.org/abs/1206.6389
Poisoning Attacks On Algorithmic Fairness – https://arxiv.org/abs/2004.07401
Preventing Unauthorized Use Of Proprietary Data: Poisoning For Secure Dataset Release – https://arxiv.org/abs/2103.02683
Targeted Poisoning Attacks On Social Recommender Systems – https://ieeexplore.ieee.org/document/9013539
Towards Data Poisoning Attacks In Crowd Sensing Systems – https://cse.buffalo.edu/~lusu/papers/MobiHoc2018.pdf
Towards Poisoning Of Deep Learning Algorithms With Back-gradient Optimization – https://arxiv.org/pdf/1708.08689
Using Machine Teaching To Identify Optimal Training-set Attacks On Machine Learners – https://ojs.aaai.org/index.php/AAAI/article/view/9569
Witches’ Brew: Industrial Scale Data Poisoning Via Gradient Matching – https://arxiv.org/abs/2009.02276

Feature Collision Poisoning Attacks

These attacks manipulate training data so that samples from different classes have similar or identical feature representations in the model’s feature space. This causes the model to confuse different classes, as their learned representations become indistinguishable despite belonging to different categories.

Bullseye Polytope: A Scalable Clean-label Poisoning Attack With Improved Transferability – https://arxiv.org/pdf/2005.00191
Poison Frogs! Targeted Clean-label Poisoning Attacks On Neural Networks – https://arxiv.org/abs/1804.00792
Practical Poisoning Attacks On Neural Networks – https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123720137.pdf
Transferable Clean-label Poisoning Attacks On Deep Neural Nets – https://arxiv.org/abs/1905.05897

Federated Learning Model Poisoning Attacks

In federated learning settings where multiple clients train a shared model, these attacks involve malicious clients submitting corrupted model updates. The poisoned updates can degrade global model performance, introduce backdoors, or bias the model toward specific misclassifications when aggregated with legitimate updates.

Analyzing Federated Learning Through An Adversarial Lens – https://arxiv.org/abs/1811.12470
Local Model Poisoning Attacks To Byzantine-robust Federated Learning – https://www.usenix.org/conference/usenixsecurity20/presentation/fang
Universal Multi-Party Poisoning Attacks – https://arxiv.org/abs/1809.03474

Generative Model Poisoning Attacks

These target generative models (like GANs or diffusion models) by injecting malicious samples into training data. The goal is to corrupt the model’s learned distribution so it generates inappropriate content, exhibits biases, or produces outputs with hidden backdoor patterns.

Generative Poisoning Attack Method Against Neural Networks – https://arxiv.org/pdf/1703.01340
Learning To Confuse: Generating Training Time Adversarial Data With Auto-encoder – https://arxiv.org/abs/1905.09027
Poisoning Attack In Federated Learning Using Generative Adversarial Nets – https://ieeexplore.ieee.org/document/8887357
Poisoning Attacks With Generative Adversarial Nets – https://arxiv.org/abs/1906.07773

Influence Function Poisoning Attacks

These attacks use influence functions, which measure how individual training points affect model predictions, to identify the most effective poisoning points. By understanding which training samples have the highest influence on specific test predictions, attackers can craft minimal but highly effective poisoning sets.

Influence Function Based Data Poisoning Attacks To Top-n Recommender Systems – https://arxiv.org/abs/2002.08025
Stronger Data Poisoning Attacks Break Data Sanitization Defenses – https://arxiv.org/abs/1811.00741
Understanding Black-box Predictions Via Influence Functions – https://proceedings.mlr.press/v70/koh17a/koh17a.pdf

Label-Flipping Poisoning Attacks

One of the simplest poisoning strategies where attackers flip the labels of training samples to incorrect classes while keeping features unchanged. For example, labeling images of dogs as cats. This creates inconsistencies that degrade the model’s ability to learn correct decision boundaries.

A Game-theoretic Analysis Of Label Flipping Attacks On Distributed Support Vector Machines – https://ieeexplore.ieee.org/document/7926118
Efficient Label Contamination Attacks Against Black-box Learning Models – https://www.researchgate.net/publication/317252983_Efficient_Label_Contamination_Attacks_Against_Black-Box_Learning_Models
Support Vector Machines Under Adversarial Label Noise – https://proceedings.mlr.press/v20/biggio11.html

p-Tampering Poisoning Attacks

These attacks involve modifying a small fraction p of the training data in specific ways. The tampering can include changing features, labels, or both, with the constraint that only p proportion of the dataset is corrupted.

Blockwise p-tampering Attacks On Cryptographic Primitives, Extractors, And Learners – https://eprint.iacr.org/2017/950.pdf
Learning Under p-tampering Attacks – https://proceedings.mlr.press/v83/mahloujifar18a/mahloujifar18a.pdf
The Curse Of Concentration In Robust Learning: Evasion And Poisoning Attacks From Concentration Of Measure – https://ojs.aaai.org/index.php/AAAI/article/view/4373

Vanishing Gradient Poisoning Attacks

These attacks craft poisoned samples that cause gradient computations during training to become extremely small or zero. This effectively stalls learning for certain parts of the model or specific classes, preventing the model from properly learning to classify certain inputs or causing training instability.

TensorClog: An Imperceptible Poisoning Attack On Deep Neural Network Applications – https://ieeexplore.ieee.org/document/8668758
Unlearnable Examples: Making Personal Data Unexploitable – https://arxiv.org/abs/2101.04898

Thanks for reading!