Label-flipping is popular because of key advantages such as requiring not only minimal access to data, but minimal computational resources, as well. In addition to this attack’s low effort and low cost requirements, label-flipping attacks are also extremely versatile, with the ability to be applied to virtually any supervised learning task across different domains.

Today, let’s discuss the history of these utilitarian attacks.

2011-2016: Label-Flipping Attack Origins

In their 2011 paper, ‘Support vector machines under adversarial label noise’, Biggio, Nelson, and Laskov used both random and adversarial label flips to poison support vector machines. Their work showed that flipping the labels of an adversarially chosen data subset could cause poisoning, even against learners trained in a robust fashion. While not exclusively focused on label-flipping, this paper is considered one of the first to systematically address adversarial attacks on machine learning models, including label manipulation attacks, laying the groundwork for later, dedicated, label-flipping research.

For example, in 2012, Xiao et al. provided us with one of the earliest works specifically dedicated to Label-flipping, titled ‘Adversarial Label Flips Against Support Vector Machines’. This paper presented an early formalization of label-flipping attacks as a framework for analyzing the vulnerability of support vector machines (SVMs) to targeted attack. Next, in 2013, Natarajan et al. published a milestone paper titled ‘Learning with Noisy Labels’, as part of “Advances in Neural Information Processing Systems 26 (NIPS 2013)”, in which were proposed methods to modify surrogate loss functions to handle label noise. Research into label-flipping attacks continued from 2014 to 2016 with papers addressing challenges such as ‘Learning from Massive Noisy Labeled Data for Image Classification’ and ‘Training Convolutional Networks with Noisy Labels’.

2017-2023: Label-Flipping Expands To Deep Learning Models & Black-Box Environments

These pioneering papers established the foundations for learning with noisy labels, providing theoretical frameworks, algorithmic approaches, and experimental validations that continue to influence this research area today. The field has evolved significantly since these early works, however, and we experienced a second phase of substantial development in dirty-label attacks and defenses from 2017 through 2023.

2017 was a pivotal year for dirty-label attacks – one in which Muñoz-González et al. expanded label-flipping attack research to include deep learning models in their paper ‘Towards Poisoning of Deep Learning Algorithms with Back-gradient Optimization’. Further, Gu, Dolan-Gavitt, and Garg showed in 2017 that an adversary could created a maliciously trained network (a backdoored neural network, or a “BadNet”) in ‘BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain’, a foundational work in dirty-label attacks. That same year, Zhang and Zhu, in their paper titled ‘A game-theoretic analysis of label flipping attacks on distributed support vector machines’, provided a theoretical analysis of label flipping attacks on SVM using tools from game theory, while in a paper titled ‘Efficient label contamination attacks against black-box learning models’, Zhang showed that a projected gradient ascent approach to label flipping is effective even against black-box linear models ranging from SVM to logistic regression and LS-SVM.

In response to these early label flipping attack successes, a number of defenses were developed – such as anomaly detection in 2018, convolutional neural networks in 2019, and randomized smoothing in 2020, and common risk mitigation strategies, such as data sanitization, were explored, but innovation in attacks has continued to outpace that of defense. For example, in 2023, Chang, Dobbie, and Wicker introduced “FALFA”, an efficient Label-flipping attack for tabular datasets, in their paper ‘Fast Adversarial Label-Flipping Attack on Tabular Data’.

2024-Present: Growing Sophistication Of Label-Flipping Attacks Highlighted By Defense Efforts

A third phase of dirty-label attack development began in 2024 with the introduction of “DirtyFlipping”, or stealthy Label-flipping attacks, in Orson Mengara’s ‘A Backdoor Approach with Inverted Labels Using Dirty Label-Flipping Attacks’. Here Mengara achieved stealth through intra-class label manipulation. Also in 2024, Gutiérrez-Megías et al. introduced us to “Lexical Label-Flipping”, which brought explainability-driven attacks to NLP. Finally in 2024, defense-focused papers such as ‘LFGurad: A Defense against Label Flipping Attack in Federated Learning for Vehicular Network’ indirectly highlighted the growing sophistication of Label-flipping attacks in distributed systems.

Final Thoughts

On a final note, while so far this year (2025) attack-attention is more on clean-label attacks, I fully expect stealthy dirty-label attacks to keep gaining in prominence and, as stealthy Label-flipping techniques continue to advance, we should be worried – the only previous barrier to attack entry was relatively easy detection and, because of its accessibility and practicality, well before stealthy techniques were innovated, Label-flipping was already a data poisoning attack method favored by those with limited technical expertise. Now that stealth is an option, perhaps results are no longer limited for those with nefarious goals.

Thanks for reading!