Brian D. Colwell

Menu
  • Home
  • Blog
  • Contact
Menu

The Big List Of AI Data Poisoning Attack And Defense References And Resources 

Posted on June 8, 2025 by Brian Colwell

Note that the below are in alphabetical order by title. Enjoy!

  1. A Backdoor Approach With Inverted Labels Using Dirty Label-Flipping Attacks – https://arxiv.org/html/2404.00076v1
  2. A Backdoor Attack Against LSTM-Based Text Classification Systems – https://ieeexplore.ieee.org/document/8836465 
  3. A Brief Introduction To AI Data Poisoning – https://cryptopunk4762.com/f/a-brief-introduction-to-ai-data-poisoning 
  4. A Brief Introduction To Backdoor AI Data Poisoning Attacks – https://cryptopunk4762.com/f/a-brief-introduction-to-backdoor-ai-data-poisoning-attacks 
  5. A Brief Introduction To Clean Label AI Data Poisoning – https://cryptopunk4762.com/f/a-brief-introduction-to-clean-label-ai-data-poisoning-attacks 
  6. A Brief Introduction To Dirty-Label AI Data Poisoning Attacks – https://cryptopunk4762.com/f/a-brief-introduction-to-dirty-label-ai-data-poisoning-attacks
  7. A Brief Introduction To AI Data Poisoning Defenses – https://cryptopunk4762.com/f/a-brief-introduction-to-ai-data-poisoning-defenses
  8. A Brief Taxonomy Of AI Data Poisoning Attacks – https://cryptopunk4762.com/f/a-brief-taxonomy-of-ai-data-poisoning-attacks
  9. A Brief Taxonomy Of AI Data Poisoning Defenses – https://cryptopunk4762.com/f/a-brief-taxonomy-of-ai-data-poisoning-defenses
  10. Advances In Neural Information Processing Systems 26 (NIPS 2013) – https://papers.nips.cc/paper_files/paper/2013 
  11. Adversarial Clean Label Backdoor Attacks And Defenses On Text Classification Systems – https://arxiv.org/abs/2305.19607 
  12. Adversarial Label Flips Attack On Support Vector Machines – https://www.sec.in.tum.de/i20/publications/adversarial-label-flips-attack-on-support-vector-machines 
  13. A Game-theoretic Analysis Of Label Flipping Attacks On Distributed Support Vector Machines – https://ieeexplore.ieee.org/document/7926118 
  14. A Label Flipping Attack On Machine Learning Model And Its Defense Mechanism – https://www.researchgate.net/publication/367031053_A_Label_Flipping_Attack_on_Machine_Learning_Model_and_Its_Defense_Mechanism 
  15. Analysis Of Causative Attacks Against SVMs Learning From Data Streams – https://faculty.washington.edu/lagesse/publications/CausativeSVM.pdf
  16. Analyzing Federated Learning Through An Adversarial Lens – https://arxiv.org/abs/1811.12470
  17. An Embarrassingly Simple Approach For Trojan Attack In Deep Neural Networks – https://arxiv.org/abs/2006.08131
  18. Anti-backdoor Learning: Training Clean Models On Poisoned Data – https://arxiv.org/abs/2110.11571 
  19. Artificial Intelligence Crime: An Overview of Malicious Use And Abuse Of AI – https://ieeexplore.ieee.org/document/9831441
  20. A Semantic And Clean-label Backdoor Attack Against Graph Convolutional Networks –  https://arxiv.org/pdf/2503.14922
  21. Awesome Learning With Noisy Labels – https://github.com/subeeshvasu/Awesome-Learning-with-Label-Noise 
  22. BAAAN: Backdoor Attacks Against Autoencoder And GAN-based Machine Learning Models – https://arxiv.org/abs/2010.03007
  23. Backdoor Attacks Against Deep Learning Systems In The Physical World – https://ieeexplore.ieee.org/document/9577800
  24. Backdoor Embedding In Convolutional Neural Network Models Via Invisible Perturbation – https://arxiv.org/abs/1808.10307
  25. Backdooring And Poisoning Neural Networks With Image-scaling Attacks – https://arxiv.org/abs/2003.08633
  26. Backdoor Scanning For Deep Neural Networks Through K-arm Optimization – https://arxiv.org/abs/2102.05123
  27. BadNets: Identifying Vulnerabilities In The Machine Learning Model Supply Chain – https://machine-learning-and-security.github.io/papers/mlsec17_paper_51.pdf 
  28. BadNL: Backdoor Attacks against NLP Models with Semantic-preserving Improvements – https://arxiv.org/abs/2006.01043
  29. Bagging Classifiers For Fighting Poisoning Attacks In Adversarial Classification Tasks – https://dl.acm.org/doi/abs/10.5555/2040895.2040945
  30. Be Careful About Poisoned Word Embeddings: Exploring The Vulnerability Of The Embedding Layers In NLP Models – https://arxiv.org/pdf/2103.15543
  31. Blind Backdoors In Deep Learning Models – https://arxiv.org/abs/2005.03823 
  32. Blockwise p-tampering Attacks On Cryptographic Primitives, Extractors, And Learners – https://eprint.iacr.org/2017/950.pdf
  33. Bullseye Polytope: A Scalable Clean-label Poisoning Attack With Improved Transferability – https://arxiv.org/pdf/2005.00191
  34. Casting Out Demons: Sanitizing Training Data For Anomaly Sensors – https://ieeexplore.ieee.org/abstract/document/4531146 
  35. Certified Robustness To Label-Flipping Attacks Via Randomized Smoothing – https://arxiv.org/abs/2002.03018
  36. Clean-label Backdoor Attack And Defense: An Examination Of Language Model Vulnerability – https://dl.acm.org/doi/10.1016/j.eswa.2024.125856
  37. Clean-label Backdoor Attacks By Selectively Poisoning With Limited Information From Target Class – https://openreview.net/pdf?id=JvUuutHa2s 
  38. Clean-Label Backdoor Attacks On Video Recognition Models – https://openaccess.thecvf.com/content_CVPR_2020/html/Zhao_Clean-Label_Backdoor_Attacks_on_Video_Recognition_Models_CVPR_2020_paper.html 
  39. Clean-Label Feature Collision Attacks On A Keras Classifier – https://github.com/Trusted-AI/adversarial-robustness-toolbox/blob/main/notebooks/poisoning_attack_feature_collision.ipynb 
  40. COMBAT: Alternated Training For Effective Clean-Label Backdoor Attacks – https://ojs.aaai.org/index.php/AAAI/article/view/28019
  41. Concealed Data Poisoning Attacks On NLP Models – https://arxiv.org/pdf/2010.12563 
  42. Customizing Triggers With Concealed Data Poisoning – https://pdfs.semanticscholar.org/6d8d/d81f2d18e86b2fa23d52ef14dbcba39864b4.pdf
  43. DarkMind: Latent Chain-Of-Thought Backdoor In Customized LLMs – https://arxiv.org/abs/2501.18617
  44. DarkMind: A New Backdoor Attack That Leverages The Reasoning Capabilities Of LLMs – https://techxplore.com/news/2025-02-darkmind-backdoor-leverages-capabilities-llms.html
  45. Data Poisoning Against Differentially-private Learners: Attacks And Defenses – https://arxiv.org/abs/1903.09860
  46. Data Poisoning Attacks On Factorization-based Collaborative Filtering – https://papers.nips.cc/paper_files/paper/2016/file/83fa5a432ae55c253d0e60dbfa716723-Paper.pdf
  47. Dataset Security For Machine Learning: Data Poisoning, Backdoor Attacks, And Defenses – https://arxiv.org/pdf/2012.10544
  48. DeepInspect: A Black-box Trojan Detection And Mitigation Framework For Deep Neural Networks – https://www.ijcai.org/proceedings/2019/0647.pdf
  49. Deep k-NN Defense Against Clean-Label Data Poisoning Attacks – https://dl.acm.org/doi/10.1007/978-3-030-66415-2_4
  50. Defending Neural Backdoors Via Generative Distribution Modeling – https://proceedings.neurips.cc/paper_files/paper/2019/file/78211247db84d96acf4e00092a7fba80-Paper.pdf
  51. Demon In The Variant: Statistical Analysis Of DNNs For Robust Backdoor Contamination Detection – https://arxiv.org/abs/1908.00686
  52. Design Of Intentional Backdoors In Sequential Models – https://arxiv.org/abs/1902.09972
  53. Detecting AI Trojans Using Meta Neural Analysis – https://ieeexplore.ieee.org/document/9519467
  54. Detecting Backdoor Attacks On Deep Neural Networks By Activation Clustering – https://arxiv.org/abs/1811.03728
  55. Detection Of Adversarial Training Examples In Poisoning Attacks Through Anomaly Detection – https://arxiv.org/abs/1802.03041
  56. Dp-InstaHide: Provably Defusing Poisoning And Backdoor Attacks With Differentially Private Data Augmentations – https://arxiv.org/pdf/2103.02079
  57. Dynamic Backdoor Attacks Against Machine Learning Models – https://arxiv.org/abs/2003.03675
  58. Effective Clean-Label Backdoor Attacks On Graph Neural Networks – https://dl.acm.org/doi/10.1145/3627673.3679905
  59. Efficient Label Contamination Attacks Against Black-Box Learning Models – https://www.researchgate.net/publication/317252983_Efficient_Label_Contamination_Attacks_Against_Black-Box_Learning_Models 
  60. Explanation-guided Backdoor Poisoning Attacks Against Malware Classifiers – https://www.usenix.org/system/files/sec21-severi.pdf
  61. Fast Adversarial Label-Flipping Attack On Tabular Data – https://arxiv.org/abs/2310.10744
  62. Fawkes: Protecting Privacy Against Unauthorized Deep Learning Models – https://arxiv.org/abs/2002.08327
  63. Fine-pruning: Defending Against Backdooring Attacks On Deep Neural Networks – https://link.springer.com/chapter/10.1007/978-3-030-00470-5_13 
  64. GangSweep: Sweep Out Neural Backdoors By GAN – https://dl.acm.org/doi/pdf/10.1145/3394171.3413546
  65. Generative AI Misuse: A Taxonomy Of Tactics And Insights From Real-World Data – https://arxiv.org/abs/2406.13843
  66. Generative Poisoning Attack Method Against Neural Networks – https://arxiv.org/pdf/1703.01340 
  67. Google SAIF “Top Risks Of Generative AI Systems” – https://saif.google/secure-ai-framework/risks 
  68. Handcrafted Backdoors In Deep Neural Networks – https://arxiv.org/pdf/2106.04690
  69. Hardware Trojan Attacks On Neural Networks – https://arxiv.org/abs/1806.05768 
  70. Hidden Killer: Invisible Textual Backdoor Attacks With Syntactic Trigger – https://arxiv.org/abs/2105.12400
  71. Hidden Trigger Backdoor Attacks – https://arxiv.org/abs/1910.00033 
  72. How To Backdoor Federated Learning – https://proceedings.mlr.press/v108/bagdasaryan20a.html
  73. ImageNet – https://www.image-net.org/
  74. InfoGAN: Interpretable Representation Learning By Information Maximizing Generative Adversarial Nets – https://arxiv.org/abs/1606.03657
  75. Influence Functions In Deep Learning Are Fragile – https://arxiv.org/abs/2006.14651
  76. Influence Function Based Data Poisoning Attacks To Top-n Recommender Systems – https://arxiv.org/abs/2002.08025
  77. Invisible Black-Box Backdoor Attack Against Deep Cross-Modal Hashing Retrieval – https://dl.acm.org/doi/10.1145/3650205 
  78. Invisible Backdoor Attack With Sample-specific Triggers – https://arxiv.org/abs/2012.03816
  79. Is AGI An Asymmetric Threat? – https://cryptopunk4762.com/f/is-agi-an-asymmetric-threat 
  80. Is Feature Selection Secure Against Training Data Poisoning? – https://arxiv.org/abs/1804.07933
  81. Label-consistent Backdoor Attacks – https://arxiv.org/abs/1912.02771
  82. Learning To Confuse: Generating Training Time Adversarial Data With Auto-encoder – https://arxiv.org/abs/1905.09027
  83. Learning Under p-tampering Attacks – https://proceedings.mlr.press/v83/mahloujifar18a/mahloujifar18a.pdf
  84. Learning With Noisy Labels – https://papers.nips.cc/paper_files/paper/2013/hash/3871bd64012152bfb53fdf04b401193f-Abstract.html
  85. Less Is More: Stealthy And Adaptive Clean-Image Backdoor Attacks With Few Poisoned – https://openreview.net/forum?id=LsTIW9VAF7 
  86. LFGurad: A Defense Against Label Flipping Attack In Federated Learning For Vehicular Network – https://www.sciencedirect.com/science/article/abs/pii/S1389128624006005
  87. Local Model Poisoning Attacks To Byzantine-robust Federated Learning – https://www.usenix.org/conference/usenixsecurity20/presentation/fang
  88. Malicious ML Models Discovered On Hugging Face Platform – https://www.reversinglabs.com/blog/rl-identifies-malware-ml-model-hosted-on-hugging-face
  89. Manipulating Machine Learning: Poisoning Attacks And Countermeasures For Regression Learning – https://ieeexplore.ieee.org/document/8418594 
  90. Mapping The Misuse Of generative AI – https://deepmind.google/discover/blog/mapping-the-misuse-of-generative-ai/ 
  91. MetaPoison: Practical General-purpose Clean-label Data Poisoning – https://proceedings.neurips.cc/paper_files/paper/2020/file/8ce6fc704072e351679ac97d4a985574-Paper.pdf
  92. Mitigating Poisoning Attacks On Machine Learning Models: A Data Provenance Based Approach – https://www.cs.purdue.edu/homes/bb/nit/20_nathalie-Mitigating_Poisoning_Attacks_on_Machine_Learning_Models_A_Data_Provenance_Based_Approach.pdf
  93. ML Attack Models: Adversarial Attacks And Data Poisoning Attacks – https://arxiv.org/abs/2112.02797 
  94. Narcissus: A Practical Clean-Label Backdoor Attack With Limited Information – https://arxiv.org/abs/2204.05255 
  95. Neural Cleanse: Identifying And Mitigating Backdoor Attacks In Neural Networks – https://ieeexplore.ieee.org/abstract/document/8835365
  96. Neural Trojans – https://arxiv.org/pdf/1710.00942
  97. NIC: Detecting Adversarial Samples With Neural Network Invariant Checking – https://par.nsf.gov/servlets/purl/10139597 
  98. On Defending Against Label Flipping Attacks On Malware Detection Systems – https://arxiv.org/abs/1908.04473
  99. One-pixel Signature: Characterizing CNN Models For Backdoor Detection – https://arxiv.org/abs/2008.07711
  100. On The Effectiveness Of Mitigating Data Poisoning Attacks With Gradient Shaping – https://arxiv.org/abs/2002.11497
  101. Poison Forensics: Traceback Of Data Poisoning Attacks In Neural Networks – ​​https://www.usenix.org/system/files/sec22-shan.pdf 
  102. Poison Frogs! Targeted Clean-label Poisoning Attacks On Neural Networks – https://arxiv.org/abs/1804.00792
  103. Poisoned Classifiers Are Not Only Backdoored, They Are Fundamentally Broken – https://arxiv.org/abs/2010.09080
  104. Poisoning And Backdooring Contrastive Learning – https://arxiv.org/abs/2106.09667 
  105. Poisoning Attack In Federated Learning Using Generative Adversarial Nets – https://ieeexplore.ieee.org/document/8887357
  106. Poisoning Attacks Against Support Vector Machines – ​​https://arxiv.org/abs/1206.6389
  107. Poisoning Attacks On Algorithmic Fairness – https://arxiv.org/abs/2004.07401
  108. Poisoning Attacks With Generative Adversarial Nets – https://arxiv.org/abs/1906.07773 
  109. Poisoning Deep Reinforcement Learning Agents With In-distribution Triggers – https://arxiv.org/abs/2106.07798
  110. Poisoning Language Models During Instruction Tuning – https://arxiv.org/pdf/2305.00944 
  111. Poisoning Web-Scale Training Datasets Is Practical – https://arxiv.org/abs/2302.10149 
  112. Practical Detection Of Trojan Neural Networks: Data-limited And Data-free Cases – https://arxiv.org/abs/2007.15802
  113. Practical Poisoning Attacks On Neural Networks – https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123720137.pdf
  114. Preventing Unauthorized Use Of Proprietary Data: Poisoning For Secure Dataset Release – https://arxiv.org/abs/2103.02683
  115. Protecting Intellectual Property Of Deep Neural Networks With Watermarking – https://dl.acm.org/doi/10.1145/3196494.3196550
  116. Protecting The Public From Abusive AI-Generated Content – https://cdn-dynmedia-1.microsoft.com/is/content/microsoftcorp/microsoft/msc/documents/presentations/CSR/Protecting-Public-Abusive-AI-Generated-Content.pdf 
  117. Provable Robustness Against Backdoor Attacks – https://arxiv.org/abs/2003.08904
  118. Radioactive Data: Tracing Through Training – https://arxiv.org/abs/2002.00937
  119. REFIT: A Unified Watermark Removal Framework For Deep Learning Systems With Limited Data – https://arxiv.org/pdf/1911.07205
  120. Removing Backdoor-based Watermarks In Neural Networks With Limited Data – https://arxiv.org/pdf/2008.00407
  121. Robust Linear Regression Against Training Data Poisoning – https://dl.acm.org/doi/10.1145/3128572.3140447 
  122. Seal Your Backdoor With Variational Defense – https://arxiv.org/pdf/2503.08829
  123. SentiNet: Detecting Localized Universal Attack Against Deep Learning Systems – https://ieeexplore.ieee.org/document/9283822
  124. Silent Killer: A Stealthy, Clean-Label, Black-Box Backdoor Attack – https://arxiv.org/abs/2301.02615 
  125. Smart Lexical Search For Label Flipping Adversial Attack – https://aclanthology.org/2024.privatenlp-1.11.pdf
  126. Spectral Signatures In backdoor attacks – https://proceedings.neurips.cc/paper_files/paper/2018/file/280cf18baf4311c92aa5a042336587d3-Paper.pdf 
  127. Stop-and-Go: Exploring Backdoor Attacks On Deep Reinforcement Learning-based Traffic Congestion Control Systems – https://arxiv.org/abs/2003.07859
  128. STRIP: A defence Against Trojan Attacks On Deep Neural Networks – https://dl.acm.org/doi/abs/10.1145/3359789.3359790 
  129. Strong Data Augmentation Sanitizes Poisoning And Backdoor Attacks Without An Accuracy Tradeoff – https://arxiv.org/pdf/2011.09527
  130. Stronger Data Poisoning Attacks Break Data Sanitization Defenses – https://arxiv.org/pdf/1811.00741 
  131. Support Vector Machines Under Adversarial Label Noise – https://proceedings.mlr.press/v20/biggio11.html
  132. TABOR: A Highly Accurate Approach To Inspecting And Restoring Trojan Backdoors In AI systems – https://arxiv.org/pdf/1908.01763
  133. Targeted Backdoor Attacks On Deep Learning Systems Using Data Poisoning – https://arxiv.org/abs/1712.05526 
  134. Targeted Poisoning Attacks On Social Recommender Systems – https://ieeexplore.ieee.org/document/9013539
  135. TensorClog: An Imperceptible Poisoning Attack On Deep Neural Network Applications – https://ieeexplore.ieee.org/document/8668758
  136. The Art Of Deception: Robust Backdoor Attack Using Dynamic Stacking Of Triggers – https://arxiv.org/html/2401.01537v4 
  137. The Curse Of Concentration In Robust Learning: Evasion And Poisoning Attacks From Concentration Of Measure – https://ojs.aaai.org/index.php/AAAI/article/view/4373
  138. The Path To Defence: A Roadmap To Characterising Data Poisoning Attacks On Victim Models – https://dl.acm.org/doi/10.1145/3627536
  139. Top Risks Of Generative AI Systems – https://saif.google/secure-ai-framework/risks 
  140. Towards Clean-Label Backdoor Attacks In The Physical World – https://arxiv.org/html/2407.19203v1
  141. Towards Data Poisoning Attacks In Crowd Sensing Systems – https://cse.buffalo.edu/~lusu/papers/MobiHoc2018.pdf
  142. Towards Poisoning Of Deep Learning Algorithms With Back-gradient Optimization – https://arxiv.org/pdf/1708.08689
  143. Transferable Clean-label Poisoning Attacks On Deep Neural Nets – https://arxiv.org/abs/1905.05897
  144. Triggerless Backdoor Attack For NLP Tasks With Clean Labels – https://arxiv.org/abs/2111.07970 
  145. Trojan Attack On Deep Generative Models In Autonomous Driving – https://link.springer.com/chapter/10.1007/978-3-030-37228-6_15
  146. Trojaning Attack On Neural Networks – https://www.ndss-symposium.org/wp-content/uploads/2018/02/ndss2018_03A-5_Liu_paper.pdf
  147. Trojaning Language Models For Fun And Profit – https://arxiv.org/abs/2008.00312
  148. TrojDRL: Evaluation Of Backdoor Attacks On Deep Reinforcement Learning – https://dl.acm.org/doi/10.5555/3437539.3437570
  149. Truth Serum: Poisoning Machine Learning Models To Reveal Their Secrets – https://arxiv.org/abs/2204.00032
  150. Turning Your Weakness Into A Strength: Watermarking Deep Neural Networks By Backdooring – https://www.usenix.org/conference/usenixsecurity18/presentation/adi
  151. Understanding Black-box Predictions Via Influence Functions – https://proceedings.mlr.press/v70/koh17a/koh17a.pdf
  152. Universal Litmus Patterns: Revealing Backdoor Attacks In CNNs – https://arxiv.org/abs/1906.10842
  153. Universal Multi-Party Poisoning Attacks – https://arxiv.org/abs/1809.03474
  154. Unlearnable Examples: Making Personal Data Unexploitable – https://arxiv.org/abs/2101.04898
  155. Using Machine Teaching To Identify Optimal Training-set Attacks On Machine Learners – https://ojs.aaai.org/index.php/AAAI/article/view/9569
  156. Weight Poisoning Attacks On Pretrained Models – https://aclanthology.org/2020.acl-main.249.pdf 
  157. What Are The Ethical Risks Of Strong AI? – https://cryptopunk4762.com/f/what-are-the-ethical-risks-of-strong-ai 
  158. What Doesn’t Kill You Makes You Robust(er): How To Adversarially Train Against Data Poisoning – https://arxiv.org/abs/2102.13624
  159. Wicked Oddities: Selectively Poisoning For Effective Clean-Label Backdoor Attacks – https://arxiv.org/abs/2407.10825
  160. Witches’ Brew: Industrial Scale Data Poisoning Via Gradient Matching – https://arxiv.org/abs/2009.02276
  161. You Autocomplete Me: Poisoning Vulnerabilities In Neural Code Completion – https://www.usenix.org/system/files/sec21-schuster.pdf

Thanks for reading!

Browse Topics

  • Artificial Intelligence
    • Adversarial Attacks & Examples
    • Alignment & Ethics
    • Backdoor & Trojan Attacks
    • Data Poisoning
    • Federated Learning
    • Model Extraction
    • Model Inversion
    • Prompt Injection & Jailbreaking
    • Sensitive Information Disclosure
    • Watermarking
  • Biotech & Agtech
  • Commodities
    • Agricultural
    • Energies & Energy Metals
    • Gases
    • Gold
    • Industrial Metals
    • Minerals & Metalloids
  • Economics & Game Theory
  • Management
  • Marketing
  • Philosophy
  • Robotics
  • Sociology
    • Group Dynamics
    • Political Science
    • Religious Sociology
    • Sociological Theory
  • Web3 Studies
    • Bitcoin & Cryptocurrencies
    • Blockchain & Cryptography
    • DAOs & Decentralized Organizations
    • NFTs & Digital Identity

Recent Posts

  • A Taxonomy Of AI Data Poisoning Defenses

    A Taxonomy Of AI Data Poisoning Defenses

    June 8, 2025
  • What Are AI Sensitive Information Disclosure Attacks? The Threat Landscape

    What Are AI Sensitive Information Disclosure Attacks? The Threat Landscape

    June 8, 2025
  • Popular AI Model Inversion Attack Strategies

    Popular AI Model Inversion Attack Strategies

    June 8, 2025
©2025 Brian D. Colwell | Theme by SuperbThemes