Brian D. Colwell

Menu
  • Home
  • Blog
  • Contact
Menu
A digital artwork showing a human brain split into two halves, one organic and one circuit-board-like, symbolizing the fusion of biology and technology.

A Taxonomy Of AI Data Poisoning Defenses

Posted on June 8, 2025June 8, 2025 by Brian Colwell

We begin our taxonomy by dividing data poisoning defenses into three broad categories: Attack Identification Techniques, Attack Repair Techniques, and Attack Prevention Techniques, in which are then organized key research papers by defense type.

Data Poisoning Attack Identification Techniques

In this section, data poisoning defenses are divided into Techniques For Identifying Poisoned Data and Techniques For Identifying Poisoned Models.

Techniques For Identifying Poisoned Data

  • Deep k-NN Defense Against Clean-Label Data Poisoning Attacks – https://dl.acm.org/doi/10.1007/978-3-030-66415-2_4
  • Detecting Backdoor Attacks On Deep Neural Networks By Activation Clustering – https://arxiv.org/abs/1811.03728
  • NIC: Detecting Adversarial Samples With Neural Network Invariant Checking – https://par.nsf.gov/servlets/purl/10139597 
  • SentiNet: Detecting Localized Universal Attacks Against Deep Learning Systems – https://ieeexplore.ieee.org/document/9283822
  • Spectral Signatures In Backdoor Attacks – https://proceedings.neurips.cc/paper_files/paper/2018/file/280cf18baf4311c92aa5a042336587d3-Paper.pdf 
  • STRIP: A Defence Against Trojan Attacks On Deep Neural Networks – https://dl.acm.org/doi/abs/10.1145/3359789.3359790 
  • Understanding Black-Box Predictions Via Influence Functions – https://proceedings.mlr.press/v70/koh17a/koh17a.pdf

Techniques For Identifying Poisoned Models

  • DeepInspect: A Black-Box Trojan Detection And Mitigation Framework For Deep Neural Networks – https://www.ijcai.org/proceedings/2019/0647.pdf
  • Detecting AI Trojans Using Meta Neural Analysis – https://ieeexplore.ieee.org/document/9519467
  • One-Pixel Signature: Characterizing CNN Models For Backdoor Detection – https://arxiv.org/abs/2008.07711
  • Practical Detection Of Trojan Neural Networks: Data-Limited And Data-Free Cases – https://arxiv.org/abs/2007.15802
  • TABOR: A Highly Accurate Approach To Inspecting And Restoring Trojan Backdoors In AI Systems – https://arxiv.org/pdf/1908.01763
  • Universal Litmus Patterns: Revealing Backdoor Attacks In CNNs – https://arxiv.org/abs/1906.10842

Data Poisoning Attack Repair Techniques

In this section, data poisoning defenses are divided into Techniques For Patching Known Triggers and Techniques For Trigger-Agnostic Backdoor Removal.

Techniques For Patching Known Triggers

  • Defending Neural Backdoors Via Generative Distribution Modeling – https://proceedings.neurips.cc/paper_files/paper/2019/file/78211247db84d96acf4e00092a7fba80-Paper.pdf
  • GangSweep: Sweep Out Neural Backdoors By GAN – https://dl.acm.org/doi/pdf/10.1145/3394171.3413546
  • Neural Cleanse: Identifying And Mitigating Backdoor Attacks In Neural Networks – https://people.cs.uchicago.edu/~ravenben/publications/pdf/backdoor-sp19.pdf

Techniques For Trigger-Agnostic Backdoor Removal

  • Fine-pruning: Defending Against Backdooring Attacks On Deep Neural Networks – https://arxiv.org/abs/1805.12185
  • REFIT: A Unified Watermark Removal Framework For Deep Learning Systems With Limited Data – https://arxiv.org/pdf/1911.07205
  • Removing Backdoor-based Watermarks In Neural Networks With Limited Data – https://arxiv.org/pdf/2008.00407

Data Poisoning Attack Prevention Techniques

In this section, data poisoning defenses are divided into Randomized Smoothing Techniques For Poisoning Attack Prevention, Differential Privacy Techniques For Poisoning Attack Prevention, and Input Processing Techniques For Poisoning Attack Prevention.

Randomized Smoothing Techniques For Poisoning Attack Prevention

  • Certified Robustness To Label-flipping Attacks Via Randomized Smoothing – https://arxiv.org/abs/2002.03018
  • Provable Robustness Against Backdoor Attacks – https://arxiv.org/abs/2003.08904

Differential Privacy Techniques For Poisoning Attack Prevention

  • Data Poisoning Against Differentially-private Learners: Attacks And Defenses – https://arxiv.org/abs/1903.09860
  • On The Effectiveness Of Mitigating Data Poisoning Attacks With Gradient Shaping – https://arxiv.org/abs/2002.11497 (in response to ‘Witches’ Brew: Industrial Scale Data Poisoning Via Gradient Matching’ – https://arxiv.org/abs/2009.02276)

Input Processing Techniques For Poisoning Attack Prevention

  • Dp-InstaHide: Provably Defusing Poisoning And Backdoor Attacks With Differentially Private Data Augmentations – https://arxiv.org/pdf/2103.02079
  • Neural Trojans – https://arxiv.org/pdf/1710.00942
  • Strong Data Augmentation Sanitizes Poisoning And Backdoor Attacks Without An Accuracy Tradeoff – https://arxiv.org/pdf/2011.09527
  • What Doesn’t Kill You Makes You Robust(er): How to Adversarially Train against Data Poisoning – https://arxiv.org/abs/2102.13624

Thanks for reading!

Browse Topics

  • Artificial Intelligence
    • Adversarial Examples
    • Alignment & Ethics
    • Backdoor & Trojan Attacks
    • Data Poisoning
    • Federated Learning
    • Model Extraction
    • Model Inversion
    • Prompt Injection & Jailbreaking
    • Sensitive Information Disclosure
    • Supply Chain
    • Training Data Extraction
    • Watermarking
  • Biotech & Agtech
  • Commodities
    • Agricultural
    • Energies & Energy Metals
    • Gases
    • Gold
    • Industrial Metals
    • Minerals & Metalloids
  • Economics & Game Theory
  • Management
  • Marketing
  • Philosophy
  • Robotics
  • Sociology
    • Group Dynamics
    • Political Science
    • Religious Sociology
    • Sociological Theory
  • Web3 Studies
    • Bitcoin & Cryptocurrencies
    • Blockchain & Cryptography
    • DAOs & Decentralized Organizations
    • NFTs & Digital Identity

Recent Posts

  • The Bitcoin Whitepaper – Satoshi Nakamoto

    The Bitcoin Whitepaper – Satoshi Nakamoto

    June 13, 2025
  • The Big List Of AI Supply Chain Attack Resources

    The Big List Of AI Supply Chain Attack Resources

    June 11, 2025
  • AI Supply Chain Attacks Are A Pervasive Threat

    AI Supply Chain Attacks Are A Pervasive Threat

    June 11, 2025
©2025 Brian D. Colwell | Theme by SuperbThemes