Brian D. Colwell

Menu
  • Home
  • Blog
  • Contact
Menu

A Taxonomy Of AI Data Poisoning Defenses

Posted on June 8, 2025June 8, 2025 by Brian Colwell

We begin our taxonomy by dividing data poisoning defenses into three broad categories: Attack Identification Techniques, Attack Repair Techniques, and Attack Prevention Techniques, in which are then organized key research papers by defense type.

Data Poisoning Attack Identification Techniques

In this section, data poisoning defenses are divided into Techniques For Identifying Poisoned Data and Techniques For Identifying Poisoned Models.

Techniques For Identifying Poisoned Data

  • Deep k-NN Defense Against Clean-Label Data Poisoning Attacks – https://dl.acm.org/doi/10.1007/978-3-030-66415-2_4
  • Detecting Backdoor Attacks On Deep Neural Networks By Activation Clustering – https://arxiv.org/abs/1811.03728
  • NIC: Detecting Adversarial Samples With Neural Network Invariant Checking – https://par.nsf.gov/servlets/purl/10139597 
  • SentiNet: Detecting Localized Universal Attacks Against Deep Learning Systems – https://ieeexplore.ieee.org/document/9283822
  • Spectral Signatures In Backdoor Attacks – https://proceedings.neurips.cc/paper_files/paper/2018/file/280cf18baf4311c92aa5a042336587d3-Paper.pdf 
  • STRIP: A Defence Against Trojan Attacks On Deep Neural Networks – https://dl.acm.org/doi/abs/10.1145/3359789.3359790 
  • Understanding Black-Box Predictions Via Influence Functions – https://proceedings.mlr.press/v70/koh17a/koh17a.pdf

Techniques For Identifying Poisoned Models

  • DeepInspect: A Black-Box Trojan Detection And Mitigation Framework For Deep Neural Networks – https://www.ijcai.org/proceedings/2019/0647.pdf
  • Detecting AI Trojans Using Meta Neural Analysis – https://ieeexplore.ieee.org/document/9519467
  • One-Pixel Signature: Characterizing CNN Models For Backdoor Detection – https://arxiv.org/abs/2008.07711
  • Practical Detection Of Trojan Neural Networks: Data-Limited And Data-Free Cases – https://arxiv.org/abs/2007.15802
  • TABOR: A Highly Accurate Approach To Inspecting And Restoring Trojan Backdoors In AI Systems – https://arxiv.org/pdf/1908.01763
  • Universal Litmus Patterns: Revealing Backdoor Attacks In CNNs – https://arxiv.org/abs/1906.10842

Data Poisoning Attack Repair Techniques

In this section, data poisoning defenses are divided into Techniques For Patching Known Triggers and Techniques For Trigger-Agnostic Backdoor Removal.

Techniques For Patching Known Triggers

  • Defending Neural Backdoors Via Generative Distribution Modeling – https://proceedings.neurips.cc/paper_files/paper/2019/file/78211247db84d96acf4e00092a7fba80-Paper.pdf
  • GangSweep: Sweep Out Neural Backdoors By GAN – https://dl.acm.org/doi/pdf/10.1145/3394171.3413546
  • Neural Cleanse: Identifying And Mitigating Backdoor Attacks In Neural Networks – https://people.cs.uchicago.edu/~ravenben/publications/pdf/backdoor-sp19.pdf

Techniques For Trigger-Agnostic Backdoor Removal

  • Fine-pruning: Defending Against Backdooring Attacks On Deep Neural Networks – https://arxiv.org/abs/1805.12185
  • REFIT: A Unified Watermark Removal Framework For Deep Learning Systems With Limited Data – https://arxiv.org/pdf/1911.07205
  • Removing Backdoor-based Watermarks In Neural Networks With Limited Data – https://arxiv.org/pdf/2008.00407

Data Poisoning Attack Prevention Techniques

In this section, data poisoning defenses are divided into Randomized Smoothing Techniques For Poisoning Attack Prevention, Differential Privacy Techniques For Poisoning Attack Prevention, and Input Processing Techniques For Poisoning Attack Prevention.

Randomized Smoothing Techniques For Poisoning Attack Prevention

  • Certified Robustness To Label-flipping Attacks Via Randomized Smoothing – https://arxiv.org/abs/2002.03018
  • Provable Robustness Against Backdoor Attacks – https://arxiv.org/abs/2003.08904

Differential Privacy Techniques For Poisoning Attack Prevention

  • Data Poisoning Against Differentially-private Learners: Attacks And Defenses – https://arxiv.org/abs/1903.09860
  • On The Effectiveness Of Mitigating Data Poisoning Attacks With Gradient Shaping – https://arxiv.org/abs/2002.11497 (in response to ‘Witches’ Brew: Industrial Scale Data Poisoning Via Gradient Matching’ – https://arxiv.org/abs/2009.02276)

Input Processing Techniques For Poisoning Attack Prevention

  • Dp-InstaHide: Provably Defusing Poisoning And Backdoor Attacks With Differentially Private Data Augmentations – https://arxiv.org/pdf/2103.02079
  • Neural Trojans – https://arxiv.org/pdf/1710.00942
  • Strong Data Augmentation Sanitizes Poisoning And Backdoor Attacks Without An Accuracy Tradeoff – https://arxiv.org/pdf/2011.09527
  • What Doesn’t Kill You Makes You Robust(er): How to Adversarially Train against Data Poisoning – https://arxiv.org/abs/2102.13624

Thanks for reading!

Browse Topics

  • Artificial Intelligence
    • Adversarial Attacks & Examples
    • Alignment & Ethics
    • Backdoor & Trojan Attacks
    • Data Poisoning
    • Federated Learning
    • Model Extraction
    • Model Inversion
    • Prompt Injection & Jailbreaking
    • Sensitive Information Disclosure
    • Watermarking
  • Biotech & Agtech
  • Commodities
    • Agricultural
    • Energies & Energy Metals
    • Gases
    • Gold
    • Industrial Metals
    • Minerals & Metalloids
  • Economics & Game Theory
  • Management
  • Marketing
  • Philosophy
  • Robotics
  • Sociology
    • Group Dynamics
    • Political Science
    • Religious Sociology
    • Sociological Theory
  • Web3 Studies
    • Bitcoin & Cryptocurrencies
    • Blockchain & Cryptography
    • DAOs & Decentralized Organizations
    • NFTs & Digital Identity

Recent Posts

  • The Big List Of AI Data Poisoning Attack And Defense References And Resources 

    The Big List Of AI Data Poisoning Attack And Defense References And Resources 

    June 8, 2025
  • What Are AI Sensitive Information Disclosure Attacks? The Threat Landscape

    What Are AI Sensitive Information Disclosure Attacks? The Threat Landscape

    June 8, 2025
  • Popular AI Model Inversion Attack Strategies

    Popular AI Model Inversion Attack Strategies

    June 8, 2025
©2025 Brian D. Colwell | Theme by SuperbThemes