We begin our taxonomy by dividing data poisoning defenses into three broad categories: Attack Identification Techniques, Attack Repair Techniques, and Attack Prevention Techniques, in which are then organized key research papers by defense type.
Data Poisoning Attack Identification Techniques
In this section, data poisoning defenses are divided into Techniques For Identifying Poisoned Data and Techniques For Identifying Poisoned Models.
Techniques For Identifying Poisoned Data
- Deep k-NN Defense Against Clean-Label Data Poisoning Attacks – https://dl.acm.org/doi/10.1007/978-3-030-66415-2_4
- Detecting Backdoor Attacks On Deep Neural Networks By Activation Clustering – https://arxiv.org/abs/1811.03728
- NIC: Detecting Adversarial Samples With Neural Network Invariant Checking – https://par.nsf.gov/servlets/purl/10139597
- SentiNet: Detecting Localized Universal Attacks Against Deep Learning Systems – https://ieeexplore.ieee.org/document/9283822
- Spectral Signatures In Backdoor Attacks – https://proceedings.neurips.cc/paper_files/paper/2018/file/280cf18baf4311c92aa5a042336587d3-Paper.pdf
- STRIP: A Defence Against Trojan Attacks On Deep Neural Networks – https://dl.acm.org/doi/abs/10.1145/3359789.3359790
- Understanding Black-Box Predictions Via Influence Functions – https://proceedings.mlr.press/v70/koh17a/koh17a.pdf
Techniques For Identifying Poisoned Models
- DeepInspect: A Black-Box Trojan Detection And Mitigation Framework For Deep Neural Networks – https://www.ijcai.org/proceedings/2019/0647.pdf
- Detecting AI Trojans Using Meta Neural Analysis – https://ieeexplore.ieee.org/document/9519467
- One-Pixel Signature: Characterizing CNN Models For Backdoor Detection – https://arxiv.org/abs/2008.07711
- Practical Detection Of Trojan Neural Networks: Data-Limited And Data-Free Cases – https://arxiv.org/abs/2007.15802
- TABOR: A Highly Accurate Approach To Inspecting And Restoring Trojan Backdoors In AI Systems – https://arxiv.org/pdf/1908.01763
- Universal Litmus Patterns: Revealing Backdoor Attacks In CNNs – https://arxiv.org/abs/1906.10842
Data Poisoning Attack Repair Techniques
In this section, data poisoning defenses are divided into Techniques For Patching Known Triggers and Techniques For Trigger-Agnostic Backdoor Removal.
Techniques For Patching Known Triggers
- Defending Neural Backdoors Via Generative Distribution Modeling – https://proceedings.neurips.cc/paper_files/paper/2019/file/78211247db84d96acf4e00092a7fba80-Paper.pdf
- GangSweep: Sweep Out Neural Backdoors By GAN – https://dl.acm.org/doi/pdf/10.1145/3394171.3413546
- Neural Cleanse: Identifying And Mitigating Backdoor Attacks In Neural Networks – https://people.cs.uchicago.edu/~ravenben/publications/pdf/backdoor-sp19.pdf
Techniques For Trigger-Agnostic Backdoor Removal
- Fine-pruning: Defending Against Backdooring Attacks On Deep Neural Networks – https://arxiv.org/abs/1805.12185
- REFIT: A Unified Watermark Removal Framework For Deep Learning Systems With Limited Data – https://arxiv.org/pdf/1911.07205
- Removing Backdoor-based Watermarks In Neural Networks With Limited Data – https://arxiv.org/pdf/2008.00407
Data Poisoning Attack Prevention Techniques
In this section, data poisoning defenses are divided into Randomized Smoothing Techniques For Poisoning Attack Prevention, Differential Privacy Techniques For Poisoning Attack Prevention, and Input Processing Techniques For Poisoning Attack Prevention.
Randomized Smoothing Techniques For Poisoning Attack Prevention
- Certified Robustness To Label-flipping Attacks Via Randomized Smoothing – https://arxiv.org/abs/2002.03018
- Provable Robustness Against Backdoor Attacks – https://arxiv.org/abs/2003.08904
Differential Privacy Techniques For Poisoning Attack Prevention
- Data Poisoning Against Differentially-private Learners: Attacks And Defenses – https://arxiv.org/abs/1903.09860
- On The Effectiveness Of Mitigating Data Poisoning Attacks With Gradient Shaping – https://arxiv.org/abs/2002.11497 (in response to ‘Witches’ Brew: Industrial Scale Data Poisoning Via Gradient Matching’ – https://arxiv.org/abs/2009.02276)
Input Processing Techniques For Poisoning Attack Prevention
- Dp-InstaHide: Provably Defusing Poisoning And Backdoor Attacks With Differentially Private Data Augmentations – https://arxiv.org/pdf/2103.02079
- Neural Trojans – https://arxiv.org/pdf/1710.00942
- Strong Data Augmentation Sanitizes Poisoning And Backdoor Attacks Without An Accuracy Tradeoff – https://arxiv.org/pdf/2011.09527
- What Doesn’t Kill You Makes You Robust(er): How to Adversarially Train against Data Poisoning – https://arxiv.org/abs/2102.13624
Thanks for reading!