A futuristic humanoid robot with glowing blue circuits, deep in thought, connected to a complex neural network interface.

Backdoor Attacks – The Problem Has Outpaced The Solution

06/07/2025|Brian D. Colwell|Artificial Intelligence

The concept of the backdoor, or “trojan”, AI attack was first proposed in 2017 by Gu, Dolan-Gavitt & Garg in their paper ‘BadNets: Identifying Vulnerabilities In The Machine Learning Model Supply Chain’ in the area of computer vision. At first, mechanisms for injecting backdoors were limited to poisoning, with research from Chen, et. al in ‘Targeted Backdoor Attacks On Deep Learning Systems Using Data Poisoning’ as an example. “Our work demonstrates that backdoor poisoning attacks pose real threats to a learning system”, Chen, et. al stated in 2017.

Later, it was shown that natural language processing models also suffer from such potential risks, as seen in the 2019 backdoor attack against LSTM-based classification systems and the 2020 backdoor attack against NLP models with semantic-preserving improvements.

Since then, researchers have found that backdoors can be preserved even if the backdoored model is further fine-tuned by users on downstream task-specific datasets, as discussed by Kurita, Michel, and Neubig in their 2020 paper ‘Weight Poisoning Attacks On Pretrained Models’, and the ability of trojan attacks to penetrate ill-prepared federated learning defenses has been well-studied. Also in 2020, dynamic backdoor attacks such as the “conditional Backdoor Generating Network” (c-BaN) were proposed. Closing out that first era for backdoor attack, in 2021, gradient descent method made it feasible to manipulate a text classification model with only a single word embedding vector modified, disregarding whether task-related datasets can be acquired or not, and poisoning of deep reinforcement learning agents with in-distribution triggers was researched.

The next era of backdoor attacks saw a shift away from poisoning to handcrafted attack techniques that directly manipulate a model’s weights and introduce arbitrary perturbations, allowing the attacker to evade many backdoor detection efforts and removal defenses. 2024, alone, saw the innovation of dynamic trigger stacking, backdoor attacks in the physical world, invisible cross-modal backdoor attacks, and generative adversarial backdoors. Finally, so far this year (2025), we’ve already been introduced to “DarkMind”, a reasoning-chain backdoor that dynamically alters a large language model’s intermediate logic without modifying inputs or outputs. Creating “reasoning-process backdoors”, DarkMind operates entirely within the LLM’s reasoning process.

Final Thought?

The problem has continued to outpace the solution.

“Equo ne credite, Teucri. Quidquid id est, timeo Danaos et dona ferentes“, or “Do not trust the horse, Trojans! Whatever it is, I fear the Danaans [Greeks], even those bearing gifts“. – Virgil in ‘The Aeneid’ on what came to be known as “The Trojan Horse”.

Thanks for reading!

Appendix

Reader note – you may also be interested in these other articles on artificial intelligence:

A Brief Introduction To AI Model Inversion Attacks – https://briandcolwell.com/a-brief-introduction-to-ai-model-inversion-attacks/
A Brief Introduction To AI Prompt Injection Attacks – https://briandcolwell.com/a-brief-introduction-to-ai-prompt-injection-attacks/
A History Of AI Jailbreaking Attacks – https://briandcolwell.com/a-history-of-ai-jailbreaking-attacks/
A History Of Clean-Label AI Data Poisoning Backdoor Attacks – https://briandcolwell.com/a-history-of-clean-label-ai-data-poisoning-attacks/
AI Supply Chain Attacks Are A Pervasive Threat – https://briandcolwell.com/ai-supply-chain-attacks-are-a-pervasive-threat/
An Introduction To AI Model Extraction – https://briandcolwell.com/an-introduction-to-ai-model-extraction/
An Introduction To AI Side-Channel Attacks – https://briandcolwell.com/an-introduction-to-ai-side-channel-attacks/
Gradient And Update Leakage (GAUL) In Federated Learning – https://briandcolwell.com/gradient-and-update-leakage-gaul-in-federated-learning/
Membership Inference Attacks Leverage AI Model Behaviors – https://briandcolwell.com/membership-inference-attacks-leverage-ai-model-behaviors/
What Are AI Sensitive Information Disclosure Attacks? The Threat Landscape – https://briandcolwell.com/what-are-ai-sensitive-information-disclosure-attacks/
What Is AI Training Data Extraction? A Combination Of Techniques – https://briandcolwell.com/what-is-ai-training-data-extraction-a-combination-of-techniques/
What Is Model Leeching? – https://briandcolwell.com/what-is-model-leeching/

Backdoor Attacks – The Problem Has Outpaced The Solution

Final Thought?

Appendix

Categories

Join the Next Wave