The Big List Of AI Prompt Injection References And Resources

Note that the below are in alphabetical order by title. Please let me know if there are any sources you would like to see added to this list. Enjoy!

A Critical Evaluation of Defenses against Prompt Injection Attacks – https://arxiv.org/abs/2505.18333
Abusing Images And Sounds For Indirect Instruction Injection In Multi-Modal LLMs – https://arxiv.org/abs/2307.10490
Adaptive Attacks Break Defenses Against Indirect Prompt Injection Attacks on LLM Agents – https://arxiv.org/abs/2503.00061
Adversarial Attacks on LLM-as-a-Judge Systems: Insights from Prompt Injections – https://arxiv.org/abs/2504.18333
Adversarial Search Engine Optimization for Large Language Models – https://arxiv.org/abs/2406.18382
AgentDojo: A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents – https://arxiv.org/abs/2406.13352
Aligning LLMs to Be Robust Against Prompt Injection – https://arxiv.org/abs/2410.05451
Applying Pre-trained Multilingual BERT in Embeddings for Improved Malicious Prompt Injection Attacks Detection – https://arxiv.org/abs/2409.13331
Attention Tracker: Detecting Prompt Injection Attacks in LLMs – https://arxiv.org/abs/2411.00348
Automatic and Universal Prompt Injection Attacks against Large Language Models – https://arxiv.org/abs/2403.04957
AutoPrompt: Eliciting Knowledge From Language Models With Automatically Generated Prompts – https://arxiv.org/abs/2010.15980
Backdoored Retrievers for Prompt Injection Attacks on Retrieval Augmented Generation of Large Language Models – https://arxiv.org/abs/2410.14479
Black Box Adversarial Prompting For Foundation Models – https://arxiv.org/abs/2302.04237
Breaking the Prompt Wall (I): A Real-World Case Study of Attacking ChatGPT via Lightweight Prompt Injection – https://arxiv.org/abs/2504.16125
CachePrune: Neural-Based Attribution Defense Against Indirect Prompt Injection Attacks – https://arxiv.org/abs/2504.21228
Computing Optimization-Based Prompt Injections Against Closed-Weights Models By Misusing a Fine-Tuning API – https://arxiv.org/abs/2501.09798
Dataset and Lessons Learned from the 2024 SaTML LLM Capture-the-Flag Competition – https://arxiv.org/abs/2406.07954
DataSentinel: A Game-Theoretic Detection of Prompt Injection Attacks – https://arxiv.org/abs/2504.11358
Defeating Prompt Injections by Design – https://arxiv.org/abs/2503.18813
Defending Against Indirect Prompt Injection Attacks With Spotlighting – https://arxiv.org/abs/2403.14720
Defending against Indirect Prompt Injection by Instruction Detection – https://arxiv.org/abs/2505.06311
Defense Against Prompt Injection Attack by Leveraging Attack Techniques – https://arxiv.org/abs/2411.00459
Embedding-based classifiers can detect prompt injection attacks – https://arxiv.org/abs/2410.22284
Empirical Analysis of Large Vision-Language Models against Goal Hijacking via Visual Prompt Injection – https://arxiv.org/abs/2408.03554
Encrypted Prompt: Securing LLM Applications Against Unauthorized Actions – https://arxiv.org/abs/2503.23250
Exploiting Large Language Models (LLMs) through Deception Techniques and Persuasion Principles – https://arxiv.org/abs/2311.14876
F2A: An Innovative Approach for Prompt Injection by Utilizing Feign Security Detection Agents – https://arxiv.org/abs/2410.08776
FATH: Authentication-based Test-time Defense against Indirect Prompt Injection Attacks – https://arxiv.org/abs/2410.21492
Formalizing and Benchmarking Prompt Injection Attacks and Defenses – https://arxiv.org/abs/2310.12815
From Allies To Adversaries: Manipulating LLM Tool-Calling Through Adversarial Injection – https://arxiv.org/abs/2412.10198
GenTel-Safe: A Unified Benchmark and Shielding Framework for Defending Against Prompt Injection Attacks – https://arxiv.org/abs/2409.19521
Goal-guided Generative Prompt Injection Attack on Large Language Models – https://arxiv.org/abs/2404.07234
Hacking Back the AI-Hacker: Prompt Injection as a Defense Against LLM-driven Cyberattacks – https://arxiv.org/abs/2410.20911
HijackRAG: Hijacking Attacks against Retrieval-Augmented Large Language Models – https://arxiv.org/abs/2410.22832
How We Estimate The Risk From Prompt Injection Attacks On AI Systems – https://security.googleblog.com/2025/01/how-we-estimate-risk-from-prompt.html
Ignore Previous Prompt: Attack Techniques For Language Models – https://arxiv.org/abs/2211.09527
Ignore This Title and HackAPrompt: Exposing Systemic Vulnerabilities of LLMs through a Global Scale Prompt Hacking Competition – https://arxiv.org/abs/2311.16119
InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents – https://arxiv.org/abs/2403.02691
InjecGuard: Benchmarking and Mitigating Over-defense in Prompt Injection Guardrail Models – https://arxiv.org/abs/2410.22770
Invisible Prompts, Visible Threats: Malicious Font Injection in External Resources for Large Language Models – https://arxiv.org/abs/2505.16957
Is Your Prompt Safe? Investigating Prompt Injection Attacks Against Open-Source LLMs – https://arxiv.org/abs/2505.14368
Lessons from Defending Gemini Against Indirect Prompt Injections – https://arxiv.org/abs/2505.14534
Making LLMs Vulnerable to Prompt Injection via Poisoning Alignment – https://arxiv.org/abs/2410.14827
MELON: Indirect Prompt Injection Defense via Masked Re-execution and Tool Comparison – https://arxiv.org/abs/2502.05174
Multi-modal Prompt Injection Image Attacks Against GPT-4V – https://simonwillison.net/2023/Oct/14/multi-modal-prompt-injection/
Neural Exec: Learning (and Learning from) Execution Triggers for Prompt Injection Attacks – https://arxiv.org/abs/2403.03792
Not what you’ve signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection – https://arxiv.org/abs/2302.12173v2
OET: Optimization-based prompt injection Evaluation Toolkit – https://arxiv.org/abs/2505.00843
Optimization-based Prompt Injection Attack to LLM-as-a-Judge – https://arxiv.org/abs/2403.17710
Preemptive Answer “Attacks” on Chain-of-Thought Reasoning – https://arxiv.org/abs/2405.20902
Prompt, Divide, And Conquer: Bypassing Large Language Model Safety Filters Via Segmented And Distributed Prompt Processing – https://arxiv.org/abs/2503.21598
Prompt Infection: LLM-to-LLM Prompt Injection within Multi-Agent Systems – https://arxiv.org/abs/2410.07283
Prompt Inject Detection with Generative Explanation as an Investigative Tool – https://arxiv.org/abs/2502.11006
Prompt Injections – https://saif.google/secure-ai-framework/risks
Prompt Injection Attack Against LLM-integrated Applications – https://arxiv.org/abs/2306.05499
Prompt Injection Attacks Against GPT-3 – https://simonwillison.net/2022/Sep/12/prompt-injection/
Prompt Injection Attacks in Defended Systems – https://arxiv.org/abs/2406.14048
Prompt Injection Attacks on Large Language Models in Oncology – https://arxiv.org/abs/2407.18981
Prompt Leaking – https://learnprompting.org/docs/prompt_hacking/leaking
PROMPTFUZZ: Harnessing Fuzzing Techniques for Robust Testing of Prompt Injection in LLMs – https://arxiv.org/abs/2409.14729
PromptShield: Deployable Detection for Prompt Injection Attacks – https://arxiv.org/abs/2501.15145
Query-Based Adversarial Prompt Generation – https://arxiv.org/abs/2402.12329
Rag and Roll: An End-to-End Evaluation of Indirect Prompt Manipulations in LLM-based Application Frameworks – https://arxiv.org/abs/2408.05025
REDUCING THE IMPACT OF PROMPT INJECTION ATTACKS THROUGH DESIGN – https://research.kudelskisecurity.com/2023/05/25/reducing-the-impact-of-prompt-injection-attacks-through-design/
Riley Goodside (@goodside) – “Exploiting GPT-3 prompts with malicious inputs that order the model to ignore its previous directions.” – https://x.com/goodside/status/1569128808308957185
RLPrompt: Optimizing Discrete Text Prompts With Reinforcement Learning – https://arxiv.org/abs/2205.12548
Robustness via Referencing: Defending against Prompt Injection Attacks by Referencing the Executed Instruction – https://arxiv.org/abs/2504.20472
RTBAS: Defending LLM Agents Against Prompt Injection and Privacy Leakage – https://arxiv.org/abs/2502.08966
Scaling Behavior of Machine Translation with Large Language Models under Prompt Injection Attacks – https://arxiv.org/abs/2403.09832
SecAlign: Defending Against Prompt Injection With Preference Optimization – https://arxiv.org/html/2410.05451v2
Separator Injection Attack: Uncovering Dialogue Biases in Large Language Models Caused by Role Separators – https://arxiv.org/abs/2504.05689
Simple Prompt Injection Attacks Can Leak Personal Data Observed by LLM Agents During Task Execution – https://arxiv.org/abs/2506.01055
StruPhantom: Evolutionary Injection Attacks on Black-Box Tabular Agents Powered by Large Language Models – https://arxiv.org/abs/2504.09841
StruQ: Defending Against Prompt Injection with Structured Queries – https://arxiv.org/html/2402.06363v2
System-Level Defense against Indirect Prompt Injection Attacks: An Information Flow Control Perspective – https://arxiv.org/abs/2409.19091
Systematically Analyzing Prompt Injection Vulnerabilities in Diverse LLM Architectures – https://arxiv.org/abs/2410.23308
Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game – https://openreview.net/forum?id=fsW7wJGLBd
The Task Shield: Enforcing Task Alignment to Defend Against Indirect Prompt Injection in LLM Agents – https://arxiv.org/abs/2412.16682
Token-Efficient Prompt Injection Attack: Provoking Cessation in LLM Reasoning via Adaptive Token Compression – https://arxiv.org/abs/2504.20493
Towards Action Hijacking of Large Language Model-based Agent – https://arxiv.org/abs/2412.10807
TracLLM: A Generic Framework for Attributing Long Context LLMs – https://arxiv.org/abs/2506.04202
Trust No AI: Prompt Injection Along The CIA Security Triad – https://arxiv.org/abs/2412.06090
Universal and Context-Independent Triggers for Precise Control of LLM Outputs – https://arxiv.org/abs/2411.14738
VPI-Bench: Visual Prompt Injection Attacks for Computer-Use Agents – https://arxiv.org/abs/2506.02456
What Is A Prompt Injection Attack? – https://www.ibm.com/think/topics/prompt-injection

Thanks for reading!