Note that the below are in alphabetical order by title. Please let me know if there are any sources you would like to see added to this list. Enjoy!
- A Critical Evaluation of Defenses against Prompt Injection Attacks – https://arxiv.org/abs/2505.18333
- Abusing Images And Sounds For Indirect Instruction Injection In Multi-Modal LLMs – https://arxiv.org/abs/2307.10490
- Adaptive Attacks Break Defenses Against Indirect Prompt Injection Attacks on LLM Agents – https://arxiv.org/abs/2503.00061
- Adversarial Attacks on LLM-as-a-Judge Systems: Insights from Prompt Injections – https://arxiv.org/abs/2504.18333
- Adversarial Search Engine Optimization for Large Language Models – https://arxiv.org/abs/2406.18382
- AgentDojo: A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents – https://arxiv.org/abs/2406.13352
- Aligning LLMs to Be Robust Against Prompt Injection – https://arxiv.org/abs/2410.05451
- Applying Pre-trained Multilingual BERT in Embeddings for Improved Malicious Prompt Injection Attacks Detection – https://arxiv.org/abs/2409.13331
- Attention Tracker: Detecting Prompt Injection Attacks in LLMs – https://arxiv.org/abs/2411.00348
- Automatic and Universal Prompt Injection Attacks against Large Language Models – https://arxiv.org/abs/2403.04957
- AutoPrompt: Eliciting Knowledge From Language Models With Automatically Generated Prompts – https://arxiv.org/abs/2010.15980
- Backdoored Retrievers for Prompt Injection Attacks on Retrieval Augmented Generation of Large Language Models – https://arxiv.org/abs/2410.14479
- Black Box Adversarial Prompting For Foundation Models – https://arxiv.org/abs/2302.04237
- Breaking the Prompt Wall (I): A Real-World Case Study of Attacking ChatGPT via Lightweight Prompt Injection – https://arxiv.org/abs/2504.16125
- CachePrune: Neural-Based Attribution Defense Against Indirect Prompt Injection Attacks – https://arxiv.org/abs/2504.21228
- Computing Optimization-Based Prompt Injections Against Closed-Weights Models By Misusing a Fine-Tuning API – https://arxiv.org/abs/2501.09798
- Dataset and Lessons Learned from the 2024 SaTML LLM Capture-the-Flag Competition – https://arxiv.org/abs/2406.07954
- DataSentinel: A Game-Theoretic Detection of Prompt Injection Attacks – https://arxiv.org/abs/2504.11358
- Defeating Prompt Injections by Design – https://arxiv.org/abs/2503.18813
- Defending Against Indirect Prompt Injection Attacks With Spotlighting – https://arxiv.org/abs/2403.14720
- Defending against Indirect Prompt Injection by Instruction Detection – https://arxiv.org/abs/2505.06311
- Defense Against Prompt Injection Attack by Leveraging Attack Techniques – https://arxiv.org/abs/2411.00459
- Embedding-based classifiers can detect prompt injection attacks – https://arxiv.org/abs/2410.22284
- Empirical Analysis of Large Vision-Language Models against Goal Hijacking via Visual Prompt Injection – https://arxiv.org/abs/2408.03554
- Encrypted Prompt: Securing LLM Applications Against Unauthorized Actions – https://arxiv.org/abs/2503.23250
- Exploiting Large Language Models (LLMs) through Deception Techniques and Persuasion Principles – https://arxiv.org/abs/2311.14876
- F2A: An Innovative Approach for Prompt Injection by Utilizing Feign Security Detection Agents – https://arxiv.org/abs/2410.08776
- FATH: Authentication-based Test-time Defense against Indirect Prompt Injection Attacks – https://arxiv.org/abs/2410.21492
- Formalizing and Benchmarking Prompt Injection Attacks and Defenses – https://arxiv.org/abs/2310.12815
- From Allies To Adversaries: Manipulating LLM Tool-Calling Through Adversarial Injection – https://arxiv.org/abs/2412.10198
- GenTel-Safe: A Unified Benchmark and Shielding Framework for Defending Against Prompt Injection Attacks – https://arxiv.org/abs/2409.19521
- Goal-guided Generative Prompt Injection Attack on Large Language Models – https://arxiv.org/abs/2404.07234
- Hacking Back the AI-Hacker: Prompt Injection as a Defense Against LLM-driven Cyberattacks – https://arxiv.org/abs/2410.20911
- HijackRAG: Hijacking Attacks against Retrieval-Augmented Large Language Models – https://arxiv.org/abs/2410.22832
- How We Estimate The Risk From Prompt Injection Attacks On AI Systems – https://security.googleblog.com/2025/01/how-we-estimate-risk-from-prompt.html
- Ignore Previous Prompt: Attack Techniques For Language Models – https://arxiv.org/abs/2211.09527
- Ignore This Title and HackAPrompt: Exposing Systemic Vulnerabilities of LLMs through a Global Scale Prompt Hacking Competition – https://arxiv.org/abs/2311.16119
- InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents – https://arxiv.org/abs/2403.02691
- InjecGuard: Benchmarking and Mitigating Over-defense in Prompt Injection Guardrail Models – https://arxiv.org/abs/2410.22770
- Invisible Prompts, Visible Threats: Malicious Font Injection in External Resources for Large Language Models – https://arxiv.org/abs/2505.16957
- Is Your Prompt Safe? Investigating Prompt Injection Attacks Against Open-Source LLMs – https://arxiv.org/abs/2505.14368
- Lessons from Defending Gemini Against Indirect Prompt Injections – https://arxiv.org/abs/2505.14534
- Making LLMs Vulnerable to Prompt Injection via Poisoning Alignment – https://arxiv.org/abs/2410.14827
- MELON: Indirect Prompt Injection Defense via Masked Re-execution and Tool Comparison – https://arxiv.org/abs/2502.05174
- Multi-modal Prompt Injection Image Attacks Against GPT-4V – https://simonwillison.net/2023/Oct/14/multi-modal-prompt-injection/
- Neural Exec: Learning (and Learning from) Execution Triggers for Prompt Injection Attacks – https://arxiv.org/abs/2403.03792
- Not what you’ve signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection – https://arxiv.org/abs/2302.12173v2
- OET: Optimization-based prompt injection Evaluation Toolkit – https://arxiv.org/abs/2505.00843
- Optimization-based Prompt Injection Attack to LLM-as-a-Judge – https://arxiv.org/abs/2403.17710
- Preemptive Answer “Attacks” on Chain-of-Thought Reasoning – https://arxiv.org/abs/2405.20902
- Prompt, Divide, And Conquer: Bypassing Large Language Model Safety Filters Via Segmented And Distributed Prompt Processing – https://arxiv.org/abs/2503.21598
- Prompt Infection: LLM-to-LLM Prompt Injection within Multi-Agent Systems – https://arxiv.org/abs/2410.07283
- Prompt Inject Detection with Generative Explanation as an Investigative Tool – https://arxiv.org/abs/2502.11006
- Prompt Injections – https://saif.google/secure-ai-framework/risks
- Prompt Injection Attack Against LLM-integrated Applications – https://arxiv.org/abs/2306.05499
- Prompt Injection Attacks Against GPT-3 – https://simonwillison.net/2022/Sep/12/prompt-injection/
- Prompt Injection Attacks in Defended Systems – https://arxiv.org/abs/2406.14048
- Prompt Injection Attacks on Large Language Models in Oncology – https://arxiv.org/abs/2407.18981
- Prompt Leaking – https://learnprompting.org/docs/prompt_hacking/leaking
- PROMPTFUZZ: Harnessing Fuzzing Techniques for Robust Testing of Prompt Injection in LLMs – https://arxiv.org/abs/2409.14729
- PromptShield: Deployable Detection for Prompt Injection Attacks – https://arxiv.org/abs/2501.15145
- Query-Based Adversarial Prompt Generation – https://arxiv.org/abs/2402.12329
- Rag and Roll: An End-to-End Evaluation of Indirect Prompt Manipulations in LLM-based Application Frameworks – https://arxiv.org/abs/2408.05025
- REDUCING THE IMPACT OF PROMPT INJECTION ATTACKS THROUGH DESIGN – https://research.kudelskisecurity.com/2023/05/25/reducing-the-impact-of-prompt-injection-attacks-through-design/
- Riley Goodside (@goodside) – “Exploiting GPT-3 prompts with malicious inputs that order the model to ignore its previous directions.” – https://x.com/goodside/status/1569128808308957185
- RLPrompt: Optimizing Discrete Text Prompts With Reinforcement Learning – https://arxiv.org/abs/2205.12548
- Robustness via Referencing: Defending against Prompt Injection Attacks by Referencing the Executed Instruction – https://arxiv.org/abs/2504.20472
- RTBAS: Defending LLM Agents Against Prompt Injection and Privacy Leakage – https://arxiv.org/abs/2502.08966
- Scaling Behavior of Machine Translation with Large Language Models under Prompt Injection Attacks – https://arxiv.org/abs/2403.09832
- SecAlign: Defending Against Prompt Injection With Preference Optimization – https://arxiv.org/html/2410.05451v2
- Separator Injection Attack: Uncovering Dialogue Biases in Large Language Models Caused by Role Separators – https://arxiv.org/abs/2504.05689
- Simple Prompt Injection Attacks Can Leak Personal Data Observed by LLM Agents During Task Execution – https://arxiv.org/abs/2506.01055
- StruPhantom: Evolutionary Injection Attacks on Black-Box Tabular Agents Powered by Large Language Models – https://arxiv.org/abs/2504.09841
- StruQ: Defending Against Prompt Injection with Structured Queries – https://arxiv.org/html/2402.06363v2
- System-Level Defense against Indirect Prompt Injection Attacks: An Information Flow Control Perspective – https://arxiv.org/abs/2409.19091
- Systematically Analyzing Prompt Injection Vulnerabilities in Diverse LLM Architectures – https://arxiv.org/abs/2410.23308
- Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game – https://openreview.net/forum?id=fsW7wJGLBd
- The Task Shield: Enforcing Task Alignment to Defend Against Indirect Prompt Injection in LLM Agents – https://arxiv.org/abs/2412.16682
- Token-Efficient Prompt Injection Attack: Provoking Cessation in LLM Reasoning via Adaptive Token Compression – https://arxiv.org/abs/2504.20493
- Towards Action Hijacking of Large Language Model-based Agent – https://arxiv.org/abs/2412.10807
- TracLLM: A Generic Framework for Attributing Long Context LLMs – https://arxiv.org/abs/2506.04202
- Trust No AI: Prompt Injection Along The CIA Security Triad – https://arxiv.org/abs/2412.06090
- Universal and Context-Independent Triggers for Precise Control of LLM Outputs – https://arxiv.org/abs/2411.14738
- VPI-Bench: Visual Prompt Injection Attacks for Computer-Use Agents – https://arxiv.org/abs/2506.02456
- What Is A Prompt Injection Attack? – https://www.ibm.com/think/topics/prompt-injection
Thanks for reading!