The Big List Of AI Prompt Injection References And Resources
This curated collection of references and resources serves as a comprehensive research repository, bringing together academic papers, industry analyses, and practical demonstrations that illuminate the multifaceted nature of prompt injection attacks.
The materials assembled here span the full spectrum of this emerging threat domain: from foundational attack techniques and novel exploitation methods to defensive frameworks and empirical evaluations of real-world systems. Whether you’re a security researcher investigating LLM vulnerabilities, a developer building AI-integrated applications, or a practitioner seeking to understand the risk landscape, this resource provides essential context for navigating one of artificial intelligence’s most pressing security challenges.
AI Prompt Injection References And Resources
Several themes emerge from this body of research. First, defensive approaches are gradually maturing from ad-hoc filtering to more principled frameworks—including instruction authentication, spotlighting techniques, game-theoretic detection, and architectural redesigns that separate trusted instructions from untrusted data. Second, the arms race between attacks and defenses continues to accelerate, with adaptive adversarial techniques consistently finding ways around static protections. Third, the challenge becomes exponentially more complex as LLMs evolve into autonomous agents with tool-calling capabilities, web access, and the ability to execute actions in the real world.
Note that the below are in alphabetical order by title. Please let me know if there are any sources you would like to see added to this list. Enjoy!
- A Critical Evaluation of Defenses against Prompt Injection Attacks – https://arxiv.org/abs/2505.18333
- Abusing Images And Sounds For Indirect Instruction Injection In Multi-Modal LLMs – https://arxiv.org/abs/2307.10490
- Adaptive Attacks Break Defenses Against Indirect Prompt Injection Attacks on LLM Agents – https://arxiv.org/abs/2503.00061
- Adversarial Attacks on LLM-as-a-Judge Systems: Insights from Prompt Injections – https://arxiv.org/abs/2504.18333
- Adversarial Search Engine Optimization for Large Language Models – https://arxiv.org/abs/2406.18382
- AgentDojo: A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents – https://arxiv.org/abs/2406.13352
- Aligning LLMs to Be Robust Against Prompt Injection – https://arxiv.org/abs/2410.05451
- Applying Pre-trained Multilingual BERT in Embeddings for Improved Malicious Prompt Injection Attacks Detection – https://arxiv.org/abs/2409.13331
- Attention Tracker: Detecting Prompt Injection Attacks in LLMs – https://arxiv.org/abs/2411.00348
- Automatic and Universal Prompt Injection Attacks against Large Language Models – https://arxiv.org/abs/2403.04957
- AutoPrompt: Eliciting Knowledge From Language Models With Automatically Generated Prompts – https://arxiv.org/abs/2010.15980
- Backdoored Retrievers for Prompt Injection Attacks on Retrieval Augmented Generation of Large Language Models – https://arxiv.org/abs/2410.14479
- Black Box Adversarial Prompting For Foundation Models – https://arxiv.org/abs/2302.04237
- Breaking the Prompt Wall (I): A Real-World Case Study of Attacking ChatGPT via Lightweight Prompt Injection – https://arxiv.org/abs/2504.16125
- CachePrune: Neural-Based Attribution Defense Against Indirect Prompt Injection Attacks – https://arxiv.org/abs/2504.21228
- Computing Optimization-Based Prompt Injections Against Closed-Weights Models By Misusing a Fine-Tuning API – https://arxiv.org/abs/2501.09798
- Dataset and Lessons Learned from the 2024 SaTML LLM Capture-the-Flag Competition – https://arxiv.org/abs/2406.07954
- DataSentinel: A Game-Theoretic Detection of Prompt Injection Attacks – https://arxiv.org/abs/2504.11358
- Defeating Prompt Injections by Design – https://arxiv.org/abs/2503.18813
- Defending Against Indirect Prompt Injection Attacks With Spotlighting – https://arxiv.org/abs/2403.14720
- Defending against Indirect Prompt Injection by Instruction Detection – https://arxiv.org/abs/2505.06311
- Defense Against Prompt Injection Attack by Leveraging Attack Techniques – https://arxiv.org/abs/2411.00459
- Embedding-based classifiers can detect prompt injection attacks – https://arxiv.org/abs/2410.22284
- Empirical Analysis of Large Vision-Language Models against Goal Hijacking via Visual Prompt Injection – https://arxiv.org/abs/2408.03554
- Encrypted Prompt: Securing LLM Applications Against Unauthorized Actions – https://arxiv.org/abs/2503.23250
- Exploiting Large Language Models (LLMs) through Deception Techniques and Persuasion Principles – https://arxiv.org/abs/2311.14876
- F2A: An Innovative Approach for Prompt Injection by Utilizing Feign Security Detection Agents – https://arxiv.org/abs/2410.08776
- FATH: Authentication-based Test-time Defense against Indirect Prompt Injection Attacks – https://arxiv.org/abs/2410.21492
- Formalizing and Benchmarking Prompt Injection Attacks and Defenses – https://arxiv.org/abs/2310.12815
- From Allies To Adversaries: Manipulating LLM Tool-Calling Through Adversarial Injection – https://arxiv.org/abs/2412.10198
- GenTel-Safe: A Unified Benchmark and Shielding Framework for Defending Against Prompt Injection Attacks – https://arxiv.org/abs/2409.19521
- Goal-guided Generative Prompt Injection Attack on Large Language Models – https://arxiv.org/abs/2404.07234
- Hacking Back the AI-Hacker: Prompt Injection as a Defense Against LLM-driven Cyberattacks – https://arxiv.org/abs/2410.20911
- HijackRAG: Hijacking Attacks against Retrieval-Augmented Large Language Models – https://arxiv.org/abs/2410.22832
- How We Estimate The Risk From Prompt Injection Attacks On AI Systems – https://security.googleblog.com/2025/01/how-we-estimate-risk-from-prompt.html
- Ignore Previous Prompt: Attack Techniques For Language Models – https://arxiv.org/abs/2211.09527
- Ignore This Title and HackAPrompt: Exposing Systemic Vulnerabilities of LLMs through a Global Scale Prompt Hacking Competition – https://arxiv.org/abs/2311.16119
- InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents – https://arxiv.org/abs/2403.02691
- InjecGuard: Benchmarking and Mitigating Over-defense in Prompt Injection Guardrail Models – https://arxiv.org/abs/2410.22770
- Invisible Prompts, Visible Threats: Malicious Font Injection in External Resources for Large Language Models – https://arxiv.org/abs/2505.16957
- Is Your Prompt Safe? Investigating Prompt Injection Attacks Against Open-Source LLMs – https://arxiv.org/abs/2505.14368
- Lessons from Defending Gemini Against Indirect Prompt Injections – https://arxiv.org/abs/2505.14534
- Making LLMs Vulnerable to Prompt Injection via Poisoning Alignment – https://arxiv.org/abs/2410.14827
- MELON: Indirect Prompt Injection Defense via Masked Re-execution and Tool Comparison – https://arxiv.org/abs/2502.05174
- Multi-modal Prompt Injection Image Attacks Against GPT-4V – https://simonwillison.net/2023/Oct/14/multi-modal-prompt-injection/
- Neural Exec: Learning (and Learning from) Execution Triggers for Prompt Injection Attacks – https://arxiv.org/abs/2403.03792
- Not what you’ve signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection – https://arxiv.org/abs/2302.12173v2
- OET: Optimization-based prompt injection Evaluation Toolkit – https://arxiv.org/abs/2505.00843
- Optimization-based Prompt Injection Attack to LLM-as-a-Judge – https://arxiv.org/abs/2403.17710
- Preemptive Answer “Attacks” on Chain-of-Thought Reasoning – https://arxiv.org/abs/2405.20902
- Prompt, Divide, And Conquer: Bypassing Large Language Model Safety Filters Via Segmented And Distributed Prompt Processing – https://arxiv.org/abs/2503.21598
- Prompt Infection: LLM-to-LLM Prompt Injection within Multi-Agent Systems – https://arxiv.org/abs/2410.07283
- Prompt Inject Detection with Generative Explanation as an Investigative Tool – https://arxiv.org/abs/2502.11006
- Prompt Injections – https://saif.google/secure-ai-framework/risks
- Prompt Injection Attack Against LLM-integrated Applications – https://arxiv.org/abs/2306.05499
- Prompt Injection Attacks Against GPT-3 – https://simonwillison.net/2022/Sep/12/prompt-injection/
- Prompt Injection Attacks in Defended Systems – https://arxiv.org/abs/2406.14048
- Prompt Injection Attacks on Large Language Models in Oncology – https://arxiv.org/abs/2407.18981
- Prompt Leaking – https://learnprompting.org/docs/prompt_hacking/leaking
- PROMPTFUZZ: Harnessing Fuzzing Techniques for Robust Testing of Prompt Injection in LLMs – https://arxiv.org/abs/2409.14729
- PromptShield: Deployable Detection for Prompt Injection Attacks – https://arxiv.org/abs/2501.15145
- Query-Based Adversarial Prompt Generation – https://arxiv.org/abs/2402.12329
- Rag and Roll: An End-to-End Evaluation of Indirect Prompt Manipulations in LLM-based Application Frameworks – https://arxiv.org/abs/2408.05025
- REDUCING THE IMPACT OF PROMPT INJECTION ATTACKS THROUGH DESIGN – https://research.kudelskisecurity.com/2023/05/25/reducing-the-impact-of-prompt-injection-attacks-through-design/
- Riley Goodside (@goodside) – “Exploiting GPT-3 prompts with malicious inputs that order the model to ignore its previous directions.” – https://x.com/goodside/status/1569128808308957185
- RLPrompt: Optimizing Discrete Text Prompts With Reinforcement Learning – https://arxiv.org/abs/2205.12548
- Robustness via Referencing: Defending against Prompt Injection Attacks by Referencing the Executed Instruction – https://arxiv.org/abs/2504.20472
- RTBAS: Defending LLM Agents Against Prompt Injection and Privacy Leakage – https://arxiv.org/abs/2502.08966
- Scaling Behavior of Machine Translation with Large Language Models under Prompt Injection Attacks – https://arxiv.org/abs/2403.09832
- SecAlign: Defending Against Prompt Injection With Preference Optimization – https://arxiv.org/html/2410.05451v2
- Separator Injection Attack: Uncovering Dialogue Biases in Large Language Models Caused by Role Separators – https://arxiv.org/abs/2504.05689
- Simple Prompt Injection Attacks Can Leak Personal Data Observed by LLM Agents During Task Execution – https://arxiv.org/abs/2506.01055
- StruPhantom: Evolutionary Injection Attacks on Black-Box Tabular Agents Powered by Large Language Models – https://arxiv.org/abs/2504.09841
- StruQ: Defending Against Prompt Injection with Structured Queries – https://arxiv.org/html/2402.06363v2
- System-Level Defense against Indirect Prompt Injection Attacks: An Information Flow Control Perspective – https://arxiv.org/abs/2409.19091
- Systematically Analyzing Prompt Injection Vulnerabilities in Diverse LLM Architectures – https://arxiv.org/abs/2410.23308
- Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game – https://openreview.net/forum?id=fsW7wJGLBd
- The Task Shield: Enforcing Task Alignment to Defend Against Indirect Prompt Injection in LLM Agents – https://arxiv.org/abs/2412.16682
- Token-Efficient Prompt Injection Attack: Provoking Cessation in LLM Reasoning via Adaptive Token Compression – https://arxiv.org/abs/2504.20493
- Towards Action Hijacking of Large Language Model-based Agent – https://arxiv.org/abs/2412.10807
- TracLLM: A Generic Framework for Attributing Long Context LLMs – https://arxiv.org/abs/2506.04202
- Trust No AI: Prompt Injection Along The CIA Security Triad – https://arxiv.org/abs/2412.06090
- Universal and Context-Independent Triggers for Precise Control of LLM Outputs – https://arxiv.org/abs/2411.14738
- VPI-Bench: Visual Prompt Injection Attacks for Computer-Use Agents – https://arxiv.org/abs/2506.02456
- What Is A Prompt Injection Attack? – https://www.ibm.com/think/topics/prompt-injection
Final Thoughts
The taxonomy of prompt injection continues to evolve, encompassing direct attacks that manipulate user inputs, indirect attacks that poison external data sources, and sophisticated multi-modal exploits that leverage images, audio, or even font rendering. As LLMs gain more autonomy through tool-calling capabilities and agent frameworks, the potential impact of successful injection attacks grows from simple output manipulation to full system compromise, data exfiltration, and unauthorized actions.
This research corpus reveals that prompt injection is not a problem that will be “solved” through a single breakthrough. Instead, it demands defense-in-depth strategies that combine multiple layers: robust system design, runtime monitoring, embedding-based detection, formal verification where possible, and careful scoping of AI capabilities to match the risk tolerance of each deployment context. The industry lessons from defending systems like Gemini demonstrate that production-grade protection requires continuous iteration, rigorous red-teaming, and a willingness to accept that some attacks may succeed despite our best efforts.
Understanding these threats is not merely an academic exercise—it’s fundamental to building trustworthy AI systems that can safely operate in adversarial environments.
Thanks for reading!