Brian D. Colwell

Menu
  • Home
  • Blog
  • Contact
Menu

The Big List Of AI Prompt Injection References And Resources

Posted on June 8, 2025June 8, 2025 by Brian Colwell

Note that the below are in alphabetical order by title. Please let me know if there are any sources you would like to see added to this list. Enjoy!

  1. A Critical Evaluation of Defenses against Prompt Injection Attacks – https://arxiv.org/abs/2505.18333
  2. Abusing Images And Sounds For Indirect Instruction Injection In Multi-Modal LLMs – https://arxiv.org/abs/2307.10490
  3. Adaptive Attacks Break Defenses Against Indirect Prompt Injection Attacks on LLM Agents – https://arxiv.org/abs/2503.00061
  4. Adversarial Attacks on LLM-as-a-Judge Systems: Insights from Prompt Injections – https://arxiv.org/abs/2504.18333
  5. Adversarial Search Engine Optimization for Large Language Models – https://arxiv.org/abs/2406.18382
  6. AgentDojo: A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents – https://arxiv.org/abs/2406.13352
  7. Aligning LLMs to Be Robust Against Prompt Injection – https://arxiv.org/abs/2410.05451
  8. Applying Pre-trained Multilingual BERT in Embeddings for Improved Malicious Prompt Injection Attacks Detection – https://arxiv.org/abs/2409.13331
  9. Attention Tracker: Detecting Prompt Injection Attacks in LLMs – https://arxiv.org/abs/2411.00348
  10. Automatic and Universal Prompt Injection Attacks against Large Language Models – https://arxiv.org/abs/2403.04957
  11. AutoPrompt: Eliciting Knowledge From Language Models With Automatically Generated Prompts – https://arxiv.org/abs/2010.15980
  12. Backdoored Retrievers for Prompt Injection Attacks on Retrieval Augmented Generation of Large Language Models – https://arxiv.org/abs/2410.14479
  13. Black Box Adversarial Prompting For Foundation Models – https://arxiv.org/abs/2302.04237
  14. Breaking the Prompt Wall (I): A Real-World Case Study of Attacking ChatGPT via Lightweight Prompt Injection – https://arxiv.org/abs/2504.16125
  15. CachePrune: Neural-Based Attribution Defense Against Indirect Prompt Injection Attacks – https://arxiv.org/abs/2504.21228
  16. Computing Optimization-Based Prompt Injections Against Closed-Weights Models By Misusing a Fine-Tuning API – https://arxiv.org/abs/2501.09798
  17. Dataset and Lessons Learned from the 2024 SaTML LLM Capture-the-Flag Competition – https://arxiv.org/abs/2406.07954
  18. DataSentinel: A Game-Theoretic Detection of Prompt Injection Attacks – https://arxiv.org/abs/2504.11358
  19. Defeating Prompt Injections by Design – https://arxiv.org/abs/2503.18813
  20. Defending Against Indirect Prompt Injection Attacks With Spotlighting – https://arxiv.org/abs/2403.14720
  21. Defending against Indirect Prompt Injection by Instruction Detection – https://arxiv.org/abs/2505.06311
  22. Defense Against Prompt Injection Attack by Leveraging Attack Techniques – https://arxiv.org/abs/2411.00459
  23. Embedding-based classifiers can detect prompt injection attacks – https://arxiv.org/abs/2410.22284
  24. Empirical Analysis of Large Vision-Language Models against Goal Hijacking via Visual Prompt Injection – https://arxiv.org/abs/2408.03554
  25. Encrypted Prompt: Securing LLM Applications Against Unauthorized Actions – https://arxiv.org/abs/2503.23250
  26. Exploiting Large Language Models (LLMs) through Deception Techniques and Persuasion Principles – https://arxiv.org/abs/2311.14876
  27. F2A: An Innovative Approach for Prompt Injection by Utilizing Feign Security Detection Agents – https://arxiv.org/abs/2410.08776
  28. FATH: Authentication-based Test-time Defense against Indirect Prompt Injection Attacks – https://arxiv.org/abs/2410.21492
  29. Formalizing and Benchmarking Prompt Injection Attacks and Defenses – https://arxiv.org/abs/2310.12815
  30. From Allies To Adversaries: Manipulating LLM Tool-Calling Through Adversarial Injection – https://arxiv.org/abs/2412.10198
  31. GenTel-Safe: A Unified Benchmark and Shielding Framework for Defending Against Prompt Injection Attacks – https://arxiv.org/abs/2409.19521
  32. Goal-guided Generative Prompt Injection Attack on Large Language Models – https://arxiv.org/abs/2404.07234
  33. Hacking Back the AI-Hacker: Prompt Injection as a Defense Against LLM-driven Cyberattacks – https://arxiv.org/abs/2410.20911
  34. HijackRAG: Hijacking Attacks against Retrieval-Augmented Large Language Models – https://arxiv.org/abs/2410.22832
  35. How We Estimate The Risk From Prompt Injection Attacks On AI Systems – https://security.googleblog.com/2025/01/how-we-estimate-risk-from-prompt.html 
  36. Ignore Previous Prompt: Attack Techniques For Language Models – https://arxiv.org/abs/2211.09527 
  37. Ignore This Title and HackAPrompt: Exposing Systemic Vulnerabilities of LLMs through a Global Scale Prompt Hacking Competition – https://arxiv.org/abs/2311.16119
  38. InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents – https://arxiv.org/abs/2403.02691
  39. InjecGuard: Benchmarking and Mitigating Over-defense in Prompt Injection Guardrail Models – https://arxiv.org/abs/2410.22770
  40. Invisible Prompts, Visible Threats: Malicious Font Injection in External Resources for Large Language Models – https://arxiv.org/abs/2505.16957
  41. Is Your Prompt Safe? Investigating Prompt Injection Attacks Against Open-Source LLMs – https://arxiv.org/abs/2505.14368
  42. Lessons from Defending Gemini Against Indirect Prompt Injections – https://arxiv.org/abs/2505.14534
  43. Making LLMs Vulnerable to Prompt Injection via Poisoning Alignment – https://arxiv.org/abs/2410.14827
  44. MELON: Indirect Prompt Injection Defense via Masked Re-execution and Tool Comparison – https://arxiv.org/abs/2502.05174
  45. Multi-modal Prompt Injection Image Attacks Against GPT-4V – https://simonwillison.net/2023/Oct/14/multi-modal-prompt-injection/
  46. Neural Exec: Learning (and Learning from) Execution Triggers for Prompt Injection Attacks – https://arxiv.org/abs/2403.03792
  47. Not what you’ve signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection – https://arxiv.org/abs/2302.12173v2
  48. OET: Optimization-based prompt injection Evaluation Toolkit – https://arxiv.org/abs/2505.00843
  49. Optimization-based Prompt Injection Attack to LLM-as-a-Judge – https://arxiv.org/abs/2403.17710
  50. Preemptive Answer “Attacks” on Chain-of-Thought Reasoning – https://arxiv.org/abs/2405.20902
  51. Prompt, Divide, And Conquer: Bypassing Large Language Model Safety Filters Via Segmented And Distributed Prompt Processing – https://arxiv.org/abs/2503.21598
  52. Prompt Infection: LLM-to-LLM Prompt Injection within Multi-Agent Systems – https://arxiv.org/abs/2410.07283
  53. Prompt Inject Detection with Generative Explanation as an Investigative Tool – https://arxiv.org/abs/2502.11006
  54. Prompt Injections – ​​https://saif.google/secure-ai-framework/risks
  55. Prompt Injection Attack Against LLM-integrated Applications – https://arxiv.org/abs/2306.05499
  56. Prompt Injection Attacks Against GPT-3 – https://simonwillison.net/2022/Sep/12/prompt-injection/
  57. Prompt Injection Attacks in Defended Systems – https://arxiv.org/abs/2406.14048
  58. Prompt Injection Attacks on Large Language Models in Oncology – https://arxiv.org/abs/2407.18981
  59. Prompt Leaking – https://learnprompting.org/docs/prompt_hacking/leaking
  60. PROMPTFUZZ: Harnessing Fuzzing Techniques for Robust Testing of Prompt Injection in LLMs – https://arxiv.org/abs/2409.14729
  61. PromptShield: Deployable Detection for Prompt Injection Attacks – https://arxiv.org/abs/2501.15145
  62. Query-Based Adversarial Prompt Generation – https://arxiv.org/abs/2402.12329
  63. Rag and Roll: An End-to-End Evaluation of Indirect Prompt Manipulations in LLM-based Application Frameworks – https://arxiv.org/abs/2408.05025
  64. REDUCING THE IMPACT OF PROMPT INJECTION ATTACKS THROUGH DESIGN – https://research.kudelskisecurity.com/2023/05/25/reducing-the-impact-of-prompt-injection-attacks-through-design/
  65. Riley Goodside (@goodside) – “Exploiting GPT-3 prompts with malicious inputs that order the model to ignore its previous directions.” – https://x.com/goodside/status/1569128808308957185 
  66. RLPrompt: Optimizing Discrete Text Prompts With Reinforcement Learning – https://arxiv.org/abs/2205.12548
  67. Robustness via Referencing: Defending against Prompt Injection Attacks by Referencing the Executed Instruction – https://arxiv.org/abs/2504.20472
  68. RTBAS: Defending LLM Agents Against Prompt Injection and Privacy Leakage – https://arxiv.org/abs/2502.08966
  69. Scaling Behavior of Machine Translation with Large Language Models under Prompt Injection Attacks – https://arxiv.org/abs/2403.09832
  70. SecAlign: Defending Against Prompt Injection With Preference Optimization – https://arxiv.org/html/2410.05451v2
  71. Separator Injection Attack: Uncovering Dialogue Biases in Large Language Models Caused by Role Separators – https://arxiv.org/abs/2504.05689
  72. Simple Prompt Injection Attacks Can Leak Personal Data Observed by LLM Agents During Task Execution – https://arxiv.org/abs/2506.01055
  73. StruPhantom: Evolutionary Injection Attacks on Black-Box Tabular Agents Powered by Large Language Models – https://arxiv.org/abs/2504.09841
  74. StruQ: Defending Against Prompt Injection with Structured Queries – https://arxiv.org/html/2402.06363v2
  75. System-Level Defense against Indirect Prompt Injection Attacks: An Information Flow Control Perspective – https://arxiv.org/abs/2409.19091
  76. Systematically Analyzing Prompt Injection Vulnerabilities in Diverse LLM Architectures – https://arxiv.org/abs/2410.23308
  77. Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game – https://openreview.net/forum?id=fsW7wJGLBd
  78. The Task Shield: Enforcing Task Alignment to Defend Against Indirect Prompt Injection in LLM Agents – https://arxiv.org/abs/2412.16682
  79. Token-Efficient Prompt Injection Attack: Provoking Cessation in LLM Reasoning via Adaptive Token Compression – https://arxiv.org/abs/2504.20493
  80. Towards Action Hijacking of Large Language Model-based Agent – https://arxiv.org/abs/2412.10807
  81. TracLLM: A Generic Framework for Attributing Long Context LLMs – https://arxiv.org/abs/2506.04202
  82. Trust No AI: Prompt Injection Along The CIA Security Triad – https://arxiv.org/abs/2412.06090
  83. Universal and Context-Independent Triggers for Precise Control of LLM Outputs – https://arxiv.org/abs/2411.14738
  84. VPI-Bench: Visual Prompt Injection Attacks for Computer-Use Agents – https://arxiv.org/abs/2506.02456
  85. What Is A Prompt Injection Attack? – https://www.ibm.com/think/topics/prompt-injection

Thanks for reading!

Browse Topics

  • Artificial Intelligence
    • Adversarial Attacks & Examples
    • Alignment & Ethics
    • Backdoor & Trojan Attacks
    • Federated Learning
    • Model Extraction
    • Prompt Injection & Jailbreaking
    • Watermarking
  • Biotech & Agtech
  • Commodities
    • Agricultural
    • Energies & Energy Metals
    • Gases
    • Gold
    • Industrial Metals
    • Minerals & Metalloids
  • Economics
  • Management
  • Marketing
  • Philosophy
  • Robotics
  • Sociology
    • Group Dynamics
    • Political Science
    • Religious Sociology
    • Sociological Theory
  • Web3 Studies
    • Bitcoin & Cryptocurrencies
    • Blockchain & Cryptography
    • DAOs & Decentralized Organizations
    • NFTs & Digital Identity

Recent Posts

  • Defining The Prompt-Level AI Jailbreaking Techniques

    Defining The Prompt-Level AI Jailbreaking Techniques

    June 8, 2025
  • A Brief Introduction To AI Jailbreaking Attacks

    A Brief Introduction To AI Jailbreaking Attacks

    June 8, 2025
  • The Big List Of AI Jailbreaking References And Resources

    The Big List Of AI Jailbreaking References And Resources

    June 8, 2025
©2025 Brian D. Colwell | Theme by SuperbThemes