The Open Worldwide Application Security Project (OWASP), a nonprofit organization focused on education “about the potential security risks when deploying and managing Large Language Models (LLMs) and Generative AI applications”, initiated its well-respected ‘Top 10 for Large Language Model Applications’ in 2023 and updated this list of security issues specific to AI applications recently. Since its inception, this “Top 10” has highlighted prompt injection attacks as the top security issue in the field of AI risk management.

What Are Prompt Injection Attacks?

Prompt injection attacks target a model’s instruction-following logic during deployment, not the training phase, as do data poisoning attacks. The fundamental vulnerability stems from an LLMs’ ability to respond to natural language instructions – architecturally, LLMs process all inputs, whether instructions or user data, as part of a single natural-language stream without clear boundaries between developer-defined system prompts and external content. It is this blurred boundary that attackers can exploit, as LLMs inherently treat instructions and user inputs interchangeably, relying on context rather than secure data boundaries to distinguish between the two.

In prompt injection attacks, adversaries craft deceptive inputs to manipulate a model’s behavior by overriding its original instructions. These attacks succeed by tricking the LLM into interpreting malicious payloads as legitimate commands, rather than untrusted data. Unlike traditional cyberattacks, prompt injections don’t rely on malicious code, but instead use plain language to manipulate LLMs into performing unintended actions, bypassing traditional code-based attack vectors.

What makes LLM prompt injection vulnerabilities particularly concerning is that they exploit the architecture and core features that make LLMs useful in the first place. Even under black-box settings with risk-management strategies already in place, malicious users can exploit LLMs through prompt injection attacks that circumvent content restrictions or gain access to the model’s original instructions. Not only that, but systems designed to detect injections can, themselves, be vulnerable to sophisticated prompt injection attacks. Challengingly, attempts to mitigate these risks can limit functionality of, and thus limit the value of, these systems, and no foolproof solution has emerged that balances security against innovation.

How Did We Get Here? A Brief History Of Prompt Injection Attacks

It was in 2022 that prompt injection attacks gained mainstream attention. In May that year, researchers discovered that ChatGPT was susceptible to prompt injections. They confidentially reported the flaw to OpenAI and submitted a paper in September 2022 that declassified the vulnerability. That same month, data scientist Riley Goodside drew attention to the risk of prompt injection attacks by “exploiting GPT-3 prompts with malicious inputs that ordered the model to ignore its previous directions”. This exploitation was confirmed by Simon Wilson soon after.

From 2022 through 2024, experimentation with, and research into, prompt injection attacks advanced at break-neck speed. We saw these attacks extendedto GPT-4, it was found that malware transmission is possible through prompt injections, and novel methods such as “prompt leaking”, “indirect prompt injection”, and “dialog poisoning”, as well as “self-generated typographic” and “cross-modal” attacks, were introduced.

This year, 2025, research is prioritizing attack mitigation techniques, such as preference optimization and ethical prompt engineering, and new defenses, such as SecAlign and gradient-based methods, are showing promise against existing threats. But, with the ability to focus on creating new threats, attackers always have the advantage. For example, this year, adversarial research revealed “adaptive attacks” that “break defenses against indirect prompt injection attacks on LLM agents”.

Final Thoughts

As AI systems become more integrated into critical infrastructure, addressing architectural flaws requires rethinking model design, input validation approaches, and monitoring frameworks. Without architectural redesigns that prioritize security alongside functionality, prompt injection will remain a pervasive and intractable challenge, endangering systems reliant on automated decision-making and ultimately stifling AI innovation.