The Open Worldwide Application Security Project (OWASP), a nonprofit organization focused on education “about the potential security risks when deploying and managing Large Language Models (LLMs) and Generative AI applications”, initiated its…
Author: Brian Colwell
Defining The Token-level AI Jailbreaking Techniques
Token-level Jailbreaking optimizes the raw sequence of tokens fed into the LLM to elicit responses that violate the model’s intended behavior. Unlike prompt-level attacks that rely on semantic manipulation, token-level methods treat…
Defining The Prompt-Level AI Jailbreaking Techniques
Prompt-level attacks are considered social-engineering-based, semantically meaningful prompts which elicit objectionable content from LLMs, distinguishing them from token-level attacks that use mathematical optimization of raw token sequences. Now, let’s consider specific prompt-level…
A Brief Introduction To AI Jailbreaking Attacks
System prompts for LLMs don’t just specify what the model should do – they also include safeguards that establish boundaries for what the model should not do. “Jailbreaking,” a conventional concept in software systems…
The Big List Of AI Jailbreaking References And Resources
Note that the below are in alphabetical order by title. Please let me know if there are any sources you would like to see added to this list. Enjoy! Thanks for reading!
The Big List Of AI Prompt Injection References And Resources
Note that the below are in alphabetical order by title. Please let me know if there are any sources you would like to see added to this list. Enjoy! Thanks for reading!
A History Of AI Jailbreaking Attacks
The last couple years have seen an explosion in research into jailbreaking attack methods and jailbreaking has emerged as the primary attack vector for bypassing Large Language Model (LLM) safeguards. To date,…
What Is AutoAttack? Evaluating Adversarial Robustness
AutoAttack has become the de facto standard for adversarial robustness evaluation because it solves real problems in a practical way. By combining diverse attack strategies with automatic parameter tuning, it provides a…
What Are The Adversarial Attacks That Create Adversarial Examples? Typology And Definitions
Adversarial Examples exploit vulnerabilities in machine learning systems by leveraging the gap between a model’s learned representations and the true distribution of the data. But, it is the adversarial attack that discovers…
Adversarial Examples In Model Extraction
While primarily known for their use in evasion attacks (causing misclassification), adversarial examples can also aid in model extraction by systematically exploring decision boundaries. By generating samples that lie close to these…