Brian Colwell, Author at Brian D. Colwell

A Brief Introduction To AI Prompt Injection Attacks

Posted on June 8, 2025 by Brian Colwell

The Open Worldwide Application Security Project (OWASP), a nonprofit organization focused on education “about the potential security risks when deploying and managing Large Language Models (LLMs) and Generative AI applications”, initiated its…

Defining The Token-level AI Jailbreaking Techniques

Posted on June 8, 2025June 8, 2025 by Brian Colwell

Token-level Jailbreaking optimizes the raw sequence of tokens fed into the LLM to elicit responses that violate the model’s intended behavior. Unlike prompt-level attacks that rely on semantic manipulation, token-level methods treat…

Defining The Prompt-Level AI Jailbreaking Techniques

Posted on June 8, 2025June 8, 2025 by Brian Colwell

Prompt-level attacks are considered social-engineering-based, semantically meaningful prompts which elicit objectionable content from LLMs, distinguishing them from token-level attacks that use mathematical optimization of raw token sequences. Now, let’s consider specific prompt-level…

A Brief Introduction To AI Jailbreaking Attacks

Posted on June 8, 2025June 8, 2025 by Brian Colwell

System prompts for LLMs don’t just specify what the model should do – they also include safeguards that establish boundaries for what the model should not do. “Jailbreaking,” a conventional concept in software systems…

The Big List Of AI Jailbreaking References And Resources

Posted on June 8, 2025June 8, 2025 by Brian Colwell

Note that the below are in alphabetical order by title. Please let me know if there are any sources you would like to see added to this list. Enjoy! Thanks for reading!

The Big List Of AI Prompt Injection References And Resources

Posted on June 8, 2025June 8, 2025 by Brian Colwell

Note that the below are in alphabetical order by title. Please let me know if there are any sources you would like to see added to this list. Enjoy! Thanks for reading!

A History Of AI Jailbreaking Attacks

Posted on June 7, 2025June 7, 2025 by Brian Colwell

The last couple years have seen an explosion in research into jailbreaking attack methods and jailbreaking has emerged as the primary attack vector for bypassing Large Language Model (LLM) safeguards. To date,…

What Is AutoAttack? Evaluating Adversarial Robustness

Posted on June 7, 2025June 7, 2025 by Brian Colwell

AutoAttack has become the de facto standard for adversarial robustness evaluation because it solves real problems in a practical way. By combining diverse attack strategies with automatic parameter tuning, it provides a…

What Are The Adversarial Attacks That Create Adversarial Examples? Typology And Definitions

Posted on June 7, 2025June 10, 2025 by Brian Colwell

Adversarial Examples exploit vulnerabilities in machine learning systems by leveraging the gap between a model’s learned representations and the true distribution of the data. But, it is the adversarial attack that discovers…

Adversarial Examples In Model Extraction

Posted on June 7, 2025June 7, 2025 by Brian Colwell

While primarily known for their use in evasion attacks (causing misclassification), adversarial examples can also aid in model extraction by systematically exploring decision boundaries. By generating samples that lie close to these…