Artificial Intelligence Archives - Page 4 of 7

The Big List Of AI Prompt Injection References And Resources

Posted on June 8, 2025June 8, 2025 by Brian Colwell

Note that the below are in alphabetical order by title. Please let me know if there are any sources you would like to see added to this list. Enjoy! Thanks for reading!

A History Of AI Jailbreaking Attacks

Posted on June 7, 2025June 7, 2025 by Brian Colwell

The last couple years have seen an explosion in research into jailbreaking attack methods and jailbreaking has emerged as the primary attack vector for bypassing Large Language Model (LLM) safeguards. To date,…

What Is AutoAttack? Evaluating Adversarial Robustness

Posted on June 7, 2025June 7, 2025 by Brian Colwell

AutoAttack has become the de facto standard for adversarial robustness evaluation because it solves real problems in a practical way. By combining diverse attack strategies with automatic parameter tuning, it provides a…

What Are The Adversarial Attacks That Create Adversarial Examples? Typology And Definitions

Posted on June 7, 2025June 10, 2025 by Brian Colwell

Adversarial Examples exploit vulnerabilities in machine learning systems by leveraging the gap between a model’s learned representations and the true distribution of the data. But, it is the adversarial attack that discovers…

Adversarial Examples In Model Extraction

Posted on June 7, 2025June 7, 2025 by Brian Colwell

While primarily known for their use in evasion attacks (causing misclassification), adversarial examples can also aid in model extraction by systematically exploring decision boundaries. By generating samples that lie close to these…

Backdoor Attacks – The Problem Has Outpaced The Solution

Posted on June 7, 2025June 7, 2025 by Brian Colwell

The concept of the backdoor, or “trojan”, AI attack was first proposed in 2017 by Gu, Dolan-Gavitt & Garg in their paper ‘BadNets: Identifying Vulnerabilities In The Machine Learning Model Supply Chain’…

Gradient And Update Leakage (GAUL) In Federated Learning

Posted on June 7, 2025June 7, 2025 by Brian Colwell

Gradient and Update Leakage attacks intercept and analyze gradient updates or model changes in distributed or federated learning environments to reconstruct the underlying model. In federated learning or other collaborative training setups,…

An Introduction To AI Model Extraction

Posted on June 7, 2025June 7, 2025 by Brian Colwell

AI model extraction refers to an attack method where an adversary attempts to replicate the functionality of a machine learning model by systematically querying it and using its outputs to train a…

What Are The Types Of AI Model Extraction Attacks?

Posted on June 7, 2025June 7, 2025 by Brian Colwell

Model Extraction Attacks aim at stealing model architecture, training hyperparameters, learned parameters, or model behavior, and are effective across a broad threat landscape that features many practical attack vectors. Today, let’s discuss the most…

What Is Alignment-Aware Extraction?

Posted on June 7, 2025June 7, 2025 by Brian Colwell

Alignment-Aware Extraction goes beyond conventional extraction methods by strategically capturing both the functional capabilities and ethical guardrails implemented in modern AI systems. By specifically accounting for alignment procedures like Reinforcement Learning from Human Feedback…