Note that the below are in alphabetical order by title. Please let me know if there are any sources you would like to see added to this list. Enjoy! Thanks for reading!
Category: Artificial Intelligence
A History Of AI Jailbreaking Attacks
The last couple years have seen an explosion in research into jailbreaking attack methods and jailbreaking has emerged as the primary attack vector for bypassing Large Language Model (LLM) safeguards. To date,…
What Is AutoAttack? Evaluating Adversarial Robustness
AutoAttack has become the de facto standard for adversarial robustness evaluation because it solves real problems in a practical way. By combining diverse attack strategies with automatic parameter tuning, it provides a…
What Are The Adversarial Attacks That Create Adversarial Examples? Typology And Definitions
Adversarial Examples exploit vulnerabilities in machine learning systems by leveraging the gap between a model’s learned representations and the true distribution of the data. But, it is the adversarial attack that discovers…
Adversarial Examples In Model Extraction
While primarily known for their use in evasion attacks (causing misclassification), adversarial examples can also aid in model extraction by systematically exploring decision boundaries. By generating samples that lie close to these…
Backdoor Attacks – The Problem Has Outpaced The Solution
The concept of the backdoor, or “trojan”, AI attack was first proposed in 2017 by Gu, Dolan-Gavitt & Garg in their paper ‘BadNets: Identifying Vulnerabilities In The Machine Learning Model Supply Chain’…
Gradient And Update Leakage (GAUL) In Federated Learning
Gradient and Update Leakage attacks intercept and analyze gradient updates or model changes in distributed or federated learning environments to reconstruct the underlying model. In federated learning or other collaborative training setups,…
An Introduction To AI Model Extraction
AI model extraction refers to an attack method where an adversary attempts to replicate the functionality of a machine learning model by systematically querying it and using its outputs to train a…
What Are The Types Of AI Model Extraction Attacks?
Model Extraction Attacks aim at stealing model architecture, training hyperparameters, learned parameters, or model behavior, and are effective across a broad threat landscape that features many practical attack vectors. Today, let’s discuss the most…
What Is Alignment-Aware Extraction?
Alignment-Aware Extraction goes beyond conventional extraction methods by strategically capturing both the functional capabilities and ethical guardrails implemented in modern AI systems. By specifically accounting for alignment procedures like Reinforcement Learning from Human Feedback…