Brian D. Colwell

Menu
  • Home
  • Blog
  • Contact
Menu

Introduction To Adversarial Attacks: Typology And Definitions

Posted on June 7, 2025June 7, 2025 by Brian Colwell

Adversarial Examples exploit vulnerabilities in machine learning systems by leveraging the gap between a model’s learned representations and the true distribution of the data. But, it is the adversarial attack that discovers and creates the adversarial example. What are the methods of adversarial attacks commonly used to creates adversarial examples? Today, let’s introduce these adversarial attacks through a brief typology, followed by definitions of each.

Typology Of Adversarial Attacks

Gradient-based attacks (require access to model gradients):

  • FGSM – Single-step attack using the sign of gradients
  • BIM/ILCM – Iterative version of FGSM
  • PGD – Iterative attack with projection onto an epsilon-ball
  • APGD – Adaptive PGD with automatic hyperparameter tuning
  • MIM – Adds momentum to iterative attacks
  • C&W – Optimization-based attack that minimizes perturbation size
  • DDN – Decoupling Direction and Norm, minimizes L2 perturbation
  • EAD – Uses elastic-net regularization (L1 + L2)
  • DeepFool – Finds minimal perturbations to cross decision boundaries
  • FAB – Fast Adaptive Boundary attack
  • JSMA – Uses saliency maps to identify important pixels

Black-box/Query-based attacks (don’t require gradients):

  • Boundary Attack – Walks along decision boundaries
  • HopSkipJump – Improved boundary attack
  • Square Attack – Random search-based method
  • SimBA – Simple Black-box Attack using random directions
  • One Pixel Attack – Extreme sparse attack modifying few pixels

Gradient-estimation attacks (black-box using numerical gradients):

  • ZOO – Zeroth Order Optimization using finite differences
  • SPSA – Simultaneous Perturbation Stochastic Approximation

Universal perturbations (Single perturbation that fools models on many inputs):

  • UAP – Universal Adversarial Perturbations
  • FFF – Fast Feature Fool
  • GAP – Generative Adversarial Perturbations
  • Data-free Universal Perturbations
  • Class-specific Universal Perturbations
  • Sparse/Structured Universal Perturbations

Physical-world attacks (designed to work in real environments):

  • Adversarial Patches – Visible patches that cause misclassification
  • EoT – Expectation over Transformation for robust physical perturbations
  • Spatial Transformation Attacks – Rotations, translations, distortions

Gradient-Based Adversarial Attacks

FGSM (Fast Gradient Sign Method) 

A single-step attack that perturbs input by adding the sign of the gradient of the loss with respect to the input, scaled by epsilon. Efficient but less effective than iterative methods.

BIM/ILCM (Basic Iterative Method/Iterative Least-Likely Class Method) 

Multi-step variant of FGSM that applies small FGSM steps iteratively, clipping after each step to stay within epsilon-ball. ILCM variant targets least-likely class rather than any misclassification.

PGD (Projected Gradient Descent) 

Iterative attack that takes gradient steps to maximize loss, projecting back onto the L∞ epsilon-ball after each step. Considered one of the strongest first-order attacks and often used for adversarial training.

APGD (Auto-PGD) Enhanced version of PGD with adaptive step size scheduling and momentum, automatically adjusting hyperparameters based on loss progress. Reduces need for manual tuning and improves attack success rate.

MIM (Momentum Iterative Method) Iterative attack that incorporates momentum into gradient updates, accumulating gradients across iterations. Helps escape poor local maxima and improves transferability of adversarial examples.

C&W (Carlini & Wagner) 

Optimization-based attack that reformulates the problem to minimize perturbation size while ensuring misclassification. Uses different loss formulations and is particularly effective at finding small perturbations.

DDN (Decoupling Direction and Norm) 

Attack that decouples the direction and norm of perturbations, specifically designed to minimize L2 distance. Efficiently finds minimal L2 perturbations by optimizing direction and norm separately.

EAD (Elastic-Net Attack) 

Attack combining L1 and L2 regularization (elastic-net) in its objective function. Allows flexible control over perturbation sparsity and magnitude through regularization parameters.

DeepFool 

Iteratively finds the closest decision boundary and computes minimal perturbation to cross it. Provides good approximation of minimal adversarial perturbations with relatively low computational cost.

FAB (Fast Adaptive Boundary) 

Efficiently computes minimal adversarial perturbations by iteratively projecting onto approximated decision boundaries. Combines benefits of minimal-perturbation and PGD-style attacks.

JSMA (Jacobian-based Saliency Map Attack) 

Creates sparse perturbations by computing saliency maps from Jacobian matrices to identify most influential pixels. Modifies only high-saliency pixels, resulting in L0-constrained attacks.

Black-Box/Query-Based Adversarial Attacks

Boundary Attack 

A decision-based attack that starts from an adversarial example (often from another class) and performs a random walk along the decision boundary, gradually reducing the perturbation size while maintaining misclassification. Only requires the model’s final decision, not confidence scores.

HopSkipJump 

An improved boundary attack that uses binary search to efficiently estimate gradients at the decision boundary and geometric progression to find optimal step sizes. Significantly reduces the number of queries needed compared to the original Boundary Attack while maintaining similar perturbation quality.

Square Attack 

A score-based black-box attack that uses random search with square-shaped perturbations of decreasing size. Randomly samples perturbation updates and keeps those that increase the loss, achieving competitive performance with very few queries compared to other black-box methods.

SimBA (Simple Black-box Attack) 

Iteratively perturbs the input along random orthogonal directions (or DCT basis), keeping changes that decrease the model’s confidence in the correct class. Remarkably simple yet effective, requiring only forward passes and no gradient estimation.

One Pixel Attack 

An extreme sparse attack that modifies only one to few pixels to cause misclassification, using differential evolution (a population-based optimization algorithm) to find optimal pixel locations and values. Demonstrates that even minimal L0 perturbations can fool deep neural networks.

Gradient-Estimation Adversarial Attacks

ZOO (Zeroth Order Optimization) 

A black-box attack that estimates gradients using finite differences by querying the model with slightly perturbed inputs. Approximates the gradient by computing (f(x+h) – f(x-h))/2h for each dimension, enabling gradient-based attack techniques without actual gradient access. Effective but requires many queries due to dimensional scaling.

SPSA (Simultaneous Perturbation Stochastic Approximation) 

Estimates gradients more efficiently than ZOO by perturbing all input dimensions simultaneously with random noise, requiring only two queries per gradient estimate regardless of dimensionality. Uses the difference in model outputs from two random perturbations to approximate the gradient direction, trading some accuracy for drastically reduced query complexity.

Universal Adversarial Perturbations

UAP (Universal Adversarial Perturbations) 

The original universal attack that iteratively builds a single perturbation by aggregating DeepFool perturbations across multiple training samples. Finds a fixed noise pattern that, when added to most inputs from a dataset, causes misclassification with high probability.

FFF (Fast Feature Fool) 

Generates universal perturbations by maximizing activations in intermediate network layers rather than working in input space. Significantly faster than UAP as it directly optimizes in feature space to disrupt learned representations.

GAP (Generative Adversarial Perturbations) 

Uses a generative neural network to learn a mapping that produces universal perturbations for any input. Once trained, can instantly generate effective perturbations without iterative optimization, providing flexibility and efficiency.

Data-free Universal Perturbations 

Creates universal perturbations without access to training data by exploiting only the model’s architecture and parameters. Typically maximizes activation norms or uses class impressions extracted from the network itself.

Class-specific Universal Perturbations 

Targeted universal perturbations designed to cause misclassification to a specific target class. A single perturbation that makes most inputs classify as the chosen target (e.g., making all images classify as “dog”).

Sparse/Structured Universal Perturbations 

Universal perturbations with additional structural constraints such as sparsity or specific patterns (patches, frames). Includes universal adversarial patches that are localized, visible perturbations placed on images to cause misclassification.

Physical-World Adversarial Attacks 

Adversarial Patches 

Creates visible, localized patches that can be placed anywhere in an image to cause targeted or untargeted misclassification. Unlike traditional perturbations, these patches are designed to be robust to location, scale, and viewing angle, making them effective when printed and placed in physical environments (e.g., stickers that fool stop sign detectors).

EoT (Expectation over Transformation) 

A technique that generates physically robust adversarial examples by optimizing over a distribution of possible transformations (rotation, lighting, camera angle, etc.). During optimization, it samples various transformations and ensures the perturbation remains effective across all variations, making it robust to real-world conditions.

Spatial Transformation Attacks 

Adversarial attacks that manipulate spatial properties rather than pixel values, using transformations like rotation, translation, scaling, or distortion to cause misclassification. These attacks often produce more natural-looking adversarial examples since they preserve image content while altering geometric properties within imperceptible bounds.

Thanks for reading!

Browse Topics

  • Artificial Intelligence
    • Adversarial Attacks & Examples
    • Alignment & Ethics
    • Backdoor & Trojan Attacks
    • Federated Learning
    • Model Extraction
    • Prompt Injection & Jailbreaking
    • Watermarking
  • Biotech & Agtech
  • Commodities
    • Agricultural
    • Energies & Energy Metals
    • Gases
    • Gold
    • Industrial Metals
    • Minerals & Metalloids
  • Economics
  • Management
  • Marketing
  • Philosophy
  • Robotics
  • Sociology
    • Group Dynamics
    • Political Science
    • Religious Sociology
    • Sociological Theory
  • Web3 Studies
    • Bitcoin & Cryptocurrencies
    • Blockchain & Cryptography
    • DAOs & Decentralized Organizations
    • NFTs & Digital Identity

Recent Posts

  • Defining The Prompt-Level AI Jailbreaking Techniques

    Defining The Prompt-Level AI Jailbreaking Techniques

    June 8, 2025
  • A Brief Introduction To AI Jailbreaking Attacks

    A Brief Introduction To AI Jailbreaking Attacks

    June 8, 2025
  • The Big List Of AI Jailbreaking References And Resources

    The Big List Of AI Jailbreaking References And Resources

    June 8, 2025
©2025 Brian D. Colwell | Theme by SuperbThemes