Synthetic media is becoming increasingly indistinguishable from human-created content and, as AI continues to move to the edge, effective watermarking will become increasingly important for protecting intellectual property. Problematically, most watermarking techniques face significant challenges against determined removal attempts, and there is no one-stop technical process that balances the three components of the “Watermarking Trade-Off Triangle” – robustness, fidelity, and capacity – across all modalities.
Today, let’s continue our conversation on AI watermarking with a brief review of watermarking text, image, and video media.
Text Watermarks
Typically for text watermarks, statistical approaches embed watermarks by controlling token selection probabilities, while steganographic methods manipulate word order or phrasing. The more often employed approaches of the two for text media is statistical watermarking, which, according to Kirchenbauer et al., 2023, is “strictly a subset of steganography, the task of embedding arbitrary hidden information into data”.
Language models generate text by predicting the next token (word or subword) based on previous context, with a randomness component that allows variability in outputs. Statistical text watermarking leverages this randomness by introducing controlled biases into the token selection process – modifying the language model’s output probabilities so that certain words or sequences appear more frequently – creating a statistical “fingerprint”. One common approach divides the model’s vocabulary into “green” (preferred) and “red” (avoided) tokens, subtly biasing the model toward selecting words from the green list, creating a statistical pattern that detection algorithms can identify (Kirchenbauer et al., 2023).
Statistical text watermarking becomes more reliable with longer passages, where statistical patterns are easier to detect – as they depend on accumulating enough statistical evidence to distinguish from random chance (DataScientest, 2023). Short texts (under a paragraph) present significant challenges for reliable watermarking without affecting quality (Kirchenbauer et al., 2023). Though statistical text watermarks are vulnerable to removal through paraphrasing or extensive editing, these watermarks have several strengths – including little impact on content quality and high adaptability to various model architectures.
More recent text watermarking techniques include that of researchers at the University of Maryland and OpenAI, which uses previous tokens to influence which words are favored for selection next, creating a predictable but subtle pattern (Brookings, 2024), and that of Stanford researchers, which predetermines the “dice rolls” that guide token selection and stores this as a key for later detection. A significant advancement in text watermarking emerged in 2025 with paraphrasing-based approaches: Xu et al. (2025) introduced a groundbreaking approach that leverages LLMs to create an imperceptible multi-bit text watermark through paraphrasing. Their system fine-tunes a pair of LLM paraphrasers designed to behave differently, so that the paraphrasing differences reflected in the text semantics can be identified by a trained decoder. To embed the watermark, the two paraphrasers are alternated to encode a pre-defined binary code at the sentence level. A text classifier then serves as the decoder to extract each bit of the watermark. Importantly, the approach of Xu et al. (2025) achieved over 99.99% detection AUC (Area Under the Curve) with even relatively small (1.1B parameter) text paraphrasers while preserving the semantic information of the original content. The system demonstrates robust performance even when subjected to word substitution and sentence paraphrasing perturbations, and generalizes well to out-of-distribution data.
Image & Video Watermarks
Generally more robust than text watermarking, image watermarking effectiveness depends on factors such as resolution, complexity, and color depth. Higher-resolution images typically allow for more robust watermarking because more data can be embedded without noticeable artifacts and signals can be embedded without perceptible quality loss (Hugging Face, 2024). Put another way, low-resolution images present greater challenges for reliable watermarking than do high-resolution images (Kirchenbauer et al., 2023). In the case of video watermarks, effectiveness varies with audio duration, complexity, and frequency ranges. Video watermarking typically applies image watermarking techniques to individual frames, sometimes with additional temporal consistency constraints, and longer videos allow watermarks to be distributed across multiple frames, potentially increasing robustness (Lu et al., 2024).
For deep learning-based image watermarking, several approaches have been developed. Early methods used autoencoder structures for embedding. Next came, amongst others, “Wavelet Domain”, a deep neural network model incorporating watermark information into high-frequency wavelet domain components, and “StegaStamp”, which embedded bit-string watermarks into photo images with exceptionally high perceptual invisibility. Now, however, for both images and video, embedding-based techniques that directly manipulate the content generated by diffusion model neural networks, such as pixel data and frequency components, are more often used – inserting patterns and leveraging the redundancy and complexity of visual information to hide the watermarks (Lu et al., 2024, DataCamp, 2025). One example is Google’s SynthID, designed to work with diffusion models like Google’s Imagen and which embeds robust patterns into pixel data, watermarking without altering the media’s visual appearance.
Continuing on diffusion model-based watermarking, Fernandez et al. (2023)introduced Stable Signature, which involves pre-training a watermark extractor applicable to both VAE encoders and decoders, then fixing this extractor and fine-tuning the model decoder to generate images that can reveal bit watermarks through the extractor. Other approaches include AquaLoRA (Feng et al., 2024), which shifted the fine-tuning focus to diffusion UNet, improving the precision of watermark decoding, and LaWa, which uses a bit modulation module between diffusion UNet blocks without modifying original weights, thus maintaining generation quality.
For diffusion model-based image generation, watermarking methods primarily fall into one of two categories: Fine-tuning-based watermarking (embedding watermarks by fine-tuning part of the diffusion model – VAE or UNet – or an extra pre-trained watermark embedder) or Initial-noise-modification-based watermarking (embedding watermark patterns into the initial noise of diffusion models, leveraging DDIM’s approximate reversibility). Examples of Fine-tuning-based watermarking include fine-tuning the VAE decoder with a pre-trained watermark decoder, using an information encoder to convert watermark into a matrix embedded in intermediate output, and employing a pre-trained watermark encoder for secondary processing of image decoder output. Examples of Initial-noise-modification-based watermarking include methods such as “Tree-Ring” (embedding watermarks in the frequency domain of initial noise in a circular pattern), “RingID” (extending Tree-Ring to multiple-keys watermarking), and “GaussianShading” (transforming watermark from uniform distribution to Gaussian to reduce impact on generated images).
Thanks for reading!