The Open Worldwide Application Security Project (OWASP), a nonprofit organization focused on LLM security risk education, updated for 2025 its well-respected list ‘Top 10 for Large Language Model Applications’. Amongst OWASP’s top AI security risks for 2025 was listed Sensitive Information Disclosure. What are sensitive information attacks and what is the actual risk involved in sensitive information disclosure? Let’s answer these questions today.

What Is A Sensitive Information Disclosure Attack?

AI Sensitive Information Disclosure Attacks occur when malicious actors exploit vulnerabilities in AI systems to extract confidential information embedded within their training data or system configurations. Problematically, LLMs have a multitude of weaknesses and critical failure points that make sensitive information disclosure attacks accessible, practical, profitable, and scalable to attackers with limited resources. Further, implementation of available defensive measures against these attacks often involves significant trade-offs in model performance and utility, requiring substantial resources to maintain – especially as LLMs evolve and are integrated into business-critical applications.

Sensitive information disclosure attacks exploit the inherent design of large language models (LLMs) and other AI systems, which are trained on massive datasets that may include confidential or sensitive data. Unlike traditional security vulnerabilities that arise from coding flaws, these attacks target the core functioning of AI models, taking advantage of their ability to memorize and reproduce information when given the right prompts. The fundamental weakness lies in how modern AI systems learn – they are built to absorb vast amounts of data and reproduce it intelligently, which makes them particularly susceptible to unintentionally revealing sensitive information embedded in their training materials.

Sensitive Information Disclosure Vectors Of Attack

As a result of this fundamental weakness in how AI systems learn, adversaries are presented with several opportunistic vectors of attack – the primary of which include data ingestion vulnerabilities, model training weaknesses, and inference-stage risks:

Data Ingestion Vulnerabilities

Data ingestion vulnerabilities include unvetted data sources that may contain confidential information or malicious content, insecure transfer and access controls that allow unauthorized data access, human error leading to unsanctioned integrations introducing sensitive data, and direct integration risks where AI systems connect to databases or systems without proper controls.

Model Training Weaknesses

Model training weaknesses include unintentional memorization where models store sensitive or regulated information, limitations of anonymization where sophisticated models can re-identify individuals despite data redaction, overfitting that increases risk of verbatim data leakage, and regulatory compliance issues where models reproduce protected information in violation of privacy laws.

Inference-Stage Risks

Inference-stage risks include unprotected data processing with insufficient controls on inputs and outputs, unstructured data challenges where traditional security measures cannot detect PII in certain formats, and inadequate safeguards for logged, cached, or stored outputs creating additional exposure points throughout the system lifecycle.

Sensitive Information Disclosure Attack Methodologies

Worth noting is that current vectors of attack already encourage training data exploitation, model analysis attacks, prompt-based attacks, and a variety of infrastructure-focused attacks:

Training Data Exploitation

Unintended memorization attacks extract details like names, addresses, or medical records from models that memorized training data (even if it appeared only once in the dataset). Training data extraction techniques can recover specific text fragments from training datasets, while data poisoning manipulates training data to affect model behavior or cause models to memorize sensitive information. These vulnerabilities make AI systems particularly susceptible to revealing confidential information.

Model Analysis Attacks

Model inversion attacks reconstruct private training data from model outputs by systematically analyzing responses, while membership inference attacks determine if specific data was used in training, potentially exposing participation in sensitive studies such as medical research. Parameter stealing extracts model parameters to replicate protected systems, potentially compromising valuable intellectual property. Side-channel attacks analyze non-direct outputs like response times to infer protected information, and transfer learning attacks exploit vulnerabilities in fine-tuned models to access sensitive data from the original training set.

Prompt-Based Attacks

Direct prompt injection manipulates inputs to trick models into revealing internal information by bypassing security controls, while indirect prompt injection exploits third-party content processing to introduce malicious instructions. Context and prompt theft recovers confidential system instructions or proprietary algorithms by analyzing model behavior, and data inference attacks combine seemingly innocuous data points to deduce confidential details by exploiting the model’s pattern recognition capabilities.

Infrastructure-Focused Attacks

Data pipeline attacks target weaknesses during preprocessing and storage phases of AI development, while authorization bypass exploits weak authentication to gain unauthorized access to AI systems. Supply chain attacks target third-party components used in AI development, and flowbreaking exploits specifically target enterprise AI applications by disrupting their normal operation. Evasion attacks bypass AI detection systems designed to protect sensitive information, and embedded malware can compromise the entire AI lifecycle through infected training files.

Pervasive Threat, Severe Financial Consequences

These attacks aren’t merely theoretical concerns and the threat landscape is rapidly evolving, with alarming statistics highlighting the severity of the problem. For example, 74% of organizations experienced an AI breach in 2024, up from 67% the previous year, according to AI security provide HiddenLayer, while organizations take an average of 204 days to identify breaches and 73 days to contain them, according to a study from IBM, as well as articles from Varonis on data breach response times and statistics.

Not only that, but nearly half of all breaches expose customer personally identifiable information (PII), and 40% compromise employee data, according to reports from secureframe, Pentera, and spacelift. Interestingly, compromised credentials are responsible for 86% of breaches, according to NIL, BeyondTrust, and Hacker News, at the same time as cybersecurity staffing shortages affect over 25% of organizations, according to BCG and ASIS.

The financial consequences are equally severe. For example, the global average cost of a data breach reached $4.88 million in 2024, a 10% increase from the previous year, while U.S. breaches averaged $9.36 million, according to sources such as statista, IBM, keepnet, and CFO. Further, organizations face potential penalties under GDPR or HIPAA violations of up to 4% of global annual turnover or €20 million, and 81% of consumers abandon a company’s services following a breach, according to Nasdaq and Help Net Security.

Final Thoughts

Sensitive information disclosure attacks represent a critical challenge and, as organizations increasingly integrate AI into their core operations, the risks associated with these attacks will only grow more significant. But how do we ensure that AI systems are designed with privacy and security as fundamental requirements rather than afterthoughts? The time to address these vulnerabilities is now – before sensitive information disclosure attacks become as commonplace as phishing or ransomware in the cybersecurity landscape.

Thanks for reading!