AI Agent Security & OWASP
Attack surfaces of AI Agents and how the OWASP framework mitigates prompt injection, tool misuse and data leakage.
AI agent security per OWASP encompasses securing autonomous AI agents against threats arising from their ability to plan, use tools, write to memory, and act independently. Two OWASP lists from the GenAI Security Project are authoritative: the OWASP Top 10 for LLM Applications 2025 (LLM01–LLM10) and the OWASP Top 10 for Agentic Applications 2026 (ASI01–ASI10), published on 9 December 2025, which address agent-specific risks such as Agent Goal Hijack, Tool Misuse, Excessive Agency, and Memory Poisoning. Agentic systems inherit all LLM risks and, through autonomy, tool integration, multi-agent coordination, and persistent state, add entirely new classes of vulnerability.
Key Takeaways
- ✓The canonical OWASP Top 10 for Agentic Applications 2026 (ASI01–ASI10) was published on 9 December 2025 by the OWASP GenAI Security Project and differs from earlier working drafts (the "15 agentic threats" with Memory Poisoning as ASI01) – this older numbering should only be cited historically.
- ✓Agentic systems inherit all LLM risks and add new classes: ASI01 (Agent Goal Hijack) corresponds, per the DeepTeam mapping, to LLM01 (Prompt Injection) × LLM06 (Excessive Agency), while ASI07, ASI08, and ASI10 have no LLM Top 10 counterpart at all.
- ✓EchoLeak (CVE-2025-32711, CVSS 9.3, June 2025, Aim Labs) was the first documented real-world zero-click prompt injection in a production system (Microsoft 365 Copilot), exfiltrating sensitive content without any user click.
- ✓Memory & Context Poisoning (ASI06) creates persistent attack surfaces: a single successful injection can poison memory permanently – Lakera research (Nov. 2025) documented "sleeper agent" behaviour in which compromised agents defended false beliefs against humans.
- ✓Galileo AI research (Dec. 2025) showed in simulated multi-agent systems that a single compromised agent poisoned 87% of downstream decision-making within four hours – Cascading Failures (ASI08) spread faster than classic incident response can contain them.
- ✓Human-Agent Trust Exploitation (ASI09) turns human oversight into the weak point: automation bias leads to rubber-stamping; effective HITL gates require independent verification of evidence rather than a mere "Approve" button (cf. EU AI Act Art. 14).
- ✓No single countermeasure is sufficient – every published guardrail (Microsoft's XPIA, Bedrock Guardrails, Gemini Safety) was bypassed by researchers within months; what is needed is defence-in-depth combining input filtering, scope/provenance enforcement, output filtering, and behavioural monitoring.
- ✓The OWASP lists are not a certification – "OWASP-compliant" is not a credible vendor claim; however, they map onto ISO/IEC 42001, EU AI Act Art. 15, GDPR Art. 32, as well as NIS2/DORA, and for DACH deployers they are the central threat catalogue (informational, not legal advice).
What AI Agent Security per OWASP Means
AI agent security per OWASP describes the systematic protection of autonomous AI agents against threats arising from their ability to independently plan, invoke tools, write to persistent memory, and act – with minimal step-by-step human approval. Whereas a classic chatbot predominantly responds (prompt in, completion out, possibly grounded via RAG), agents plan, reason, select tools, write memory, and act. This shift in the level of abstraction expands the attack surface along three axes: autonomy (multi-step plans, self-modification of context and memory), tool use (file system, APIs, database connectors, code sandboxes, MCP servers, A2A protocol peers), and persistence (long-lived memory stores, vector databases, agent-to-agent trust chains).
OWASP frames this in the GenAI Security Project's own wording (Sotiropoulos et al., 9 December 2025) as follows: agentic systems inherit all LLM risks and introduce entirely new classes of vulnerability arising from autonomy, tool integration, multi-agent coordination, and persistent state. For decision-makers in the DACH region, the central insight is: anyone reading only the LLM Top 10 systematically underestimates the agentic risk.
The Two Authoritative OWASP Lists
The OWASP GenAI Security Project (genai.owasp.org) publishes a family of overlapping AI security artefacts. Two of them are central to agents:
- OWASP Top 10 for LLM Applications 2025 (LLM01:2025 – LLM10:2025). Addresses model and application risks in any GenAI/LLM application – conversational, RAG-based, copilot-style, or agentic. This is the reference most DACH security teams already know.
- OWASP Top 10 for Agentic Applications 2026 (ASI01 – ASI10). Published on 9 December 2025 by the same initiative (Agentic Security Initiative) with more than 100 contributors and an Expert Review Board (including NIST, the Alan Turing Institute, Microsoft AI Red Team, AWS/CoSAI, Zenity). This is the backbone of agent-specific threat modelling.
Important note on taxonomy: The canonical December 2025 list differs from an earlier OWASP working draft ("Agentic AI Threats and Mitigations v1.0", February 2025, often cited as the "15 agentic threats"). Many practitioner articles from early/mid-2025 still use this draft numbering (e.g. "Memory Poisoning as ASI01"). That numbering belongs to historical drafts, not the final Top 10. The old categories now map into the consolidated list – Memory Poisoning sits under ASI06, HITL bypass under ASI09.
ASI01–ASI10 at a Glance
The ten canonical agentic threats and their primary control categories (canonical OWASP titles retained as proper names):
ID | OWASP title (Dec. 2025) | Primary countermeasure | LLM Top 10 relationship |
|---|---|---|---|
ASI01 | Agent Goal Hijack | Input filtering + scope/provenance ACL | LLM01 Prompt Injection |
ASI02 | Tool Misuse & Exploitation | Tool RBAC + schema validation + HITL on destructive operations | LLM06 Excessive Agency |
ASI03 | Identity & Privilege Abuse | NHI lifecycle + short-lived credentials + explicit delegation | (no direct counterpart) |
ASI04 | Agentic Supply Chain Vulnerabilities | Trusted registry + AIBOM + runtime quarantine | LLM03 Supply Chain (static) |
ASI05 | Unexpected Code Execution | Sandboxing + command allow-list + disable auto-approve | LLM05 Improper Output Handling |
ASI06 | Memory & Context Poisoning | Provenance metadata + tenant isolation + memory audits | LLM04 + LLM08 |
ASI07 | Insecure Inter-Agent Communication | mTLS + signed AgentCards + collaboration-graph allow-list | (no counterpart) |
ASI08 | Cascading Failures | Circuit breaker + bulkheads + digital-twin simulation | (no counterpart) |
ASI09 | Human-Agent Trust Exploitation | Force-engagement UI + tiered approval + automation-bias training | LLM09 Misinformation (partly) |
ASI10 | Rogue Agents | Behavioural baselines + kill switch + audit | LLM06 (partly) |
The amplification logic is decisive: per the DeepTeam mapping, ASI01 = LLM01 (Prompt Injection) × LLM06 (Excessive Agency) – only with multi-step execution that multiplies the damage beyond a single response. ASI07, ASI08, and ASI10 have no LLM Top 10 analogue at all.
Prompt Injection and Agent Goal Hijack (ASI01)
In ASI01, an attacker manipulates an agent's goals, task selection, or decision paths – via direct prompt injection or indirectly through hidden instructions in documents, the RAG corpus, emails, calendar invitations, PR descriptions, or tool outputs. The core issue: agents and the underlying model cannot reliably distinguish instructions from data – any text the agent reads is part of the attack surface.
The documented turning point is EchoLeak (CVE-2025-32711, CVSS 9.3), disclosed by Aim Labs in June 2025 in Microsoft 365 Copilot – the first documented real-world zero-click prompt injection in a production system. A single crafted email bypassed Microsoft's XPIA classifier and exfiltrated the most sensitive content from the Copilot context via a permitted Teams proxy – without any user click (Aim Labs coined the term "LLM Scope Violation"; technical depth: arXiv 2509.10540, Reddy et al., Sep. 2025). Microsoft patched it server-side. Further cases: GitHub Copilot YOLO Mode (CVE-2025-53773) and CamoLeak in GitHub Copilot Chat (CVSS 9.6), which used a CSP bypass via GitHub's own Camo image proxy to siphon private repository secrets character by character. The academic origin of the discipline is Greshake et al. (arXiv 2302.12173, 2023) on indirect prompt injection.
DACH relevance: A banking customer-service agent that reads a shared mailbox can, through a "thank you" email containing hidden instructions, be induced to reveal other customers' transaction snippets in its next response.
Tool Misuse (ASI02) and Excessive Agency
ASI02 differs subtly from privilege abuse: the access is legitimate, the use is not. The agent operates within authorized rights but applies a legitimate tool unsafely – deletes data, over-calls costly APIs, executes destructive operations. Excessive Agency (LLM06:2025) – granting unchecked power to act without sufficient scope limitation or human oversight – is the conceptual bridge and sits directly in front of ASI02, ASI03, and ASI10.
Documented: In the Amazon Q Code Assistant (CVE-2025-8217, July 2025), attackers compromised a GitHub token and smuggled destructive prompts into the VS Code extension v1.84.0, which, running with --trust-all-tools --no-interactive, could have deleted file systems and cloud resources without confirmation – around 1 million developers had the extension installed. Countermeasures: least privilege per tool, schema validation of every tool argument, allow-/deny-lists per agent role, disabling auto-approve ("YOLO" modes) for anything that touches databases, payments, communication, or deployment, plus cost caps with circuit breakers.
Memory & Context Poisoning (ASI06)
Unlike chatbots, which forget between sessions, agents maintain persistent memory – conversation history, preferences, learned context, RAG stores. This creates persistent attack surfaces: a single successful injection poisons memory permanently, the payload runs on indefinitely, and every future session inherits the compromise.
Documented: The Google Gemini memory attack (Feb. 2025, Johann Rehberger) demonstrated "delayed tool invocation" – an uploaded document instructed Gemini to store false information upon future trigger words. The Gemini calendar-invite poisoning study ("Targeted Promptware Attacks", 2025) rated 73% of 14 tested scenarios as High–Critical. Lakera research (Nov. 2025) documented "sleeper agent" behaviour: compromised agents developed persistent false beliefs about security policies and defended them against humans. Countermeasures: treat memory writes as security-sensitive, attach provenance metadata (source, timestamp, ingestion path, confidence) per entry, per-tenant isolation, regular memory audits, and deletion procedures compliant with GDPR Art. 17 (right to erasure).
Countermeasures and Human-in-the-Loop (ASI09)
Agents produce polished, authoritative-sounding output – humans tend to trust it. ASI09 (Human-Agent Trust Exploitation) makes precisely this trust layer attackable: automation bias, deference to authority, and "polished hallucination" lead human approvers to wave recommendations through. The oversight layer intended as a security control becomes the weak point. In the documented "manufacturing procurement cascade" (2025), a procurement agent was gradually convinced over three weeks that its authorization limit was USD 500,000 – the attacker then placed USD 5 million in fraudulent orders.
Effective HITL gates require independent verification of the evidence, not just of the agent's recommendation: tiered approvals (higher impact → more approvers/stronger evidence), UI patterns that actively surface reasoning, source provenance, and confidence (no mere "Approve" button), as well as periodic "test injections" to check whether HITL actually works (fire drill). This aligns directly with EU AI Act Art. 14 (human oversight).
Across the board, the defence-in-depth principle holds: no single guardrail is a silver bullet. EchoLeak bypassed Microsoft's XPIA classifier; every published guardrail (Bedrock Guardrails, Google Model Armor, Gemini Safety) was bypassed by competent researchers within months. Several complementary layers are needed: input filters (e.g. Llama Guard 4, NVIDIA NeMo Guardrails, Microsoft Prompt Shield, Lakera Guard) + scope/provenance enforcement + output filters + behavioural monitoring. Guardrails typically add 100–500 ms of latency per rail and exhibit elevated false-positive rates in multilingual DACH contexts (DE/FR/IT/EN code-switching) – vendor claims such as "blocks 99.x% of prompt injections" should be treated as marketing until independent red-team verification.
DACH Compliance Classification (informational, not legal advice)
The following notes are informational and do not replace legal advice. The OWASP lists are not a certification – "OWASP-compliant" is not a credible vendor claim; providers can implement OWASP recommendations, not be "compliant". They do, however, map cleanly onto the DACH compliance framework:
- EU AI Act Art. 15 (accuracy, robustness, cybersecurity) broadly covers high-risk systems but is technology-neutral; indirect injection, NHI specifics, and multi-agent cascades are not yet codified in the standard and remain the deployer's responsibility.
- GDPR Art. 32 (technical and organizational measures): most ASI threats touch confidentiality, integrity, availability, and resilience – ASI06, for example, Art. 5(1)(d) accuracy and Art. 17 erasure.
- ISO/IEC 42001:2023 (Annex A, 38 controls across 9 objectives) provides the reference controls, including A.6.2.4 (V&V), A.6.2.6 (operation/monitoring), and A.6.2.8 (logging).
- NIS2 / DORA: for essential/important entities, NIS2 Art. 21(2) requires, among others, supply-chain security (→ ASI04) and access control (→ ASI03); DORA Art. 28–30 governs ICT third-party risk – directly relevant for MCP/agent supply chains in the financial sector.
DACH supervision: The BaFin orientation guidance (18 Dec. 2025) and the FINMA supervisory communication 08/2024 are both non-binding, but noticeably shift the burden of proof in audits; both recommend, among others, adversarial penetration tests and scenario-based cyber exercises respectively. The BSI is developing dedicated AI agent security criteria; the BSI status report 2025 notes that only around 10% of German organizations use AI defensively, while attackers are already weaponizing it. For ATX-relevant readers, note: as of May 2026, Austria's FMA has not yet published comparable formal AI guidance – this gap should be explicitly flagged.
Outlook and Practical Note
The OWASP Agentic Top 10 is version 1.0 (9 December 2025) and is expected to evolve at least annually – readers should bookmark genai.owasp.org rather than treat the list as static. Three sober practical notes: First, MITRE ATLAS, as an adversary playbook, lags the agentic front line by 6–12 months, especially for ASI07, ASI08, and ASI10. Second, detection in production agent deployments is still weak – most observability stacks were built for classic applications and do not surface reasoning-trace anomalies, memory-write provenance violations, or behavioural-drift signals; the conclusion "we don't see it in the SIEM, so it isn't happening" is dangerously wrong. Third, "AI red-teaming" is not the same as classic pen-testing (different skills, different tools such as Garak/PyRIT/DeepTeam, probabilistic rather than binary findings).
For DACH deployers, the pragmatic entry point is: set up the OWASP Agentic Top 10 as a risk register, map it against ISO 42001 and EU AI Act Art. 15, treat every agent as a standalone non-human identity with short-lived credentials, deploy defence-in-depth guardrails, secure destructive operations with genuine HITL, and keep a kill switch ready for rogue-agent isolation. Operational depth on prompt injection, red-teaming, observability, and ISO 42001 implementation belongs in the respective sister topics of this security cluster.
All Articles in this Topic
10 ArticlesOWASP LLM Top 10 (2025) explained: The ten security risks for LLM applications
The OWASP LLM Top 10 (2025) are the ten most serious security risks for applications built on large language models, published by the OWASP GenAI Security Project. They range from Prompt Injection through Sensitive Information Disclosure to Unbounded Consumption and form the reference for securing LLM and AI-agent systems.
OWASP Agentic Security (ASI) Top 10 (2026): The Risks of Agentic AI Systems
The OWASP Agentic Top 10 (ASI01–ASI10) is the risk list for autonomous AI agents published on 9 December 2025 by the OWASP GenAI Security Project under the Agentic Security Initiative. It extends the LLM Top 10 with agent-specific threats such as goal hijack, tool misuse, privilege abuse, memory poisoning and cascading failures. As of 2026.
Prompt Injection: Direct vs. Indirect - the difference and why it becomes a boardroom issue with AI agents
Prompt injection refers to the smuggling of malicious instructions into an AI system's input in order to hijack its behaviour. In direct injection, the user themselves manipulates the prompt. In indirect injection, attackers hide the instruction in retrieved data such as documents, emails or web pages that the agent processes.
Tool Misuse and Excessive Agency: When AI Agents Are Allowed to Do Too Much
Excessive Agency refers to the overly broad autonomy, permissions or functionality of an AI agent – it can do more than its task would require. Tool Misuse is the abusive use of legitimate tools: access is authorised, the usage is not. Both lead to unintended actions, data exfiltration and uncontrolled costs.
Agent Goal Hijacking: When the Objectives of Autonomous AI Agents Are Manipulated
Goal Hijacking (OWASP ASI01) refers to the manipulation of an autonomous AI agent's objectives, task selection or decision paths. Attackers redirect the agent via prompt injection, manipulated tool outputs, poisoned data or forged inter-agent messages. The agent is not faulty; it follows planted instructions that it believes to be legitimate.
Preventing Memory Poisoning: Securing the Long-Term and Vector Memory of AI Agents
Memory poisoning refers to the deliberate injection of manipulated content into the long-term or vector memory of an AI agent. Unlike one-off prompt injections, the malicious content remains persistently stored and compromises the agent's behaviour on every subsequent retrieval — a single successful write operation has an unlimited lasting effect.
Designing Human-in-the-Loop (HITL) Correctly: Approval Patterns for AI Agents
Human-in-the-Loop (HITL) refers to the deliberate insertion of human approvals into the action chain of an AI agent. Before irreversible, costly or legally relevant actions, a human reviews and authorises them before the agent acts. HITL is the operational implementation of the human oversight required by EU AI Act Art. 14.
Red-Teaming for AI Agents: Uncovering Vulnerabilities Systematically
Red-teaming for AI agents refers to the systematic, simulated attacking of AI agents in order to uncover vulnerabilities such as prompt injection, jailbreaks, tool misuse and data exfiltration before real attackers exploit them. It combines automated attack tools with manual, multi-stage attack creativity and delivers measurable findings such as attack success rates rather than binary vulnerability lists.
AI Agent Monitoring with LangSmith and Langfuse: Observability for Secure AI Agents
AI agent monitoring (agent observability) is the end-to-end capture and analysis of what an AI agent does: traces, tool calls, token costs, latency, errors and eval scores. Tools such as LangSmith and Langfuse make an agent's decision paths traceable and are therefore a prerequisite for security, debugging and compliance.
Audit Trails for AI Agents: Complete, Tamper-Proof Logging
An audit trail for AI agents is the complete, tamper-proof logging of all decisions, tool calls, data accesses, memory operations and human approvals of an agent. It makes autonomous agent behaviour forensically traceable, satisfies regulatory logging obligations, and provides evidence in the event of damage about who or what triggered which action.