Skip to content

LinkedIn Announcement: The Guardians — Why Agent Security Is Not LLM Safety

I've been working on Part 4 of the Engineering the Agentic Stack series — the security layer — and it turned out to be the one I had the most to learn on.

Short version: content filters like NeMo, Bedrock Guardrails, and Lakera wrap a model call and watch what it says. Agent guardians wrap the whole tool-using loop and watch what the system tries to do. The serious 2025–2026 incidents I looked at all exploited the loop, not the generation.

A few things that stood out:

🛡️ Six real incidents from the past 18 months (EchoLeak, Amazon Q, Azure MCP, Claude Code CVE-2025-59536, axios 1.14.1, Trivy Actions) and none would have been stopped by a content filter. The attacker never talked to the model — they talked to the tool, the config file, or the npm registry.

📋 Of the OWASP Agentic Security Top 10 for 2026, a content filter primarily addresses 2 categories. The other 8 are harness concerns: permissions, sandboxing, MCP scoping, supply chain.

🖱️ Anthropic published data showing 93% of Claude Code permission prompts get approved. At that rate, the prompt isn't a security control — it's telemetry. Their fix is architectural, not UX: a two-stage classifier that removes prompts for low-risk actions and stops the loop entirely when denials cluster.

🔐 RFC 8707 audience-bound MCP tokens are the boring cryptographic detail that actually closed the CVE-2025-59536-class attacks. A token scoped to one server literally can't be replayed against another — small change, large policy hole closed.

The full post walks through the six incidents, the OWASP mapping, how Claude Code, Codex CLI, and OpenAI Agents SDK express the same policy primitives differently, and a four-layer guardian stack for the Market Analyst Agent from Part 1.

🔗 Links in the comments

AIEngineering #AgentSecurity #LLMOps


Version 2: Self-Contained

I've been working on Part 4 of the Engineering the Agentic Stack series — the security layer — and it turned out to be the one I had the most to learn on.

Short version: content filters like NeMo, Bedrock Guardrails, and Lakera wrap a model call and watch what it says. Agent guardians wrap the whole tool-using loop and watch what the system tries to do. The serious 2025–2026 incidents I looked at all exploited the loop, not the generation.

A few things that stood out:

🛡️ Six real incidents from the past 18 months (EchoLeak, Amazon Q, Azure MCP, Claude Code CVE-2025-59536, axios 1.14.1, Trivy Actions) and none would have been stopped by a content filter. The attacker never talked to the model — they talked to the tool, the config file, or the npm registry.

📋 Of the OWASP Agentic Security Top 10 for 2026, a content filter primarily addresses 2 categories. The other 8 are harness concerns: permissions, sandboxing, MCP scoping, supply chain.

🖱️ Anthropic published data showing 93% of Claude Code permission prompts get approved. At that rate, the prompt isn't a security control — it's telemetry. Their fix is architectural, not UX: a two-stage classifier that removes prompts for low-risk actions and stops the loop entirely when denials cluster.

🔐 RFC 8707 audience-bound MCP tokens are the boring cryptographic detail that actually closed the CVE-2025-59536-class attacks. A token scoped to one server literally can't be replayed against another — small change, large policy hole closed.

The full post walks through the six incidents, the OWASP mapping, how Claude Code, Codex CLI, and OpenAI Agents SDK express the same policy primitives differently, and a four-layer guardian stack for the Market Analyst Agent from Part 1.

Full post: https://slavadubrov.github.io/blog/2026/04/20/the-guardians-why-agent-security-is-not-llm-safety/ Companion repo: https://github.com/slavadubrov/market-analyst-agent

AIEngineering #AgentSecurity #LLMOps


First Comment (for Version 1)

Full post: https://slavadubrov.github.io/blog/2026/04/20/the-guardians-why-agent-security-is-not-llm-safety/ Companion repo (guardian stack code for the Market Analyst Agent): https://github.com/slavadubrov/market-analyst-agent