The Guardians — Why Agent Security Is Not LLM Safety
Part 4 of the Engineering the Agentic Stack series
In 2024 we shipped guardrails. NeMo Guardrails, Bedrock Guardrails, and a handful of similar products wrapped the input and output of a model call and asked one question: is the model producing the right thing? Toxic output, PII leak, jailbreak, off-topic. Filter, redact, refuse. The threat was easy to see because there were only two places to look: input and output.
Then we gave the model a tool loop, a filesystem, a shell, a Model Context Protocol (MCP) registry, and the authority to act. The threat model changed underneath us and most of the 2024 guardrails didn't notice. Six serious incidents in eighteen months (EchoLeak, the Amazon Q Developer extension compromise, the Azure MCP Server disclosure, Claude Code CVE-2025-59536, the axios 1.14.1 remote-access trojan, and the Trivy Actions tag hijack, each walked through below) were not addressable by a better output filter. The output was fine. The system was compromised.