Skip to content

Agents 101

The Guardians — Why Agent Security Is Not LLM Safety

Part 4 of the Engineering the Agentic Stack series

In 2024 we shipped guardrails. NeMo Guardrails, Bedrock Guardrails, and a handful of similar products wrapped the input and output of a model call and asked one question: is the model producing the right thing? Toxic output, PII leak, jailbreak, off-topic. Filter, redact, refuse. The threat was easy to see because there were only two places to look: input and output.

Then we gave the model a tool loop, a filesystem, a shell, a Model Context Protocol (MCP) registry, and the authority to act. The threat model changed underneath us and most of the 2024 guardrails didn't notice. Six serious incidents in eighteen months (EchoLeak, the Amazon Q Developer extension compromise, the Azure MCP Server disclosure, Claude Code CVE-2025-59536, the axios 1.14.1 remote-access trojan, and the Trivy Actions tag hijack, each walked through below) were not addressable by a better output filter. The output was fine. The system was compromised.

The Hands — Tool Ergonomics and the Agent-Computer Interface

Part 3 of the Engineering the Agentic Stack series

Part 1 covered reasoning loops, Part 2 covered memory. This post is about tools — how agents interact with the world, and why tool design matters more than most teams realize.

The tool landscape shifted dramatically in 2025–2026. MCP won the standards war — but production teams are discovering its security gaps and token overhead the hard way. The alternative gaining traction: agents that write and execute code instead of calling JSON-defined tools, achieving 98.7% token reductions and 20% higher task success rates.

This post covers all five tool modalities, the ACI design principles that govern them, and practical patterns I apply in the Market Analyst Agent.

The Cortex — Architecting Memory for AI Agents

Part 2 of the Engineering the Agentic Stack series

State is what separates a chatbot from an agent. Without memory, every interaction starts from zero — the agent cannot pause and resume, cannot learn from past sessions, cannot personalize. In Part 1, I covered the cognitive engine that decides how an agent thinks. This post tackles the infrastructure that determines what it remembers.

I'll walk through the memory architecture of the Market Analyst Agent, showing how hot and cold memory layers work together to support checkpointing, pause/resume workflows, and cross-session learning — and why a third tier of document-based memory is becoming essential for agents that manage their own knowledge.

The Cognitive Engine: Choosing the Right Reasoning Loop

Part 1 of the Engineering the Agentic Stack series

Building production AI agents is no longer about prompt engineering—it's about system engineering. The difference between a demo that impresses and a product that delivers comes down to one critical decision: how your agent thinks.

This post introduces three reasoning loop architectures and shows you how to choose between them. I'll use a production-grade Market Analyst Agent as the running example, with code you can use today.