Zum Inhalt

Best AI Agent Security Patterns in 2026

Agent security is action security. A chatbot can say something wrong and embarrass you. An agent can call a tool with real credentials and change production state.

My default: do not give the agent capabilities it should never use. Start with narrow tools, pre-tool policy checks, sandbox isolation, scoped credentials, human approval gates, traces, and guardrails. Output filtering helps, but it is not the security boundary.

Pattern ranking

Pattern Priority Protects against Implementation note
Least-privilege tools P0 Excessive agency Do not expose tools the agent should never use.
Pre-tool policy checks P0 Dangerous actions Check the concrete action right before execution.
Sandboxes P0 File, shell, browser, and network damage Isolate code and untrusted content.
Human approvals P0 Irreversible or regulated actions Gate writes, deployments, payments, external sends, and privileged changes.
Scoped credentials P0 Credential overreach and confused deputy failures Use per-server, per-tool, narrow scopes.
MCP server isolation P1 tool poisoning, tool shadowing, cross-server attacks Do not mix untrusted servers and powerful tools in one context without review.
Audit traces P1 Unknown incident history Persist user request, tool call, args, result, policy decision, and approver.
Guardrails P1 Unsafe input and output text Useful, but not enough for tool authority.
Red-team evals P1 Known attack paths Test prompt injection, tool poisoning, data exfiltration, and permission bypasses.

What to implement first

Delete capabilities first. If the agent does not need to write to GitHub, do not give it a write token. If it only needs calendar availability, do not grant full mailbox access. Least privilege beats a stern prompt.

Then put a policy boundary before every tool call. Check the exact tool name, arguments, target resource, user, environment, and side effect. A user request can look harmless while the generated shell command is not.

Add sandboxes for code execution, browser automation, file access, and untrusted document processing. A sandbox does not make the action correct, but it reduces the damage from a compromised tool result or confused model.

Use human approval for irreversible actions. Do not approve every step. Approve boundaries: production deploys, data deletion, email sends, money movement, permission changes, and regulated decisions.

MCP-specific risks

MCP is useful because it standardizes tool access. It is risky because tool descriptions, schemas, server identities, OAuth scopes, and tool outputs all become part of the model's decision environment.

For MCP, I would keep these rules in code review:

  • review tool descriptions and schemas before approval
  • prefer narrow per-server credentials
  • isolate untrusted MCP servers from sensitive tools
  • watch for tool definition changes after installation
  • treat tool output as untrusted input
  • log every server, tool, argument, and result

Guardrails are not enough

Guardrails can validate input and output. They do not solve least privilege, credential scope, sandboxing, tool poisoning, or approval policy. Keep them, but put them after capability design and before user-visible output.

Deeper reading

References