Maybe Don't 1.1: AI Agents Need Guardrails, MCP Isn't Enough

CLI validation, policy testing, audit reports, and easier adoption

Kendal Miller February 10, 2026

Maybe Don’t started as a policy gateway for MCP tool calls — a way to put guardrails between AI agents and the tools they use. In the last month we’ve made significant changes to what we’re doing, including increasing the surface area for protection against AI behavior going off the rails.

Here’s what’s new.

CLI Command Validation

AI agents don’t just call MCP tools, especially if they feel blocked or look for another way to accomplish a task.. They run shell commands — git push, aws s3 rm, kubectl delete. Until now, those commands were a blind spot.

Maybe Don’t can now validate arbitrary CLI commands before they execute. Agents route shell commands through the gateway, and the same CEL and AI-based policies that protect your MCP tools now protect your terminal. If an agent tries to force-push to main or drop a production database, your policies catch it — regardless of whether it’s an MCP call or a raw shell command.

Policy Test Matrix

Writing policies is easy. Knowing they actually work is harder.

Our recent changes introduce a declarative test suite for your policies. Write test cases that describe what should be allowed and what should be denied, then run them against your rules. CEL policies are tested deterministically. AI-based policies are tested across a matrix of models with configurable pass-rate thresholds, so you can measure how reliably each model enforces your intent. This also allows you to find the most affordable model that will reliably enforce your policies.

Test state persists between runs — only changed or failing policies get re-evaluated. Rolling pass-rate history tracks stability over time and flags flaky policies before they become a problem in production. This gives you the confidence to iterate on policies the same way you iterate on code: write the test, change the rule, verify the result.

Audit-Only Mode by Default

Deploying a policy gateway into an existing workflow is nerve-wracking. One overly aggressive rule and your agents grind to a halt.

All validation modes now default to audit-only. Requests flow through without blocking while policies evaluate in the background. You get full visibility into what would have been denied without disrupting anything. When you’re confident in your rules, flip them to enforcing. This makes adoption painless — install it, watch the logs, tighten the policies at your own pace.

AI-Powered Audit Reports

Maybe Don’t can now generate on-demand executive reports summarizing everything your AI agents have been doing — and everything the gateway stopped them from doing.

With a single tool call, the gateway analyzes your audit logs and produces a structured report covering usage patterns, policy violations, denied actions, and security concerns, all prioritized by business impact. The report surfaces which agents are hitting guardrails, which policies are doing the most work, and where your biggest risks are. It also provides actionable recommendations — tightening a rule that’s too permissive, relaxing one that’s creating unnecessary friction, or flagging an agent workflow that’s consistently trying to do things it shouldn’t.

This isn’t a raw log dump. The report is AI-generated from the actual decision history, so it reads like a briefing, not a spreadsheet. If you’re an engineering leader who approved the investment in guardrails, this is how you see the return: concrete evidence of risky actions that were caught, patterns that were flagged, and a clear picture of how your policies are performing over time. You can filter by time range, focus area (security, usage, errors), and output format, making it easy to pull a weekly summary or drill into a specific incident window. Bring these reports to your boss. Bring them to your board. You are protecting your team—brag about it.

Getting Started

If you’ve been thinking about putting guardrails on your AI agents, we’d love to talk. Book a 30-minute call with a founder and we’ll help you figure out the right setup for your team.