AI PM Playbook

Use this when the AI can call tools, take actions, or run multi-step workflows. Agents need stricter product definition than chatbots because they create side effects.

The starter checklist

Question	Artifact
What actions can the agent take, and at what autonomy level?	AI PRD
Which tools can it call, with what constraints?	AI PRD
What can the agent never do?	AI PRD + Operating AI Products
Which actions require approval, rollback, or audit trail?	Human Review Workflow
How will trajectory quality be evaluated?	Eval Plan + Agentic Products
What is the cost ceiling per task?	Cost Model
What happens when the agent gets stuck or confidence is low?	AI PRD + Launch Gate Checklist

1. Map autonomy by action

Do not assign one autonomy level to the whole agent. Assign a level to each action:

Action	Recommended starting level	Why
Draft content	Draft	User can review before anything happens
Recommend a next step	Suggest	User approves or rejects
Update reversible internal state	Act	Requires audit trail and rollback
Send external communication	Draft or Suggest	Customer-facing mistakes damage trust
Process money, permissions, identity, legal, health, or deletion	Human-only unless proven otherwise	High-severity harm or irreversible action

Start lower than you think. Let reliability, user trust, and operations earn higher autonomy over time.

2. Define tool boundaries

For every tool, document:

What data it can read
What data it can write
Which tenants, accounts, or objects it can access
Whether the action has side effects
Whether the action is reversible
What happens when the tool fails

If a tool can send, delete, charge, approve, publish, or change permissions, it needs confirmation or rollback.

3. Design rollback before action

An agent should not take an action unless the team knows how to undo it or has decided the action must be reviewed first.

Action type	Required control
Reversible internal update	Change log, undo path, monitoring owner
Customer-visible output	Human approval or safe draft state
Irreversible action	Human approval before action
Tool failure or partial completion	Stop, report state, and ask for review
Repeated failed attempts	Cost ceiling and loop termination

Rollback is part of the product design, not an implementation detail.

4. Evaluate the trajectory, not only the outcome

For agents, a correct final answer can still be a product failure if the path was unsafe.

Evaluate

Tool calls: did the agent use only allowed tools?
Parameters: did it pass safe and correct arguments?
Sequence: did each step follow from the prior state?
Efficiency: did it avoid unnecessary retries and loops?
Boundary handling: did it stop or escalate when out of scope?
Final output: did the result satisfy the user goal?

Add trajectory failures to the eval plan and error analysis loop.

5. Set cost ceilings

Agent costs compound through tool calls, retries, and growing context. Define:

Maximum cost per task
Maximum number of tool calls
Maximum retries per tool
Timeout or stop condition
User-facing behavior when the ceiling is hit

If the agent reaches the ceiling, it should stop, summarize progress, and ask whether to continue or hand off.

6. Make handoff explicit

The agent needs a clear path when it cannot safely continue:

Hand off to a human reviewer
Ask the user for missing information
Produce a draft instead of taking action
Show uncertainty and stop
Fall back to the existing non-AI workflow

"Try again" is not an escalation policy.

Launch blockers for agents

Do not launch an agent beyond internal testing if:

Tool boundaries are not defined
Any high-impact action has no approval or rollback path
Trajectory evals do not exist
Cost ceilings are missing
There is no audit trail for tool calls and actions
The human handoff path is not staffed or tested

Use the Launch Gates guide when deciding whether the agent can move from prototype to pilot, limited production, or scale.

Agent PM starter pack