PMAI PM Playbook

Launch gate checklist

Core template

Use this to decide if the product can enter or advance beyond each release stage. Three gates: pilot entry, limited production entry, and scale-up entry. Do not skip gates.

Inputs: scores and evidence come from the AI PRD, eval plan, cost model, observability plan, and human review workflow. Output: a go/no-go decision with rationale, conditions, owner, and reversal trigger.


Gate 1: pilot entry

Pass/fail criteria

CriterionTargetActualPass?
Eval accuracy on golden sete.g., >= 90%
Failure behavior testedAll failure modes documented and handled
Human review workflow functionalReviewers can approve/reject/edit
Latencye.g., p95 < 10s
Cost per taske.g., < $0.10
Safety boundaries holdAdversarial eval passes
Observability in placeLogs, metrics, alerts configured
Trace review completedPrototype or pilot traces reviewed and failures labeled

Staged rollout plan

  • Shadow mode: production requests duplicated to AI path, outputs logged but not shown to users
  • Canary: 1% of traffic, gated ramp (1% -> 5% -> 20% -> full), rollback criteria defined
  • Cohort-based: specific user segment or geography first

Rollback trigger: e.g., quality score drops > 5%, cost per task > 2x budget, any safety incident

Regulatory compliance

  • Risk classification and compliance path determined
  • Data provenance documented (training data sources, retrieval sources, retention policy)
  • Transparency requirements met (users informed they are interacting with AI)
  • System card or model card drafted (which models, prompts, tools, retrieval sources, human review points)
  • Vendor due diligence complete for third-party model providers

Required pass conditions

  • No unmitigated high-severity risks in risk register
  • No data leakage between users/tenants
  • Failure behavior does not expose raw model output to users
  • Trace review has happened for prototype or pilot behavior
  • Any agent, eval, prompt, tool, or workflow self-improvement requires human review before rollout

Risk and decision record

Risk or blockerSeverityOwnerRequired mitigationDue

Options considered

OptionProsCons
Start pilot
Hold
Do not launch

What would reverse this decision: Name a specific metric, date, dependency, or evidence threshold that would reopen the decision.

Decision

  • Start pilot
  • Advance with conditions: list conditions
  • Hold: what needs to change
  • Do not launch: reason

Decided by: name Date: YYYY-MM-DD Review date or trigger: YYYY-MM-DD or metric threshold


Gate 2: limited production entry

Pass/fail criteria

CriterionTargetActualPass?
Eval accuracy on production samplee.g., >= 92%
User task completion ratee.g., >= 80%
Accept ratee.g., >= 60%
Reject/escalation ratee.g., < 15%
User-reported issuese.g., < 5 per week
Cost per task (production)e.g., < $0.08
Latency (production)e.g., p95 < 8s
No regression vs. pilotQuality metrics stable or improving

Required pass conditions

  • No unresolved incidents from pilot
  • No systematic bias detected in output quality across user segments
  • Cost trajectory within budget at projected scale
  • Regulatory requirements from Gate 1 still met (no scope changes that alter risk classification)

Risk and decision record

Risk or blockerSeverityOwnerRequired mitigationDue

Options considered

OptionProsCons
Advance
Advance with conditions
Hold

What would reverse this decision: Name a specific metric, date, dependency, or evidence threshold that would reopen the decision.

Decision

  • Advance to limited production
  • Advance with conditions: list conditions
  • Hold: what needs to change
  • Do not launch: reason

Decided by: name Date: YYYY-MM-DD Review date or trigger: YYYY-MM-DD or metric threshold


Gate 3: scale-up entry

Pass/fail criteria

CriterionTargetActualPass?
Quality metrics stable for >= 2 weeksspecific metrics
Cost per customer within margin targete.g., < $X/customer/month
Support ticket volumee.g., < baseline + 10%
Rollback plan testedCan disable AI path in < 15 min
Monitoring and alerting validatedAlerts fire correctly on synthetic failures
Trace-to-eval loop runningProduction failures feed back into eval set

Required pass conditions

  • No open high-severity incidents
  • Rollback plan tested and documented
  • On-call runbook reviewed by ops team
  • Incident response process tested (at least one simulated incident)
  • Near-miss capture process in place (not just incidents, but close calls)

Risk and decision record

Risk or blockerSeverityOwnerRequired mitigationDue

Options considered

OptionProsCons
Scale
Hold expansion
Roll back

What would reverse this decision: Name a specific metric, date, dependency, or evidence threshold that would reopen the decision.

Decision

  • Ship to all users
  • Advance with conditions: list conditions
  • Hold: what needs to change
  • Do not launch: reason

Decided by: name Date: YYYY-MM-DD Review date or trigger: YYYY-MM-DD or metric threshold

Link copied