PMAI PM Playbook

Bad to good AI PRD

Guide 06

Most weak AI PRDs do not fail because they are short. They fail because the AI job is vague, quality is undefined, and failure behavior is missing.

Bad PRD

Build an AI assistant that helps support agents answer customer questions faster.

Why this is weak

It says the product should be "AI" and "faster," but not what the AI actually does, what data it can use, what output is acceptable, who reviews it, or what happens when it is wrong.

That gives engineering permission to guess. The first demo may look good, but no one can tell whether the product is safe, measurable, or ready to launch.

Better PRD Slice

Problem

Tier 1 support agents spend 3.1 minutes per supported ticket drafting responses from approved KB articles. The top 10 intents account for 31% of ticket volume and have stable policy answers.

AI Job

The AI drafts support responses for the top 10 Tier 1 support intents using the customer ticket, recent conversation history, account metadata, and approved KB articles. It produces a draft response, cited KB article IDs, an intent label, and a confidence score for the handling agent to review before sending.

Non-Goals

  • No auto-send.
  • No Tier 2 or Tier 3 escalations.
  • No unsupported intents.
  • No new KB article authoring.
  • No policy claims that are not grounded in current KB articles.

Input Contract

InputRequiredFallback
Latest customer messageYesDo not draft
Recent conversation historyNoDraft from latest message only
Account metadataNoAvoid account-specific claims
Current KB articlesYesShow KB search results, no draft

Output Contract

OutputRequirement
Draft responseGrounded in cited KB articles
CitationsArticle IDs must exist and be current
Confidence score0-1, calibrated against human review
Fallback flagTrue when no safe draft should be shown

Human Review

Every draft must be reviewed by the handling agent before sending. If confidence is below 0.6, suppress the draft and show the top 3 KB articles instead.

Quality Bar

  • Zero fabricated policy claims.
  • 75% accept rate during pilot.
  • Reject rate below 8%.
  • Safety-trigger tickets produce escalation, not drafts.
  • p95 draft latency below 4 seconds.
  • Cost below $0.03 per draft.

Launch Gate

Pilot can start only when

  • 100-example eval set is labeled.
  • Low-confidence fallback is implemented.
  • Accept/edit/reject tracking is live.
  • Safety escalation tests pass.
  • Legal and security approve disclosure, retention, and tenant data handling.

What Changed

Bad PRDBetter PRD
"AI assistant"Specific drafting task
"Helps agents"Named user and workflow
"Faster"Accept rate, reject rate, latency, cost
No boundariesTop 10 intents only
No source of truthApproved KB articles
No review ruleAgent reviews before sending
No failure behaviorSuppress draft below confidence threshold
No launch criteriaPilot blockers and eval requirements

The better PRD is still short. The difference is that it gives the team a behavioral contract they can build, test, review, and gate.

Link copied