AI PM Playbook

Problem

Sales reps spend 45 min/day on post-call CRM updates. 30% of CRM fields are left empty or outdated. Pipeline forecasting accuracy is directly impacted.

Evidence

Time study across 23 reps over 2 weeks showed average 45 min/day on CRM entry.
CRM audit of 1,200 opportunities showed 30% empty rate on key fields.
Sales ops reports forecast accuracy at 62%, below the 80% target.

Goals

Reduce the time reps spend updating the five target fields by at least 50% during pilot.
Measure total post-call CRM time impact during pilot; do not claim full 45 min/day CRM reduction until broader fields are in scope.
Increase CRM field completion rate from 70% to 95% on the five target fields.
Zero fabricated evidence in production (no invented quotes or facts).

Non-goals

Changing deal stage automatically. Too high-risk for v1.
Updating deal amount or close date without explicit rep confirmation through a dedicated approval flow.
Replacing rep judgment on deal health or qualification.
Processing calls without verified recording consent.
Summarizing calls for management review. This is a CRM update tool, not a surveillance tool.

Account Executives and SDRs who use Salesforce and have Gong or Chorus recording their calls. Starting with the Enterprise AE team (38 reps) who have the highest call volume and the worst CRM hygiene.

Current workflow

Rep joins a sales call. Gong/Chorus records.
Rep takes notes during the call (inconsistent quality).
After the call, rep opens Salesforce opportunity record.
Rep manually updates fields from memory and notes.
Rep skips fields they cannot remember or do not prioritize.
Manager reviews pipeline with incomplete data.

Proposed workflow

Rep joins a sales call. Gong/Chorus records and transcribes.
After the call, the system processes the transcript against the existing CRM record.
Rep receives a notification with proposed CRM updates.
Rep reviews each proposed update alongside the transcript evidence.
Rep accepts, edits, or rejects each field update.
Accepted updates are written to Salesforce.
Rejected or edited updates are logged for eval improvement.

AI job statement

The AI extracts structured CRM fields from sales call transcripts and presents them for rep approval before writing to Salesforce. It uses the call transcript, existing CRM record, and field definitions to produce proposed field updates with transcript evidence and per-field confidence scores.

Input contract

Input	Format	Required	Max size	Fallback if missing
Call transcript	Text (from Gong/Chorus API)	Yes	50,000 tokens	Do not process. Surface "no transcript available."
Existing CRM record	JSON (from Salesforce API)	Yes	5 KB	Process transcript only, flag that existing record was unavailable.
Field definitions	JSON config	Yes	2 KB	Use default field definitions.
Recording consent flag	Boolean	Yes	N/A	Do not process. Surface "recording consent not verified."

Output contract

Output field	Type	Always present	Example
next_steps	Array of {action, owner, deadline}	Yes (may be empty array)	[{action: "Send pricing proposal", owner: "rep", deadline: "Friday"}]
objections	Array of {objection, severity}	Yes (may be empty array)	[{objection: "Concerned about migration timeline", severity: "medium"}]
budget_discussed	{mentioned: bool, range: string, confidence: float, evidence: string}	Yes	{mentioned: true, range: "$50-75K", confidence: 0.85, evidence: "we're looking at the fifty to seventy-five thousand range"}
timeline_mentioned	{mentioned: bool, details: string, confidence: float, evidence: string}	Yes	{mentioned: true, details: "Decision by end of Q1", confidence: 0.92, evidence: "we need to have this decided before March"}
competitor_mentioned	Array of {name, context, evidence}	Yes (may be empty array)	[{name: "Acme Corp", context: "Currently evaluating", evidence: "we're also looking at Acme"}]
transcript_quality	{score: float, issues: array}	Yes	{score: 0.72, issues: ["cross-talk detected in minutes 12-15"]}
overall_confidence	Float 0-1	Yes	0.87

Every extracted field must include an evidence property containing the exact transcript quote. If the model cannot point to a specific quote, the field must be marked as inferred with confidence below 0.5.

Autonomy level

Suggest: AI recommends an action, human accepts or rejects

All five target fields use suggest-only autonomy. The rep must explicitly approve each field update before it writes to Salesforce.

High-risk fields that are out of scope for v1 but relevant for future versions:

Deal amount changes: would require a dedicated confirmation UI with the old value, new value, and evidence side-by-side.
Stage changes: would require explicit approval with a "are you sure" confirmation step.
Close date changes: same as deal amount.

These are not in v1 because the trust relationship with reps has not been established yet.

Human review rules

Every proposed CRM update must be reviewed by the rep before writing to Salesforce.
If overall confidence is below 0.6, surface a warning: "Low confidence extraction. Please review carefully."
If transcript quality score is below 0.5, surface: "Transcript quality too low for reliable extraction. Manual review recommended."
If any individual field confidence is below 0.4, visually distinguish it from high-confidence fields.
Reps can edit any extracted value before approving.

Quality bar

85% field accuracy on the golden eval set (n=200). Field accuracy means the extracted value is correct or an acceptable variation of the correct value.
Zero fabricated evidence. Every quote in the evidence field must exist verbatim in the transcript.
False positive rate below 5%. If the model says a competitor was mentioned and no competitor was mentioned, that is a false positive.
Transcript quality scoring must correctly flag poor-quality transcripts at least 90% of the time.

Latency target

p50 under 15 seconds for full extraction (all five fields).
p95 under 30 seconds.
Reps are not waiting synchronously. The system processes after the call ends and notifies when ready. Latency matters for freshness, not for blocking the rep.

Cost constraint

Target: under $0.05 per call processed. Prototype testing shows $0.03 actual.
At 38 reps averaging 4 calls/day and 22 working days/month: $0.03 x 38 x 4 x 22 = $100/month at current cost, $167/month at cap.
At scale (200 reps): $528/month at current cost, $880/month at cap. Well within budget for the CRM accuracy improvement.

Failure behavior

On timeout: retry once. If second attempt fails, notify rep that extraction is unavailable and they should update manually.
On low confidence: present results with clear visual warning. Do not suppress low-confidence results; let the rep decide.
On malformed output: log the error, skip the extraction, notify the rep. Do not write partial or malformed data to Salesforce.
On safety trigger: if the transcript contains content flagged by safety filters, skip processing and log the event. Do not surface the flagged content.
On missing consent: do not process. Surface "Recording consent not verified for this call."
On poor transcript quality (score below 0.3): surface "Transcript quality too low for reliable extraction" and skip automated extraction.

Recording consent must be verified before any transcript is processed. The system checks for the consent flag from Gong/Chorus.
Transcripts are processed but not stored permanently. Extracted fields and evidence quotes are stored; full transcripts are not retained by our system.
Extracted data inherits the access controls of the Salesforce opportunity record.
No transcript data is used for model training without explicit opt-in.

Observability requirements

Log every extraction: input size, output fields, confidence scores, latency, cost.
Log every rep action: accept, edit, reject per field.
Track acceptance rate by field, by rep, and by confidence level.
Track edit rate (accepted but modified) as a signal of partial accuracy.
Alert if acceptance rate drops below 70% over a rolling 7-day window.
Alert if fabricated evidence is detected in any production extraction.
Weekly report: field accuracy by type, rep adoption rate, time saved estimate.

Launch gates

Golden eval set (n=200) passes at 85% field accuracy.
Zero fabricated evidence on eval set.
Consent verification flow is implemented and tested.
Transcript quality scoring is calibrated against 50 manually-rated transcripts.
Rep training materials are complete and training is scheduled.
Observability dashboards are live and alerting is configured.
Rollback plan is documented: ability to disable extraction per rep or globally within 5 minutes.

Open questions

How do we handle multi-party calls where the rep is not the primary speaker? Do we still extract, or does this degrade quality enough to skip?
Should we support languages other than English in v1? Three of the 38 reps conduct calls in Spanish.
What is the right UX for "partially confident" results? Do we show them inline with a warning, or in a separate "review needed" section?
How do we handle calls that span multiple opportunities? The rep may discuss several deals in one call.

AI PRD