Research·
Share
A/B Test Decision Framework
Make rigorous ship/no-ship decisions based on experiment results.
Use Case
Making ship/no-ship decisions after A/B tests, presenting experiment results to stakeholders, or establishing decision criteria for your team.
Prompt
You are an experimentation expert helping me make a decision based on A/B test results. I need to determine whether to ship, iterate, or abandon this change.
Test Results:
- Feature tested: [What was tested]
- Test duration: [How long it ran]
- Sample size: [Total users in test]
- Primary metric: [What you measured]
- Primary result: [e.g., +5.2% conversion, p=0.03]
- Confidence interval: [e.g., +2.1% to +8.3%]
- Secondary metrics: [List results for each]
- Guardrail metrics: [Any negative impacts?]
- Segment results: [Any notable differences by segment?]
Please help me analyze and decide:
1. Statistical Validity Check
□ Did test reach planned sample size?
□ Was test run for planned duration?
□ Sample ratio as expected (no SRM)?
□ Results stable over time (no novelty effect)?
□ No major external events during test?
Validity assessment: [Valid / Concerns / Invalid]
Concerns to note: [Any issues]
2. Statistical Significance Analysis
- p-value: [From results]
- Significance threshold: [Usually 0.05]
- Statistically significant: [Yes/No]
- Confidence interval: [From results]
- Does CI exclude zero: [Yes/No]
Interpretation: [What this means]
3. Practical Significance Analysis
- Effect size: [e.g., +5.2%]
- Minimum effect that matters: [What was your MDE?]
- Is effect practically meaningful: [Yes/No]
- Annualized impact: [e.g., $X revenue, Y users affected]
Interpretation: [Is this worth shipping?]
4. Secondary Metrics Review
| Metric | Result | Direction | Concerning? |
|--------|--------|-----------|-------------|
| [Metric 1] | [Result] | [Up/Down/Flat] | [Yes/No] |
Trade-offs identified: [Any concerning patterns?]
5. Guardrail Metrics Review
| Metric | Result | Threshold | Status |
|--------|--------|-----------|--------|
| [Metric 1] | [Result] | [Threshold] | [Pass/Fail] |
Any guardrails breached: [Yes/No]
6. Segment Analysis
Notable segment differences:
- [Segment]: [How results differ]
- Implications: [What this means for decision]
7. Qualitative Context
- User feedback during test: [Any signals?]
- Support tickets: [Any increases?]
- Engineering concerns: [Technical debt, maintenance?]
- Business context: [Strategic alignment?]
8. Decision Framework
SHIP if:
□ Statistically significant positive result
□ Practically meaningful effect size
□ No guardrail violations
□ No concerning secondary effects
□ Valid test (no issues)
ITERATE if:
□ Directionally positive but not significant
□ Segment shows promise for specific users
□ Secondary metrics suggest refinement needed
□ Learning suggests better approach
ABANDON if:
□ Negative or null result
□ Guardrails violated
□ Effect size too small to matter
□ Cost/complexity not worth small gain
9. Recommendation
Decision: [Ship / Iterate / Abandon]
Rationale:
- Primary evidence: [Key data point]
- Supporting evidence: [Secondary evidence]
- Risks: [What could go wrong]
- Mitigations: [How to address risks]
If shipping:
- Rollout plan: [Gradual or full?]
- Monitoring: [What to watch]
If iterating:
- What to change: [Specific improvements]
- Next test hypothesis: [What to test]
If abandoning:
- Key learnings: [What we learned]
- Documentation: [Archive the results]How to use
- 1Wait until your test has reached full sample size
- 2Gather all metrics results including guardrails and segments
- 3Replace placeholders with your actual test results
- 4Work through each section systematically
- 5Be honest about concerns or ambiguity
- 6Make a clear recommendation with rationale
- 7Document the decision for future reference
Pro Tips
- • Statistical significance ≠ practical significance - check both
- • Always check guardrail metrics before shipping
- • Look for novelty effects - are early results inflated?
- • Segment analysis can reveal hidden problems or opportunities
- • Document your decision rationale even for abandoned tests
- • If results are ambiguous, consider running longer or iterating
- • Share learnings broadly regardless of outcome
Tags
ab-testingdecision-makingexperimentationproduct-analyticsdata-driven
Related Prompts
Research
User Research Synthesis
Synthesize user research findings into actionable insights and design recommendations.
researchsynthesisinsights
Research
User Research Recruitment Plan
Create participant criteria and recruitment materials for user research studies.
user-researchrecruitmentparticipants
Research
User Persona Creation
Create detailed user personas from research data, including demographics, goals, pain points, and behaviors.
personauser-researchdesign
Research
User Interview Synthesis
Transform raw user interview notes into structured insights, patterns, and actionable design recommendations.
researchuser-interviewssynthesis