PromptResearchUpdated January 28, 2026

A/B Test Decision Framework

Make rigorous ship/no-ship decisions based on experiment results.

Prompt

You are an experimentation expert helping me make a decision based on A/B test results. I need to determine whether to ship, iterate, or abandon this change.

Test Results:
- Feature tested: [What was tested]
- Test duration: [How long it ran]
- Sample size: [Total users in test]
- Primary metric: [What you measured]
- Primary result: [e.g., +5.2% conversion, p=0.03]
- Confidence interval: [e.g., +2.1% to +8.3%]
- Secondary metrics: [List results for each]
- Guardrail metrics: [Any negative impacts?]
- Segment results: [Any notable differences by segment?]

Please help me analyze and decide:

1. Statistical Validity Check
   □ Did test reach planned sample size?
   □ Was test run for planned duration?
   □ Sample ratio as expected (no SRM)?
   □ Results stable over time (no novelty effect)?
   □ No major external events during test?
   
   Validity assessment: [Valid / Concerns / Invalid]
   Concerns to note: [Any issues]

2. Statistical Significance Analysis
   - p-value: [From results]
   - Significance threshold: [Usually 0.05]
   - Statistically significant: [Yes/No]
   - Confidence interval: [From results]
   - Does CI exclude zero: [Yes/No]
   
   Interpretation: [What this means]

3. Practical Significance Analysis
   - Effect size: [e.g., +5.2%]
   - Minimum effect that matters: [What was your MDE?]
   - Is effect practically meaningful: [Yes/No]
   - Annualized impact: [e.g., $X revenue, Y users affected]
   
   Interpretation: [Is this worth shipping?]

4. Secondary Metrics Review
   | Metric | Result | Direction | Concerning? |
   |--------|--------|-----------|-------------|
   | [Metric 1] | [Result] | [Up/Down/Flat] | [Yes/No] |
   
   Trade-offs identified: [Any concerning patterns?]

5. Guardrail Metrics Review
   | Metric | Result | Threshold | Status |
   |--------|--------|-----------|--------|
   | [Metric 1] | [Result] | [Threshold] | [Pass/Fail] |
   
   Any guardrails breached: [Yes/No]

6. Segment Analysis
   Notable segment differences:
   - [Segment]: [How results differ]
   - Implications: [What this means for decision]

7. Qualitative Context
   - User feedback during test: [Any signals?]
   - Support tickets: [Any increases?]
   - Engineering concerns: [Technical debt, maintenance?]
   - Business context: [Strategic alignment?]

8. Decision Framework
   
   SHIP if:
   □ Statistically significant positive result
   □ Practically meaningful effect size
   □ No guardrail violations
   □ No concerning secondary effects
   □ Valid test (no issues)
   
   ITERATE if:
   □ Directionally positive but not significant
   □ Segment shows promise for specific users
   □ Secondary metrics suggest refinement needed
   □ Learning suggests better approach
   
   ABANDON if:
   □ Negative or null result
   □ Guardrails violated
   □ Effect size too small to matter
   □ Cost/complexity not worth small gain

9. Recommendation
   Decision: [Ship / Iterate / Abandon]
   
   Rationale:
   - Primary evidence: [Key data point]
   - Supporting evidence: [Secondary evidence]
   - Risks: [What could go wrong]
   - Mitigations: [How to address risks]
   
   If shipping:
   - Rollout plan: [Gradual or full?]
   - Monitoring: [What to watch]
   
   If iterating:
   - What to change: [Specific improvements]
   - Next test hypothesis: [What to test]
   
   If abandoning:
   - Key learnings: [What we learned]
   - Documentation: [Archive the results]