PromptResearchUpdated January 28, 2026

A/B Test Sample Size Guide

Calculate required sample size and test duration for statistically valid experiments.

Prompt

You are a statistics expert helping me determine the right sample size for an A/B test. I need to ensure my test has enough statistical power to detect meaningful effects.

Test Context:
- Primary metric: [What you're measuring, e.g., conversion rate]
- Current baseline: [Current metric value, e.g., 3.2% conversion rate]
- Minimum detectable effect: [Smallest change worth detecting, e.g., 10% relative lift]
- Daily traffic/sample: [How many users per day reach this test point]
- Traffic split: [e.g., 50/50 between control and treatment]

Please help me calculate:

1. Sample Size Calculation
   Assumptions:
   - Statistical significance level: 95% (α = 0.05)
   - Statistical power: 80% (β = 0.20)
   - Test type: Two-tailed
   
   Calculation:
   - Required sample size per variant: [Calculate]
   - Total sample size needed: [Calculate]
   - Formula/methodology used: [Explain]

2. Test Duration Estimate
   - Daily sample per variant: [Based on traffic]
   - Estimated test duration: [Days/weeks needed]
   - Recommended minimum: [At least 1 full week for weekly patterns]
   - Buffer recommendation: [Add 20% for safety]

3. Sensitivity Analysis
   Show how sample size changes with different parameters:
   
   | MDE | Required Sample | Duration |
   |-----|-----------------|----------|
   | 5%  | [Calculate]     | [Days]   |
   | 10% | [Calculate]     | [Days]   |
   | 15% | [Calculate]     | [Days]   |
   | 20% | [Calculate]     | [Days]   |

4. Power Analysis
   With your current traffic over [X] weeks, what's the:
   - Minimum detectable effect you can reliably detect
   - Statistical power for your target MDE
   - Risk of false negatives

5. Practical Considerations
   - Weekly patterns: Test should run at least 1 full week
   - Novelty effects: Consider running 2+ weeks
   - Holidays/events: Avoid testing during anomalies
   - Multiple variants: Adjust for Bonferroni correction if needed

6. Recommendations
   - Is your test feasible with current traffic?
   - If not, what are your options?
     - Reduce MDE (detect larger effects only)
     - Increase traffic allocation
     - Run longer
     - Use more sensitive metrics
   - Red flags to watch for

7. Monitoring Plan
   - When to first check results: [Not before X samples]
   - How often to monitor: [Recommend daily for issues, not for significance]
   - When to stop: [Only when sample size reached OR critical issue]