Research·
Share
A/B Test Sample Size Guide
Calculate required sample size and test duration for statistically valid experiments.
Use Case
Planning A/B tests, determining test feasibility, understanding statistical requirements, or educating stakeholders on why tests take time.
Prompt
You are a statistics expert helping me determine the right sample size for an A/B test. I need to ensure my test has enough statistical power to detect meaningful effects.
Test Context:
- Primary metric: [What you're measuring, e.g., conversion rate]
- Current baseline: [Current metric value, e.g., 3.2% conversion rate]
- Minimum detectable effect: [Smallest change worth detecting, e.g., 10% relative lift]
- Daily traffic/sample: [How many users per day reach this test point]
- Traffic split: [e.g., 50/50 between control and treatment]
Please help me calculate:
1. Sample Size Calculation
Assumptions:
- Statistical significance level: 95% (α = 0.05)
- Statistical power: 80% (β = 0.20)
- Test type: Two-tailed
Calculation:
- Required sample size per variant: [Calculate]
- Total sample size needed: [Calculate]
- Formula/methodology used: [Explain]
2. Test Duration Estimate
- Daily sample per variant: [Based on traffic]
- Estimated test duration: [Days/weeks needed]
- Recommended minimum: [At least 1 full week for weekly patterns]
- Buffer recommendation: [Add 20% for safety]
3. Sensitivity Analysis
Show how sample size changes with different parameters:
| MDE | Required Sample | Duration |
|-----|-----------------|----------|
| 5% | [Calculate] | [Days] |
| 10% | [Calculate] | [Days] |
| 15% | [Calculate] | [Days] |
| 20% | [Calculate] | [Days] |
4. Power Analysis
With your current traffic over [X] weeks, what's the:
- Minimum detectable effect you can reliably detect
- Statistical power for your target MDE
- Risk of false negatives
5. Practical Considerations
- Weekly patterns: Test should run at least 1 full week
- Novelty effects: Consider running 2+ weeks
- Holidays/events: Avoid testing during anomalies
- Multiple variants: Adjust for Bonferroni correction if needed
6. Recommendations
- Is your test feasible with current traffic?
- If not, what are your options?
- Reduce MDE (detect larger effects only)
- Increase traffic allocation
- Run longer
- Use more sensitive metrics
- Red flags to watch for
7. Monitoring Plan
- When to first check results: [Not before X samples]
- How often to monitor: [Recommend daily for issues, not for significance]
- When to stop: [Only when sample size reached OR critical issue]How to use
- 1Identify your primary metric and current baseline
- 2Determine the minimum effect worth detecting (usually 5-20% relative)
- 3Check your traffic data for daily sample size
- 4Replace placeholders with your specific values
- 5Review the duration estimate against your timeline
- 6If duration is too long, explore the sensitivity analysis options
- 7Share with stakeholders to set realistic expectations
Pro Tips
- • Use online calculators (Evan Miller, Optimizely) to verify
- • Conservative estimates are better than optimistic ones
- • Account for weekly patterns - never run less than 7 days
- • If you can't detect 5% changes, you might need a different approach
- • Multiple metrics = multiple comparison corrections needed
- • Low traffic? Consider qualitative research instead
- • Share this with stakeholders who ask "why does it take so long?"
Tags
ab-testingstatisticssample-sizeexperimentationpower-analysis
Related Prompts
Research
User Research Synthesis
Synthesize user research findings into actionable insights and design recommendations.
researchsynthesisinsights
Research
User Research Recruitment Plan
Create participant criteria and recruitment materials for user research studies.
user-researchrecruitmentparticipants
Research
User Persona Creation
Create detailed user personas from research data, including demographics, goals, pain points, and behaviors.
personauser-researchdesign
Research
User Interview Synthesis
Transform raw user interview notes into structured insights, patterns, and actionable design recommendations.
researchuser-interviewssynthesis