Statistical Power
Definition
Statistical Power is the probability of detecting an effect when it truly exists.
where:
- : Type I error rate (false positive) — declaring an effect when there is none
- : Type II error rate (false negative) — missing an effect that is there
- Power =
A power of 80% is typically targeted.
Intuitive Understanding
Power is the “ability to discover a real effect.”
When power is low, you fail to detect an effect even when one exists. For example, in an experiment with 50% power, even if the effect is real, you discover it only with coin-flip probability.
Key Properties
Factors Affecting Power
| Factor | Effect on Power |
|---|---|
| Sample size ↑ | Power ↑ |
| Effect size ↑ | Power ↑ |
| Variance ↓ | Power ↑ |
| Significance level α ↑ | Power ↑ (Type I error increases) |
Sample Size Formulas
Binary outcome (conversion rate):
Continuous outcome:
where:
- : z-value for the significance level (0.05 → 1.96)
- : z-value for the power (0.80 → 0.84)
- : minimum effect size to detect (MDE)
- : baseline rate
- : standard deviation
MDE (Minimum Detectable Effect)
The smallest effect size detectable given a sample size.
Example
Python Implementation
from scipy import stats
import numpy as np
def sample_size_binary(baseline_rate, mde_relative, alpha=0.05, power=0.80):
"""Sample size calculation for a binary outcome"""
p1 = baseline_rate
p2 = baseline_rate * (1 + mde_relative)
p_pooled = (p1 + p2) / 2
z_alpha = stats.norm.ppf(1 - alpha/2)
z_beta = stats.norm.ppf(power)
numerator = (z_alpha * np.sqrt(2 * p_pooled * (1 - p_pooled)) +
z_beta * np.sqrt(p1*(1-p1) + p2*(1-p2)))**2
denominator = (p2 - p1)**2
return int(np.ceil(numerator / denominator))
def sample_size_continuous(std, mde_absolute, alpha=0.05, power=0.80):
"""Sample size calculation for a continuous outcome"""
z_alpha = stats.norm.ppf(1 - alpha/2)
z_beta = stats.norm.ppf(power)
n = 2 * ((z_alpha + z_beta) * std / mde_absolute)**2
return int(np.ceil(n))
# Example: 5% baseline conversion rate, detect a 10% relative increase
n_binary = sample_size_binary(baseline_rate=0.05, mde_relative=0.10)
print(f"Required sample per group (binary): {n_binary:,}") # ~31,000
# Example: $100 mean revenue, $50 standard deviation, detect a $5 difference
n_continuous = sample_size_continuous(std=50, mde_absolute=5)
print(f"Required sample per group (continuous): {n_continuous:,}") # ~1,570
Power Curve
import matplotlib.pyplot as plt
def power_curve(baseline_rate, mde_range, sample_size, alpha=0.05):
"""Compute power across a range of effect sizes"""
powers = []
for mde in mde_range:
p1 = baseline_rate
p2 = baseline_rate * (1 + mde)
se = np.sqrt(p1*(1-p1)/sample_size + p2*(1-p2)/sample_size)
z = (p2 - p1) / se - stats.norm.ppf(1 - alpha/2)
power = stats.norm.cdf(z)
powers.append(power)
return powers
mde_range = np.linspace(0.01, 0.30, 50)
n = 10000
powers = power_curve(0.05, mde_range, n)
plt.figure(figsize=(8, 5))
plt.plot(mde_range * 100, powers)
plt.axhline(0.8, color='r', linestyle='--', label='80% power')
plt.xlabel('Relative MDE (%)')
plt.ylabel('Power')
plt.title(f'Power Curve (n={n:,} per group)')
plt.legend()
plt.grid(True, alpha=0.3)
Pricing Experiment Example
# Test the effect of a price change on conversion rate
# Baseline: 3% conversion rate
# Detection target: 5% relative change (3% → 3.15%)
n_required = sample_size_binary(
baseline_rate=0.03,
mde_relative=0.05,
alpha=0.05,
power=0.80
)
print(f"Required sample: {n_required:,} per group")
# With 5,000 visitors per day
days_required = n_required * 2 / 5000
print(f"Experiment duration: about {days_required:.0f} days")
Related Concepts
- A-B Testing - the context in which power applies
- CUPED - improving power via variance reduction
- Design Effect - the impact of cluster randomization on power
References
- Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences.
- Kohavi, R., Tang, D., & Xu, Y. (2020). Trustworthy Online Controlled Experiments.
- Comprehensive Personalized Pricing Guide, Part V, §14