Tae Hyun Kim (Lowell)

Statistical Power

4 min read #experiments#ab-testing#rct

Definition

Statistical Power is the probability of detecting an effect when it truly exists.

Power=P(detect effecteffect exists)=1β\text{Power} = P(\text{detect effect} | \text{effect exists}) = 1 - \beta

where:

  • α\alpha: Type I error rate (false positive) — declaring an effect when there is none
  • β\beta: Type II error rate (false negative) — missing an effect that is there
  • Power = 1β1 - \beta

A power of 80% is typically targeted.

Intuitive Understanding

Power is the “ability to discover a real effect.”

When power is low, you fail to detect an effect even when one exists. For example, in an experiment with 50% power, even if the effect is real, you discover it only with coin-flip probability.

Key Properties

Factors Affecting Power

FactorEffect on Power
Sample size ↑Power ↑
Effect size ↑Power ↑
Variance ↓Power ↑
Significance level α ↑Power ↑ (Type I error increases)

Sample Size Formulas

Binary outcome (conversion rate): n=2(z1α/2+z1β)2p(1p)δ2n = \frac{2 (z_{1-\alpha/2} + z_{1-\beta})^2 p(1-p)}{\delta^2}

Continuous outcome: n=2(z1α/2+z1β)2σ2δ2n = \frac{2 (z_{1-\alpha/2} + z_{1-\beta})^2 \sigma^2}{\delta^2}

where:

  • z1α/2z_{1-\alpha/2}: z-value for the significance level (0.05 → 1.96)
  • z1βz_{1-\beta}: z-value for the power (0.80 → 0.84)
  • δ\delta: minimum effect size to detect (MDE)
  • pp: baseline rate
  • σ\sigma: standard deviation

MDE (Minimum Detectable Effect)

MDE=(z1α/2+z1β)2σ2nMDE = (z_{1-\alpha/2} + z_{1-\beta}) \cdot \sqrt{\frac{2\sigma^2}{n}}

The smallest effect size detectable given a sample size.

Example

Python Implementation

from scipy import stats
import numpy as np

def sample_size_binary(baseline_rate, mde_relative, alpha=0.05, power=0.80):
    """Sample size calculation for a binary outcome"""
    p1 = baseline_rate
    p2 = baseline_rate * (1 + mde_relative)
    p_pooled = (p1 + p2) / 2

    z_alpha = stats.norm.ppf(1 - alpha/2)
    z_beta = stats.norm.ppf(power)

    numerator = (z_alpha * np.sqrt(2 * p_pooled * (1 - p_pooled)) +
                 z_beta * np.sqrt(p1*(1-p1) + p2*(1-p2)))**2
    denominator = (p2 - p1)**2

    return int(np.ceil(numerator / denominator))

def sample_size_continuous(std, mde_absolute, alpha=0.05, power=0.80):
    """Sample size calculation for a continuous outcome"""
    z_alpha = stats.norm.ppf(1 - alpha/2)
    z_beta = stats.norm.ppf(power)

    n = 2 * ((z_alpha + z_beta) * std / mde_absolute)**2
    return int(np.ceil(n))

# Example: 5% baseline conversion rate, detect a 10% relative increase
n_binary = sample_size_binary(baseline_rate=0.05, mde_relative=0.10)
print(f"Required sample per group (binary): {n_binary:,}")  # ~31,000

# Example: $100 mean revenue, $50 standard deviation, detect a $5 difference
n_continuous = sample_size_continuous(std=50, mde_absolute=5)
print(f"Required sample per group (continuous): {n_continuous:,}")  # ~1,570

Power Curve

import matplotlib.pyplot as plt

def power_curve(baseline_rate, mde_range, sample_size, alpha=0.05):
    """Compute power across a range of effect sizes"""
    powers = []

    for mde in mde_range:
        p1 = baseline_rate
        p2 = baseline_rate * (1 + mde)

        se = np.sqrt(p1*(1-p1)/sample_size + p2*(1-p2)/sample_size)
        z = (p2 - p1) / se - stats.norm.ppf(1 - alpha/2)
        power = stats.norm.cdf(z)
        powers.append(power)

    return powers

mde_range = np.linspace(0.01, 0.30, 50)
n = 10000

powers = power_curve(0.05, mde_range, n)

plt.figure(figsize=(8, 5))
plt.plot(mde_range * 100, powers)
plt.axhline(0.8, color='r', linestyle='--', label='80% power')
plt.xlabel('Relative MDE (%)')
plt.ylabel('Power')
plt.title(f'Power Curve (n={n:,} per group)')
plt.legend()
plt.grid(True, alpha=0.3)

Pricing Experiment Example

# Test the effect of a price change on conversion rate
# Baseline: 3% conversion rate
# Detection target: 5% relative change (3% → 3.15%)

n_required = sample_size_binary(
    baseline_rate=0.03,
    mde_relative=0.05,
    alpha=0.05,
    power=0.80
)
print(f"Required sample: {n_required:,} per group")

# With 5,000 visitors per day
days_required = n_required * 2 / 5000
print(f"Experiment duration: about {days_required:.0f} days")
  • A-B Testing - the context in which power applies
  • CUPED - improving power via variance reduction
  • Design Effect - the impact of cluster randomization on power

References

  • Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences.
  • Kohavi, R., Tang, D., & Xu, Y. (2020). Trustworthy Online Controlled Experiments.
  • Comprehensive Personalized Pricing Guide, Part V, §14

Local graph