Statistical Power · Tae Hyun Kim (Lowell)

Definition

Statistical Power is the probability of detecting an effect when it truly exists.

$\text{Power} = P(\text{detect effect} | \text{effect exists}) = 1 - \beta$

where:

$\alpha$ : Type I error rate (false positive) — declaring an effect when there is none
$\beta$ : Type II error rate (false negative) — missing an effect that is there
Power = $1 - \beta$

A power of 80% is typically targeted.

Intuitive Understanding

Power is the “ability to discover a real effect.”

When power is low, you fail to detect an effect even when one exists. For example, in an experiment with 50% power, even if the effect is real, you discover it only with coin-flip probability.

Key Properties

Factors Affecting Power

Factor	Effect on Power
Sample size ↑	Power ↑
Effect size ↑	Power ↑
Variance ↓	Power ↑
Significance level α ↑	Power ↑ (Type I error increases)

Sample Size Formulas

Binary outcome (conversion rate): $n = \frac{2 (z_{1-\alpha/2} + z_{1-\beta})^2 p(1-p)}{\delta^2}$

Continuous outcome: $n = \frac{2 (z_{1-\alpha/2} + z_{1-\beta})^2 \sigma^2}{\delta^2}$

where:

$z_{1-\alpha/2}$ : z-value for the significance level (0.05 → 1.96)
$z_{1-\beta}$ : z-value for the power (0.80 → 0.84)
$\delta$ : minimum effect size to detect (MDE)
$p$ : baseline rate
$\sigma$ : standard deviation

MDE (Minimum Detectable Effect)

$MDE = (z_{1-\alpha/2} + z_{1-\beta}) \cdot \sqrt{\frac{2\sigma^2}{n}}$

The smallest effect size detectable given a sample size.

Example

Python Implementation

from scipy import stats
import numpy as np

def sample_size_binary(baseline_rate, mde_relative, alpha=0.05, power=0.80):
    """Sample size calculation for a binary outcome"""
    p1 = baseline_rate
    p2 = baseline_rate * (1 + mde_relative)
    p_pooled = (p1 + p2) / 2

    z_alpha = stats.norm.ppf(1 - alpha/2)
    z_beta = stats.norm.ppf(power)

    numerator = (z_alpha * np.sqrt(2 * p_pooled * (1 - p_pooled)) +
                 z_beta * np.sqrt(p1*(1-p1) + p2*(1-p2)))**2
    denominator = (p2 - p1)**2

    return int(np.ceil(numerator / denominator))

def sample_size_continuous(std, mde_absolute, alpha=0.05, power=0.80):
    """Sample size calculation for a continuous outcome"""
    z_alpha = stats.norm.ppf(1 - alpha/2)
    z_beta = stats.norm.ppf(power)

    n = 2 * ((z_alpha + z_beta) * std / mde_absolute)**2
    return int(np.ceil(n))

# Example: 5% baseline conversion rate, detect a 10% relative increase
n_binary = sample_size_binary(baseline_rate=0.05, mde_relative=0.10)
print(f"Required sample per group (binary): {n_binary:,}")  # ~31,000

# Example: $100 mean revenue, $50 standard deviation, detect a $5 difference
n_continuous = sample_size_continuous(std=50, mde_absolute=5)
print(f"Required sample per group (continuous): {n_continuous:,}")  # ~1,570

Power Curve

import matplotlib.pyplot as plt

def power_curve(baseline_rate, mde_range, sample_size, alpha=0.05):
    """Compute power across a range of effect sizes"""
    powers = []

    for mde in mde_range:
        p1 = baseline_rate
        p2 = baseline_rate * (1 + mde)

        se = np.sqrt(p1*(1-p1)/sample_size + p2*(1-p2)/sample_size)
        z = (p2 - p1) / se - stats.norm.ppf(1 - alpha/2)
        power = stats.norm.cdf(z)
        powers.append(power)

    return powers

mde_range = np.linspace(0.01, 0.30, 50)
n = 10000

powers = power_curve(0.05, mde_range, n)

plt.figure(figsize=(8, 5))
plt.plot(mde_range * 100, powers)
plt.axhline(0.8, color='r', linestyle='--', label='80% power')
plt.xlabel('Relative MDE (%)')
plt.ylabel('Power')
plt.title(f'Power Curve (n={n:,} per group)')
plt.legend()
plt.grid(True, alpha=0.3)

Pricing Experiment Example

# Test the effect of a price change on conversion rate
# Baseline: 3% conversion rate
# Detection target: 5% relative change (3% → 3.15%)

n_required = sample_size_binary(
    baseline_rate=0.03,
    mde_relative=0.05,
    alpha=0.05,
    power=0.80
)
print(f"Required sample: {n_required:,} per group")

# With 5,000 visitors per day
days_required = n_required * 2 / 5000
print(f"Experiment duration: about {days_required:.0f} days")

A-B Testing - the context in which power applies
CUPED - improving power via variance reduction
Design Effect - the impact of cluster randomization on power

References

Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences.
Kohavi, R., Tang, D., & Xu, Y. (2020). Trustworthy Online Controlled Experiments.
Comprehensive Personalized Pricing Guide, Part V, §14

Local graph