CUPED
Definition
CUPED (Controlled-experiment Using Pre-Experiment Data) is a technique that leverages pre-experiment data to reduce the variance of A/B tests.
where:
- : the outcome observed during the experiment
- : pre-experiment data (e.g., behavior during the 2 weeks before the experiment)
- : the adjustment coefficient
It was proposed by Deng et al. (2013) at Microsoft Research.
Intuitive Understanding
Each customer has a different baseline propensity to purchase. Some customers naturally buy a lot, while others buy little.
CUPED uses “how much this customer tends to purchase in the first place” to reduce the noise in the experiment results. By removing the variation that is predictable from pre-experiment behavior, the treatment effect can be estimated more precisely.
Key Properties
Variance Reduction
- : the correlation coefficient between and
- The higher the correlation, the greater the variance reduction
Unbiasedness Preserved
Under random assignment:
Even after the CUPED adjustment, it remains an unbiased estimator of the treatment effect.
Efficiency Gain
When variance is reduced, the same Statistical Power is achieved with fewer samples:
Example: if , the effective sample size increases by 33%.
Example
Python Implementation
import numpy as np
from scipy import stats
class CUPEDEstimator:
def __init__(self, pre_period_days=14):
self.pre_period_days = pre_period_days
def fit(self, Y, X, treatment):
"""
Y: outcome during the experiment
X: pre-experiment data (covariate)
treatment: treatment indicator (0/1)
"""
# Compute theta (full data)
self.theta = np.cov(Y, X)[0, 1] / np.var(X)
self.X_mean = np.mean(X)
# CUPED-adjusted outcome
Y_cuped = Y - self.theta * (X - self.X_mean)
# Estimate treatment effect
Y_cuped_treatment = Y_cuped[treatment == 1]
Y_cuped_control = Y_cuped[treatment == 0]
self.effect = np.mean(Y_cuped_treatment) - np.mean(Y_cuped_control)
self.effect_se = np.sqrt(
np.var(Y_cuped_treatment) / len(Y_cuped_treatment) +
np.var(Y_cuped_control) / len(Y_cuped_control)
)
# Comparison: before adjustment
self.effect_raw = np.mean(Y[treatment == 1]) - np.mean(Y[treatment == 0])
self.effect_raw_se = np.sqrt(
np.var(Y[treatment == 1]) / sum(treatment == 1) +
np.var(Y[treatment == 0]) / sum(treatment == 0)
)
# Variance reduction rate
self.variance_reduction = 1 - (self.effect_se / self.effect_raw_se)**2
return self
def summary(self):
return {
'effect_cuped': self.effect,
'se_cuped': self.effect_se,
'effect_raw': self.effect_raw,
'se_raw': self.effect_raw_se,
'variance_reduction': self.variance_reduction,
'theta': self.theta
}
Usage Example
# Simulated data
np.random.seed(42)
n = 10000
# Per-individual baseline propensity (unobserved)
baseline = np.random.randn(n) * 10 + 50
# Pre-experiment data (revenue during the 2 weeks before the experiment)
X_pre = baseline + np.random.randn(n) * 5
# Treatment assignment
treatment = np.random.binomial(1, 0.5, n)
# Outcome during the experiment (treatment effect = 2)
true_effect = 2
Y = baseline + true_effect * treatment + np.random.randn(n) * 5
# Apply CUPED
cuped = CUPEDEstimator()
cuped.fit(Y, X_pre, treatment)
results = cuped.summary()
print(f"True effect: {true_effect}")
print(f"Raw estimate: {results['effect_raw']:.3f} ± {results['se_raw']:.3f}")
print(f"CUPED estimate: {results['effect_cuped']:.3f} ± {results['se_cuped']:.3f}")
print(f"Variance reduction: {results['variance_reduction']:.1%}")
Extension to Multiple Covariates
from sklearn.linear_model import LinearRegression
def cuped_multiple_covariates(Y, X_covariates, treatment):
"""
CUPED using multiple pre-experiment variables
"""
# Estimate theta with a linear model
model = LinearRegression()
model.fit(X_covariates, Y)
# Subtract the predicted values
Y_pred = model.predict(X_covariates)
Y_cuped = Y - Y_pred + np.mean(Y_pred)
# Treatment effect
effect = np.mean(Y_cuped[treatment == 1]) - np.mean(Y_cuped[treatment == 0])
return effect, Y_cuped
Related Concepts
- Statistical Power - what CUPED improves
- A-B Testing - the context in which CUPED is applied
- Design Effect - another effective-sample adjustment
References
- Deng, A., Xu, Y., Kohavi, R., & Walker, T. (2013). “Improving the Sensitivity of Online Controlled Experiments by Utilizing Pre-Experiment Data.”
- Comprehensive Personalized Pricing Guide, Part V, §14.3