A/B Testing · Tae Hyun Kim (Lowell)

Definition

A/B testing is the online application of the randomized controlled trial (RCT), estimating causal effects by randomly exposing two or more variants to users.

Control (A): the existing version
Treatment (B): the new version

$ATE = E[Y(1) - Y(0)] = \bar{Y}_B - \bar{Y}_A$

Because of random assignment, the simple difference in means becomes an unbiased estimator of the causal effect.

Intuitive Understanding

A/B testing is the “gold standard for establishing causal relationships.”

In observational data, the correlation between price and demand can be distorted by confounders, but in a randomized experiment the correlation between treatment and all confounders is eliminated.

Key Properties

Why experiment?

Approach	Assumptions	Risk
Observational study	Unconfoundedness, exclusion restriction	Bias when assumptions are violated
A/B testing	Only requires SUTVA	Ethical/cost constraints

Particularities of pricing experiments

Challenge	Description	Mitigation strategy
Ethical concerns	Different prices for the same product are unfair	Region/time-based experiments
Interference	Information sharing between customers	Cluster randomization
Long-term effects	Brand and loyalty effects	Long-term tracking
Sample contamination	Multiple devices/accounts	Deterministic assignment

Deterministic random assignment

def randomize(user_id, experiment_name, treatment_prob=0.5):
    """해시 기반 결정론적 할당"""
    hash_value = hash(f"{user_id}_{experiment_name}") % 100
    return 'treatment' if hash_value < treatment_prob * 100 else 'control'

The same user is always assigned to the same group, providing a consistent experience.

Example

Price A/B test

class PricingExperiment:
    def __init__(self, control_price, treatment_price):
        self.control_price = control_price
        self.treatment_price = treatment_price
        self.results = {'control': [], 'treatment': []}

    def get_price(self, user_id):
        group = self.randomize(user_id)
        return self.treatment_price if group == 'treatment' else self.control_price

    def analyze(self):
        from scipy import stats
        control = np.array(self.results['control'])
        treatment = np.array(self.results['treatment'])

        t_stat, p_value = stats.ttest_ind(treatment, control)
        effect = treatment.mean() - control.mean()

        return {'effect': effect, 'p_value': p_value}

Analyzing results

Conversion rate difference: $\hat{\tau} = \bar{Y}_B - \bar{Y}_A$
Statistical significance: p-value < 0.05
Practical significance: Is the effect size meaningful for the business?

Statistical Power - Sample size determination
CUPED - Variance reduction technique
Design Effect - Impact of cluster randomization
ATE - Estimand

References

Kohavi, R., Tang, D., & Xu, Y. (2020). Trustworthy Online Controlled Experiments.
Comprehensive Personalized Pricing Guide, Part V, §13

Local graph