Tae Hyun Kim (Lowell)

ATT (Average Treatment Effect on the Treated)

3 min read #causal-inference#potential-outcomes

Definition

Average treatment effect for the group that actually received treatment

ATT=E[Y(1)Y(0)W=1]\text{ATT} = E[Y(1) - Y(0) \mid W=1]

Decomposed:

ATT=E[Y(1)W=1]E[Y(0)W=1]\text{ATT} = E[Y(1) \mid W=1] - E[Y(0) \mid W=1]
  • First term: the observed outcome of the treated group (directly estimable)
  • Second term: the counterfactual outcome of the treated group (must be estimated)

Intuitive Understanding

ATE vs ATT

EstimandQuestion
ATE”What is the average effect if treatment is applied to the entire population?”
ATT”Was the treatment effective for those who received it?”

When do we use ATT?

  1. Policy evaluation: the effect on participants in an existing program
  2. Cost-benefit analysis: the effect on those actually treated
  3. Self-selection settings: when the effect on treatment-takers is of interest

Relationship Between ATE and ATT

Mathematical Relationship

ATE=P(W=1)ATT+P(W=0)ATC\text{ATE} = P(W=1) \cdot \text{ATT} + P(W=0) \cdot \text{ATC}

where:

ATC=E[Y(1)Y(0)W=0]\text{ATC} = E[Y(1) - Y(0) \mid W=0]

When does ATE = ATT?

Under homogeneous treatment effects:

Y(1)Y(0)=τ(constant)Y(1) - Y(0) = \tau \quad \text{(constant)}

In this case ATE=ATT=ATC=τ\text{ATE} = \text{ATT} = \text{ATC} = \tau.

When ATE ≠ ATT

Heterogeneous effects + self-selection:

  • Those expected to benefit most select into treatment
  • ATT>ATC\text{ATT} > \text{ATC}
  • ATT>ATE\text{ATT} > \text{ATE}

Example: a job training program

  • Highly motivated individuals participate
  • The effect is also larger for them
  • → ATT is larger than ATE

Identification

Under Strong Ignorability

ATT=EX[E[YW=1,X]E[YW=0,X]W=1]\text{ATT} = E_X\left[E[Y \mid W=1, X] - E[Y \mid W=0, X] \mid W=1\right]

IPW-ATT

ATT^IPW=1n1i:Wi=1Yi1n1i:Wi=0e(Xi)1e(Xi)Yi\hat{\text{ATT}}_{IPW} = \frac{1}{n_1} \sum_{i: W_i=1} Y_i - \frac{1}{n_1} \sum_{i: W_i=0} \frac{e(X_i)}{1-e(X_i)} Y_i

where n1=iWin_1 = \sum_i W_i.

Matching for ATT

Match a comparable control to each treated individual:

  1. For treated unit ii, find a comparable control unit jj
  2. τ^i=YiYj\hat{\tau}_i = Y_i - Y_j
  3. ATT^=1n1i:Wi=1τ^i\hat{\text{ATT}} = \frac{1}{n_1} \sum_{i: W_i=1} \hat{\tau}_i

ATT Estimation Methods

1. IPW for ATT

ATT^=iWiYiiWii(1Wi)e(Xi)1e(Xi)Yii(1Wi)e(Xi)1e(Xi)\hat{\text{ATT}} = \frac{\sum_{i} W_i Y_i}{\sum_i W_i} - \frac{\sum_i (1-W_i) \frac{e(X_i)}{1-e(X_i)} Y_i}{\sum_i (1-W_i) \frac{e(X_i)}{1-e(X_i)}}

2. Matching

Propensity Score Matching, Nearest Neighbor Matching, etc.

3. Doubly Robust for ATT

ATT^DR=1n1i:Wi=1[Yiμ^0(Xi)(1Wi)(Yiμ^0(Xi))1e(Xi)e(Xi)P(W=1)]\hat{\text{ATT}}_{DR} = \frac{1}{n_1}\sum_{i: W_i=1} \left[Y_i - \hat{\mu}_0(X_i) - \frac{(1-W_i)(Y_i - \hat{\mu}_0(X_i))}{1-e(X_i)} \cdot \frac{e(X_i)}{P(W=1)}\right]

Comparison with ATC

EstimandDefinitionInterpretation
ATTE[Y(1)Y(0)W=1]E[Y(1)-Y(0) \mid W=1]Effect on those who received treatment
ATCE[Y(1)Y(0)W=0]E[Y(1)-Y(0) \mid W=0](Hypothetical) effect on those who did not receive treatment

Why ATT ≠ ATC?

Heterogeneity:

  • The treatment effect varies with characteristics
  • Treatment selection is related to the effect


References

  • yaoSurveyCausalInference2021 - Section 2.2
  • Imbens, G. W. (2004). Nonparametric estimation of average treatment effects under exogeneity
  • Heckman, J. J., et al. (1997). Matching as an econometric evaluation estimator

Local graph