Tae Hyun Kim (Lowell)

BART (Bayesian Additive Regression Trees)

3 min read #causal-inference#tree-based#bart

Definition

A Bayesian ensemble method that models the outcome as a sum of many trees

Y=k=1Kgk(X,W;Tk,Mk)+ϵ,ϵN(0,σ2)Y = \sum_{k=1}^{K} g_k(X, W; T_k, M_k) + \epsilon, \quad \epsilon \sim N(0, \sigma^2)

where:

  • gkg_k: the kk-th tree
  • TkT_k: tree structure
  • MkM_k: leaf node values
  • Prior on (Tk,Mk)(T_k, M_k)

Application to Causal Inference

Potential Outcome Modeling

μ(X,W)=E[YX,W]=kgk(X,W)\mu(X, W) = E[Y \mid X, W] = \sum_k g_k(X, W)

CATE Estimation

τ^(X)=E^[YX,W=1]E^[YX,W=0]\hat{\tau}(X) = \hat{E}[Y \mid X, W=1] - \hat{E}[Y \mid X, W=0]

Estimated by sampling from the posterior:

τ^(X)=1Ss=1S[kgk(s)(X,1)kgk(s)(X,0)]\hat{\tau}(X) = \frac{1}{S}\sum_{s=1}^{S} \left[\sum_k g_k^{(s)}(X, 1) - \sum_k g_k^{(s)}(X, 0)\right]

Model Structure

Prior Specification

Tree structure TT prior:

  • Node split probability: P(split at depth d)=α(1+d)βP(\text{split at depth } d) = \alpha(1+d)^{-\beta}
  • Typically α=0.95,β=2\alpha = 0.95, \beta = 2

Leaf value MM prior:

μklN(0,σμ2)\mu_{kl} \sim N(0, \sigma^2_\mu)

Variance prior:

σ2Inverse-Gamma\sigma^2 \sim \text{Inverse-Gamma}

MCMC Sampling

Posterior estimation via Gibbs Sampling:

  1. TkothersT_k \mid \text{others}: MH step
  2. MkTk,othersM_k \mid T_k, \text{others}: Conjugate update
  3. σ2others\sigma^2 \mid \text{others}: Conjugate update

Advantages and Disadvantages

Advantages

AdvantageDescription
Uncertainty quantificationPosterior → confidence/credible intervals
Flexible nonlinearityCaptures complex interactions
RegularizationPrior prevents overfitting
Continuous/binary treatmentBoth can be handled
Automatic variable selectionDiscovers important variables

Disadvantages

DisadvantageDescription
Computational costSlow due to MCMC
Convergence diagnosticsRequires checking MCMC convergence
HyperparametersSensitive to prior specification
Large-scale dataHard to scale

Causal BART Variants

BCF (Bayesian Causal Forests)

Hahn et al. (2020): separates the treatment effect

Y=μ(X)+τ(X)W+ϵY = \mu(X) + \tau(X) \cdot W + \epsilon
  • μ(X)\mu(X): Prognostic function (BART)
  • τ(X)\tau(X): Treatment effect function (BART)

ps-BART

Includes the propensity score as a covariate:

Y=f(X,e(X),W)+ϵY = f(X, e(X), W) + \epsilon

Implementation

R (dbarts, bartCause)

library(dbarts)

# Basic BART
bart_fit <- bart(x.train = cbind(X, W),
                 y.train = Y,
                 ntree = 200)

# CATE estimation
X1 <- cbind(X, W = 1)
X0 <- cbind(X, W = 0)

pred1 <- predict(bart_fit, X1)
pred0 <- predict(bart_fit, X0)

cate <- colMeans(pred1 - pred0)

# Credible interval
cate_samples <- pred1 - pred0
ci <- apply(cate_samples, 2, quantile, c(0.025, 0.975))

R (bartCause)

library(bartCause)

# Causal BART
fit <- bartc(y = Y, z = W, x = X,
             method.rsp = "bart",
             method.trt = "bart")

# CATE
cate <- predict(fit)

Comparison with Causal Forest

PropertyBARTCausal Forest
InferenceBayesianFrequentist
UncertaintyPosteriorBootstrap/Asymptotic
HonestNoYes
SpeedSlow (MCMC)Fast
TheoryPosterior consistencyn\sqrt{n}-normality

  • Tree-based Methods Overview - integration of tree-based methods
  • Causal Forest - Frequentist alternative
  • Honest Estimation - overfitting prevention
  • HTE - estimation target

Key Papers

  • Hill, J. L. (2011). Bayesian nonparametric modeling for causal inference. JCGS
  • Chipman, H. A., George, E. I., & McCulloch, R. E. (2010). BART: Bayesian additive regression trees. Annals of Applied Statistics
  • Hahn, P. R., Murray, J. S., & Carvalho, C. M. (2020). Bayesian regression tree models for causal inference. Bayesian Analysis
  • yaoSurveyCausalInference2021 - Section 3.4.3

Local graph