Tae Hyun Kim (Lowell)

Dynamic Treatment Regimes (DTR / OTR)

Definition

A DTR is a sequence of decision rules {dt(Ht)}t=1T\{d_t(H_t)\}_{t=1}^T mapping the accumulated history HtH_t (covariates, prior treatments, intermediate outcomes) to a treatment. The optimal treatment regime (OTR) maximizes the expected long-term outcome E[Yd]E[Y^{d}]. Estimation:

  • Q-learning — backward induction on the Q-function (regression based)
  • A-learning — contrast/advantage modeling (robust to baseline misspecification)
  • G-estimation — structural nested mean model (Robins)
  • OWL — outcome-weighted learning (classification perspective)

Assumption: sequential ignorability.

Intuitive Understanding

Personalized, adaptive treatment over time — deciding “what to do next given everything so far” to optimize the long-term outcome. It is the core of CV pillar #4 (clinical sequential decision-making) and the time-series extension of personalization.

Key Papers

  • Murphy, “Optimal dynamic treatment regimes”, JRSS-B 65(2), 2003
  • Robins — structural nested models / g-estimation
  • Hernán & Robins, Causal Inference: What If, 2020 — Part III (g-methods)

Local graph