Tae Hyun Kim (Lowell)

← All pillars

Decision-Making under Uncertainty

Decision-Making under Uncertainty

bandits · RL · OPE · DTR/OTR · policy learning

Turning estimated effects into decisions — optimal policy learning, bandits and reinforcement learning, off-policy evaluation, and dynamic / optimal treatment regimes.

20 notes

From Estimation to Action — How HTE Drives Personalized Policy Across Domains

One methodological spine — estimate heterogeneous treatment effects and turn them into individual-level policies — powers both clinical sequential treatment decisions and industrial targeting, pricing, and recommendation.

2026-06-12 #personalization#causal-inference#decision-making
Marketing Attribution at Scale — From Simulation to Causal Inference

A case study comparing 10+ multi-touch attribution methods against a known-ground-truth simulator, then scaling them on the public Criteo dataset, closing the loop with budget off-policy evaluation for channel allocation.

2026-06-12 #causal-inference#decision-making#attribution
RTB Bidding Strategy via Causal ML — From Prediction to Optimization

A five-stage case study on the public iPinYou RTB dataset that moves from pCTR/pCVR prediction through causal effect estimation (CATE, SCM) to budget-constrained optimal bidding and off-policy policy evaluation.

2026-06-12 #decision-making#targeting#ope
Sequential and Adaptive Decision-Making — From Bandits to Dynamic Treatment Regimes

A synthesis essay tracing one methodological spine through sequential decision-making under uncertainty — exploration–exploitation in bandits, off-policy evaluation, and optimal/dynamic treatment regimes — that powers clinical adaptive trials and real-time bidding alike.

2026-06-12 #decision-making
Anytime-Valid Inference Overview

Game-theoretic statistics that resolves the "peeking" problem of fixed-sample hypothesis testing. The mathematical foundation for real-time monitoring of identification-validity drift.

2026-06-11 #experiments#causal-inference#anytime-valid
Anytime-Valid OPE

Anytime-valid off-policy evaluation that provides time-uniform off-policy value confidence sequences valid at any stopping time; based on e-processes/confidence sequences.

2026-06-11 #decision-making#anytime-valid#ope
Confidence Sequence

A confidence sequence (CS) $(C_t){t\ge1}$ is a sequence of confidence intervals with time-uniform coverage:

2026-06-11 #experiments#anytime-valid
Decision-Making Overview

Methods for (sequential) decision-making under uncertainty — from bandit regret to RL, off-policy evaluation, and dynamic/optimal treatment regimes. Underpins both clinical (DTR/OTR) and industrial (targeting, bidding) personalization.

2026-06-11 #decision-making
Dynamic Treatment Regimes (DTR / OTR)

A DTR is a sequence of decision rules $\{d_t(H_t)\}_{t=1}^T$ mapping the accumulated history $H_t$ (covariates, prior treatments, intermediate outcomes) to a treatment. The optimal treatment regime (OTR) maximizes the expected long-term outcome $E[Y^{d}]$. Estimation:

2026-06-11 #decision-making#clinical-decision-making#dtr
e-process (e-value)

An e-value $E$ is a nonnegative random variable with $EP[E]\le 1$ ($\forall P\in H0$) under the null $H0$. An e-process $(Et)$ is a nonnegative process such that $E\tau$ is an e-value at any stopping time $\tau$ ($E[E\tau]\le1$) — typically a nonnegative supermartingale under the null.…

2026-06-11 #experiments#anytime-valid#e-process
Multi-Armed Bandits

$K$ arms; at each round $t$ pull $A_t$ and observe a reward. Minimize cumulative regret:

2026-06-11 #decision-making#bandits
Off-Policy Evaluation (OPE)

Estimate the value $V(\pie)=E{\pie}[\sum r]$ of a target policy $\pie$ from logs collected under a different behavior policy $\pib$.

2026-06-11 #decision-making#ope#doubly-robust
A/B Testing

A/B testing is the online application of the randomized controlled trial (RCT), estimating causal effects by randomly exposing two or more variants to users.

2025-01-28 #experiments#rct#ab-testing
Contextual Bandits

Contextual Bandits are a multi-armed bandit problem in which the optimal action (arm) varies depending on the context.

2025-01-28 #decision-making#bandits#contextual-bandits
CUPED

CUPED (Controlled-experiment Using Pre-Experiment Data) is a technique that leverages pre-experiment data to reduce the variance of A/B tests.

2025-01-28 #experiments#ab-testing
Design Effect

The Design Effect (DEFF) measures the impact of a complex sampling design on variance relative to simple random sampling.

2025-01-28 #experiments#ab-testing#interference
MDP (Markov Decision Process)

A Markov Decision Process (MDP) is a mathematical framework for sequential decision-making problems.

2025-01-28 #decision-making#ope
Policy Trees

Policy Trees, proposed by Athey & Wager (2021), are an interpretable policy-learning method.

2025-01-28 #decision-making#ope#causal-forest
Statistical Power

Statistical power is the probability of detecting an effect when it truly exists.

2025-01-28 #experiments#ab-testing#rct
Thompson Sampling

Thompson Sampling is a Bayesian approach that balances exploration and exploitation.

2025-01-28 #decision-making#bandits