Tae Hyun Kim (Lowell)

← 모든 기둥

Decision-Making under Uncertainty

Decision-Making under Uncertainty

bandits · RL · OPE · DTR/OTR · policy learning

추정된 효과를 결정으로 — optimal policy learning, bandits·reinforcement learning, off-policy evaluation, dynamic/optimal treatment regimes.

노트 20개

From Estimation to Action — How HTE Drives Personalized Policy Across Domains

One methodological spine — estimate heterogeneous treatment effects and turn them into individual-level policies — powers both clinical sequential treatment decisions and industrial targeting, pricing, and recommendation.

2026-06-12 #personalization#causal-inference#decision-making
Marketing Attribution at Scale — From Simulation to Causal Inference

A case study comparing 10+ multi-touch attribution methods against a known-ground-truth simulator, then scaling them on the public Criteo dataset, closing the loop with budget off-policy evaluation for channel allocation.

2026-06-12 #causal-inference#decision-making#attribution
RTB Bidding Strategy via Causal ML — From Prediction to Optimization

A five-stage case study on the public iPinYou RTB dataset that moves from pCTR/pCVR prediction through causal effect estimation (CATE, SCM) to budget-constrained optimal bidding and off-policy policy evaluation.

2026-06-12 #decision-making#targeting#ope
Sequential and Adaptive Decision-Making — From Bandits to Dynamic Treatment Regimes

A synthesis essay tracing one methodological spine through sequential decision-making under uncertainty — exploration–exploitation in bandits, off-policy evaluation, and optimal/dynamic treatment regimes — that powers clinical adaptive trials and real-time bidding alike.

2026-06-12 #decision-making
Anytime-Valid Inference Overview

고정 표본 가설검정의 "peeking" 문제를 푸는 game-theoretic statistics. 식별-타당성 drift를 실시간 모니터링하는 안전 추론의 수학적 기초.

2026-06-11 #experiments#causal-inference#anytime-valid
Anytime-Valid OPE

임의의 정지 시점에서도 유효한(time-uniform) off-policy value 신뢰열을 제공하는 anytime-valid off-policy evaluation; e-process/confidence sequence 기반.

2026-06-11 #decision-making#anytime-valid#ope
Confidence Sequence

confidence sequence(CS) $(Ct){t\ge1}$는 time-uniform 커버리지를 갖는 신뢰구간 열:

2026-06-11 #experiments#anytime-valid
Decision-Making Overview

불확실성하 (순차) 의사결정의 방법론 — bandit regret부터 RL, off-policy evaluation, dynamic/optimal treatment regimes까지. 임상(DTR/OTR)과 산업(targeting·bidding) personalization을 모두 받친다.

2026-06-11 #decision-making
Dynamic Treatment Regimes (DTR / OTR)

DTR은 누적 이력 $Ht$(공변량·이전 처치·중간결과)를 처치로 사상하는 결정규칙 열 $\{dt(Ht)\}{t=1}^T$. optimal treatment regime(OTR) 은 기대 장기결과 $E[Y^{d}]$를 최대화. 추정:

2026-06-11 #decision-making#clinical-decision-making#dtr
e-process (e-value)

e-value $E$는 귀무가설 $H0$ 하 $EP[E]\le 1$ ($\forall P\in H0$)인 비음 확률변수. e-process $(Et)$는 임의의 정지시각 $\tau$에 대해 $E\tau$가 e-value인 비음 과정($E[E\tau]\le1$) — 보통 귀무 하 비음 supermartingale.…

2026-06-11 #experiments#anytime-valid#e-process
Multi-Armed Bandits

$K$개 arm, 매 라운드 $t$에 $At$를 당겨 보상 관측. cumulative regret 최소화:

2026-06-11 #decision-making#bandits
Off-Policy Evaluation (OPE)

다른 behavior policy $\pib$로 수집한 로그로 target policy $\pie$의 가치 $V(\pie)=E{\pie}[\sum r]$를 추정.

2026-06-11 #decision-making#ope#doubly-robust
A/B Testing

A/B 테스트는 무작위 대조 실험(RCT)의 온라인 응용으로, 두 가지 이상의 변형(variants)을 무작위로 사용자에게 노출시켜 인과 효과를 추정하는 방법입니다.

2025-01-28 #experiments#rct#ab-testing
Contextual Bandits

Contextual Bandits는 맥락(context)에 따라 최적의 행동(arm)이 달라지는 다중 슬롯 머신 문제입니다.

2025-01-28 #decision-making#bandits#contextual-bandits
CUPED

CUPED (Controlled-experiment Using Pre-Experiment Data)는 사전 실험 데이터를 활용하여 A/B 테스트의 분산을 줄이는 기법입니다.

2025-01-28 #experiments#ab-testing
Design Effect

설계 효과(Design Effect, DEFF)는 복잡한 표집 설계가 단순 무작위 표집에 비해 분산에 미치는 영향을 측정합니다.

2025-01-28 #experiments#ab-testing#interference
MDP (Markov Decision Process)

마르코프 결정 과정(Markov Decision Process, MDP)은 순차적 의사결정 문제의 수학적 프레임워크입니다.

2025-01-28 #decision-making#ope
Policy Trees

정책 트리(Policy Trees)는 Athey & Wager (2021)가 제안한 해석 가능한 정책 학습 방법입니다.

2025-01-28 #decision-making#ope#causal-forest
Statistical Power

통계적 검정력(Statistical Power)은 효과가 실제로 존재할 때 그것을 탐지할 확률입니다.

2025-01-28 #experiments#ab-testing#rct
Thompson Sampling

Thompson Sampling은 탐색과 활용의 균형을 맞추는 베이지안 접근법입니다.

2025-01-28 #decision-making#bandits