Decision-Making Overview
불확실성하 (순차) 의사결정 방법론의 허브 — bandits·RL·OPE·DTR/OTR. 기존 4개 노트(Contextual Bandits·MDP·Thompson Sampling·Policy Trees)를 Roadmap Track 2/3로 확장.
개요
불확실성하 (순차) 의사결정의 방법론 — bandit regret부터 RL, off-policy evaluation, dynamic/optimal treatment regimes까지. 임상(DTR/OTR)과 산업(targeting·bidding) personalization을 모두 받친다.
목표 atomic 노트
생성됨 ✓ (bandits/RL): Contextual Bandits · MDP · Thompson Sampling · Policy Trees · Policy Learning · Multi-Armed Bandits · UCB · Regret · Offline RL 생성됨 ✓ (OPE): Off-Policy Evaluation · Doubly Robust OPE · Empirical Welfare Maximization 생성됨 ✓ (DTR/OTR): Dynamic Treatment Regimes · Q-learning · A-learning · G-estimation · Outcome-Weighted Learning
참고
- Study Roadmap §Track 2 (Lattimore & Szepesvári) · §Track 3 (OPE 계보)
- MOC-DecisionMaking