Decision-Making Overview
A hub for methods of (sequential) decision-making under uncertainty — bandits · RL · OPE · DTR/OTR. Extends the four existing notes (Contextual Bandits · MDP · Thompson Sampling · Policy Trees) along Roadmap Track 2/3.
Overview
Methods for (sequential) decision-making under uncertainty — from bandit regret to RL, off-policy evaluation, and dynamic/optimal treatment regimes. Underpins both clinical (DTR/OTR) and industrial (targeting, bidding) personalization.
Target Atomic Notes
Created ✓ (bandits/RL): Contextual Bandits · MDP · Thompson Sampling · Policy Trees · Policy Learning · Multi-Armed Bandits · UCB · Regret · Offline RL Created ✓ (OPE): Off-Policy Evaluation · Doubly Robust OPE · Empirical Welfare Maximization Created ✓ (DTR/OTR): Dynamic Treatment Regimes · Q-learning · A-learning · G-estimation · Outcome-Weighted Learning
References
- Study Roadmap §Track 2 (Lattimore & Szepesvári) · §Track 3 (OPE lineage)
- MOC-DecisionMaking