Decision-Making Overview · Tae Hyun Kim (Lowell)

A hub for methods of (sequential) decision-making under uncertainty — bandits · RL · OPE · DTR/OTR. Extends the four existing notes (Contextual Bandits · MDP · Thompson Sampling · Policy Trees) along Roadmap Track 2/3.

Overview

Methods for (sequential) decision-making under uncertainty — from bandit regret to RL, off-policy evaluation, and dynamic/optimal treatment regimes. Underpins both clinical (DTR/OTR) and industrial (targeting, bidding) personalization.

Target Atomic Notes

Created ✓ (bandits/RL): Contextual Bandits · MDP · Thompson Sampling · Policy Trees · Policy Learning · Multi-Armed Bandits · UCB · Regret · Offline RL Created ✓ (OPE): Off-Policy Evaluation · Doubly Robust OPE · Empirical Welfare Maximization Created ✓ (DTR/OTR): Dynamic Treatment Regimes · Q-learning · A-learning · G-estimation · Outcome-Weighted Learning

References

Study Roadmap §Track 2 (Lattimore & Szepesvári) · §Track 3 (OPE lineage)
MOC-DecisionMaking

Overview

Target Atomic Notes

References

Local graph