#decision-making
13 notes
- Anytime-Valid OPE Anytime-valid off-policy evaluation that provides time-uniform off-policy value confidence sequences valid at any stopping time; based on e-processes/confidence sequences.
- Contextual Bandits Contextual Bandits are a multi-armed bandit problem in which the optimal action (arm) varies depending on the context.
- Decision-Making Overview Methods for (sequential) decision-making under uncertainty — from bandit regret to RL, off-policy evaluation, and dynamic/optimal treatment regimes. Underpins both clinical (DTR/OTR) and industrial (targeting, bidding) personalization.
- Dynamic Treatment Regimes (DTR / OTR) A DTR is a sequence of decision rules $\{d_t(H_t)\}_{t=1}^T$ mapping the accumulated history $H_t$ (covariates, prior treatments, intermediate outcomes) to a treatment. The optimal treatment regime (OTR) maximizes the expected long-term outcome $E[Y^{d}]$. Estimation:
- From Estimation to Action — How HTE Drives Personalized Policy Across Domains One methodological spine — estimate heterogeneous treatment effects and turn them into individual-level policies — powers both clinical sequential treatment decisions and industrial targeting, pricing, and recommendation.
- Marketing Attribution at Scale — From Simulation to Causal Inference A case study comparing 10+ multi-touch attribution methods against a known-ground-truth simulator, then scaling them on the public Criteo dataset, closing the loop with budget off-policy evaluation for channel allocation.
- MDP (Markov Decision Process) A Markov Decision Process (MDP) is a mathematical framework for sequential decision-making problems.
- Multi-Armed Bandits $K$ arms; at each round $t$ pull $A_t$ and observe a reward. Minimize cumulative regret:
- Off-Policy Evaluation (OPE) Estimate the value $V(\pie)=E{\pie}[\sum r]$ of a target policy $\pie$ from logs collected under a different behavior policy $\pib$.
- Policy Trees Policy Trees, proposed by Athey & Wager (2021), are an interpretable policy-learning method.
- RTB Bidding Strategy via Causal ML — From Prediction to Optimization A five-stage case study on the public iPinYou RTB dataset that moves from pCTR/pCVR prediction through causal effect estimation (CATE, SCM) to budget-constrained optimal bidding and off-policy policy evaluation.
- Sequential and Adaptive Decision-Making — From Bandits to Dynamic Treatment Regimes A synthesis essay tracing one methodological spine through sequential decision-making under uncertainty — exploration–exploitation in bandits, off-policy evaluation, and optimal/dynamic treatment regimes — that powers clinical adaptive trials and real-time bidding alike.
- Thompson Sampling Thompson Sampling is a Bayesian approach that balances exploration and exploitation.