#ope
5 notes
- Anytime-Valid OPE Anytime-valid off-policy evaluation that provides time-uniform off-policy value confidence sequences valid at any stopping time; based on e-processes/confidence sequences.
- MDP (Markov Decision Process) A Markov Decision Process (MDP) is a mathematical framework for sequential decision-making problems.
- Off-Policy Evaluation (OPE) Estimate the value $V(\pie)=E{\pie}[\sum r]$ of a target policy $\pie$ from logs collected under a different behavior policy $\pib$.
- Policy Trees Policy Trees, proposed by Athey & Wager (2021), are an interpretable policy-learning method.
- RTB Bidding Strategy via Causal ML — From Prediction to Optimization A five-stage case study on the public iPinYou RTB dataset that moves from pCTR/pCVR prediction through causal effect estimation (CATE, SCM) to budget-constrained optimal bidding and off-policy policy evaluation.