#ope

5 notes

Anytime-Valid OPE Anytime-valid off-policy evaluation that provides time-uniform off-policy value confidence sequences valid at any stopping time; based on e-processes/confidence sequences.
MDP (Markov Decision Process) A Markov Decision Process (MDP) is a mathematical framework for sequential decision-making problems.
Off-Policy Evaluation (OPE) Estimate the value $V(\pie)=E{\pie}[\sum r]$ of a target policy $\pie$ from logs collected under a different behavior policy $\pib$.
Policy Trees Policy Trees, proposed by Athey & Wager (2021), are an interpretable policy-learning method.
RTB Bidding Strategy via Causal ML — From Prediction to Optimization A five-stage case study on the public iPinYou RTB dataset that moves from pCTR/pCVR prediction through causal effect estimation (CATE, SCM) to budget-constrained optimal bidding and off-policy policy evaluation.