Anytime-Valid OPE
Definition
Anytime-valid off-policy evaluation that provides time-uniform off-policy value confidence sequences valid at any stopping time; based on e-processes/confidence sequences.
Intuitive Understanding
Even if you “keep peeking” at the policy value while estimating it from logged data, coverage does not break — well suited to online monitoring.
Related Concepts
- Anytime-Valid Inference Overview ← hub
- Off-Policy Evaluation · Confidence Sequence · e-process · Doubly Robust OPE
References
- (forthcoming) Waudby-Smith, Wu, Ramdas et al. 2024 — Study Roadmap §Track 3