Tae Hyun Kim (Lowell)

Anytime-Valid OPE

1 min read #decision-making#anytime-valid#ope

Definition

Anytime-valid off-policy evaluation that provides time-uniform off-policy value confidence sequences valid at any stopping time; based on e-processes/confidence sequences.

Intuitive Understanding

Even if you “keep peeking” at the policy value while estimating it from logged data, coverage does not break — well suited to online monitoring.

References

  • (forthcoming) Waudby-Smith, Wu, Ramdas et al. 2024 — Study Roadmap §Track 3

Local graph