#bandits

3 notes

Contextual Bandits Contextual Bandits are a multi-armed bandit problem in which the optimal action (arm) varies depending on the context.
Multi-Armed Bandits $K$ arms; at each round $t$ pull $A_t$ and observe a reward. Minimize cumulative regret:
Thompson Sampling Thompson Sampling is a Bayesian approach that balances exploration and exploitation.