#bandits
3 notes
- Contextual Bandits Contextual Bandits are a multi-armed bandit problem in which the optimal action (arm) varies depending on the context.
- Multi-Armed Bandits $K$ arms; at each round $t$ pull $A_t$ and observe a reward. Minimize cumulative regret:
- Thompson Sampling Thompson Sampling is a Bayesian approach that balances exploration and exploitation.