- Thompson Sampling for Contextual Bandits with Linear Payoffs
- Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits
- A Survey on Contextual Multi-armed Bandits
- A Survey of Online Experiment Design with the Stochastic Multi-Armed Bandit
- Variational inference for the multi-armed contextual bandit
- Medoids in almost linear time via multi-armed bandits
- Learning Structural Weight Uncertainty for Sequential Decision-Making
- Contextual Bandits with Stochastic Experts
- Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson Sampling
- Semiparametric Contextual Bandits
- Learning Contextual Bandits in a Non-stationary Environment
- Myopic Bayesian Design of Experiments via Posterior Sampling and Probabilistic Programming
- Greybox fuzzing as a contextual bandits problem
- On-line Adaptative Curriculum Learning for GANs
- Machine Teaching of Active Sequential Learners
- Decentralized Cooperative Stochastic Bandits
- Deep Reinforcement Learning based Recommendation with Explicit User-Item Interactions Modeling
- Stay With Me: Lifetime Maximization Through Heteroscedastic Linear Bandits With Reneging
- Practical Bayesian Neural Networks via Adaptive Optimization Methods
- Adapting multi-armed bandits policies to contextual bandits scenarios
- Warm-starting Contextual Bandits: Robustly Combining Supervised and Bandit Feedback
- The Assistive Multi-Armed Bandit
- From Complexity to Simplicity: Adaptive ES-Active Subspaces for Blackbox Optimization
- Batched Multi-armed Bandits Problem
- Introduction to Multi-Armed Bandits
- Regret Bounds for Thompson Sampling in Episodic Restless Bandit Problems
- Model selection for contextual bandits
- Distribution oblivious, risk-aware algorithms for multi-armed bandits with unbounded rewards
- Empirical Likelihood for Contextual Bandits
- Intrinsically Efficient, Stable, and Bounded Off-Policy Evaluation for Reinforcement Learning
- Bayesian Optimisation over Multiple Continuous and Categorical Inputs
- Censored Semi-Bandits: A Framework for Resource Allocation with Censored Feedback
- Practical Calculation of Gittins Indices for Multi-armed Bandits
- Model-free Reinforcement Learning in Infinite-horizon Average-reward Markov Decision Processes
- Thompson Sampling via Local Uncertainty
- Persistency of Excitation for Robustness of Neural Networks
- Neural Contextual Bandits with UCB-based Exploration
- Safe Exploration for Optimizing Contextual Bandits
- Adaptive Estimator Selection for Off-Policy Evaluation
- Hierarchical Adaptive Contextual Bandits for Resource Constraint based Recommendation
- Thompson Sampling for Linearly Constrained Bandits
- An Empirical Study of Human Behavioral Agents in Bandits, Contextual Bandits and Reinforcement Learning
- Gaussian Gated Linear Networks
- Online Learning in Iterated Prisoner's Dilemma to Mimic Human Behavior
- Finding All ε-Good Arms in Stochastic Bandits
- Recurrent Neural-Linear Posterior Sampling for Non-Stationary Contextual Bandits
- Lenient Regret for Multi-Armed Bandits
- Using Subjective Logic to Estimate Uncertainty in Multi-Armed Bandit Problems
- Carousel Personalization in Music Streaming Apps with Contextual Bandits
- Dual-Mandate Patrols: Multi-Armed Bandits for Green Security
- Online Semi-Supervised Learning in Contextual Bandits with Episodic Reward
- Offline Contextual Bandits with High Probability Fairness Guarantees
- Thompson Sampling for Multinomial Logit Contextual Bandits
- Residual Loss Prediction: Reinforcement Learning with no Incremental Feedback
- SIC -MMAB: Synchronisation Involves Communication in Multiplayer Multi-Armed Bandits
maryamhsnv/Multi-Armed-Bandits-Papers
"At some point we have to give up and say that's just the way it is. Or, not give up and push on."― Leonard Susskind,