Randomized Exploration for Linear Bandits via Absolute Perturbations

arXiv:2606.28616v1 Announce Type: new Abstract: In stochastic linear bandits, the canonical Upper Confidence Bound (UCB) algorithm admits a simple frequentist regret analysis but can be computationally demanding, while Thompson Sampling (TS) is computationally attractive yet typically harder to analyze due to its non-optimistic nature. We propose Absolute Thompson Sampling (ATS), a simple modification of TS that ensures optimism in expectation by replacing the signed exploration noise with its a...

arXiv cs.LG ·Toshinori Kitamura, Shuai Liu, Csaba Szepesv\'ari ·
compartilhar: