Gittins Index

The first solution given to solve the multi-armed bandits problem

Why the Gittins Index might not be the best solution

For one, the Gittins index is optimal only under some strong assumptions. It’s based on geometric discounting of future reward, valuing each pull at a constant fraction of the previous one, which is something that a variety of experiments in behavioral economics and psychologists suggest people don’t do. And if there’s a cost to switching among options, the Gittins strategy is no longer optimal either. Perhaps even more importantly, it’s hard to compute the Gittins index on the fly.

uid: 202006071552 tags: #writeup #algorithms

Date

February 22, 2023

Up next

06-08-20 It’s pretty funny that I haven’t journaled in a long time, given that I’ve spent more time on The Archive writing notes than I have spent on

Previously

Multi-Armed Bandits Problem In probability theory and machine learning, the multi-armed bandit problem (sometimes called the K-[1] or N-armed bandit problem[2]) is a problem in