Gittins Index
The first solution given to solve the multi-armed bandits problem
Why the Gittins Index might not be the best solution
For one, the Gittins index is optimal only under some strong assumptions. It’s based on geometric discounting of future reward, valuing each pull at a constant fraction of the previous one, which is something that a variety of experiments in behavioral economics and psychologists suggest people don’t do. And if there’s a cost to switching among options, the Gittins strategy is no longer optimal either. Perhaps even more importantly, it’s hard to compute the Gittins index on the fly.
uid: 202006071552 tags: #writeup #algorithms