Multi-Armed Bandits Problem

In probability theory and machine learning, the multi-armed bandit problem (sometimes called the K-[1] or N-armed bandit problem[2]) is a problem in which a fixed limited set of resources must be allocated between competing (alternative) choices in a way that maximizes their expected gain, when each choice’s properties are only partially known at the time of allocation, and may become better understood as time passes or by allocating resources to the choice.[3][4] This is a classic reinforcement learning problem that exemplifies the exploration–exploitation tradeoff dilemma.

Solutions and Algorithms

Gittins Index: 202006071552
Upper Confidence Bound algorithms: 202006071602

uid: 202006071550 tags: #algorithms

Date

February 22, 2023

Up next

Gittins Index For one, the Gittins index is optimal only under some strong assumptions. It’s based on geometric discounting of future reward, valuing each pull at

Previously

Theorem resume entry Heavily inspired from this discussion with Josh and Kelly 202005261444 Also look at my final presentation 202005291000 for the work I did Planned