Upper Confidence Bound Algorithms
Rather than performing exploration by simply selecting an arbitrary action, chosen with a probability that remains constant, the UCB algorithm changes its exploration-exploitation balance as it gathers more knowledge of the environment. It moves from being primarily focused on exploration, when actions that have been tried the least are preferred, to instead concentrate on exploitation, selecting the action with the highest estimated reward.
This is a theoretical note, but pretty cool that they have an application towards how I live my life: 202006071608
uid: 202006071602 tags: #algorithms #writeup