Bandit Algorithms Book [pdf]

pronoiac · on July 31, 2018

This came up a couple of days ago: https://news.ycombinator.com/item?id=17637683

clickok · on July 31, 2018

I skimmed through this and have already found a bunch of interesting sections, but there's also a ton of background information on topics related to bandit algorithms.

The authors say that this is the first draft of the book submitted to the publisher, so I suppose it's nearly complete? More details available at the site they put up, http://banditalgs.com/

tomkat0789 · on July 31, 2018

Never heard of bandit algorithms before! Or if I did I didn't recognize it as something different from probability. What have people around here used them for?

haffi112 · on July 31, 2018

You can use it when determining the best solution being tested in as few trials as possible.

Say you are selling a product and you are AB testing something related to buying the product. When a user visits the site you ideally want to give him the version you are more confident is better. By using a bandit approach you can determine if say option A is currently better (w.r.t. some confidence bounds). After each visit you can update the bounds and after sufficiently many visits you have a winner. The main difference to more traditional AB testing is that the process is more adaptive and less time is wasted on exposing an inferior product to the user.

bochi · on July 31, 2018

Bandits are probably one of the most underrated machine learning algorithms. One possible application is recommendation systems. Shameless self promotion. I wrote an article about it: https://towardsdatascience.com/how-not-to-sort-by-popularity...

mlechha · on July 31, 2018

They're probably the most fundamental kind of reinforcement learning algorithms. Understanding bandit algorithms is crucial to developing a good understanding of RL.

zdkl · on July 31, 2018

This rust project, to manage the number of threads in a monero miner afair. https://github.com/Ragnaroek/mithril

ur-whale · on July 31, 2018

Doesn't alphago use some form of Bandit algorithm in their MonteCarlo code?

magoghm · on July 31, 2018

I believe that Monte Carlo Tree Search, used in AlphaGo, does work using bandit algorithms. On top of that AlphaGo uses Reinforcement Learning, which also uses bandit algorithms (in Sutton & Barto's book, "Reinforcement Learning: An Introduction", all of chapter 2 is about multi-armed bandits).

shoo · on July 31, 2018

Readers who enjoy banditry may also enjoy John Langford's http://hunch.net

joshuamorton · on July 31, 2018

It always makes me sad that Thompson Sampling isn't (or at least doesn't appear to be) mentioned alongside things like UCB1. Its theoretically optimal, and relatively easy to grok, and not significantly more difficult to implement.

dsvmn · on July 31, 2018

I really appreciate sharing the book. However, to everyone in charge with naming these files, please don't call it "book.pdf". It makes everyone go to their computer and rename the file after downloading it so that they can find it later. Give it a more intuitive name.

Thanks

daleroberts · on July 31, 2018

Cool, nice to see that Tor was a student at ANU.

_gjrn · on July 31, 2018

Is this the book that is going to make me a poker master player ?

srean · on July 31, 2018

If you play long enough it will make you regret less

sureaboutthis · on July 31, 2018

Well that's really great! What is it?