Hacker News new | past | comments | ask | show | jobs | submit login

This is interesting for its breadth, but they are abstracting a huge set of gradient-climbing behaviors that I'm not sure are of the same class. Also, this has been known forever (since before I was born) in game theory / repeated games as "exploration vs exploitation".

But, it's not surprising to me that a mathematically simple, yet provably convergent algorithm appears over and over. For example, reward maximization in repeated bayesian games looks a lot like "Run and tumble", once you add this critical step (from OP):

> When a “tumble” occurs, rather than sampling a new direction from a uniformly random distribution, we sample according to the distribution of expected rewards

You may want to read the book Algorithms to Live By, also.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: