Learning Reinforcement Learning, with Code, Exercises, and Solutions

joanderson · on Oct 2, 2016

I remember reading a comment sometime ago on this site where people on HN seem to have a positive bias towards topics on machine learning and would always upvote it. ML seems to be such a tech buzzword these days. This blog post is at the top of this site, yet there is not a single comment of discussion. I am not trying to sound negative, but rather was wondering if other people shared the same opinion.

Regarding the post: This seems like a useful resource. When I read many of these papers, a code supplement makes understanding it so much better. I do a lot of research with RNNs with respect to language modeling and implementing various models when I started researching this field was very useful to get a better understanding. I got great feedback on my implementations where people said it helped them understand.

j1f4 · on Oct 2, 2016

I have a hunch that learning resources get a significant bump from people using upvotes as bookmarks.

I think the ML results and demo posts are mostly getting upvotes because the results are fascinating and the posts are often excellently written. Image manipulation in particular lends itself towards posts that are appealing both to people casually skimming articles and to people looking for some technical depth.

dennybritz · on Oct 2, 2016

Another reason may be that this is a pure "resource" post that doesn't make an argument or represents personal opinions that people could easily comment on. It's not a good basis to start a discussion, unlike many other HN posts.

However, I'd appreciate more comments of course ;)

joanderson · on Oct 2, 2016

I guess you raise a good point since it is only a "resource" post. My comment was geared toward all ML related posts in general.

I do research in ML so I love seeing these posts, but the fact that words like neural networks has become such a buzzword is somewhat disappointing. I remember all the buzz when Swiftkey released their "neural network" keyboard simply because of the name.

coredog64 · on Oct 3, 2016

More than a few times I've seen a resource post like this that has garnered discussion on the usefulness of what was being done. In a few extreme cases, HN has exposed people copy/pasting the work of others and linked to the originals.

argonaut · on Oct 2, 2016

Yes, and I don't think it's just people using upvoting as bookmarks (most people I know don't do that).

I think it has to do with the fact that HN is a very diverse mix of folks (technical, non-technical, working at startup, working at big co, web devs, mobile devs, infra dev, etc). There isn't a concentration of ML-technical folks like you might find on /r/machinelearning (decent technical ML discussion). Instead you have here a mix of futurology speculation and tutorial discussion - lots of introductory material of highly varying quality (since the people upvoting don't really know if it's good), science fiction passed off as credible opinion, highly technical ML posts that get upvoted a lot but no comments, etc.

sprobertson · on Oct 2, 2016

My guess (based on my own upvote) is that people want to know about it but don't currently know enough to contribute.

chewxy · on Oct 2, 2016

Heh. Only if you are talking about the sexy machine learning bits - reinforcement learning's this year's hot sexy one. A few years ago, it's convolutional neural networks, followed by recurrent nets.

Nobody thinks SVM is cool anymore :S

Part of this is due to, I think the immense amount of marketing pushed by Big Corp on their recent AI successes.

curiousgal · on Oct 2, 2016

I admit to upvoting solely as a means to bookmark it.

master_yoda_1 · on Oct 2, 2016

Agree people blindly upvote ml topic. That explain why there is never a c++ or java programming topics on top news.

joanderson · on Oct 2, 2016

Regarding your second point, posts about C++ or Java seem to be lacking simply because the languages evolve over a multi year phase. I do remember seeing posts about C++17 on HN and there being discussion about it. People will occasionally post libraries they made in that language. I don't do a lot of Java development, but I am interested in C++ and if you are too I would check out r/cpp if you haven't. Many of the posters are people doing a lot of work on C++ - compiler developers, people working on the standard, etc.

brockf · on Oct 2, 2016

It's great to see more hacker-friendly introductions to reinforcement learning. Like most facets of machine learning, there are so many interesting applications of reinforcement learning (e.g., we're using RL to optimize email marketing campaigns at Optimail), and we'll only find more as more non-academic hackers discover it.

make3 · on Oct 2, 2016

... still, I think it does people disservice to not make it clear that ml is math and that it can only really be done well if you understand what is happening, ie, if you take the time to understand the maths, which is not that hard btw.

Calling something cool/hacking because you don't want to take the time to understand the maths is something Trump would do if he was a programmer

spenuke · on Oct 2, 2016

Can you expand on what math you're talking about that is both necessary and not that hard? "Not that hard", to me, indicates that a reasonably intelligent person could teach themselves without university instructors (present or past).

notbigml · on Oct 2, 2016

I find the python code very clear, but I would prefer to see a real life interesting application that doesn't require a lot of computation. In a post in wildml there is an example of using NLP and deep learning for a simple task but after 22 hours of computation the final result is a little disappointing to say the least.

I like to read wildml.com and fastml.com blogs, but I would like to find more simple applications that shows real value without using lots of resources. Perhaps there is a subfield of RL where using some kind of proper human intelligence one can hope to beat those giants provided of unlimited computational and financial resources

mdda · on Oct 3, 2016

I gave a talk a PyConSG this year[1], which included a demonstration of training a Reinforcement Learning model on a 'Bubble Breaker' game. There's also more detail available[2].

The Jupyter notebook is included in the GitHub repo[3], and includes a 'scaled down version' that takes ~5mins to train on a MacBook's CPU. There's also a downloadable 'full scale' model that was trained in ~7hours on a Titan X. It plays the game (on average) better than me...

[1] http://blog.mdda.net/ai/2016/06/23/workshop-at-pycon-sg-2016 (has slides, and YouTube link) [2] http://redcatlabs.com/2016-07-30_FifthElephant-DeepLearning-... [3] https://github.com/mdda/deep-learning-workshop : have a look at notebooks/7-Reinforcement-Learning.ipynb

Tsagadai · on Oct 3, 2016

Most RL algorithms are polynomial time or worse, and they use large datasets. Computation is always going to be an issue which is why most successful implementations are around simplifying models and datasets.

If you can figure out a way of making RL better than polynomial time there is at least a Turing Award for you.

orthoganol · on Oct 2, 2016

> but RL is also widely used in Robotics, Image Processing and Natural Language Processing

RL for NLP? I would love to know about counter examples, but I'm not aware of a serious project using RL for NLP, let alone 'widely used.' However I do believe RL makes sense for a number of NLP problems.

Either way, well done. I appreciate a collection of the algos from Sutton's book (great book), and in Python.

dennybritz · on Oct 2, 2016

A lot of recent research uses RL to "fine-tune" NLP models. A practical example would be Google's recently announced Machine Translation System (https://arxiv.org/abs/1609.08144). It uses RL to directly optimize BLEU scores on translated sentences.

You'll find similar applications in state-of-the art models for chatbots for example. Though I agree, "widely used" may be somewhat of an overstatement. But it's becoming more common.

On a side note, I actually think RL makes a lot of sense for many NLP problems and it would be super interesting to build a pure RL approach to language modeling or translation. Nobody has managed to do that quite yet.

orthoganol · on Oct 4, 2016

I agree, I would love to see or even work on pure RL approaches in NLP. I would also like to see HMMs explored more, which I think also make a lot of sense for NLP, if you think of the problem as sequences of hidden states represented by semantic frames or other hand crafted language features producing phrases and sentences.

chewxy · on Oct 2, 2016

I'm doing that. Database queries are (relatively) expensive actions in what I'm doing, and reinforcement learning sorta cuts down on the amount of db querying that my NLP program does. Way too much time is spent finetuning it, and I find myself often lost in the model