The problem setting for RL algorithms provides far less supervision than traditional supervised learning.
This is most clearly seen when you look at how you would train a supervised learning system to operate an RL agent - you would need to provide the correct action at every timestep. So RL algorithms are mostly interesting when you get periodic reward signals and the reward may depend significantly on actions you did previously, rather than the action you just did. Learning to grip objects is an interesting use case from robotics.
IMO the main reason it's getting more attention is that there is a lot of progress being made, and a lot of that progress is due to progress that is being made in the supervised learning of neural networks.
However, people see some strong parallels between RL and GANs which promise to greatly improve unsupervised and semi-supervised learning. Also there has been work on using RL algorithms (largely REINFORCE) to train non-differentiable parts of neural networks. And then there has been recent work on using RL to decide how to train neural nets over all.
So while most people in industry may never need to touch RL, it will be useful in some systems with time-dependent components and is worth learning from a research perspective.
Thanks. I can certainly see how time-series & planning actions are currently a tough fit for existing approaches. (I always chuckle a bit when I see things like having to turn time-series into stationary distributions, effectively attempting to remove the time information from the data completely)
This is most clearly seen when you look at how you would train a supervised learning system to operate an RL agent - you would need to provide the correct action at every timestep. So RL algorithms are mostly interesting when you get periodic reward signals and the reward may depend significantly on actions you did previously, rather than the action you just did. Learning to grip objects is an interesting use case from robotics.
IMO the main reason it's getting more attention is that there is a lot of progress being made, and a lot of that progress is due to progress that is being made in the supervised learning of neural networks.
However, people see some strong parallels between RL and GANs which promise to greatly improve unsupervised and semi-supervised learning. Also there has been work on using RL algorithms (largely REINFORCE) to train non-differentiable parts of neural networks. And then there has been recent work on using RL to decide how to train neural nets over all.
So while most people in industry may never need to touch RL, it will be useful in some systems with time-dependent components and is worth learning from a research perspective.