Probably massive amounts of data but it depends on how far off the simulation is from reality and the expected cost of collecting that data vs running the simulation a few billion times.
Not far off and you want perfect results? Go with actual data. Very far off where it's not even usable? You'd need to go back to the training environment and redo the reward function.
Generally speaking (and I'm not sure if this is the case from reading the link but I may have missed it) reinforcement learning for optimal control is done to fine tune while traditional control methods are used for coarse adjustments. Since this is deepmind they probably want to use RL for everything
Not far off and you want perfect results? Go with actual data. Very far off where it's not even usable? You'd need to go back to the training environment and redo the reward function.
Generally speaking (and I'm not sure if this is the case from reading the link but I may have missed it) reinforcement learning for optimal control is done to fine tune while traditional control methods are used for coarse adjustments. Since this is deepmind they probably want to use RL for everything