RL can in principle recover necessary subsystems, but you can't tell if that has actually happened. A human designed system may not be as comprehensive asymptotically as a learned system, but the designer can ensure that critical portions of the state space are covered.
Will it perform worse in this case, with this system, though? What is "enough data"? And how do you expect to get any sense of why the system made a particular decision if all you have is a distribution of probabilities from such a simplistic policy?
A simple model based RL system, like this (https://webdocs.cs.ualberta.ca/~sutton/book/ebook/figtmp63.p...), trained with enough data would implicitly recover all the necessary sub-systems.