You certainly could, but that doesn't entirely account for shading / system degradation / site-specific diffuse light opportunities (consider a huge amount of light reflecting off the side of a mountain at some time of day). Those are both really difficult and time-intensive to model for, so there's a desire to have an AI that can simply learn those things specific to the system it's optimizing without humans having to do it. I see the larger impact of RL as scaling humanity's problem solving capability. If we have to use N human hours per installation to get to 97% optimality per installation but RL can use N/10000000 per installation to get to 95%, we could free up all those N human hours for things that RL still struggles with. Just my 2 cents though, it's a very fair question
I'm more and more convinced of this. Control theory appears to be like lightsabers, "a refined weapon for a more civilized age". It's really unfortunately that the controls literature is so opaque.
>> Control theory is better if you know what you’re doing.
The "if you know what you’re doing" here does not refer to the ability to understand control theory. It means that if you know the underlying dynamics, there is mathematically nothing better than controlling those dynamics. Flying a plane, oscillating a circuit, etc. are all things we can do very well without ML because we have exact models of the physical phenomena. Playing chess has no dynamics, control theory is useless. Anything where the dynamics are not "nice" differential equations, ML is probably easier at learning the dynamics than coming up with an ansatz.
There are areas of control theory where you can learn the dynamics ("adaptive control"). The advantage over RL is that in control theory, you generally assume the dynamics are described by differential equations (sometimes difference equations), not by Markov decision processes. MDPs are more general, but basically any physical mechanism you're going to control doesn't need that generality.
There is a surprising amount of structure imposed by the assumption that the dynamics are differential equations, even if you don't know what the differential equations look like. As a consequence, adaptive control laws generally converge a lot faster (like, orders of magnitude faster) than MDP-based RL approaches on the same system being controlled.
The other advantage is that you can prove stability and in some cases have an idea of your performance margin with control theory. THis is important if you eg want your system to receive any sort of accreditation or if you want to fit it into the systems engineering of a more complex system. There's a reason autopilots don't use RL, and it isn't that RL can't be made to work. It's that you can't rigorously prove how robust the RL policy is to changes in the airplane dynamics.