Currently ChatGPT isn't, at least via public access, hooked up to a compiler or interpreter that it can use to feed the code it generates into and determine whether it executes as expected. That wouldn't even seem particularly difficult to do, and once it is, ChatGPT would literally be able to train itself how to get the desired result.
Precisely. I think people should consider the "v4" in "ChatGPT 4" as more like "0.4 alpha".
We're very much in the "early days" of experimenting with how LLMs can be effectively used. The API restrictions enforced by OpenAI are preventing entire categories of use-cases from being tested.
Expect to see fine-tuned versions of LLaMA run circles around ChatGPT once people start hooking it up like this.