This is a strange document. They don't mention supervised instruction finetuning anywhere. You need (and can!) only really apply "prompt engineering" of this kind to a foundation model which just completes text. An instruction tuned model is no longer a text completer, it is something which models an agent and understands what you ask it to do. No need or possibility for prompt engineering of this kind. (The foundation model for GPT-4 was not made publicly available, by the way, and for GPT-3.5 it was removed from the API a few weeks ago.)
It it is worth mentioning that the instruction tuned models are not necessarily better, since they can exhibit "mode collapse", a loss in entropy, where they e.g. tend to produce content which is very similar in style.
No you're not. I too enjoy working with text the completion LLMs have been able to do for some time. The issue with text completion is that most people don't want to be forced to think about possible document headers when they want inferred answers.
Another problem is that OpenAI doesn't want their customers to access them anymore. They may be considered too dangerous, since they are not just not instruction tuned, but also not censored (RLHF'd). So people have to use less powerful base models, which cancels out their increased flexibility.
I guess so, at least this is what people are reporting who have a lot of experience with language models, like janus (see link in sibling).
Though I should mention that mode collapse doesn't just come from supervised instruction tuning (which let the model reply to requests instead of treating them as completion prompts), but also from things like RLHF, which bias the model to give certain replies rather than others.
What the commenters there didn't realize at the time is that code-davinci-002 has nothing to do with the "Codex API" specifically. It is simply the GPT-3.5 foundation model without fine-tuning applied to it. See
It it is worth mentioning that the instruction tuned models are not necessarily better, since they can exhibit "mode collapse", a loss in entropy, where they e.g. tend to produce content which is very similar in style.