The idea that a general language model like GPT-3 can answer questions intelligently is utterly absurd. It's trained to get language right (where "right" is defined as similar enough to the way people speak (or mostly write) to be intelligible as language), but it does so without any underlying knowledge model to make the intelligible language relevant to any given area of knowledge. Human language is not knowledge; it's a means for articulating our knowledge (that is, domain specific models of the world) in a way that other people can understand and translate into their own particular models.
So what is needed is the capabilities of GPT-3 or other language generators sitting on top of domain specific knowledge models, and constrained by those models.
Asking GPT-3 a general knowledge question is like asking an articulate 5 year old a question like "how does gravity work?" You'll get gramatically meaningful answers that use the structure of the language correctly, but that are quite likely to have nothing to do with our actual understanding of physics.
This is not wrong, but also not entirely right. There is a model called T0pp (T0 plus plus) which was fine tuned on simple logic problems, and it is capable of solving novel logic problems. This implies to me that there is more here than we've discovered.
Additionally, the whole point of fine tuning LLMs is to give them domain-specific knowledge. If you couple this with search/QA capabilities, the results can be quite impressive. I've not seen them in the wild yet, but I've played with them myself, and the performance is surprisingly good.
I don't know anything about the T0pp, so can't comment on that.
I agree with you on tuning of LLMs. We did some work at my last job before I retired (as CTO of a major medical clinical and research organization) using GPT-3 up-trained on medical vocabulary to generate physician's notes as a summary of a transcribed visit. The results were impressive. Still not usable though. Most of what it generated was correct (and essentially all of it was well composed and readable), but false statements and non sequiturs still crept in at an unacceptable rate.
I think the technology is amazing, and very valuable. But I do also think that tying it to "hard" knowledge models - akin to the way deep physics is done, but coupling to the language model, rather than to generalized neural networks, is going to prove will eventually make it a complete success in specific domains.
So what is needed is the capabilities of GPT-3 or other language generators sitting on top of domain specific knowledge models, and constrained by those models.
Asking GPT-3 a general knowledge question is like asking an articulate 5 year old a question like "how does gravity work?" You'll get gramatically meaningful answers that use the structure of the language correctly, but that are quite likely to have nothing to do with our actual understanding of physics.