> because it's not optimized to solve problems and provide answers The entire po...

> because it's not optimized to solve problems and provide answers

The entire point of RLHF training is to do this. Every model since GPT-3.0 has been trained specifically for this purpose.

But of course the model can only generate text in one direction and can't take time to "think" or undo anything it's generated.