I had in mind the datasets of Easy2Hard-Bench that the study tested against: math competitions, math word problems, programming, chess puzzles, science QA, and commonsense reasoning.
The last problem like this that I myself asked an LLM to solve was to find tax and base price of items on an invoice given total price and tax rates. I couldn't make sense of the answer, but asking the LLM questions made me realize that I had framed the problem badly, and moreso that I didn't know how to ask. (Though the process also triggered a surprising ability of my own to dredge up and actually apply basic algebra.) I'm sure it's that I'm still learning what and how to ask.
The last problem like this that I myself asked an LLM to solve was to find tax and base price of items on an invoice given total price and tax rates. I couldn't make sense of the answer, but asking the LLM questions made me realize that I had framed the problem badly, and moreso that I didn't know how to ask. (Though the process also triggered a surprising ability of my own to dredge up and actually apply basic algebra.) I'm sure it's that I'm still learning what and how to ask.