I feel like you're glossing over the contexts of "acceptable" and "questionable"...

I feel like you're glossing over the contexts of "acceptable" and "questionable".

The tool itself is not questionable or acceptable, it becomes questionable or acceptable depending on the usage. A pencil and paper can be questionable if the test is designed and expected to be completed without it.

You can design tests where an LLM spitting out python is an expected tool, but what are you testing for then? I doubt there are classes that teach whatever that test would be for yet.