I've not seen anyone seriously attempting to benchmark chatgpt output, without heavily cherry picking it first.
I've not seen anyone seriously attempting to benchmark chatgpt output, without heavily cherry picking it first.