Whenever I read something like this I do definitely think "you're using it wrong". This question would've certainly tripped up earlier models but new ones have absolutely no issue making this with sources for each question. Example:
(the 7 minutes thinking is because ChatGPT is unusually slow right now for any question)
These days I'd trust it to accurately give 100 questions only about Homer. LLMs really are quite a lot better than they used to be by a large margin if you use them right.
https://chatgpt.com/share/69160c9e-b2ac-8001-ad39-966975971a...
(the 7 minutes thinking is because ChatGPT is unusually slow right now for any question)
These days I'd trust it to accurately give 100 questions only about Homer. LLMs really are quite a lot better than they used to be by a large margin if you use them right.