I find claude models often use “tricks” like bash one liners, essentially excelling at surgical fixes. It does what i want more reliably, on smaller tasks.
GPT-5 can often be better at larger architectural changes, but i find that comes at the cost of instability/broken PRs. It often fails to capture intent or argues back, or just completely spirals out of control more often.
GPT-5 codex seemed to refuse valid requests like “make a change to break a test so we can test CI” (it over indexed on our agents.md and other instructions and then refused on the basis of “ethics” or some such)
More like “i tried what others claim extensively and it does not work for me, please let me know if im doing something wrong” — to which the response is often yours, reframing the observation as a fallacy.
I see many people insisting it didn't work when they tried it for some little thing, therefore it's broken and useless. And a few people saying, actually it works really well if you're willing to learn how to use it.
I'm not sure I've ever seen someone here saying it hasn't worked but they're open to learning how to use it right. It's definitely not common.
reply