It depends honestly. Both are prone to doing the exact opposite of what you asked. Especially with poor context management.
I’ve had both $200 plans and now just have Max x20 and use the $20 ChatGPT plan for an inferior Codex.
My experience (up until today) has always been that Codex acts like that one Sr Engineer that we all know. They are kind of a dick. And will disappear into a dark hole and emerge with a circle when you asked for a pentagon. Then let you know why edges are bad for you.
And yes, Anthropic is pivoting hard into everything agentic. I bet it’s not too long before Claude Code stops differentiating models. I had Opus blow 750k tokens on a single small task.
Whenever I see new behaviors and suspect I’m being tested on I’ll typically see a feedback form at some point in that session. Well, that and dropping four letter words.
I know it’s more random sampling than not. But they are definitely using our codebases (and in some respects our livelihoods) as their guinea pigs.
I’ve had Opus struggle on trivial things that Sonnet 3.5 handled with ease.
It’s not so much that the implementations are bad because the code is bad (the code is bad). It’s that it gets extremely confused and starts to frantically make worse and worse decisions and questioning itself. Editing multiple files, changing its mind and only fixing one or two. Reseting and overriding multiple batches of commits without so much as a second thought and losing days of work (yes, I’ve learned my lesson).
It, the model, can’t even reason with the decisions it’s making from turn to turn. And the more opaque agentic help it’s getting the more I suspect that tasks are being routed to much lesser models (not the ones we’ve chosen via /model or those in our agent definitions) however Anthropic chooses.
I knew what an SRE was and found the article somewhat interesting with a slightly novel (throwaway), more realistic take, on the "why need Salesforce when you can vibe your own Salesforce convo."
But not defining what an SRE is feels like a glaring, almost suffocating, omission.
I’ve had both $200 plans and now just have Max x20 and use the $20 ChatGPT plan for an inferior Codex.
My experience (up until today) has always been that Codex acts like that one Sr Engineer that we all know. They are kind of a dick. And will disappear into a dark hole and emerge with a circle when you asked for a pentagon. Then let you know why edges are bad for you.
And yes, Anthropic is pivoting hard into everything agentic. I bet it’s not too long before Claude Code stops differentiating models. I had Opus blow 750k tokens on a single small task.
reply