Hacker Newsnew | past | comments | ask | show | jobs | submit | F7F7F7's commentslogin

It depends honestly. Both are prone to doing the exact opposite of what you asked. Especially with poor context management.

I’ve had both $200 plans and now just have Max x20 and use the $20 ChatGPT plan for an inferior Codex.

My experience (up until today) has always been that Codex acts like that one Sr Engineer that we all know. They are kind of a dick. And will disappear into a dark hole and emerge with a circle when you asked for a pentagon. Then let you know why edges are bad for you.

And yes, Anthropic is pivoting hard into everything agentic. I bet it’s not too long before Claude Code stops differentiating models. I had Opus blow 750k tokens on a single small task.


There’s a correlation between getting the “How’s Claude Doing This Session?” (Or whatever) and four letter words.

It’s not always then, but it often follows it.


Right after the Holiday double token promotion users felt (perceived) a huge regression in capabilities. I bet that triggered the idea.

Whenever I see new behaviors and suspect I’m being tested on I’ll typically see a feedback form at some point in that session. Well, that and dropping four letter words.

I know it’s more random sampling than not. But they are definitely using our codebases (and in some respects our livelihoods) as their guinea pigs.


I’ve had Opus struggle on trivial things that Sonnet 3.5 handled with ease.

It’s not so much that the implementations are bad because the code is bad (the code is bad). It’s that it gets extremely confused and starts to frantically make worse and worse decisions and questioning itself. Editing multiple files, changing its mind and only fixing one or two. Reseting and overriding multiple batches of commits without so much as a second thought and losing days of work (yes, I’ve learned my lesson).

It, the model, can’t even reason with the decisions it’s making from turn to turn. And the more opaque agentic help it’s getting the more I suspect that tasks are being routed to much lesser models (not the ones we’ve chosen via /model or those in our agent definitions) however Anthropic chooses.

In these moments I mind as well be using Haiku.


Multiple concurrences a choir or a mob?

1pm EST time it’s all down hill until around 8 or 9pm EST time.

Late nights and weekends is smooth sailing.


“Just drink the water, it’s all water.”

My Dad shipped a M car overseas a decade ago. The is fine. The keys are dead.

Since the islands he now lives on has no BMW presence they want him to ship the car back to get new keys.


I knew what an SRE was and found the article somewhat interesting with a slightly novel (throwaway), more realistic take, on the "why need Salesforce when you can vibe your own Salesforce convo."

But not defining what an SRE is feels like a glaring, almost suffocating, omission.


Claude likely built their front end.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: