> Sonnet and Opus don't benchmark as well as O3/Grok4 at pure coding Do any of t...

sothatsit · 2025-07-19T00:34:47 1752885287

FWIW, people report that Grok 4 is not very good at coding, and xAI admit this themselves when they said they will be releasing a separate coding model in "the next few weeks".

Also, Google does have Gemini CLI, OpenAI does have Codex CLI, and then there is Aider which can support any model. I think the big difference is that Anthropic's models are the best for this use-case right now, and Anthropic has the Max plan which makes a massive difference to the cost of using Claude Code compared to competitors (although the Gemini CLI has insane free tiers).

I'm not sure how this will play out in the future, because it seems to me that Claude Code does not have much of a moat beyond Anthropic having the best coding models right now, and them offering model usage at heavily discounted prices.

ghuntley · 2025-07-19T01:06:57 1752887217

> people report that Grok 4 is not very good at coding

There are agentic models and oracle models. It can be modelled on a four-way quadrant of agent vs oracle and high safety vs low safety.

https://ghuntley.com/cars

Grok is oracle and low safety.

theshrike79 · 2025-07-20T13:15:17 1753017317

Grok4 is pretty decent at planning and figuring out libraries and APIs.

For code it falls down past simple scripts and utilities.