o3 and o3-pro are just so good. Sonnet goes off the deep end too often and Opus, in my experience, is not as strong at reasoning compared to OpenAI, despite the higher costs. Rarely do we see a worse, more expensive product win - but competition is good and I’m rooting for Anthropic nonetheless!
OpenAI also has Flex processing[1] for o3. I've spent most of my time with Gemini 2.5, but lately been trying out a ton of o3 as it seems to work quite well and I get really cheap tokens (~95% of my agentic tokens are cached which is 75% discount and flex mode adds 50% for $0.25 / million input tokens)
I've made my own fork of Codex that always uses flex, or you can route agents through litellm and make it add the service_tier parameter. I haven't really seen native support for it anywhere.
o3 feels pretty good to me as well but o3-pro has consistently one shotted problems other LLMs got stuck on.
I'm talking multiple tries of claude 4 opus, Gemini 2.5 pro, o3 etc resulting in sometimes hundreds of lines of code.
Versus o3-pro (very slowly) analyzing and then fixing something that seemed completely unrelated in a one or two line change and truly fixing the root cause.
o3-pro level LLMs at reduced cost and increased speed will already be amazing..
Probably referring to it's tendency to over-complicate things to the point you have to step in and be like "WTF are you even talking about... Wouldn't it be a lot simpler to just use the original, well planned out design?"