o3 and o3-pro are just so good. Sonnet goes off the deep end too often and Opus,...

bayesianbot · 2025-08-06T01:04:35 1754442275

OpenAI also has Flex processing[1] for o3. I've spent most of my time with Gemini 2.5, but lately been trying out a ton of o3 as it seems to work quite well and I get really cheap tokens (~95% of my agentic tokens are cached which is 75% discount and flex mode adds 50% for $0.25 / million input tokens)

[1] https://platform.openai.com/docs/guides/flex-processing?api-...

esafak · 2025-08-06T03:58:39 1754452719

Which agents support flex mode?

bayesianbot · 2025-08-06T04:06:29 1754453189

I've made my own fork of Codex that always uses flex, or you can route agents through litellm and make it add the service_tier parameter. I haven't really seen native support for it anywhere.

WXLCKNO · 2025-08-05T19:24:51 1754421891

o3 feels pretty good to me as well but o3-pro has consistently one shotted problems other LLMs got stuck on.

I'm talking multiple tries of claude 4 opus, Gemini 2.5 pro, o3 etc resulting in sometimes hundreds of lines of code.

Versus o3-pro (very slowly) analyzing and then fixing something that seemed completely unrelated in a one or two line change and truly fixing the root cause.

o3-pro level LLMs at reduced cost and increased speed will already be amazing..

AlecSchueler · 2025-08-05T18:21:00 1754418060

Off the deep end?

derwiki · 2025-08-06T02:49:02 1754448542

It picks a bad path forward and keeps doubling down on it

UncleEntity · 2025-08-06T00:56:14 1754441774

Probably referring to it's tendency to over-complicate things to the point you have to step in and be like "WTF are you even talking about... Wouldn't it be a lot simpler to just use the original, well planned out design?"

Which it does a lot...