It's quite a bit better than coding --- they hint that it can tie o1's performance for coding, which already benchmarks higher than 4o. And it's significantly cheaper, and presumably faster. I believe API costs account for the vast majority of COGS at most today's AI startups, so they would be very motivated to switch to a cheaper model that has similar performance.
Right. For large-volume requests that use reasoning this will be quite useful. I have a task that requires the LLM to convert thousands of free-text statements into SQL select statements, and o3-mini-high is able to get many of the more complicated ones that GPT-4o and Sonnet 3.5 failed at. So I will be switching this task to either o3-mini or DeepSeek-R1.