Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> I've read a lot of misunderstandings about DeepResearch, which isn't helped by the multiplication of open and closed clones. OpenAI has not built a wrapper on top of O3.

It also doesn't help that they let you select a model varying from 4o-mini to o1-pro for the Deep Research task. But this confirms my suspicion that model selection is irrelevant for the Deep Research tasks and answering follow-up questions.

> Weirdly enough, while Claude 3.7 works perfectly in Claude Code, Cursor struggles with it and I've already seen several high end users cancelling their subscriptions as a result.

It's because Claude Code burns through tokens like there's no tomorrow, meanwhile Cursor attempts to carefully manage token usage and limit what's in context to remain profitable. It's gotten so bad that for any moderately complex task I switch to o1-pro or sonnet-3.7 in the Anthropic Console and max out the thinking tokens. They just released a "MAX" option but I can still tell its nerfed because it thinks for a few seconds whereas I can get up to 2 minutes of thinking via Anthropic Console.

Its abundantly clear that all these model providers are trying to pivot hard into productizing, which is ironic considering that the UX of all these model-as-a-product companies is so universally terrible. Deep Research is a major win, but OpenAI has plenty of fails: Plugins, Custom GPTs, Sora, Search (obsolete now?), Operator are maybe just okay for casual users - not at all a "product".



Anecdotally I noticed this in aider with 3.7 as well; the responses coming back from Claude 3.7 are wayyy more tokens than 3.5(+), and the model is a little less responsive to aider’s prompts. Upshot - it benchmarks better, but is frustrating to use and slower.

Using claude code, it’s clear that Anthropic knows how to get the best out of their model — and, the output spewing is hidden in the interface. I am now using both, depending on task.


When did people ever believe that model selection mattered when using Deep Research? The UI may be bad, but it was obvious from day one that it followed its own workflow.

Search within ChatGPT is far from obsolete. 4o + Search remains a significant advantage in both time and cost when handling real-time, single-step queries—e.g., What is the capital of Texas?


If you have not been reading every OpenAI blog post, you can't be blamed for thinking the model picker affects Deep Research, since the UI heavily implies that.


Hmmm i noticed it after two deep research tasks. No doubt bad UI but surprising folks here were confused for that long.


Single-step queries are far better handled by Kagi/Google search when you care about source quality, discovery and good UX, anything above that it's worth letting Deep Research do its thing in the background. I would go so far as say using Search with 4o you risk getting worse results than just asking the LLM directly - or at least that's been my experience.


YMMV as always but I get the same exact answers between Kagi's Quick Answer and ChatGPT Search, this includes sources.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: