I've found that they struggle with understanding time and dates, and are sometim...

jdauriemma · 2025-07-16T19:25:42 1752693942

This is similar to many experiences I've had with LLM tools as well; the more complex and/or multi-step the task, the less reliable they become. This is why I object to the "graduate-level" label that Sam Altman et al. use. It fundamentally misrepresents the skill pyramid that makes a researcher (or any knowledge worker) effective. If a researcher can't reliably manage a to-do list, they can't be left unsupervised with any critical tasks, despite the impressive amount of information they can bring to bear and the efficiency with which they can search the web.

That's fine, I get a lot of value out of AI tooling between ChatGPT, Cursor, Claude+MCP, and even Apple Intelligence. But I have yet to use an agent that has come close to the capabilities that AI optimists claim with any consistency.