Because it failed miserably at a very simple task of looking through some scattered charts, the human asking should blame themselves for this basic failure and trust it to do better with much harder and more specialized tasks?
I think you might as well be saying "robotics fail miserably at the very simple task of jogging around the block, so why should we trust the field to be able to accurately place millions of transistors within a 25cm square of silicon?"
His point is that the two tasks are very different at their core, and deep research is better at teasing out an accurate "fuzzy" answer from a swamp of interrelated data, and a data scientist is better at getting an accurate answer for a precise, sharply-defined question from a sea of comma-separated numbers.
A human readily understands that "hold the onions, hots on the side" means to not serve any onions and to place any spicy components of the sandwich in a separate container rather than on the sandwich itself. A machine needs to do a lot of educated guessing to decide whether it's being asked to keep the onions in its "hand" for a moment or keep them off the sandwich entirely, and whether black pepper used in the barbeque sauce needs to be separated and placed in a pile along with the haberno peppers.
You seem to misunderstand my previous comment and also the thing being criticized by the post I replied to.
I understand that there are fuzzy tasks that AIs/algorithms are terrible at, which seem really simple for a human mind, and this hasn't gone away with the latest generations of LLMs. That's fine and I wouldn't criticize an AI for failing at something like the instructions you describe, for example.
However in this case, the human was asking for very specific, cut and dry information from easily available NFL rosters. Again, if an AI fails at that, especially because you didn't phrase the question "just so", then sorry, but no, it's not much more trustworthy for deep research and data scientist inquiries.
What in any case makes you think the data scientists will use superior phrasing to tease better results under more complexity from an LLM?
What I got out of that essay is that you should discredit most responses of LLMs unless you want to do just as much or more work yourself confirming the accuracy of an unreliable and deeply flawed partner. Whereas if a human "hallucinated a non-existent library or method you would instantly lose trust in them." But, for reasons, we should either give the machine the benefit of the doubt or manually confirm everything.
> If your reaction to this is “surely typing out the code is faster than typing out an English instruction of it”, all I can tell you is that it really isn’t for me any more. Code needs to be correct. English has enormous room for shortcuts, and vagaries, and typos, and saying things like “use that popular HTTP library” if you can’t remember the name off the top of your head.
Using LLMs as part of my coding work speeds me up by a significant amount.
Because it failed miserably at a very simple task of looking through some scattered charts, the human asking should blame themselves for this basic failure and trust it to do better with much harder and more specialized tasks?