Claude is extremely poor at vision when compared to Gemini and ChatGPT. i think anthropic severely overfit their evals to coding/text etc. use cases. maybe naively adding browser use would work, but I am a bit skeptical.
I have a completely different experience. Pasting a screenshot into CC is my de-facto go-to that more often than not leads to CC understanding what needs to be done etc…
The 1 million robot number that Amazon keeps on using is a quite nuanced. It includes more ~800K robots that simply just move stuff in a 2D plane. I think the number of robots that actually manipulate things is far far less (probably less than 500) (but really no human wants to just move things from A to B).
Also, I completely agree with what you said. Cars (w/ no self-driving) can be thought of as primitive robots (just like robots of today). For good or bad, we will move towards more and more automation.
The simple Kiva mobile platforms are most of the robot count, but they replaced large numbers of people who did walk around warehouses moving stuff from A to B.
ohh wow, that's bad, just tried this with Gemini 2.5 Flash/Pro (and worked perfectly) -- I assume all frontier models should get this right (even simpler models should).
I'd be willing to bet a more clear prompt would've given a good answer. People generally tend to overlook the fact that AIs aren't like "google". They're not really doing pure "word search" similar to Google. They expect a sensible sentence structure in order to work their best.
Maybe, but this sort of prompt structure doesn't bamboozle the better models at all. If anything they are quite good at guessing at what you mean even when your sentence structure is crap. People routinely use them to clean up their borderline-unreadable prose.
I'm all about clear prompting, but even using the verbatim prompt from the OP "ffmpeg command to convert movie.mov into a reasonably sized mp4", the smallest current models from Google and OpenAI (gemini-2.5-flash-lite and gpt-4.1-nano) both produced me a working output with explanations for what each CLI arg does.
Hell, the Q4 quantized Mistral Small 3.1 model that runs on my 16GB desktop GPU did perfectly as well. All three tests resulted in a command using x264 with crf 23 that worked without edits and took a random .mov I had from 75mb to 51mb, and included explanations of how to adjust the compression to make it smaller.
There's as much variability in LLM AI as there is in human intelligence. What I'm saying is that I bet if that guy wrote a better prompt his "failing LLM" is much more likely to stop failing, unless it's just completely incompetent.
What I always find hilarious too is when the AI Skeptics try to parlay these kinds of "failures" into evidence LLMs cannot reason. If course they can reason.
Less clarity in a prompt _never_ results in better outputs. If the LLM has to "figure out" what your prompt likely even means its already wasted a lot of computations going down trillions of irrelevant neural branches that could've been spent solving the actual problem.
Sure you can get creative interesting results from something like "dog park game run fun time", which is totally unclear, but if you're actually solving an actual problem that has an actual optimal answer, then clarity is _always_ better. The more info you supply about what you're doing, how, and even why, the better results you'll get.
I disagree. Less clarity gives them more freedom to choose and utilize the practices they are better trained on instead of being artificially restricted to something that might not be a necessary limit.
The more info you give the AI the more likely it is to utilize the practices it was trained on as applied to _your_ situation, as opposed to random stereotypical situations that don't apply.
LLMs are like humans in this regard. You never get a human to follow instructions better by omitting parts of the instructions. Even if you're just wanting the LLM to be creative and explore random ideas, you're _still_ better off to _tell_ it that. lol.
Not true and the trick for you to get better results is to let go of this incorrect assumption you have. If a human is an expert in JavaScript and you tell them to use Rust for a task that can be done in JavaScript, the results will be worse than if you just let them use what they know.
The only way that analogy remotely maps onto reality in the world of LLMs would be in a `Mixture of Experts` system where small LLMs have been trained on a specific area like math or chemistry, and a sort of 'Router pre-Inference' is done to select which model to send to, so that if there was a bug in a MoE system and it routed to the wrong 'Expert' then quality would reduce.
However _even_ in a MoE system you _still_ always get better outputs when your prompting is clear with as much relevant detail as you have. They never do better because of being unconstrained as you mistakenly believe.