One the one hand this is super cool and maybe very beneficial, something I defin...

gordon_freeman · 2025-07-17T17:34:05 1752773645

This.

Recently I uploaded screenshot of movie show timing at a specific theatre and asked ChatGPT to find the optimal time for me to watch the movie based on my schedule.

It did confidently find the perfect time and even accounted for the factors such as movies in theatre start 20 mins late due to trailers and ads being shown before movie starts. The only problem: it grabbed the times from the screenshot totally incorrectly which messed up all its output and I tried and tried to get it to extract the time accurately but it didn’t and ultimately after getting frustrated I lost the trust in its ability. This keeps happening again and again with LLMs.

barbazoo · 2025-07-17T19:02:59 1752778979

And this is actually a great use of Agents because they can go and use the movie theater's website to more reliably figure out when movies start. I don't think they're going to feed screenshots in to the LLM.

tootyskooty · 2025-07-17T17:56:56 1752775016

Honestly might be more indicative of how far behind vision is than anything.

Despite the fact that CV was the first real deep learning breakthrough VLMs have been really disappointing. I'm guessing it's in part due to basic interleaved web text+image next token prediction being a weak signal to develop good image reasoning.

polytely · 2025-07-17T18:04:37 1752775477

Is there anyone trying to solve OCR, I often think of that annas-archive blog about how we basically just have to keep shadow libraries alive long enough until the conversion from pdf to plaintext is solved.

https://annas-archive.org/blog/critical-window.html

I hope one of these days one of these incredibly rich LLM companies accidentally solves this or something, would be infinitely more beneficial to mankind than the awful LLM products they are trying to make

Metricon · 2025-07-18T02:33:46 1752806026

You may want to have a look at Mistral OCR: https://mistral.ai/news/mistral-ocr

kurtis_reed · 2025-07-17T18:46:42 1752778002

This... what?

SlavikCA · 2025-07-17T17:43:16 1752774196

That is the problem. LLMs can't be trusted.

I was searching on HuggingFace for the model which can fit on my system RAM + VRAM. And the way HuggingFace shows the models - bunch of files, showing size for each file, but doesn't show the total. I copy-pasted that page to LLM and asked to count the total. Some of LLMs counted correctly, and some - confidently gave me totally wrong number.

And that's not that complicated question.

seydor · 2025-07-17T19:58:25 1752782305

also LLMs mistakes tend to pile up , multiplying like probabilities. I wonder how scrabled a computer will be after some hours of use

ActorNightly · 2025-07-17T19:32:01 1752780721

Im currently working on a way to basically make LLM spit out any data processing answer as code which is then automatically executed, and verified, with additional context. So things like hallucinations are reduced pretty much to zero, given that the wrapper will say that the model could not determine a real answer.

tomjen3 · 2025-07-17T20:31:25 1752784285

Based on the live stream, so does OpenAI.

But of course humans makes a multitude of mistakes too.