Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

One the one hand this is super cool and maybe very beneficial, something I definitely want to try out.

On the other, LLMs always make mistakes, and when it's this deeply integrated into other system I wonder how severe these mistakes will be, since they are bound to happen.



This.

Recently I uploaded screenshot of movie show timing at a specific theatre and asked ChatGPT to find the optimal time for me to watch the movie based on my schedule.

It did confidently find the perfect time and even accounted for the factors such as movies in theatre start 20 mins late due to trailers and ads being shown before movie starts. The only problem: it grabbed the times from the screenshot totally incorrectly which messed up all its output and I tried and tried to get it to extract the time accurately but it didn’t and ultimately after getting frustrated I lost the trust in its ability. This keeps happening again and again with LLMs.


And this is actually a great use of Agents because they can go and use the movie theater's website to more reliably figure out when movies start. I don't think they're going to feed screenshots in to the LLM.


Honestly might be more indicative of how far behind vision is than anything.

Despite the fact that CV was the first real deep learning breakthrough VLMs have been really disappointing. I'm guessing it's in part due to basic interleaved web text+image next token prediction being a weak signal to develop good image reasoning.


Is there anyone trying to solve OCR, I often think of that annas-archive blog about how we basically just have to keep shadow libraries alive long enough until the conversion from pdf to plaintext is solved.

https://annas-archive.org/blog/critical-window.html

I hope one of these days one of these incredibly rich LLM companies accidentally solves this or something, would be infinitely more beneficial to mankind than the awful LLM products they are trying to make


You may want to have a look at Mistral OCR: https://mistral.ai/news/mistral-ocr


This... what?


That is the problem. LLMs can't be trusted.

I was searching on HuggingFace for the model which can fit on my system RAM + VRAM. And the way HuggingFace shows the models - bunch of files, showing size for each file, but doesn't show the total. I copy-pasted that page to LLM and asked to count the total. Some of LLMs counted correctly, and some - confidently gave me totally wrong number.

And that's not that complicated question.


also LLMs mistakes tend to pile up , multiplying like probabilities. I wonder how scrabled a computer will be after some hours of use


Im currently working on a way to basically make LLM spit out any data processing answer as code which is then automatically executed, and verified, with additional context. So things like hallucinations are reduced pretty much to zero, given that the wrapper will say that the model could not determine a real answer.


Based on the live stream, so does OpenAI.

But of course humans makes a multitude of mistakes too.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: