Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm not convinced.

I had Gemini convert a bunch of charity forms yesterday, and the deviation was significant and problematic. Rephrasing questions, inventing new questions, changing the emphasis; it might be performing a lot better for numerical data sets, but it's rare to have one without a meaningful textual component.



I've seen similar. I wonder if traditional organizational solutions, like those employed by the US Military or IBM, might be applicable. Redundancy is one of their tools for achieving reliability from unreliable parts. Instead of asking a single LLM to perform the task, ask 10 different LLMs to perform the same task 10 different times and count them like votes.


Yeah, what I did to "solve" my issue was to use several models (4), then where there was any disagreement farm out to humans (2). 60% went to humans in the end.

I suspect if I'd done some corrective transformations before LLM scanning the success rate would have been higher, but the cost threshold of the project didn't warrant it.


Why complicate? One LLM works, another reflects and then a decision engine to review would be cheaper.


Not sure I believe this.

I just quickly took a scanned document and the transcription looks good.

https://19january2021snapshot.epa.gov/sites/static/files/201...

https://g.co/gemini/share/d315b4047224

It even got the faded partial date stamp.


Well bully for you accusing people of lying.

Thats one of the best scanned documents I've seen in years. Most scanning now is via phone.


Did you out as much work into it as Derek did? He spent a full hour with Gemini to process the longer document.


Use 2.5 Pro on ai studio, not the gemini app


I did. I was scanning about 400 forms.


That's what I did.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: