Hacker News new | past | comments | ask | show | jobs | submit login

There's no run to any OCR, first step or not.

And you have no idea what you're talking about.




You understand that OCR is the process of extracting text from images, right? You know, such as what Gemini does, and they reference repeatedly in their paper. I have absolutely no idea why you repeatedly make some bizarre distinction about it being a "separate process".

Okay, it's been fun talking to you but feel free to have the last word. Good luck.


The transformer (Gemini) predicts text with image and text in the context window. That's it.

OCR, Object detection etc all come from the transformer predicting text. Read the Flamingo paper.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: