Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

great news ! sidenote : Does vision include the ability to read a pdf ?


Vision = visual, while PDF is a container of sorts, usually containing images and text. So I guess the short answer is: 50% yes, the other part you can use any LLM for.


i'm asking because openai api has a special endpoint to deal with pdf, different from images.

Which part of a pdf file can you use LLMs for ? Pdf is a binary format..


Yeah, that'd make sense, PDFs aren't images.

PDF isn't really a binary format, it starts with a text header, structure is mostly text-based objects and you can parse many PDFs as plain-text. They tend to contain embedded binary data though, which is the specific part these vision models can help you with, assuming they're images. The rest a "normal" LLM can parse just fine.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: