great news ! sidenote : Does vision include the ability to read a pdf ?

diggan · 2025-05-10T11:31:22 1746876682

Vision = visual, while PDF is a container of sorts, usually containing images and text. So I guess the short answer is: 50% yes, the other part you can use any LLM for.

bsaul · 2025-05-10T13:02:48 1746882168

i'm asking because openai api has a special endpoint to deal with pdf, different from images.

Which part of a pdf file can you use LLMs for ? Pdf is a binary format..

diggan · 2025-05-10T13:12:58 1746882778

Yeah, that'd make sense, PDFs aren't images.

PDF isn't really a binary format, it starts with a text header, structure is mostly text-based objects and you can parse many PDFs as plain-text. They tend to contain embedded binary data though, which is the specific part these vision models can help you with, assuming they're images. The rest a "normal" LLM can parse just fine.