This is the most useful documentation I've found so far to help understand how t... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		simonw 3 months ago \| parent \| context \| favorite \| on: Vision Now Available in Llama.cpp This is the most useful documentation I've found so far to help understand how this works: https://github.com/ggml-org/llama.cpp/tree/master/tools/mtmd...

scribu 3 months ago [–]

It’s interesting that they decided to move all of the architecture-specific image-to-embedding preprocessing into a separate library.

Similar to how we ended up with the huggingface/tokenizers library for text-only Tranformers.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact