Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

i've made quite good conversions from pdf to markdown with https://github.com/VikParuchuri/marker . it's slow but worth a shot. Markdown should be easily parseable by a rag.

i'm trying to get a similar system setup on my computer.



This looks worth exploring, so thanks. The author has done a bunch of work beyond what PyMuPDF does on multicolumn layouts.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: