Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Use pymupdf to extract the PDF text. Hell, run that nasty business through an LLM as step-2 to get a beautiful clean markdown version of the text. Lord knows the PDF format is horribly complex!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: