Hacker News new | past | comments | ask | show | jobs | submit login

The hard part about document conversion is not finding a tool which can convert the formats but the tool which does it best. I wonder how MarkItDown ranks for the tasks for the various types.



The README of MarkItDown mentions "indexing and text analysis" as the two motivating features, whereas Pandoc is more interested in document preparation via conversion that maintains rich text formatting.

Since my personal use leans towards the latter, I'm hesitant to believe that this tool will work better for me but others may have other priorities.


MarkItDown feels like running strings; the output is great for text extraction and processing, not for reading by humans




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: