I may be naive here, but is there anything preventing running regular OCR over a page, and then feeding whatever it couldn't deal with into this? Sure, the plumbing for this is probably missing, but it sounds more like a matter of picking up the shovel rather than inventing something.
I've done some work cleaning up documents for use by a blind student. OCR starts to fail really badly when math is involved. Common errors like O versus 0, any accents or additions to characters, small formulas embedded in sentences, graphs with captions, etc. Can all throw things off considerably.
Best case I could copy and paste paragraphs at a time from a PDF of the textbook (with copy protection removed). Worst case I was retyping or fixing every few words in a sentence.
I was working on this from about 2011-2013. Advances with image processing and machine learning have been significant since then, so there may be much better software available now.
If anyone has ideas or packages they'd recommend, I'd be interested.