Of course, if the document is using the outline in unexpected ways, you'll run into trouble. Consider Facebook infamously splitting "Advertisement" into multiple spans to avoid tripping ad blockers.
Although you'd imagine screenshots would be easy to OCR reliably, it's not guaranteed to get everything correct.
It's not like you can rely on a dictionary to confirm you've correctly OCRed a post by "@4EyedJediO" - who knows if that's an O or a 0 at the end?
And if you're OCRing the title and view count of a youtube video, for example, you've got to take the page layout into account because there's a recommendations sidebar full of other titles with different view counts.
I guess you'd get better results if you knew the font the site uses (which in many cases you could figure it out pretty quickly) or even just override every font with your own.