Tesseract is quite good and can get some really good results if you can train it.
I just wish there was a more UI friendly way for generating the training file; for the project where I had to use it I ended up paying a freelancer with some strong knowledge of Tesseract training.
Really happy with the results for my iOS app (universal app)
So if the source image contains text columns or pull quotes or similar, the output text will just be each row of text, from the far left to the far right.
Yeah, you could do that. And I wouldn't think it would be that hard either.
I'd make a cascade that detects all letters and numbers from major font sets. That shouldn't be too terribly difficult.
Now, use the cascade to scan the document. Now, convert the document to a list of all detected characters (we don't actually care what the chars are).
Once you have this, do best fit bounding boxes around the data. You'll have to figure out what distance you want to exclude from the bounding boxes.
Now what you should end up with are a few boxes indicating the regions of data on the document. Now, crop each of these regions of interest and feed them into Tesseract.