Yeah, you could do that. And I wouldn't think it would be that hard either.
I'd make a cascade that detects all letters and numbers from major font sets. That shouldn't be too terribly difficult.
Now, use the cascade to scan the document. Now, convert the document to a list of all detected characters (we don't actually care what the chars are).
Once you have this, do best fit bounding boxes around the data. You'll have to figure out what distance you want to exclude from the bounding boxes.
Now what you should end up with are a few boxes indicating the regions of data on the document. Now, crop each of these regions of interest and feed them into Tesseract.