The size of an image is in the header right at the front, so arguably it's not that hard to avoid the reflow. Just prefetch the headers. This is already done for video files and audio files that have the relevant info.
It's still ideal to put the sizes inline if you can, though.
For the people who are downvoting, some more details on how this would work from a person who actually works on web platforms and standards:
Along with crypto and improved compression, HTTP2 offers multiplexing. HTTP also has historically offered range requests, where you can request a small subrange of a file.
This means that a browser could collect image URLs when doing its initial parse of the page (pretty easy, since you want a small, simple page for AMP anyway) and then issue a set of requests for the headers of all the images - for PNG, at least, you know the header will be of known, relatively small size. The header requests can all be multiplexed by HTTP2, which means that you will be able to get all the headers before you start getting the full bodies of the images. You won't have to download the headers or images twice since you can download the rest of the image bodies with another range request. Range requests could also be cached under this scenario.
This optimization can be implemented (in theory) by every browser engine and doesn't require changes to developers' content or web servers. And it would be an improvement that is web-wide instead of just focused on AMP.
you can do even better with http2. on a request to get a page the server can just optimistically send the image headers to the client up front with the page text then backfill the image bodies in whatever order the client wants
It's still ideal to put the sizes inline if you can, though.