Can you provide some image difference results between the reference images and the resulting super-resolution output? That would help to visualize what sort of structure, if any, is introduced by the super-resolution process.
The "leopard spots" example is particularly interesting in how the super-resolution just hallucinates seemingly similar textures which can be completely different from the actual texture in the reference patch. (Such artifacts have been specifically pointed out in the context of analogous deep learning approaches applied to medical images).
It is not structure per se (although in the case of the leopard there is structure). It is an illicit inference.
Given a diffusion process (as blur) there are an infinite number of initial states which converge to a specific final state. This is the nature of diffusion (think of it as: as long as the total initial energy is the same, the final state of a diffusion process is “homogeneous density of energy”). So inferring an “initial state” is totally invalid.
Edit: look at the 64 to 256 to 1024 example in the “Unconditional…” section and take a look at the artifact on the bottom left (to the viewer) teeth and lip. If that is not an artifact… Same on the top-right teeth.
Also: how does the algorithm know it is facial hair and not just makeup? It might be both but it generates facial hair.
I'm curious to know if the various blemishes (acne scars, moles, etc...) are there in the true images. Also, the braids in one of the pictures don't look as simple/contiguous as I think real braids would be.
Still, it's very cool how it fills in realistic looking details.
Thanks, what I am asking for is a visualization on the pixel level which shows the difference between the reference image, and the associated super-resolution image, as is often used to highlight the minimal differences in adversarial examples that confuse image classifiers. For example, see: