The linked paper shows how the klein bottle maps onto the space. Basically, grids which have an orientation (horizontal or vertical stripes) are more common and join together to form a klein bottle. That tendency towards orientation might be an artefact of the camera or might be because cameras are usually held horizontal or vertical relative to the sky.
The second (expository) paper is pretty readable. I have only the barest grasp of algebraic topology but I believe I have a rough high-level understanding of what it is doing. It would be interesting to implement their barcode program independently to test whether or not I really understand.
The comment is interesting. He? argues that any 2 manifold with no edges would be likely to end up as a Klein bottle. The question is then why a 2 manifold. Well, the thing we're asking about is a 2d picture, right? Maybe that has something to do with it. Let's suppose we tried the same trick using 3x3x3 pixel representations of STL models — might that turn out to be form a simple edge less 3 manifold embedded in 27 dimensional space? (perhaps for simplicity we could use 2x2x2 cubes and look at 8 dimensional space... In essence we're talking about a population of 2d data arbitrarily but consistently mapped into higher dimensional space and we discover it maps to a 2 manifold.
Are you talking about things like sensor noise and chromatic aberration? It would be interesting to see if downsampling the image beforehand affects the result.
However, it's hard to separate image patterns from camera structure insofar as linear projection is a result of camera structure.
I was thinking about CFA mosaic and JPG compression, I think these may introduce some axis aligned artifacts. But maybe they took it into account (using raw format?) or effect is not relevant in this case.
Even in raw format, all digital cameras apply some amount of sharpening [1] even when the setting is "off" in the camera menu. Also, all raw format conversion software (Lightroom, Capture One, etc.) applies sharpening by default.
I could imagine that a sharpening algorithm could transform a random distribution into something with structure. That the authors appear to not reference camera or image sharpening anywhere in the paper is somewhat worrisome.