It's a shame the author didn't do the same transformation, because it would de-correlate a lot of the error noise. You can see in the highest compression settings that the "MP3" image compression is smearing everything horizontally. If it used a zigzag transformation, it would be a more smeared both horizontally and vertically, but probably less visually bad.
You're right. I'm remembering a different codec — I think this is how H.264 orders DC coefficients for macroblocks. JPEG uses an actual 2D DCT, not a 1D DCT of a flattened block.
What if you try to compress the frequency components, scanned in zig-zag, using MP3 (without the first FFT like layers if they exists, I guess)? - if that even makes any sense...
That doesn't really make sense. MP3 takes inputs in signal-space, not frequency-space. You could run:
1. Divide image into blocks (as in JPEG),
2. Perform two-dimensional FFTs (as in JPEG),
3. Scan frequency components in zig-zag order (as in JPEG),
4. Run all of the steps of MP3 compression aside from the initial "split audio into blocks" and "perform FFTs" stages.
That would pretty much just give you a less efficient version of JPEG; both JPEG and MP3 take advantage of knowing how much each frequency component "matters" (i.e., how precisely it's necessary to encode the value to avoid artifacts noticed by humans), so using the MP3 quantization logic on frequency amplitudes from images would result in wasting bits by encoding certain amplitudes more precisely than is useful.
I'm not the parent, but I'd probably surmise that images are usually encoded in blocks to allow the image blocks to be decoded in parallel.
If you're asking why they use the zigzag (within a block) instead of a Hilbert curve, IIRC (quite fuzzy on this, so take it w/ a gran of salt and verify) the reason is that it allows for better spatial encoding (imagine having a ripple in one corner and going out - that's essentially what you want to encode w/ your DCT). Using a Hilbert curve would preserve locality, but I don't think it would line up with the spatial distribution of frequencies in an image.
The way the quantization matrix is set up is that most non-zero values end up in a corner, so the zig-zagging actually manages to be pretty good since the data is already suitably aligned.
https://en.wikipedia.org/wiki/JPEG#/media/File:JPEG_ZigZag.s...
It's a shame the author didn't do the same transformation, because it would de-correlate a lot of the error noise. You can see in the highest compression settings that the "MP3" image compression is smearing everything horizontally. If it used a zigzag transformation, it would be a more smeared both horizontally and vertically, but probably less visually bad.