JPEG transforms 2D images into 1D arrays using a "zigzag" ordering of 8x8 blocks...

cperciva · on April 17, 2017

JPEG transforms 2D images into 1D arrays using a "zigzag" ordering of 8x8 blocks.

No. The zigzag ordering is applied to the frequency components, not to the image pixels.

pslam · on April 17, 2017

You're right. I'm remembering a different codec — I think this is how H.264 orders DC coefficients for macroblocks. JPEG uses an actual 2D DCT, not a 1D DCT of a flattened block.

xorblurb · on April 17, 2017

What if you try to compress the frequency components, scanned in zig-zag, using MP3 (without the first FFT like layers if they exists, I guess)? - if that even makes any sense...

cperciva · on April 17, 2017

That doesn't really make sense. MP3 takes inputs in signal-space, not frequency-space. You could run:

1. Divide image into blocks (as in JPEG),

2. Perform two-dimensional FFTs (as in JPEG),

3. Scan frequency components in zig-zag order (as in JPEG),

4. Run all of the steps of MP3 compression aside from the initial "split audio into blocks" and "perform FFTs" stages.

That would pretty much just give you a less efficient version of JPEG; both JPEG and MP3 take advantage of knowing how much each frequency component "matters" (i.e., how precisely it's necessary to encode the value to avoid artifacts noticed by humans), so using the MP3 quantization logic on frequency amplitudes from images would result in wasting bits by encoding certain amplitudes more precisely than is useful.

amelius · on April 17, 2017

Good point. Do you have any idea why they didn't opt for a Hilbert curve, like others suggested here?

haneefmubarak · on April 17, 2017

I'm not the parent, but I'd probably surmise that images are usually encoded in blocks to allow the image blocks to be decoded in parallel.

If you're asking why they use the zigzag (within a block) instead of a Hilbert curve, IIRC (quite fuzzy on this, so take it w/ a gran of salt and verify) the reason is that it allows for better spatial encoding (imagine having a ripple in one corner and going out - that's essentially what you want to encode w/ your DCT). Using a Hilbert curve would preserve locality, but I don't think it would line up with the spatial distribution of frequencies in an image.

ygra · on April 17, 2017

The way the quantization matrix is set up is that most non-zero values end up in a corner, so the zig-zagging actually manages to be pretty good since the data is already suitably aligned.