Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> zlib is 23k lines.

The zlib format includes uncompressed* chunks, and CRC is only non-trivial if you're also trying to do it quickly, so a faux-zlib can be much, much smaller.

(I don't recall if I've done this with PNG specifically, but consider suitably crafted palettes for byte-per-pixel writing: quick-n-dirty image writers need not be much more complex than they would've been for netpbm)

* exercise: why is this true of any reasonable compression scheme?



I've done this. For a project where I didn't want any external dependencies, I wrote an uncompressed PNG writer for RGBA8 images in a single function. It's just over 90 lines of C++:

https://github.com/a-e-k/canvas_ity/blob/f32fbb37e2fe7c0fcae...


The "compressed" file may end up larger than the original?


why not? most formats have some headers and some kind of frames with data (additional headers)


> why is this true of any reasonable compression scheme?

Any? I wouldn't say that. If you took LZ4 and made it even simpler by removing uncompressed chunks, you would only have half a percent of overhead on random data. A thousandth of a percent if you tweaked how it represents large numbers.


TIL. IIUC, LZ4 doesn't care about the compression ratio (to which you are correct I had been alluding) but does strongly care about guaranteeing a block maximum size. (so still the same kind of concern, just on an absolute and not a relative basis)


Just simplify it further. Get rid of the implicit +4 to the match size. 0-15 instead of 4-19. Now you can guarantee any block size you want.

If you wanted to go even simpler, here's an entire compression format described in one line:

one byte literal length, one byte match length, two bytes match offset, 0-255 literal bytes, repeat




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: