Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

What? You partitioned a disk rather than just not decompressing some comically large file?


https://github.com/uint128-t/ZIPBOMB

  2048 yottabyte Zip Bomb

  This zip bomb uses overlapping files and recursion to achieve 7 layers with 256 files each, with the last being a 32GB file.

  It is only 266 KB on disk.
When you realise it's a zip bomb it's already too late. Looking at the file size doesn't betray its contents. Maybe applying some heuristics with ClamAV? But even then it's not guaranteed. I think a small partition to isolate decompression is actually really smart. Wonder if we can achieve the same with overlays.


What are you talking about? You get a compressed file. You start decompressing it. When the amount of bytes you've written exceeds some threshold (say 5 megabytes) just stop decompressing, discard the output so far & delete the original file. That is it.


I worked on a commercial HTTP proxy that scanned compressed files. Back then we would start to decompress a file but keep track of the compression ratio. I forget what the cutoff was but as soon as we saw a ratio over a certain threshold we would just mark the file as malicious and block it.


That assumes they're using a stream decompressor library and are feeding that stream manually. Solutions that write the received file to $TMP and just run an external tool (or, say, use sendfile()) don't have the option to abort after N decompressed bytes.


> Solutions that write the received file to $TMP and just run an external tool (or, say, use sendfile()) don't have the option to abort after N decompressed bytes

cgroups with hard-limits will let the external tool's process crash without taking down the script or system along with it.


> cgroups with hard-limits

This is exactly the same idea as partitioning, though.


> That assumes they're using a stream decompressor library and are feeding that stream manually. Solutions that write the received file to $TMP and just run an external tool (or, say, use sendfile()) don't have the option to abort after N decompressed bytes.

In a practical sense, how's that different from creating a N-byte partition and letting the OS return ENOSPC to you?


Depending on the language/library that might not always be possible. For instance python's zip library only provides an extract function, without a way to hook into the decompression process, or limit how much can be written out. Sure, you can probably fork the library to add in the checks yourself, but from a maintainability perspective it might be less work to do with the partition solution.


It also provides an open function for the files in a zip file. I see no reason something like this won't bail after a small limit:

    import zipfile
    with zipfile.ZipFile("zipbomb.zip") as zip:
        for name in zip.namelist():
            print("working on " + name)
            left = 1000000
            with open("dest_" + name, "wb") as fdest, zip.open(name) as fsrc:
                while True:
                    block = fsrc.read(1000)
                    if len(block) == 0:
                        break
                    fdest.write(block)
                    left -= len(block)
                    if left <= 0:
                        print("too much data!")
                        break


That is exactly what OP is doing, they've just implemented it at the operating system/file system level.


Those files are designed to exhaust the system resources before you can even do these kinds of checks. I'm not particularly familiar with the ins and outs of compression algorithms, but it's intuitively not strange for me to have a a zip that is carefully crafted so that memory and CPU goes out the window before any check can be done. Maybe someone with more experience can give mode details.

I'm sure though that if it was as simples as that we wouldn't even have a name for it.


Not really. It really is that simple. It's just dictionary decompression, and it's just halting it at some limit.

It's just nobody usually implements a limit during decompression because people aren't usually giving you zip bombs. And sometimes you really do want to decompress ginormous files, so limits aren't built in by default.

Your given language might not make it easy to do, but you should pretty much always be able to hack something together using file streams. It's just an extra step is all.


I honestly thought it was harder. It's still a burden on the developer to use the tools in the intended way so that the application isn't vulnerable, so it's something to keep in mind when implementing functionality that requires unpacking user provided compressed archives.


> it's intuitively not strange for me to have a a zip that is carefully crafted so that memory and CPU goes out the window before any check can be done

It's intuitively extremely strange to me!

Even ignoring how zips work: Memory needs to be allocated in chunks. So before allocating a chunk, you can check if the new memory use will be over a threshold. CPU is used by the program instructions you control, so you can put checks at significant points in your program to see if it hit a threshold. Or you can have a thread you kill after a certain amount of time.

But the way zips do work makes it a lot simpler: Fundamentally it's "output X raw bytes, then repeat Y bytes from location Z" over and over. Abort if those numbers get too big.


Isn’t this basically a question about the halting problem? Whatever arbitrary cutoff you chose might not work for all.


No, compression formats are not Turing-complete. You control the code interpreting the compressed stream and allocating the memory, writing the output, etc. based on what it sees there and can simply choose to return an error after writing N bytes.


Yes, and even if they were Turing complete, you could still run your Turing-machine-equivalent for n steps only before bailing.


Not really. It's easy to abort after exceeding a number of uncompressed bytes or files written. The problem is the typical software for handling these files does not implement restrictions to prevent this.


damn, it broke the macOS archiver utility.


Seems like a good and simple strategy to me. No real partition needed; tmpfs is cheap on Linux. Maybe OP is using tools that do not easily allow tracking the number of uncompressed bytes.


Yes I'd rather deal with a simple out of disk space error than perform some acrobatics to "safely" unzip a potential zip bomb.

Also zip bombs are not comically large until you unzip them.

Also you can just unpack any sort of compressed file format without giving any thought to whether you are handling it safely.


I'd put fake paper namers (doi.numbers.whatever.zip) in order to quickly keep their attention, among a robots.txt file for a /papers subdirectory to 'disallow' it. Add some index.html with links to fake 'papers' and in a week these crawlers will blacklist your like crazy.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: