Hacker News new | past | comments | ask | show | jobs | submit login
Zip Bomb (wikipedia.org)
342 points by takinola on Oct 5, 2012 | hide | past | favorite | 108 comments



I found a similar file to this (a zip file that contains itself) and e-mailed it to a friend at work. He never received it, but I thought nothing of it (I assumed the email filters just destroyed it).

A days later the mail server stops working and the sysadmin turns up at my desk. Turns out the anti-virus scanner had been unzipping and scanning repeatedly. It eventually filled up the entire disk and bad things happened.


That reminds me of an incident when I was in year 8: seeing how deeply nested I could get directories on Windows. H:\a\a\a\a\..., eventually it stopped working. (I played the game with my friend... he went for creating a new directory at each level, after a little I became sensible and went for copying and pasting, thus multiplying the depth by two each level which of course achieves the goal pretty quickly - so I won by a considerable margin.)

The school IT manager (who, incidentally, apart from this once I was always on good terms with) was rather annoyed at me the next day, for the nightly backup had fallen over the previous night and he had found the problem. You see, what to me was H:\ was \\galaxy\users$\chrism, which on that server was D:\users\chrism. So that 256-or-so character path became longer than 256 characters on the server and the backup software hadn't been written carefully enough to cope with what was a perfectly valid NTFS path, but not a valid path for the normal Win32 API function calls.

How was I to know it would do that?


Years ago, I was an applications programmer in San Francisco working on a timesharing language called EPS. The mainframes were all in Massachusetts. (Think dial-up terminals, mainframes, and 450 BAUD being considered fast.)

EPS, an interpreted language, was designed to let financial analysts run economic projections. It contained various kinds of arrays.

One day, the systems programmers announced a new feature, first released to the test mainframe: a new structure, called containers IIRC, with the twist being that any cell in a container could itself consist of a container.

You know where this is going:

    For I = 1 to 100
      ContX[1] = ContX
    EndFOR
10 seconds later, EPS on the test mainframe had crashed. Coincidence? Hmmmmmm.

So far, my cover was my inquiring mind. But then I ran the loop again, with the same results.

A minute later, there was Massachusetts on the phone, in the person of my friend Kevin, lead EPS developer: "Hey, the answer is 47. What the hell are you doing?"


Difference between hackers and coders -

Rule: "You can have at most 50 sub-directories."

coder-action:

     #define MAX_SUB_DIRECTORY_NUM 50
hacker-action:

    main(int argc, char *argv[]) {
      int i;
      for (i = 0; i < atoi(argv[1]); i++) {
          create_subdir( ... ); 
      }
    }
Plus experiments.

The difference is that the coder is just doing their job so they note the limitation and move on, the hacker is curious and trys to test to see if its a hard limit, a soft limit, a big problem, a little problem.


A good developer should be testing for that stuff too. But hackers test these things out of curiosity, developers do it out of attention to detail. A great developer will do both :)


My brother did this exact same thing (competing with his friend to try to make as many folders as possible), only he got a 1 week suspension for "hacking". The IT manager was really pissed and pressed the school administrators to make an example out of them.

I was outraged. Basically, the IT manager preferred to use punishment as his means of security, rather than actually doing his job.


Being able to break something =/= IT not doing their job.

Tossing a brick through a window doesn't mean that the window should have been thicker.


Except that this was not 'tossing a brick,' it was maybe knocking on a window to see what sound it makes, and the window then falling apart for no apparent reason.

There is nobody to blame for this, actually. Neither could the kid have known that this was bad (and the child-like curiosity is hardly something worth a punishment,) nor could the sysadmin really do anything to prevent it, except hang up a memo: please don't do that.

Though, in that case, you'd have some kids doing that over and over again out of a very different kind of curiosity.


I agree that we should not punish kids for expermentation (I am outraged when schools do anything other than encourage it). However, if playing around with a computer causes notable damage, I do have to wonder what would happen with viruses or someone who is actually malicius.


The NTFS 256-character-path bit me before when I moved a deeply-nested folder from somewhere with a short path (like the root of a disk) to somewhere with a slightly longer one (like my desktop) and then went to delete it. Windows doesn't even complain in this case--it just completely ignores all attempt to delete the folder (even if you, say, put it in the trash and then empty the trash) because it can't complete the process of gathering up the list of paths to delete.

It's very confusing when you're not expecting it--you can dig into the folder and delete all the subdirectories of it that don't have overlong paths just fine, but there'll be one series of empty directories left over that just refuse to go away. Then you flatten them out from their nested configuration, and suddenly the problem goes away.


Had this problem once as well. It's so weird that explorer doesn't use the API without the path length restriction...


One of many examples of why Windows's arbitrary path length restriction is ridiculous.


Most UNIX systems have a PATH_MAX. It's not just Windows.


PATH_MAX is defined but meaningless on both Linux and OS X.


Unfortunately, this is not true. The OS X kernel can't deal with paths longer than MAXPATHLEN, which is defined as 1024.


Perhaps, but Windows's is a bit too low.


While MAX_PATH is 260, most of the Unicode variants of the API functions allow for paths of 32,767 characters [1]. That seems like a decent length.

[1] http://msdn.microsoft.com/en-us/library/windows/desktop/aa36...


So there's a constant for path length that only applies sometimes...


This is also the case on Unix. According to Advanced Programming in the Unix Environment, it is true of many of these so-called limits. The POSIX standard defined many values that were too small for modern-day use, such as _POSIX_PATH_MAX set to 255. The non-posix values are not all defined in limits.h, and must be queried at runtime. Even then, some are indeterminate. But at least you're not relegated to 255 characters;)


Oh, that's somewhat reassuring. I'll surely benefit from it in 10 years time. :/


I'm not sure that I know what you mean. These functions aren't obscure and have been around for quite a while. For example: CreateFileW [1]. It's just good hygiene to use the Unicode variants and normalize to UNC where appropriate. This has been the case for years.

[1] http://msdn.microsoft.com/en-us/library/windows/desktop/aa36...


I mean that to maintain compatibility with legacy stuff I expect I won't be able to take advantage of it for quite a while.


The write limit should be lower than the read limit.


I had a worse problem when I helped my parents recover their backups. The backup itself had a fair amount of folder nesting, but was within the 256 character limit. At some point, they moved the backup into another folder that had the date, and computer for the backups. When they tried to access it, Windows gave a cryptic error message about corrupted files.


That reminds me of using the undelete utility that came around the Windows 3.1/DOS 6.2 sort of time. All the files it would offer to recover had file names like "testdoc.do?", and if you didn't set the final character it created a file with a wildcard in the name.


Do that, but then rename each folder to a very long name, and go back one directory up in the tree e.g. d:\h\h\h becomes d:\h\h\very_long_h then d:\h\verl_long_h\very_long_h, etc.

It was "simple" way to hide files, by making their names very long indeed.


>> ... eventually filled up the entire disk and bad things happened

I can only imagine what would have happened. Can you share more details about that.

Also, wonder how the mail servers these days are equipped to handle such attachments. Can someone throw light on that? Is it just plain simple to detect these files?


A well-known example of a zip quine: http://steike.com/code/useless/zip-file-quine/

Compression quines and bombs are a great way to screw up automated systems. Possessing or transmitting them can easily cause a denial of service.

From Wikipedia: A quine is a computer program which takes no input and produces a copy of its own source code as its only output. http://en.wikipedia.org/wiki/Quine_(computing)


Computational complexity attacks are actually quite rampant in various types of software. A good example is an "evil regex" which is usable on software that accepts regular expressions as input, and similarly costly regexes already contained in software can be exploited by certain crafted input to induce a DOS.

http://en.wikipedia.org/wiki/ReDoS


All of the email virus scanners I've used are aware of this sort of thing, and will have a maximum depth or maximum size for scanning within attachments. I don't think any of them try to "detect" them in any cleverer way.


So I should just place my malicious software deeper than n levels or put it in a huge file?

Or are those scanners just rejecting files that are too large or deep?


"The Grugq: I’m not joking. You don’t even need to do that. You just send an e-mail which says, you can literally just say, "Run this code." Some of the anti-phishing guys I’ve worked with are just shocked at what happens. I had some friends who worked in corporate security who had to do a cleanup after they got hit with e-mails which said literally, "click on this" and they had 10 or 20 people who did. It was less than 1 percent, but it was enough. People will do it and even on a locked-down corporate PC, it doesn’t matter. If you can get an HTTP connection back out to the Web, you can then tunnel in over that."

(The Grugq sells high value 0days and is a respected member of the hacking community) http://www.csoonline.com/article/216370/where-is-hacking-now...


That was how RSA was breached, which led to the eventual loss of the SecureID master key (and follow-on breeches at DoD suppliers).


What does RSA stand for? I was on their (SecurID) related site, and checked out the "about" page, but the acronym is never defined.


Initials of the three inventors (discoverers?) of the algorithm: http://en.wikipedia.org/wiki/RSA_(algorithm)


(Ron) Rivest, (Adi) Shamir, (Leonard) Adleman


You used to be able to just password protect the file, and instruct users to enter the password.

Some malware is remarkably unsophisticated and relied on users installing it and giving it permissions to run.

I hope they're not silently rejecting files.


You can't put it too deep. The scanner should stop attempting to unzip at a certain depth. Presumably, any file that has more than N depth is malicious and should get flagged, but who knows if the person that configured the scanner did id right?

I've worked with a leading commercial scanner that failed to respect the max depth parameter even when set. It would scan for days before we killed it.


This is why you set up monitoring. You could notice either a) the long-lasting cpu-eating subprocess or b)just the rapid diminishing of disk space.


When I did a databases class at a University a few years back, we were given a share on a webserver to run php programs on. Naturally they didn't turn off php error logging, and by default php doesn't report errors to the user. One of the students accidentally wrote an infinite-loop, that generated an error on each iteration, filling up the disk. They never did fix the problem, just deleted the log file both times it occurred.


That's hilarious -- the viros scanner, which is designed to protect the mail server, was what wound up destroying it in the first place.


The virus scanner is designed to protect the mail server's clients.


I've seen something similar with a PNG file for user supplied profile image [1]. The image was a 10000x10000 all black PNG image which compresses to a pretty small file size.

Unless you validate the image dimensions as well as the file size it may cause problems, for instance when GD is used to try to resize it exhausted the memory limit.

[1] https://bugs.launchpad.net/mahara/+bug/784978


You can do somewhat better if you host it on a server with gzip compression. Since the PNG has a max dictionary size for the compression, it doesn't optimally compress out all the redundancy. But because the left-over redundancy also forms a repeating pattern (since the black is the same all over the image), gzip shrinks it even further.

I got slightly better results even by doing this with a JPG image, probably because it's based on 8x8 blocks. I used the colour red, but I don't think that matters much.

Correction, looking back to my results, it seems the PNG was smaller after all: png32512.png.gz is 36,077 bytes (a 32000x32000 JPG gzips to about 41k). I forget how I came to the 32512x32512 limit, maybe it was by trial & error, the largest size a browser still opens (probably tested on Firefox and Opera, didn't use Chrome at the time).

I also asked some friends with powerful (lots of memory) computers to try out a webpage that would load this image many times, with unique GET parameters to prevent caching, but apart from loads of harddisk access and maxing the CPU for a bit until they closed the tab, nothing crashy happened (and of course I did inform them what could happen and told them to save any work).

Reliably crashing a browser on a sufficiently high-end (say, gaming) PC, I haven't been able to do it since at least 5 years or so. I might have done better if I'd own a high-end computer myself, of course :) I remember it used to be as easy as making a webpage with 200 full-page DIV layers stacked at 1% opacity :-P


This strikes me as being similar to the Black Fax attack [1] from years ago.

[1]https://en.wikipedia.org/wiki/Black_fax


Out of curiosity I just made two images:

15,000 x 15,000: http://i.imgur.com/WzCyE.png

50,000 x 50,000: http://i.imgur.com/kgmHu.png

Both FF and Chrome refuse to open the second one. IE does something weird. Both Opera and Safari figure out the size correctly, but don't display the image.


With Firefox (15.0.1) I get some really, really strange results with the second image.

When I opened it the first time, or everytime I press Ctrl+Shift+R it works, but it shows the URL, and a litte icon in the upper left corner: http://i.imgur.com/B7jFE.png

If I press F5 or Ctrl+R it doesnt work, just as you said.


Wow, the first one crashed my chrome browser. Thanks for sharing


In case safari crashes and you won't be able to open it, sudo rm -rf ~/Library/Caches/com.apple.Safari did it for me.


Here are some other compression curiosities:

(http://www.maximumcompression.com/compression_fun.php)

It includes a 115 byte rar file that expands to 5 Mb. (That 115 bytes can be squashed down further; one compressor gets it to 39 bytes.); a file that compresses with one software but ends up bigger with another software; etc.

some say that file compression is linked to AI - good general purpose compression relies on being able to predict the text and create table; if you can predict something you understand it.

The Hutter prize tests this against the 100 Mb of enwiki8. Best attempt so far is a bit less than 16 Mb.

(http://prize.hutter1.net/) I remember zip bombs from early 90s BBSing. I also remember ANSI bombs.



> By definition, the Kolmogorov complexity K of a string x is defined as the length of the shortest program (self-extracting archive) computing x. K(x) itself cannot be computed, only approximated from above, namely by finding better and better compressions, but even if we reach K(x) will never know whether we have done so. For a text string like enwik8, Shannon's estimate suggests that enwik8 should be compressible down to 12MB.

The current record is 15,949,688 bytes.



See also Russ Cox's "Zip Files All The Way Down" article:

http://research.swtch.com/zip


Even more impressive: A zip file that extracts to itself. (That's also shown in the article, as a kind of "Lempel-Ziv quine".)


I similar attack to mess up XML parsers:

http://en.wikipedia.org/wiki/Billion_laughs


Right, you can also do this with SVG images, the <g> group element and something called "xref" (IIRC?) to refer back to other defined groups by their id attribute.


lol ^ 10,000,000,000


lol * 10,000,000,000

lol ^ 10,000,000,000 is an insane amount larger. :)


Considering the nature of modern "art" (ex.: "here's a hard drive containing $5M in stolen software!"), and owning a "This T-Shirt is a Munition" (featuring the then-controversial RSA-in-4-lines-PERL code), I'm perversely inclined to find such a "zip bomb" small enough to print the hex or QR code on one business card. The 42kb file is a bit big; any known smaller versions?


(http://www.maximumcompression.com/compression_fun.php)

This has a 115 byte RAR file that expands to 5 Mb. You can probably experiment to get a file just small enough for QR code, with huge output.

(Note that using obscure compressor gives a 24 byte file that expends to 5 Mb.)


You can also use the same technique to cause a exhaustion on the number of files/inodes. Ie you format a ext4 partition with crazy number of inodes, then create the zip file containing crazy amount of 0-byte files. Then receiving side has a normal formatted ext3,ext4 partition. Exhaustion on number of inodes. This is not nice so don't do it.


Wonder what would happen if you hid a symlink pointing to root as one of the files. Someone without a doubt would rm-rf.


rm -rf doesn't follow symlinks:

    /tmp $ mkdir -p a/b/c
    /tmp $ mkdir -p a/d/e
    /tmp $ cd a/b/c
    /tmp/a/b/c $ ln -s /tmp/a/d .
    /tmp/a/b/c $ cd ../../
    /tmp/a $ ls */*
    b/c:
    d

    d/e:
    /tmp/a $ rm -rf b
    /tmp/a $ ls
    d
It would be pretty stupid for it to.


Detection of compression bimbs has improved alot as apposed to over 10 years ago when they realy did cause problems on mail servers. Home AV software detects them, crazily enough my install of GoLang on a windows box has a file that gets flagged as a compression bomb every full system scan.

But examples like this happen in many forms, heck windows on some file types/sizes doing thumbnails has done wonderous things like exponentialy growing the swap file to a ever impending churned slowdown.

Even computers have mental farts.


> crazily enough my install of GoLang on a windows box has a file that gets flagged as a compression bomb every full system scan.

This may have something to do with Russ Cox's blog post on recursive zip-archives "Zip Files All The Way Down" in Go: http://research.swtch.com/zip

Baseless speculation mode: There is a possibility that the recursive zip file was part of the Go test cases for the gzip package at some point. If it lingers in the mercurial commit history, it may still trigger hits from your AV software.


I just had to dig out my logs and see what it was - file in question is located in (default install):

C:\Go\src\pkg\regexp\testdata\re2-exhaustive.txt.bz2 385KB in size though opening shows a .txt file that is 58MB in size. Basicily Avast being picky and a non-positive. Probably so crompessed that it hit whatever limit on decompressing per file in avast and avast then things its a compression bomb. Opens and extracts fine, though hardly fun reading.


Web browsers support compressed data, I wonder will they try to decompress something like this?


Gzip could certainly be abused.


Yes they will, there was a post a while back which used this to bomb the browser.


Perhaps more concerning is being able to use this to launch a denial of service attack on a server that accepts zipped data. Gzipped requests are unusual with HTTP (no idea how widespread support for it is), but iirc SPDY is compressed by default.


Maybe not at the transport or protocol level, but it wouldn't be too hard to DoS an application server that handles compressed data, such as images.

Make a billion-pixel PNG image that compresses very well, upload several copies simultaneously to a LAMP server running on an average Linode, and watch it run out of memory while trying to create thumbnails with GD.


PHP usually has a pretty reasonable memory limit set, so it would puke on itself pretty quickly.

But I don't think you'd bring the site down.


Fair enough, but I've been on Linode's forums long enough to have seen dozens of people running 50 PHP processes with 128MB memory limit each, on a 1GB server shared with MySQL and a bunch of other crap. (It seems that 128MB is the new "reasonable memory limit" these days, since that's how much RAM it takes for PHP to handle photos from 8-to-12-megapixel cameras and smartphones.)


Also one can try to bomb virus scanners (on PC or on a mail server (using e-mail attachment)) or any other service which supports uploading+extracting zip files.


Do you have a link, by any chance?


Unfortunately I do not. This is where that project to store all your browsing history "in the cloud" (preferrably your own "cloud" would have been handy. So you never again miss something and go searching for it again, you know youve seen it -> search in your own history which contains the content right then as it was.


The most authoritative-looking reference

http://www.aerasec.de/security/advisories/decompression-bomb...

in this typically thinly referenced Wikipedia article looks to be several years old. What is the current state of the art? Some of the comments already posted as I post this comment talk about the situation "years ago" and at least one comment suggests that this is largely a solved problem, currently. How many wild vulnerabilities like this are there, really?

(I ask questions like this about most "facts" reported in Wikipedia articles, because I am a Wikipedian myself, and I have become painfully aware of how often the "the free encyclopedia that anyone can edit" becomes "the encyclopedia in which every fact is just made up.") From a neutral point of view, is this really much of a problem in day-by-day computer use and online network use?


This seems like a ripe opportunity for an enterprising blogger to examine changes that have been made to how we compress things over the past few years and see if any of these changes impacts the potency of the decompression bomb as a kind of, as you said, 'update to the current state of the art'.


Comp sci folks: Is predicting whether a compressed file will produce a finite (or, better, reasonably-sized) output roughly equivalent to the halting problem?


It depends on the decompression algorithm. It's possible for that to be the case, but this can only happen if the compressed binary format is essentially a Turing-complete language, for which your decompresser is the interpreter.

I'm not aware of any data formats for which that is the case, but from a theoretical standpoint, eval(s) is a perfectly cromulent decompression algorithm. This fact is essentially the starting point for Kolmogorov complexity.

"Reasonably-sized" is actually an interesting problem in itself. If your decompresser is sufficiently advanced, you could embed a busy-beaver function, which terminates but grows faster than any computable function. I have no idea whether such functions could be expressed with less-than-Turing-complete data formats.


RAR has a built in virtual machine for forwards compatibility with new compression algorithms.


No, because run-length encoding encodes the - well - run lengths in the file header. You can read those and know how big the resulting file will be without having to actually decompress the file.


While this is out of the range of most consumers, I wonder if any bored sysadmins with a new storage system to test have tried unzipping that file...


It'd be easier to do something like cat /dev/urandom > big


/dev/zero is probably faster


It is way faster (at least with dd):

  $ time dd if=/dev/zero of=10MB.dat  bs=1M  count=10

  real    0m0.213s

  $ time dd if=/dev/urandom of=10MB.dat  bs=1M  count=10

  real    0m8.873s


Or if you want to measure the speed of the source itself:

  $ dd if=/dev/zero of=/dev/null bs=1M count=100
  104857600 bytes (105 MB) copied, 0.0237114 s, 4.4 GB/s

  $ dd if=/dev/urandom of=/dev/null bs=1M count=100
  104857600 bytes (105 MB) copied, 21.501 s, 4.9 MB/s
Also dammit Ubuntu with your Gibis.


Or

  $ truncate  -s 17TB hugefile.dat
...which is just as pointless.


I believe truncate creates sparse files in Linux so that would not work.


I've tried it, and I'm no sysadmin. Fortunately, I had a disk quota on, set by my sysadmin, and so the bomb could only take up 2 gb of the space I had in my quota.


Welcome to 1988?


Old as fuck.


>Old as fuck.

So is algebra, and yet, every year millions of people learn it for the first time.


But they don't immediately rush out to tell the world the "News"


Yes they do. Every child that learns anything will tell the world about it - or at the very least everyone in their family.


So? It's still fun to tell people about it. http://xkcd.com/1053/


It may be old but i didn't knew about it until it was brought up on a /r/technology thread yesterday.

Old =/= everyone knows about it.


The fact that you don't do your homework doesn't mean what this guy is doing is righteous.


Hello,

Your new account has rapidly got a bunch of down votes. HN likes constructive thoughtful comments. Aggressive comments, even if correct, will likely get down votes.

"Old as fuck" is going to be down voted to oblivion, and risks the account being hell-banned.

"This is very old. I'm surprised HN readers are not already aware of it" may get down votes (it doesn't add anything to the conversation) but will probably be tolerated.

"This is very old. Here are some similar things / here's the theory behind it / here's an example of more modern versions / etc" will probably get a few up votes.

HN probably benefits more from people visiting the New tab and up voting good stories (and flagging the spam) than from people posting comments like "Old as fuck" on items they don't like.

Welcome to HN! (Though I suspect with a username like Trool that yours is a throwaway account.)


I don't really have time to google everything or to go on and ask everyone I meet: "Hey .. how are you? Do you know some cool stuff about computers and IT that you could share with me... see I'm trying to learn new stuff".. This is why i just follow some forums and communities based around Comp Sci (i.e. hacker news). So I will agree to disagree with you; i think that this is a very interesting link and a good thread; I learn new stuff I wouldn't have discovered otherwise.

If you knew about this.. you could have just ignored it and not waste your time commenting on it. Let it be useful for the n00bs (i.e. me).


trool is troll. </surprise>


Combination of tool and troll?


Next week on HN, C++ released! :)


Was thinking something simular... First time I discovered this I was a teenager still using dial-up. And apparently the virus-scanner on the school-pc's couldn't handle it :)


Same here! Except the one I had created a huge amount of folders with weird characters (∆µßßßßçøø∑) in the root of my c drive. Made my dad's computer slow to a crawl. Also took a bitt of scripting to undo it before he came home.


Inception


Awesome link, but already seen it on reddit yesterday. HUH




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: