Hacker News new | past | comments | ask | show | jobs | submit login
Zip Files All The Way Down (swtch.com)
92 points by l0stman on March 18, 2010 | hide | past | favorite | 35 comments



Emailing that zip file would probably break most mail based virus checkers. I doubt the people who wrote them anticipated an infinite zip file. Too large, sure, but infinite? Probably not.


I have a terabyte of zeros in a zip file somewhere. Virus checkers don't really have a problem with it, though.


what did you use that for?


A guess it's always good to have a stash. In case you run out of them.


I don't know what good an archive of zeroes is, unless he's also got an archive of ones to draw from as well.


Please consult Feynman's Lectures on Computer Science and see the errors of your way.

(Zeros are as good as ones and converting doesn't cost high-quality energy. It's deleting information. Thus going from an unknown state to a defined state cost high-quality energy, i.e. has to increase entropy elsewhere.)


That's why I have two zip files, one with zeros, one with ones.


It's always good to have some spare, in case you run out and have 0.


Harassing virus checkers?

  :; gunzip -l tb1.gz ; gunzip tb1.gz; mv tb1 tb1.gz
         compressed        uncompressed  ratio uncompressed_name
              43863               55982  21.7% tb1

  :; gunzip -l tb1.gz ; gunzip tb1.gz; mv tb1 tb1.gz
         compressed        uncompressed  ratio uncompressed_name
              55982             2568864  97.8% tb1

  :; gunzip -l tb1.gz ; gunzip tb1.gz; mv tb1 tb1.gz
         compressed        uncompressed  ratio uncompressed_name
            2568864          1067044016  99.8% tb1

  :; gunzip -l tb1.gz ; gunzip tb1.gz; mv tb1 tb1.gz
         compressed        uncompressed  ratio uncompressed_name
         1067044016                   0   0.0% tb1
This made my little machine unhappy.


I imagine that compresses pretty well with RLE.


Most virus scanners will only scan compressed files 4-7 levels deep. Its configurable in Norton, mcaffee, and clamav.


I've had zipped files that are too compressed break mail-based virus checkers. Their typical response is to kill the checking process if it takes too much time/memory and not deliver the mail.


> typical response

What's your sample size?


Hmm, about 5. No more than 10.


Then please do name "the mail-based virus checkers that kill the checking process if it takes too much time/memory and not deliver the mail".

I spent several years in a company that developed this sort of products and I am not aware of any commercial products that do what you described. I am guessing you are referring to the mail gateways integrated with FOSS virus scanners, and still I doubt the behavior you described is typical even between them. From what I saw the restrictions were implemented in the scanners themselves, and the mail that could not be readily processed was always quarantined, and not simply "not delivered".


There is 42.zip[1], which is 42 kB zipped, and 4.5 PB unzipped.

[1]: http://www.unforgettable.dk/


I look forward to the day when someone with FU money buys a few hundred 10TB hard drives (prerequisite: someone develops 10TB hard drives), sets up a RAID array, and extracts 42.zip completely, just for fun.


Maybe that's how Google breaks in a new datacenter.


Does it contain the Ultimate Question?


"r.zip contains an executable file. For security reasons, Gmail does not allow you to send this type of file."

And I was trying to play a nice little trick on a few of my friends with that one, too...


BTW, if you ever need to send an exe via gmail, zip it then change the file extension to something like ".google". Gmail lets it right through.


And that's probably a good thing. People who know how to change extensions (which are hidden by default in windows), are probably less likely to fall victim to running random files.


This brings back memories of trying to crash BBS uploaded file verifiers by sending massively compressed files that would crash the board when unzipped. Ahhh the memories..


In college we used to do nasty things to the PBX system. It was smart enough to detect and prevent forwarding a number to itself -- if you did it using the internal 4-digit extension. But if you forwarded to yourself via 9+direct-dial-number, it couldn't detect that.

So we'd forward the phone that way, then call it from another line. You'd hear a sequence of clicks, progressively quieter (why? isn't it all digital?), and then it would just stop, presumably because all external lines were used up.

Farther off topic, the phone system had a weird feature: dialing #*5 would say back to you 3 (apparently) random digits. This was good for chemistry lab, if you needed to fudge some data and wanted to introduce a believable error factor into it.


It gets quieter because of the transmission loss plan. Historical reasons, basically.

Since analog connections lose power with distance, there was defined an acceptable maximum loss for each segment in the network. If a line came in too hot, a "pad" was added to dissipate some power at the receiving end. If it was too low, the circuit was out of spec and had to be re-engineered.

At the other end of the telephone switch, an amplifier was added to jack up all the outgoing signals to the proper amplitude for the transmission network.

All this amplification and loss was necessary to make sure that all telephone calls had about the same volume. The network didn't all switch over to digital switching and transmission in one day (in fact there are some analog interoffice circuits still in use to this day in Alaska), so we retained some vestiges of the loss plan.

It's pretty likely that your PBX system connected to the network over analog lines anyway. :)


Quines are easy in Python. A 0-byte python source file outputs itself when run. :)


also a palindrome


My favorite quine is in Factor:

  [ [ dup curry ] dup curry ]
(Credit to Slava Pestov)

When called, it puts [ dup curry ] on the stack, then creates a copy of it on the stack. curry then takes the second block from the stack and prepends it to the first. Rather than being a program that evaluates to its code, it's a code block that evaluates to itself (similar to most Scheme/Lisp quines).

(Off topic note: I originally read the title of the article as "Zip Flies All The Way Down" and was wondering what zipping my fly down had to do with hacker news.)


Note the relevance to HN and the y-combinator, since this represents a fixed point for the zip function.


But--does it represent the least fixed point?

And isn't it rather the fixed point of the unzip function? (Is `zip' even a function (i.e. is `unzip' injective?))


When I'm writing programs for myself, sometimes I make the perl program write itself out as the config file with a config hash as the __data__. When the program runs, it tests if the config file doesn't exist or is an older version of the program. I don't know if it has any real use, but it's fun to write :D


Do you cheat (i.e. reading back the source file), or do you write real quines?


I read back the source file so it still contains the comments.


Real quines can contain comments, too.


This is really bad on Safari, which attempts to unzip archives automatically. Whatever you do don't download the zip if you have it set to unarchive automatically.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: