Hacker News new | past | comments | ask | show | jobs | submit login

IDK why, but BZIP2 seems to do somewhat better that other compression algorithms for natural language text:

    $ curl https://www.gutenberg.org/cache/epub/11/pg11.txt | bzip2 --best | wc
        246    1183   48925
Also, ZSTD goes all the way up to `--ultra -22` plus `--long=31` (4GB window— Irrelevant here since the file fits in the default 8MB anyway).



You can use a preshared dictionary with bzip2 and zstd so you can get that down alot by using different dictionaries depending on certain rules. I dont know if it helps with literature but I had great success in sending databases with free text like that. In the end it was easier to just use one dictionary for everything and just skip the rules.




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: