More

leijurv · 2025-06-17T19:08:09 1750187289

OP did actually host a tracker.

"I then started the tracker. After about an hour, it peaked at about 1.7 million distinct torrents across 3.1 million peers!"

leijurv · on Dec 30, 2024

Here's the submission that won the Hutter Prize in 2021: https://github.com/amargaritov/starlit It uses a LSTM to predict the next token lossily, then uses https://en.wikipedia.org/wiki/Arithmetic_coding to convert that to lossless compression. Lossless compression can definitely leverage a lossy compressor, such as via arithmetic coding. Also see: https://en.wikipedia.org/wiki/Context-adaptive_binary_arithm... which has a simple "Example" section - imagine if the top prediction made by your neural network was correct, you emit "0", if the 2nd was correct, you emit "10", if the 3rd, "110", if the 4th, "1110". As you can see, this is lossless, but the fundamental prediction is lossy, and the better that prediction is, the better the compression. (In actuality, you wouldn't waste your 1 bits like this, you'd use arithmetic coding instead).

willvarfar · on Dec 30, 2024

Yes, one way to think about arithmetic compression is encoding the difference between the prediction and reality.

This isn't normally what people mean by lossy compression, though. In lossy compression (e.g. mainstream media compression like JPEG) you work out what the user doesn't value and throw it away.

vlovich123 · on Dec 31, 2024

That’s a stretch to call it lossy. To my eye the purpose of the LSTM seems indistinguishable from a traditional compression dictionary.

And that still doesn’t show how lossless compression is tied to intelligence. The example I always like to give is, “What’s more intelligent? Reciting the US war of independence Wikipedia page verbatim every time or being able to synthesize a useful summary in your own words and provide relevant contextual information such as it’s role in the French Revolution?”

tshaddox · on Dec 31, 2024

These lossless compression algorithms don’t just recall a fixed string of text. Obviously that can be accomplished trivially by simply storing the text directly.

These lossless compression algorithms compress a large corpus of English text from an encyclopedia. The idea is that you can compress this text more if you know more about English grammar, the subject matter of the text, logic, etc.

I think you’re distracted by the lossless part. The only difference here between lossy and lossless compression is that the lossy algorithm also needs to generate the diff between its initial output and the real target text. Clearly a lossy algorithm with lower error needs to waste fewer bits representing that error.

vlovich123 · on Jan 2, 2025

There’s no immediately obvious reason that you should be able to come up with a diff correction to apply to recreate loseless that is more efficient than traditional compressors. In essence you’re still stuck with the idea that you have a compression dictionary and trying to build a more efficient dictionary. It’s not clear there’s a link between that and intelligence.

canjobear · on Dec 30, 2024

This is standard lossless compression. None of the concepts particular to lossy compression (like rate-distortion theory) are used.

leijurv · on Sept 7, 2024

`for char in message: encrypted_char = ord(char) ^ (shared_secret[0] % 256)`

This is not real encryption, it picks only one byte of shared secret and XORs it into the plaintext. Therefore, there are only 256 possible decryption keys to check, which is trivial.

Instead, you'd want to use the shared secret as a key to something strong and symmetric like AES.

thechao · on Sept 7, 2024

Any idiot knows not to use power-of-two! You gotta use "+13", which is prime and, therefore, *secure*.

BobbyTables2 · on Sept 8, 2024

And Twice is nice!

Jerrrrrrry · on Sept 9, 2024

and more than thrice increases your chances of collision by

tptacek · on Sept 7, 2024

I don't think it's meant to be real encryption.

leijurv · on Sept 7, 2024

I suspect it was, given that they've now deleted their comment.

leijurv · on April 19, 2024

It was a one-way collaboration, in that we referenced their discoveries and code such as LattiCG https://github.com/mjtb49/LattiCG, but they were unaware of anything we were doing until now. https://twitter.com/admiral_stapler/status/17806748612594609...

Naming Baritone after Fit is actually a coincidence / joke, the repo github.com/cabaletta/baritone was the result of random brainstorming for something untaken. We only later realized it described Fit and thus added that to the readme :)

leijurv · on April 19, 2024

I'm just a regular SWE! Infosec or algorithmic trading - maybe someday.

tptacek · on April 19, 2024

This is you? You're fully qualified.

leijurv · on April 19, 2024

Yes, absolutely :) that's why we went to FitMC to make the video, he always delivers.

ro_bit · on April 19, 2024

Between nocom and this I'm sure at least a dozen people who had no idea about reverse engineering are going to eventually have a career in it thanks to his videos, even if I find them incredibly cheesy. His videos have a habit of reaching and engrossing all sorts of people who otherwise wouldn't really care about minecraft server exploits, and maybe that will inspire some of them to learn more

Thanks for the writeup!

leijurv · on April 18, 2024

I believe that may be the spectral test https://en.wikipedia.org/wiki/Spectral_test which I mentioned in the explanation when showing the lattices visually

leijurv · on Nov 25, 2023

HN thread: https://news.ycombinator.com/item?id=29615428 :)

leijurv · on Aug 12, 2023

No that's not at all how this works. The iterator could "return" a boolean by passing a boolean to "yield". The return value of "yield" just indicates if the generator should continue, or stop because the actual loop has exited (due to break or return).

ashton314 · on Aug 12, 2023

Oh fie you’re right and I misread that. Thanks for correcting me. I’ve updated my comment.

leijurv · on July 3, 2023

This inspired me to plumb the depths of FizzBuzz, seeking further into it than anyone ever has before: the 10^10000000000th digit (it's a "1"): https://github.com/leijurv/reverse-fizzbuzz

saagarjha · on July 3, 2023

Ah yes, the hyper log log log algorithm