Hacker News new | past | comments | ask | show | jobs | submit login

This exact code appears online as part of the IOCCC 1986 (it was a submission), so it's likely that this was indeed part of the training set for this LLM and that there is a significant corpus of text discussing this particular program and other obfuscated programs like it.

I'm not ruling out that this LLM output is "partially organic" rather than "fully regurgitated", but I'd be much more interested to see this LLM explain an obfuscated program that hasn't been floating around the Internet for 35 years.




Even if it's part of the training data and the LLM is just a better search engine, how would I have figured out what the code does without an LLM? I certainly can't paste this into Google.

I mostly agree with the stochastic parrot interpretation, but that doesn't undermine the usefulness or impressiveness. Even if it's just a highly compressed search index, that level of compression is amazing.


> how would I have figured out what the code does without an LLM

Start by find-and-replacing those #defines. You can iteratively deobfuscate things by hand. It's PITA and takes time, but it's doable.

If you hit a roadblock, run it in a VM.


It's easy to test it with something unpublished


My experience is that ChatGPT does a very poor job writing Brainfuck programs for me, even simple programs like "add two and two" aren't correct. Maybe it would do better if I asked it to explain one instead.


In my experience, LLMs are poor at working with unpopular languages -- probably because their training data does not contain a lot of examples of programs written in those languages, or explanations of them.

They do much better with popular languages.


> They do much better with popular languages.

So, in other words, they perform precisely how you’d expect a stochastic parrot to perform?

The more popular the language the more likely the training corpus includes both very similar code samples and explanation of those code samples, and also the more likely those two converge on a “reasonable” explanation.

Ask it something it’s likely to have seen an answer for and it’s likely to spit out that answer… interesting? Sure, impressive? Maybe… but still pretty well captured by “a fuzzy jpeg of the web”.


"So, in other words, they perform precisely how you’d expect a stochastic parrot to perform?"

Or exactly like you'd expect a human to perform.

Train a human mostly on English, and they'll speak English. Train them mostly on Chinese, and they'll speak Chinese.


> Train a human mostly on English, and they'll speak English. Train them mostly on Chinese, and they'll speak Chinese.

Ahh, but ask a human a question in a language they don’t understand and they’ll look at you with bewilderment, not confidently make up a stream of hallucinatory nonsense that only vaguely looks statistically right.

> Or exactly like you’d expect a human to perform.

Not exactly, no… but with just enough of the uncanny valley to make me think the more interesting thought: are we really not much more than stochastic parrots? Or, in other words, are we naturally just slightly more interesting than today’s state of the artificially stupid?




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: