>Nuitka is less reliable than PyInstaller, but harder to reverse engineer for pr...

j-krieger · on March 26, 2023

GPT4 can not extrapolate real information from less. Reverse engineering large obfuscated binaries is like unpixelating an image. It’s just guessing.

Still immensely useful, but it does have it’s limits.

duskwuff · on March 26, 2023

TBH, an LLM might be decent at code identification -- looking at some assembly and saying "that looks like a CRC32 hash", for example. That's a task that dovetails fairly well with its strong pattern-matching abilities. Making larger statements about the structure and function of an entire application is probably beyond it, though.

Moreover, it's likely to fail in any sort of adversarial scenario. If you show it a function with some loops that XORs an input against 0xEDB88320, for example, it would probably confidently identify that function as CRC32, even if it's actually something else which happens to use the same constant.

flangola7 · on March 27, 2023

All the real information is already in the binary, no guessing is necessary. It takes data, processes it through a set of defined steps, and outputs it. Both the C code, the assembly code, and the obfuscated assembly code, express the same fundamental conceptual object.

If you have a good enough model with a large enough token window to grasp the entire binary, it will see all of those relations easily. GPT-4 already demonstrates ability in reverse engineering, and GPT-5 is underway which if it as powerful of a generational jump as 3 to 4 will advance these abilities tremendously.

supriyo-biswas · on March 26, 2023

I am skeptical that reverse engineering will be taken over by LLMs. At the very least, most LLMs aren't trained to work in an adversarial environment, which is what reverse engineering is.

Moru · on March 26, 2023

And just like SEO we will have people tailoring their code to fool the AI.

remexre · on March 26, 2023

Does this actually work for nontrivial functions, e.g. a hashtable lookup function?

anonym29 · on March 26, 2023

GPT 3.5 can write most of the infrastructure and "scaffolding" for a full ransomware campaign, but has absolutely no idea how to perform the most basic cryptographic operations even when explicitly instructed on which library and method to use, and will just confidently spit out absolute bullshit that only sorta vaguely resembles what you're looking for - it's like asking a nine year old. Struggles with writing any kind of obfuscation methods beyond base64, string splitting, and XORing too - I have asked it dozens of times and it's never managed to get close to a trivial implementation not using those, even when directly told to do exactly that.

Haven't played with GPT4 yet. Need to try that as well as larger LLaMA models on a rented cloud GPU box sometime. I have a full battery of tests covering writing malware, identifying vulnerabilities a la static analysis, fixing those vulnerabilities, exploiting those vulnerabilities, in a variety of languages, as well as a few generic / assorted technical tasks.

Some other things GPT 3.5 sucks at, in addition to implementing cryptography and obfuscating code:

- Writing ASCII Art

- Writing HTML, CSS with any kind of graphical instructions, even very simple ones like "draw a car using HTML5 and CSS" or "draw the Facebook logo in HTML and CSS"

- Incomplete solutions. Example: when asked to find all of the vulnerabilities in a block of code that contains three or four, it'll confidently list one and say that's the only vulnerability. If you argue with it and insist that there's more, it'll find another, and then insist it found all of them and apologize for missing it the first time. Then you will ask again and it'll say "nope, there are no more vulnerabilies in this code".

- False negatives until told explicitly. Example, you can show it a code block containing a more low-level or exotic vulnerability (e.g. TOCTOU) than your ordinary SQL injection or XSS, ask it if there are any vulnerabilities, and it'll confidently say none over and over. Then you ask if it's vulnerable to a TOCTOU attack and it can finally then realize, oh yeah, the variable X retrieves this value for this comparison on line Y but then retrieves the value again when it passes it to this other function on line Z and if the value changes during that time, it could pass the bounds check on line Y but be invalid when checked again on line Z... which is great that it gets it, right until you realize that you basically have to ask it over and over again for every specific type of vulnerability, and even then, some it'll still miss altogether.

At the level of work expected in big tech companies, I can see GPT 3.5 augmenting or supplementing some outsourced junior consultants, but it's not even adequate to replace them outright, to say nothing of seniors, principals, and true domain experts, at least in security.

j-krieger · on March 26, 2023

Not when I tried.

flangola7 · on March 27, 2023

It's in the Microsoft paper. We might require the 32k token model to really handle it.

remexre · on March 27, 2023

I didn't see it in https://cdn.openai.com/papers/gpt-4.pdf -- which paper are you referring to? (Or if I missed it, what page number?)