Very interesting as usual from Fabrice Bellard, but I'm a little bit disappointed this time, because libnc is a closed source DLL. Nevertheless it will be interesting to compare it to the amazing work of Georgi Gerganov: GGML Tensor Library. Both are heavily optimized, supports AVX intrinsics and are plain C/C++ implementation without dependencies.
The comparison table from the ts_server site looks awesome though. I wish we could generate one for llama.cpp, unfortunately too busy with other things at the moment.
ChatGPT is pretty good at disassembling x86, and is able to give reasonable descriptions of what the code is doing (e.g try "disassemble the following bytes in hex and explain what they appear to be doing: [bytes from a binary in hex]")
I'm curious how soon someone uses these models to effectively ruin the ability to use releasing binaries as an obfuscation method.
The ability to explain the code, and extract higher level understanding. Disassembling into raw instructions is the most trivial part of reverse engineering an application. Hence "and explain what they appear to be doing" bit.
For the pieces I've tested, it often recognises the source language, and could give ideas about what the code was for and what it did.
You could do that too, but that is entirely missing the point, which is that ChatGPT is capable of inferring higher level semantics from the instructions and explain what the code is doing. You're getting hung up on a minor, unimportant detail.
The point is that ChatGPT understands the code well enough to explain what it does, and so there's reason to wonder how soon someone leverages that in a disassembler to produce far better output to the point where using releasing "only" the binary as an obfuscation mechanism stops being viable.
E.g. additional value would be comments about purpose, labels for function entry points that makes sense for what the function does, labels for data that makes sense for what is stored there, comments with explanations of why things are structured the way they are.
Having reverse engineered programs from binary in the past, inferring good names and commenting the sources is the vast majority of the effort.
>E.g. additional value would be comments about purpose, labels for function entry points that makes sense for what the function does, labels for data that makes sense for what is stored there, comments with explanations of why things are structured the way they are.
None of this requires giving it a binary. You are asking it to do 2 tasks, both of which it will do with some level of error. You could disassemble the binary near perfectly, for free. You have a hammer and everything looks like a nail.
It was an example. I've already pointed out to you you've gotten hung up on a minor, unimportant detail of an example. You're right, you can do the raw disassembly of the instructions by different means first. Which is entirely irrelevant to the point made. In other words: You're being incredibly obtuse and arguing against an imaginary point nobody made.
The recommendation was literally to try the prompt. The purpose of that recommendation was that if you tried it you might see some of the additional commentary ChatGPT adds, which is where the value was. I could have suggested you disassemble it first, but it seemed easier cut and paste the hex values. That is all.
Do you have anything you want to discuss about the actual point? If not we're done here.
It will struggle with understanding small parts of big programs without seeing the full context, though maybe you could get around that by making it generate some sort of summary for itself or something like that
My tests on small parts of big programs suggests if anything that it does far better than I expected, but you're probably right that it would struggle with that for many things if you tried turning it into a bigger tool, and having it generate summaries is probably essential. While we can "fake" some level of memory that way, I really would like to see how far LLM's can go if we give them a more flexible form of memory...