Very interesting as usual from Fabrice Bellard, but I'm a little bit disappointe...

ggerganov · on March 17, 2023

I expect LibNC will be better in every aspect: performance, accuracy, determinism. But hopefully with time we will close the gap.

liuliu · on March 17, 2023

The comparison table from the ts_server site looks awesome though. I wish we could generate one for llama.cpp, unfortunately too busy with other things at the moment.

ggerganov · on March 17, 2023

Hey liuliu, would love if you join the project when you find the time - your work is really inspiring!

liuliu · on March 18, 2023

Thank you for the compliment! Definitely would love to contribute in some ways, once we have the benchmark, I have some ideas :)

hardwaresofton · on March 17, 2023

refreshingly humble take -- thanks for your hard work. The work you've done and put out in the open is massive.

vidarh · on March 17, 2023

ChatGPT is pretty good at disassembling x86, and is able to give reasonable descriptions of what the code is doing (e.g try "disassemble the following bytes in hex and explain what they appear to be doing: [bytes from a binary in hex]")

I'm curious how soon someone uses these models to effectively ruin the ability to use releasing binaries as an obfuscation method.

datpiff · on March 17, 2023

Using ChatGPT as a disassembler seems like a dumb idea when free disassemblers already exist. What possible advantage does it give?

vidarh · on March 17, 2023

The ability to explain the code, and extract higher level understanding. Disassembling into raw instructions is the most trivial part of reverse engineering an application. Hence "and explain what they appear to be doing" bit.

For the pieces I've tested, it often recognises the source language, and could give ideas about what the code was for and what it did.

datpiff · on March 17, 2023

> Disassembling into raw instructions is the most trivial part

So why not do it in the proven-correct tools and give ChatGPT the instructions?

I'm all for finding neat use cases but I wouldn't use an AI chatbot as a calculator...

vidarh · on March 17, 2023

You could do that too, but that is entirely missing the point, which is that ChatGPT is capable of inferring higher level semantics from the instructions and explain what the code is doing. You're getting hung up on a minor, unimportant detail.

datpiff · on March 17, 2023

Apparently the point is proving it's possible. Not making it useful.

vidarh · on March 17, 2023

No, that was not the point at all.

The point is that ChatGPT understands the code well enough to explain what it does, and so there's reason to wonder how soon someone leverages that in a disassembler to produce far better output to the point where using releasing "only" the binary as an obfuscation mechanism stops being viable.

E.g. additional value would be comments about purpose, labels for function entry points that makes sense for what the function does, labels for data that makes sense for what is stored there, comments with explanations of why things are structured the way they are.

Having reverse engineered programs from binary in the past, inferring good names and commenting the sources is the vast majority of the effort.

npace12 · on March 17, 2023

https://github.com/20urc3/Sekiryu

vidarh · on March 17, 2023

Thanks. That's at least the start of the type of thing I was thinking of.

datpiff · on March 17, 2023

>E.g. additional value would be comments about purpose, labels for function entry points that makes sense for what the function does, labels for data that makes sense for what is stored there, comments with explanations of why things are structured the way they are.

None of this requires giving it a binary. You are asking it to do 2 tasks, both of which it will do with some level of error. You could disassemble the binary near perfectly, for free. You have a hammer and everything looks like a nail.

vidarh · on March 17, 2023

It was an example. I've already pointed out to you you've gotten hung up on a minor, unimportant detail of an example. You're right, you can do the raw disassembly of the instructions by different means first. Which is entirely irrelevant to the point made. In other words: You're being incredibly obtuse and arguing against an imaginary point nobody made.

The recommendation was literally to try the prompt. The purpose of that recommendation was that if you tried it you might see some of the additional commentary ChatGPT adds, which is where the value was. I could have suggested you disassemble it first, but it seemed easier cut and paste the hex values. That is all.

Do you have anything you want to discuss about the actual point? If not we're done here.

circuit10 · on March 17, 2023

It will struggle with understanding small parts of big programs without seeing the full context, though maybe you could get around that by making it generate some sort of summary for itself or something like that

vidarh · on March 17, 2023

My tests on small parts of big programs suggests if anything that it does far better than I expected, but you're probably right that it would struggle with that for many things if you tried turning it into a bigger tool, and having it generate summaries is probably essential. While we can "fake" some level of memory that way, I really would like to see how far LLM's can go if we give them a more flexible form of memory...

volodia · on March 18, 2023

There are also equally easy to use open source systems, check out this one for example: https://github.com/kuleshov/minillm