Hacker News new | past | comments | ask | show | jobs | submit login

GPT 3.5 can write most of the infrastructure and "scaffolding" for a full ransomware campaign, but has absolutely no idea how to perform the most basic cryptographic operations even when explicitly instructed on which library and method to use, and will just confidently spit out absolute bullshit that only sorta vaguely resembles what you're looking for - it's like asking a nine year old. Struggles with writing any kind of obfuscation methods beyond base64, string splitting, and XORing too - I have asked it dozens of times and it's never managed to get close to a trivial implementation not using those, even when directly told to do exactly that.

Haven't played with GPT4 yet. Need to try that as well as larger LLaMA models on a rented cloud GPU box sometime. I have a full battery of tests covering writing malware, identifying vulnerabilities a la static analysis, fixing those vulnerabilities, exploiting those vulnerabilities, in a variety of languages, as well as a few generic / assorted technical tasks.

Some other things GPT 3.5 sucks at, in addition to implementing cryptography and obfuscating code:

- Writing ASCII Art

- Writing HTML, CSS with any kind of graphical instructions, even very simple ones like "draw a car using HTML5 and CSS" or "draw the Facebook logo in HTML and CSS"

- Incomplete solutions. Example: when asked to find all of the vulnerabilities in a block of code that contains three or four, it'll confidently list one and say that's the only vulnerability. If you argue with it and insist that there's more, it'll find another, and then insist it found all of them and apologize for missing it the first time. Then you will ask again and it'll say "nope, there are no more vulnerabilies in this code".

- False negatives until told explicitly. Example, you can show it a code block containing a more low-level or exotic vulnerability (e.g. TOCTOU) than your ordinary SQL injection or XSS, ask it if there are any vulnerabilities, and it'll confidently say none over and over. Then you ask if it's vulnerable to a TOCTOU attack and it can finally then realize, oh yeah, the variable X retrieves this value for this comparison on line Y but then retrieves the value again when it passes it to this other function on line Z and if the value changes during that time, it could pass the bounds check on line Y but be invalid when checked again on line Z... which is great that it gets it, right until you realize that you basically have to ask it over and over again for every specific type of vulnerability, and even then, some it'll still miss altogether.

At the level of work expected in big tech companies, I can see GPT 3.5 augmenting or supplementing some outsourced junior consultants, but it's not even adequate to replace them outright, to say nothing of seniors, principals, and true domain experts, at least in security.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: