So how do you detect these attacks?

33a · 2025-09-08T23:54:51 1757375691

We use a mix of static analysis and AI. Flagged packages are escalated to a human review team. If we catch a malicious package, we notify our users, block installation and report them to the upstream package registries. Suspected malicious packages that have not yet been reviewed by a human are blocked for our users, but we don't try to get them removed until after they have been triaged by a human.

In this incident, we detected the packages quickly, reported them, and they were taken down shortly after. Given how high profile the attack was we also published an analysis soon after, as did others in the ecosystem.

We try to be transparent with how Socket work. We've published the details of our systems in several papers, and I've also given a few talks on how our malware scanner works at various conferences:

* https://arxiv.org/html/2403.12196v2

* https://www.youtube.com/watch?v=cxJPiMwoIyY

Yoric · 2025-09-11T07:26:34 1757575594

So, from what I understand from your paper, you're using ChatGPT with careful prompts?

ATechGuy · 2025-09-09T02:55:59 1757386559

You rely on LLMs riddled with hallucinations for malware detection?

jmb99 · 2025-09-09T03:21:14 1757388074

I'm not exactly pro-AI, but even I can see that their system clearly works well in this case. If you tune the model to favour false positives, with a human review step (that's quick), I can image your response time being cut from days to hours (and your customers getting their updates that much faster).

ATechGuy · 2025-09-09T14:59:34 1757429974

You are assuming that they build their own models.

Culonavirus · 2025-09-09T05:25:56 1757395556

He literally said "Flagged packages are escalated to a human review team." in the second sentence. Wtf is the problem here?

ATechGuy · 2025-09-09T14:59:03 1757429943

What about packages that are not "flagged"? There could be hallucinations when deciding to (or not) "flag packages".

orbital-decay · 2025-09-09T15:16:24 1757430984

>What about packages that are not "flagged"?

You can't catch everything with normal static analysis either. LLM just produces some additional signal in this case, false negatives can be tolerated.

ATechGuy · 2025-09-09T15:25:33 1757431533

static analysis DOES NOT hallucinate.

Twirrim · 2025-09-09T16:27:47 1757435267

So what? They're not replacing standard tooling like static analysis with it. As they mention, it's being used as additional signal alongside static analysis.

There are cases an LLM may be able to catch that their static analysis can't currently catch. Should they just completely ignore those scenarios, thereby doing the worst thing by their customers, just to stay purist?

What is the worst case scenario that you're envisioning from an LLM hallucinating in this use case? To me the worst case is that it might incorrectly flag a package as malicious, which given they do a human review anyway isn't the end of the world. On the flip side, you've got LLM catching cases not yet recognised by static analysis, that can then be accounted for in the future.

If they were just using an LLM, I might share similar concerns, but they're not.

tripzilch · 2025-09-11T09:20:26 1757582426

well, you've never had a non-spam email end up in your spam folder? or the other way around?

when static analysis does it, it's called a "misclassification"

wiseowise · 2025-09-09T07:02:50 1757401370

> We use a mix of static analysis and AI. Flagged packages are escalated to a human review team.

“Chat, I have reading comprehension problems. How do I fix it?”

atanasi · 2025-09-10T15:13:57 1757517237

Reading comprehension problems can often be caught with some static analysis combined with AI.

Mawr · 2025-09-09T05:03:19 1757394199

"LLM bad"

Very insightful.

veber-alex · 2025-09-08T21:32:15 1757367135

AI based code review with escalation to a human

Yoric · 2025-09-08T22:17:09 1757369829

I'm curious :)

Does the AI detect the obfuscation?

33a · 2025-09-09T01:27:12 1757381232

It's actually pretty easy to detect that something is obfuscated, but it's harder to prove that the obfuscated code is actually harmful. This is why we still have a team of humans review flagged packages before we try to get them taken down, otherwise you would end up with way too many false positives.

Yoric · 2025-09-09T05:44:27 1757396667

Yeah, what I meant is that obfuscation is a strong sign that something needs to be flagged for review. Sadly, there's only a thin line between obfuscation and minification, so I was wondering how many false positives you get.

Thanks for the links in your other comment, I'll take a look!

nurettin · 2025-09-15T04:22:25 1757910145

I think that would be static analysis. After processing the source code normally (looking for net & sys calls), you decode base64, concatenate all strings and process again (until decode makes no change)

justusthane · 2025-09-08T22:37:16 1757371036

Probably. It’s trivial to plug some obfuscated code into an LLM and ask it what it does.

spartanatreyu · 2025-09-08T23:24:13 1757373853

Yeah, but just imagine how many false positives and false negatives there would be...