Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It is very common for malware to contain java script payloads that try to obfuscate themselves like like this:

Seemingly_random_code(seemingly_random_string)

The seemingly_random_code decompresses/decodes whatever is in the seemingly_random_string and hands over control to it. Interestingly the decoded code is another version of the same with different code and string. This goes on for ~100 layers deep then at the end it just downloads and executes some file from the net.



It’s amazing how much we haven’t moved on since iloveyou.txt.vbs


> This goes on for ~100 layers deep then at the end it just downloads and executes some file from the net.

I understand doing one layer. I guess I could maybe see two layers. But why would it bother with 100 layers? Either the antivirus or reverse-engineering tool can grab the final product or it can't.


Typically scanning tools have some limit to how much they probe complex formats, to avoid stalling the entire system while they're scanning. It's very much conceivable that a scan tool will try to resolve code like this for 10 layers, and then if the result is not found to be malicious, consider it safe.

This is similar to how compilers will often have recursion limits for things like generics, though in that case it's easier to reject the program if the recursion limit is reached.


Because of potential false positives, and the speed at which files need to be analyzed at runtime (suspend process executing it and then analyze it), having files which take a long time to unpack and identify can cause these to be allowed to run. They get offloaded to a sandbox or other systems to be analyzed while the file is already being executed. The sandboxes are too slow to return a verdict before the main logic of the file will be executed. IF those dynamic systems cannot identify a file, an engineer will manually need to look at it.

In very strict environments or certain systems it might be practical to block all unknown files, but this is uncommon for user systems for example where users are actively using javascript or macro documents etc. (developers, HR, finance etc.) The FP rates are too high and productivity can take a big hit. If all users do 20% less work that's a big loss in revenue (the productivity hit can be much more severe even!). perhaps this impact / loss of revenue ends up being bigger than a malware being executed depending on the rest of the security posture/measures.

technically its possible to identify (nearly?) all malware by tracking p-states/symbolic execution/very clever sanboxing etc.. but this simply takes much too long. Especially if the malware authors are aware of sandboxing techniques and symbolic execution and such things as they can make those processes also take extra long or sometimes even evade them totally with further techniques.

I wish it _was_ possible to do all of the cleverness that malware researchers have invented to detect things, but unfortunately, in practice this cannot happen on like 90+% of environments.

If you run like a DNS server or such things, it's possible to do it as such a system would not be expected (ever?) to have unknown files. (gotta test each update and analyze new versions before deploying to prod). As you can imagine, this is also kind of a bummer process but imho for such 'static' systems its worth it.


With enough conditional evals() with dynamic inputs you can make the search space unsearchable big.


The search space is linear as the algorithm is linear.


This stuff is mostly done to make static analysis harder.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: