Vx32: portable, efficient, safe execution of untrusted x86 code

comex · on Oct 1, 2016

Fairly similar to Native Client [1], whose originally paper was released the next year (2009): they both rely on x86 segment registers, for example. A core difference is in how they guarantee that the guest instruction stream contains no dangerous instructions such as system calls - which is hard, because x86 instructions are variable-length and unaligned, so you have to avoid the situation where the guest code jumps to an address which is in the middle of some legitimate-looking instruction, and the processor interprets the bytes starting there as a different instruction. Direct jumps can be validated ahead of time, but indirect jumps can't - including all function returns. Native Client prevents this by requiring the sandboxed code to be compiled with compiler passes that align all valid targets of indirect branches to a given alignment, and insert mask instructions before indirect branches themselves; then it validates that no instruction streams starting at any aligned offset do anything dangerous. Vx32, on the other hand, wants to be able to run semi-arbitrary existing x86 code, so it has to address this with a layer of indirection. Rather than just validating instructions, it translates each basic block to a modified set of instructions - essentially an x86-to-x86 emulator. Indirect jumps are translated to a hash table lookup (mapping original code addresses to their corresponding translated versions), which achieves safety at the cost of significant slowdown in some cases.

[1] http://static.googleusercontent.com/media/research.google.co...

userbinator · on Oct 2, 2016

such as system calls

This sounds like something the CPU hardware should be handling, as x86 has 4 privilege levels ("ring 0 through "ring 3") while most OSes today only use 0 and 3. 3 could become "really untrusted" while what used to be in 3 moves to 2.

johncolanduoni · on Oct 2, 2016

There are some problems with that that have arisen due to the long disuse of the other privilege levels:

1. The fast methods for system calls (syscall/sysret/sysenter/sysleave) completely ignore these privilege levels and can only perform transitions between 0 and 3. That means you have to use interrupts which are slow, and may be even slower than 0/3 interrupt transitions because the processors aren't used to dealing with them.

2. You can't make much use of them for x86_64 programs, since these disable segment based protection and the x86_64 page tables (you guessed it) only have a single bit to select privilege level of a page. Somebody that remembers the Intel manuals better can hopefully inform us if you can use them in x86 compatibility mode under a 64-bit kernel, but I'm going to guess you'll have some wrinkles here.

I would be very surprised if these two issues don't kill any performance gains you would get from avoiding the recompilation step.

bonzini · on Oct 2, 2016

All x86 page tables have a single bit for page tables, not just 64-bit ones.

comex · on Oct 2, 2016

I agree with what you seem to be getting at, that Native Client and VX32 are essentially a hack: but to do it right, you don't actually need hardware support, only kernel support. After all, user processes - on all common architectures, not just x86 - are already fully isolated from the rest of the system; their only methods of communication are system calls and other triggerable exceptions (e.g. segfault), and the kernel controls the handlers for all exceptions. In theory, all you need to run untrusted code safely, even portably (across OSes if not CPU architectures), is a kernel API to run a process without direct access to the kernel's syscall layer - e.g. the kernel could forward all attempts to invoke syscalls to a configurable handler. In fact, this sort of exists already in the form of seccomp on Linux.

One drawback to a fully hardware based approach is that you can only trap instructions the hardware lets you trap. On x86, for example, CPUID is not in that category (for normal ring 3; see below about VMX), so you can't prevent the untrusted code from learning about what kind of CPU it's running on. Nor is CLFLUSH, an instruction to flush memory from cache to RAM, which is not supposed to be dangerous - but turned out to make it a lot easier to exploit the rowhammer bug on vulnerable systems. Native Client originally allowed CLFLUSH, but was updated to block it once the vulnerability was revealed. (That said, CLFLUSH is/was not strictly necessary to exploit rowhammer, and in fact someone wrote an exploit that worked from a JavaScript VM; the only way to fully prevent it is to fix the RAM refresh rate.)

By the way, there is also VMX, hardware virtualization, which both Linux and macOS (but not Windows AFAIK) allow unprivileged processes to use at will. While traditionally used to run full operating systems, which in theory should be safe too but requires exposing a relatively large amount of hardware surface area to the guest - there's nothing preventing you from having your own mini kernel and running untrusted code in ring 3 inside the VM. This provides multiple advantages: VMX allows trapping CPUID, faults from ring 3 can be handled by the mini kernel without a context switch, more control over various bits of the execution environment, etc. Too bad it's often disabled fully and/or not supported if you happen to already be inside a VM (because nested VMX, while possible, requires some software emulation and incurs a speed penalty)...

geofft · on Oct 2, 2016

> both Linux and macOS (but not Windows AFAIK) allow unprivileged processes to use at will

Only sort of true on Linux. Most Linux distributions make /dev/kvm non-world-accessible because it's a good source of security issues (see e.g. http://www.ubuntu.com/usn/USN-2417-1/ ); the KVM driver isn't quite hardened against people who are trying to compromise the host kernel instead of actually make a VM. PolicyKit often gives access to the current logged-in desktop user, but that's precisely because those processes aren't quite unprivileged (e.g., processes running as the logged-in desktop user can usually shut down the machine or prevent it from sleeping without authentication).

Which leads to an interesting point: doing this in software, as Vx32 and Native Client do, fails safe. Since it's just regular user code, it can't possibly do things that regular user code can't do, and you can belt-and-suspenders it with an extremely tight seccomp policy on the emulator (as Chrome does). If you do this at the OS level via seccomp directly, and the OS gets it wrong, it fails open (e.g., CVE-2009-0835), but still shouldn't allow execution of non-user-mode code. If you do this at the CPU level via privilege rings, the CPU isn't very likely to get it wrong -- but if it does it fails very open (i.e., into a privileged ring) and is the hardest to patch.

bonzini · on Oct 2, 2016

It's absolutely not true that we don't care about KVM host vulnerabilities. KVM survived a good deal of fuzzing with only a handful of trivially fixed NULL-pointer dereference oopses found (including one which turned out to be a bug in a completely different part of the kernel) and no privilege escalations.

Most distros actually make /dev/kvm world-accessible; you are confusing that with virt-manager requiring PolicyKit authentication by default (that's because networking is better integrated if libvirtd runs as root), but for example GNOME Boxes doesn't

geofft · on Oct 2, 2016

That must be a relatively recent change, then - several years ago I think everyone did care quite as much, there just were more vulnerabilities. I'm not insinuating this is because of anyone not caring. :-) It was just a fact that there were a bunch of CVEs.

For instance, Debian stable makes it 664 root:kvm, and a bug I opened a bunch of years ago to change that got wontfixed: https://bugs.debian.org/640328 Is it time to request reconsideration?

bonzini · on Oct 4, 2016

Perhaps it is, but mjt (the Debian maintainer) is pretty stubborn...

bonzini · on Oct 2, 2016

It's possible to do that with KVM, running the untrusted guest as a user-mode program in a guest and trapping system calls into the hypervisor. The cost of a system call would be about 6000 clock cycles.

gue5t · on Oct 2, 2016

Thank you for such a clear and informative comment.

zvrba · on Oct 2, 2016

> Vx32 [...] Rather than just validating instructions, it translates each basic block to a modified set of instructions - essentially an x86-to-x86 emulator.

Nice summary, thanks. Reminds of valgrind though.

majke · on Oct 2, 2016

People don't understand that vx32 allows you to implement scheduler in userspace. It's not 1-to-1 mapping between host process and guest process (like NaCL for example).

With vx32 you can have many-guest "processes" in one host process.

This is totally unique.

4ad · on Oct 2, 2016

In fact Russ Cox (the author of vx32) has ported the Plan 9 kernel to vx32, so you can run a whole Plan 9 instance under Linux or what have you.

et1337 · on Oct 2, 2016

My friend and I have a crazy idea that in the future, all songs will be binary executable code running in a sandbox similar to Vx32 or NaCl. This would allow you to edit parameters and change the song to fit the rest of your playlist.

The next step is to keep the binary locked away on a server and stream only the resulting audio to the client, and suddenly you have a major piracy disincentive.

jfoutz · on Oct 2, 2016

Follow your dreams. The harder it is to listen to music, the more likely I am to listen to a file from a kid who held a microphone to a speaker. Or not even bother.

Make the experience easier than file sharing and you win. I don't mind paying for music, but if I'm paying, I don't want any stupid hoops.

vidarh · on Oct 2, 2016

That's similar to how things was on the C64. There were no standard formats for music, but a defacto "standard" for drivers to play them, where people would write their custom driver routines so that e.g. if you loaded the track at $1000, you'd call $1000 to initialise it, then call $1003 once per vertical blank. Some drivers would have parameters that could be adjusted. A lot of the time the driver remained identical for a lot songs but many composers had their own that they kept tuning from track to track.

> The next step is to keep the binary locked away on a server and stream only the resulting audio to the client, and suddenly you have a major piracy disincentive.

Except we'd just capture the audio. It achieves nothing.

Consider that in the 80's, we'd commonly tape things by connecting a tape recorder to our radio, and pres play+record at just the right time, or even by putting a taper recorder in front of the speakers.

svantana · on Oct 2, 2016

While in essence I think it's an interesting idea, there's nothing that stops it from happening right now, except lack of market demand. It's also going to be moot as audio analysis technology improves (I'm looking at you Deep Learning) -- we'll be able to parse audio tracks into their semantic/perceptual components and thus any form of editing will be easily applied.

catern · on Oct 2, 2016

Maybe I don't understand. Is this a joke? Is it not already perfectly possible to edit a song and make it fit your playlist? And how would streaming the audio from a binary be a disincentive for piracy, except inasmuch as on-demand streaming is more convenient than piracy?

et1337 · on Oct 2, 2016

Sure you can edit a finished, mixed song, but it takes a lot of skill, and realistically no one's going to do it without stems. If a song is an executable, it can expose user-friendly adjustable parameters. So one song could have infinite variations. A pirate could record one of those variations and share it, but that's much less valuable than the executable / stems.

wallacoloo · on Oct 2, 2016

What kind of adjustable parameters? I'll admit I'm intrigued by the prospect of songs that vary slightly on each play-through (e.g. slightly different drum fills, different solos, etc), but I suspect this is distinct from what you're suggesting.

I'm also of the belief that piracy of digital arts is largely a cultural thing & that attempts to prevent it by force will never be more than marginally successful at best. That said, I have zero evidence to back it up. It's something I'd like to investigate, but I don't know how.

jack1243star · on Oct 2, 2016

An example would be dynamic soundtracks in games, which can change during gameplay. (New Doom, FTL, etc.)

rzzzt · on Oct 2, 2016

Around 10 years ago, Digimpro came out with something similar; they introduced a custom audio format and a standalone player that let you change playback parameters and switch between alternative tracks: http://web.archive.org/web/20051215034341/http://www.digimpr...

Don't know what happened to their technology later on.

djsumdog · on Oct 2, 2016

I totally read it as a joke. Wait...it's..it's not a joke? I'm so confused.

johncolanduoni · on Oct 2, 2016

What is the difference between this and, say, qemu's user mode emulation? IIRC qemu (for both system and user emulation) uses Tiny Code Generator in a similar manner when not using hardware virtualization.

Is it just a different API geared towards a different purpose, or are there significant differences in the implementation (e.g. a greater focus on security)?

ris · on Oct 2, 2016

QEmu's user mode emulation is not sandboxed AFAIK and can't run in-process (of the controlling process).

mankash666 · on Oct 1, 2016

How does this compare to Google's Nacl/Pnacl

Lerc · on Oct 2, 2016

I made a wrapper for a NPAPI plugin for it back in the day.

Back then I was trying to make a decent performing system out of the XO-1. Flash performance on the XO-1 was terrible so sandboxed in-browser native seemed quite appealing.

This video shows some of the things I was working on at the time. https://youtu.be/58UmxHryq8E?t=157 The time offset jumps to the VX32 part.

anonymousDan · on Oct 1, 2016

Looks like it was released in 2008 so you could probably just check the related work section of the nacl paper (which I think waskre recent).

SixSigma · on Oct 2, 2016

if you didn't notice, plan9 was ported to run under this : 9vx [1]

which you might find in your Linux package manager, e.g. AUR on Arch [2]

[1] https://swtch.com/9vx/

[2] https://wiki.archlinux.org/index.php/9vx

yiyus · on Oct 2, 2016

That port of Plan9 is quite old and will not work with the newest sources. It has received may updates since them. I added a good bunch of features as part of a GSoC project.

Nowadays, it is being maintained by David du Colombier at github: https://github.com/0intro/vx32

SixSigma · on Oct 2, 2016

I still use "the blacksmith eats with a wooden spoon"

SixSigma · on Oct 2, 2016

thanks yiyus - maht

lightedman · on Oct 1, 2016

I've had this for over two decades.

It's called a laptop I don't mind reformatting.

MaulingMonkey · on Oct 2, 2016

Firmware malware? Hopefully it's a proper laptop you don't mind never entering an important password into.

geofft · on Oct 2, 2016

How do you tell whether you need to reformat it?

lightedman · on Oct 3, 2016

You give it an internal VLAN and watch for the malware trying to use it (assuming you disabled your physical network card on-system first before executing the code.)

It's a system set up to act like it's got internet connections, but it does not. You use sneakernet to transfer files.

geofft · on Oct 3, 2016

What if the malware generates files that infect other machines?