Deterministic Replay of QEMU Emulation

jester1337 · 2024-08-29T07:13:10 1724915590

I remember we were working on this exact topic at my University chair ~8-10 years ago. I think it never fully took off. Several Master students worked on it for a while. I like that it's now in QEMU!

waschl · 2024-08-29T07:45:45 1724917545

Tried to apply it for debugging on my own OS, but couldn’t get it finally running after several days of trial and error…

https://github.com/jbreu/jos?tab=readme-ov-file#reverse-debu...

bbarnett · 2024-08-29T08:30:20 1724920220

Qemu is great, the DEVs and all who worked on her deserve applause, except for documentation. It's like someone creating a huge Japanese titanium indestructible fighting robot, but then using aluminum in the feet/heels.

So much of my qemu work spent on randomly changing options, with no change documentation, discovering features, with no documentation, options with no reason or indication why, manpages out of date, READMEs not updated, changelog not there, etc.

bonzini · 2024-08-29T10:39:19 1724927959

Documentation is not great I admit. The problem is that we don't have anyone who is a capable tech writer in the team. It's not something that you can improvise.

However, all incompatible changes are documented and also announced at least 8 months in advance.

https://www.qemu.org/docs/master/about/removed-features.html

https://www.qemu.org/docs/master/about/deprecated.html

It may seem like there are many, but in practice they are in very old, mostly unused or very badly designed corners. For example configuration of audio was overhauled last year, and is now the same as basically all other backends (e.g. -audio pa,model=sb16; compare with -nic user,model=e1000 for a network card).

justinclift · 2024-08-29T11:16:39 1724930199

As a general thought, would it be possible to put out a "call for tech writers" post or similar on the front of the qemu website, or even a prominent blog post in the blog section?

bonzini · 2024-08-29T12:17:41 1724933861

Yes, I guess it would be an idea. We could also participate to Season of Docs.

dev-n · 2024-08-29T15:04:04 1724943844

bonzini I'm a QEMU fan ... and a techwriter. Is there a way to send you an email? There's no real contact option on qemu, other than IRC.

guerby · 2024-08-30T10:30:41 1725013841

Paolo Bonzini email is available here:

https://github.com/qemu/qemu/blob/master/MAINTAINERS#L134

justinclift · 2024-08-30T11:08:22 1725016102

Stable url: https://github.com/qemu/qemu/blob/cec99171931ea79215c79661d3...

bonzini · 2024-08-29T17:38:59 1724953139

pbonzini@redhat.com :) thanks very much!

justinclift · 2024-08-29T12:35:48 1724934948

Sounds like it'd be a useful avenue for closing a major (non-code) problem with the software. :)

cedws · 2024-08-29T08:45:41 1724921141

Agreed, the CLI in particular is a complete mess.

bonzini · 2024-08-29T10:45:56 1724928356

If there are parts that specifically you'd want to have better documentation for, please let me know here.

Generally we've been moving command line towards a scheme where each option describes an aspect of either the guest (a device, the board type, the CPU model) or the interface to the host (a file holding the contents of the disk, the network bridge to attach to, how to show graphic contents), with some options providing both as a shortcut (for example -nic, -audio, -serial).

junon · 2024-08-29T08:07:31 1724918851

Yeah QEMU's story for this sort of thing is pretty rough around the edges for OS dev. Wishing there was something like Unicorn-but-with-devices for making osdev tooling.

SoothingSorbet · 2024-08-29T12:45:04 1724935504

There's also panda[1], but I never got it working myself. I share your frustration, as it would help greatly with debugging, especially with nondeterminstic bugs. I likewise never got QEMU's record/replay to work.

[1] https://github.com/panda-re/panda

majke · 2024-08-29T08:33:36 1724920416

This is a big deal. With some tooling around it can be amazing.

I can think of using this for testing, and as a vehicle to change a programming paradigm of existing/legacy software (run a thing, and roll it back aggressively from outside of a vm)

m000 · 2024-08-29T14:39:32 1724942372

Indeed, the tooling is the problem. And I wouldn't hold my breath to see this tooling being implemented, as the feature has been around for quite a bit.

IMHO, PANDA [1] remains a better/more practical choice for whole-system record/replay analysis. It already offers quite a bit of tooling (including a python interface), as well as hooks to build your own. It does have its own shortcomings (speed and not being in-sync with the latest QEMU), but at least you're not limited to gdb-based debugging.

[1] https://panda.re/

darby_nine · 2024-08-29T15:57:44 1724947064

This is the central premise of Antithesis: https://antithesis.com (no affiliation)

repelsteeltje · 2024-08-29T07:27:42 1724916462

I think this is awesome.

While it might seem like a small feature, it opens a huge door. It's similar to what reproducible build infrastructure has done for finding bugs, attestation that binary matches source, immutability, etc.

Can imagine this is useful for finding bugs in hardware designs too.

justinclift · 2024-08-29T09:19:31 1724923171

Anyone have clear ideas/guidelines for how much ram/disk/etc this is likely to need for a "reasonable" capture?

Say capturing a Qt application as it corrupts its internal state during startup, in order to work out what's corrupting its internal state?

jraph · 2024-08-29T09:29:14 1724923754

I don't know, however a key element is:

> Record/replay system is based on saving and replaying non-deterministic events

> The following non-deterministic data from peripheral devices is saved into the log: mouse and keyboard input, network packets, audio controller input, serial port input, and hardware clocks (they are non-deterministic too, because their values are taken from the host machine). Inputs from simulated hardware, memory of VM, software interrupts, and execution of instructions are not saved into the log, because they are deterministic and can be replayed by simulating the behavior of virtual machine starting from initial state.

So, it's probably not much, you can probably comfortably save minutes of qemu sessions.

Also note the existence of the rr debugger [1], which allows you to reverse debug applications with a ~10% performance hit while recording. To achieve this, it records results of syscalls (only). It will serialize thread events, so have the effect of running applications like on a single core CPU.

[1] https://rr-project.org/

londons_explore · 2024-08-29T09:28:25 1724923705

I'm gonna guess not that much - easily doable on a typical desktop.

If it were a problem, you can skip recording your emulated machines bootup process, and simply take a snapshot when you're about to start your QT application. That snapshot probably only takes about 10% extra RAM because most of RAM contents wont change between the snapshot and the live system.

moondev · 2024-08-29T08:16:12 1724919372

Such a casual and low-key introduction of what sounds like an incredible new capability.

1 . Would something like this replace packer for creating machine images?

2. Curious how quickly the replay log grows and how it compares to a CoW snapshot.

3. Will be interesting what the log looks like and what doors could open up creating or generating it by other means.

owyn · 2024-08-29T13:27:18 1724938038

It is very cool, but I think some version of this feature has been around for years? This commit is from 7 years ago, and it looks like the code originates back to 2010.

https://github.com/qemu/qemu/blob/v2.9.0/docs/replay.txt

That said, I was not aware of it until I saw this post, and I definitely want to play around with it.

Intralexical · 2024-08-29T18:33:53 1724956433

Well, I guess it was probably new when the doc page was first written to introduce it.

> That said, I was not aware of it until I saw this post, and I definitely want to play around with it.

You could almost say it was too casual and low-key. ;)

vessenes · 2024-08-29T10:01:13 1724925673

Seems to me like one of the highest and best uses of this right now would be adding verifiable builds to … literally anything. You no longer need a verifiable-build-capable compiler or language — you can just run the compile and packaging step through a deterministic QEMU session.

Does this sound right? I’m trying to figure out where uncontrollable randomness would come in during a compile phase, and coming up blank.

0points · 2024-08-29T10:09:09 1724926149

Historically, major causes for non-determinism was embedded timestamps and unsorted file listings created by the build tools.

I have not followed the progress recently, but https://reproducible-builds.org/ is a starting point if you are interested.

There is a sane path forward for reproducibility on bare metal, no custom emulation is needed.

vessenes · 2024-08-29T10:26:38 1724927198

Thanks for the link, I’m aware of reproducible-builds.org.

Both your causes seem trivially fixable here - the QEMU builds could have a standard system clock time they start with, and an ‘unsorted’ file listing made in a deterministic OS environment will keep the same file order, no?

By comparison the rb.org site says you need to start with stripping all that stuff out of your build process, for the reasons you refer to.

repelsteeltje · 2024-08-29T12:16:23 1724933783

> Both your causes seem trivially fixable here []

You'd be amazed about the amount of indeterminsim lurking in the guts of depencies all the way into libc and os ... Like locale, fs

commercialnix · 2024-08-29T18:20:42 1724955642

I like your thinking. Deterministic replay with QEMU is supplemental to the larger goal of reproducible builds. The communities concerned with the topic of reproducible software not only expect cohesive human-readable code that runs deterministically to produce binary reproducible results, but their originally stated goals require it.

Deterministic replay with QEMU is a "power tool" in the larger picture of these efforts.

Intralexical · 2024-08-29T19:03:12 1724958192

Sounds to me like that wouldn't be quite as good as true reproducible builds (which can run on anybody's computer) because auditing the entire emulated hardware starting state and events log is a harder problem than auditing only your code. Including a virtual machine image for your build effectively makes the VM part of your codebase in terms of users needing to trust it, so verifying the build result means not just engineering but also forensics.

So it'd be good for cases where you otherwise wouldn't be able to provide any verifiability. But for software, it's still not as good as eliminating non-determinism completely.

vessenes · 2024-08-31T12:54:33 1725108873

this is a good point that it does add a lot of weight to a build process. I might try it though. If the ‘weight’ is a lightweight Linux build environment referenced by ISO hash, it might not be that bad (TM) in practice. And, generally, trading bandwidth (cheap) for human refactoring (crazy expensive) is a trade that’s often nice to make.

Intralexical · 2024-09-03T00:27:21 1725323241

It's not so much the weight of running the build I'd be concerned about, but the weight of needing to audit the whole VM to trust the build.

E.G. There could be malicious code hidden in the free RAM.

rrdharan · 2024-08-29T10:01:21 1724925681

VMware had this many years ago; it was very cool:

http://stackframe.blogspot.com/2007/10/configuring-applicati...

http://www.replaydebugging.com/2008/08/vmware-workstation-65...

justinclift · 2024-08-29T11:13:18 1724929998

Any idea if that works in modern VMware Workstation? It's currently on version 17, whereas that post was for version 6.5.

VMware Workstation has such disjointed development spurts that it wouldn't surprise me if the feature had been ripped out at some point. Other useful features such as machine groups have been. :(

rrdharan · 2024-08-29T12:41:57 1724935317

http://www.replaydebugging.com/2011/09/goodbye-replay-debugg...

m000 · 2024-08-29T14:54:44 1724943284

I guess we can say it was too cool.

justinclift · 2024-08-29T14:01:33 1724940093

Thanks. Yeah, I kind of expected that. :(

roca · 2024-08-29T11:20:30 1724930430

It was removed in 2011.