Bonus points if you can effectively implement the "copy on write" ability of the linux kernel to only send over pages to the remote machine that are changed either in the local or remote fork, or read in the remote fork.
A rsync-like diff algorithm might also substantially reduce copied pages if the same or a similar process is teleforked multiple times.
Many processes have a lot of memory which is never read or written, and there's no reason that should be moved, or at least no reason it should be moved quickly.
Using that, you ought to be able to resume the remote fork in milliseconds rather than seconds.
userfaultfd() or mapping everything to files on a FUSE filesystem both look like promising implementation options.
If you just pull things on demand, you're going to get a lot of round-trip-time penalties to page things in.
I think you should still be pushing the memory as fast as you can, but maybe you start the child while this is still in progress, and prioritize sending stuff the child asks for (reorder to send that stuff "next"), if you've not already sent it.
Yah that is indeed a super important optimization for avoiding round trips. CRIU does this and calls it "pre-paging", their wiki also mentions that they adapt their page streaming to try to pre-stream pages around pages that have been faulted: https://en.wikipedia.org/wiki/Live_migration#Post-copy_memor...
edit: lol I didn't realized that isn't CRIU's wiki since they just linked to a Wikipedia page and both use WikiMedia software. This is the actual CRIU wiki page, and it's way harder to tell if they do this, although I suspect they do and it's in the "copy images" step of the diagram https://criu.org/Userfaultfd
That’s a great idea. One of my thoughts was to “pre-heat” the process by executing a bit locally with side effects disabled to see what would get immediately accessed and send that first.
If your systems strictly match somehow (machine image with auto update disabled? or regularly hash and timestamp files on both systems) you can also cheat by mapping some of the files locally on the other side.
I do in fact mention this idea in the article. In fact userfaultfd was added to the kernel so that CRIU and KVM live migration could implement exactly this.
Another cool project that does something like this is https://github.com/gamozolabs/chocolate_milk which is a fuzzing hypervisor kernel which can back a VM snapshot memory mapping over the network to only pull down the pages that the VM actually reads during the fuzz case.
If you ever needed to bring the process back, you could use soft-dirty-bit[1] to determine which pages were modified since forking and only transfer those. CRIU uses it for incremental snapshots (in fact, they wrote the kernel patch afaik)
A rsync-like diff algorithm might also substantially reduce copied pages if the same or a similar process is teleforked multiple times.
Many processes have a lot of memory which is never read or written, and there's no reason that should be moved, or at least no reason it should be moved quickly.
Using that, you ought to be able to resume the remote fork in milliseconds rather than seconds.
userfaultfd() or mapping everything to files on a FUSE filesystem both look like promising implementation options.