Introducing Riptide: WebKit's Retreating Wavefront Concurrent Garbage Collector

bdash · on Jan 20, 2017

I'm impressed by Filip's knack for explaining complex topics in an approachable manner. His previous post on WebKit's locking infrastructure[1] is a similarly good read.

A few amusing parts of this post that stood out to me:

* A single sentence containing half a dozen links to Filip's previous work on garbage collection.

* "See here for the proof", linking to a 200+ line comment in WebKit's source.

* "But since this is JavaScript, we get to have a lot more fun".

[1]: https://webkit.org/blog/6161/locking-in-webkit/

agentultra · on Jan 20, 2017

While that proof is amusing I was hoping for... an actual proof that I can give to a verifier. Ah well. A hundred or so lines of comments is probably right.

Still this looks to be some impressive work and I did enjoy this post.

pizlonator · on Jan 20, 2017

I'm a mathematician. Proofs are one of the means of communication between me and other mathematicians.

I don't give a crap if a computer can read my proof.

nickpsecurity · on Jan 21, 2017

Quite a few people held that view in this article:

https://www.quantamagazine.org/20130222-in-computers-we-trus...

It and others I've read on machine-checked proof showed the computer-centric one could catch quite a few problems with higher assurance of correctness. The peer review problem in science also makes me think it's important given I can't be sure the huge proofs will be adequately checked. Far as reliability on computer end, I've read on verified processors, proof assistants, compilers, and so on. Much of the risk can be knocked out but the black box that is human brains is another story.

corndoge · on Jan 21, 2017

Well, us programmers do since computers are the ones we're communicating with.

pizlonator · on Jan 21, 2017

I like to write programs when I program and proofs when I mathematize. The program is for the computer and the proof is for a human.

I think you're getting it all mixed up!

corndoge · on Jan 21, 2017

I think you're wrong!

nickpsecurity · on Jan 20, 2017

Maybe he means proof like lawyers use it instead of mathematicians. Im sure his proofs are quicker that way. ;)

AnimalMuppet · on Jan 20, 2017

Lawyers are faster than mathematicians? In fact, that's probably right, but I'm experiencing cognitive dissonance right now...

moomin · on Jan 20, 2017

Depends how much you overclock your mathematician.

nickpsecurity · on Jan 21, 2017

They sure prove, err argue, faster.

ooqr · on Jan 21, 2017

The justice system has more overhead.

nickpsecurity · on Jan 21, 2017

Just means it's ripe for disruption by a YC startup.

mwcampbell · on Jan 20, 2017

This also applies to JavaScriptCore when used outside of WebKit, right? If so, then it would also benefit React Native, among other things.

scarlac · on Jan 21, 2017

If Node (React Native) uses V8, Would it not have to be ported to V8 first (ie. Chromium needs to re-implement the technique)?

pizlonator · on Jan 21, 2017

Sure, but JavaScriptCore has its own API for native apps that want to JavaScript. That's what react native does.

pizlonator · on Jan 20, 2017

Yup. :-)

ahh · on Jan 21, 2017

  Some accounting only needs to happen when visiting the
  object for the first time. The complete barrier is simply:

     object->field = newValue;
     if (object->cellState == Old)
       remember(object);

This surely elides some synchronization? One would think that either this has to communicate with the collector (to prevent a racing concurrent collection from missing the referent), or happen in the opposite order.

Or perhaps it is relying on the calling mutator thread necessarily holding a reference to the young object in question? Even then there is some non-trivial ordering work to be done here (in particular on weakly ordered architectures.)

I'm sure this is well handled but as this is one of the main difficulties in implementing good write barriers I am curious what they did.

pizlonator · on Jan 21, 2017

The barrier code you quote is for our old generational GC. Keep reading to see what the riptide barrier looks like. See the section on retreating wavefront.

ahh · on Jan 21, 2017

Aha, thanks; I didn't realize the first generational section was for an older version.

However, the riptide barrier description still isn't 100% clear how it avoids terminating a sweep (I believe "drain" in your termination) in the middle of a barrier execution; in particular I'm not concerned about the data race between the field write and the mark bit, but the GC reading mark bits, believing the drain has terminated, and freeing memory while a mutator thread has written a field but not written the mark bit.

Are you just relying on the stop-the-world behavior relative to roots that you mention briefly?

pizlonator · on Jan 21, 2017

The collector can only flip at safepoints. Safepoints cannot happen inside the barrier or in between the store the barrier protects and the barrier itself.

Flip = term of art for when the GC terminates and decides which objects are free.

ksec · on Jan 21, 2017

Sometimes I still wish Apple release Safari for Windows.

On another note, Webkit seems to be following a much more open manner, and has shift to working more towards Web App optimization rather then Web Pages.

geodel · on Jan 20, 2017

Hope some knowledgeable people can compare/contrast this with Blink/Servo/Edge etc approach.

espadrine · on Jan 20, 2017

You are listing Web browser engines. This article relates to JavaScript engines (eg: SpiderMonkey (Firefox, Servo), ChakraCore (Edge), V8 (Blink)).

V8 has some details about their GC (garbage collector) here[0]; ChakraCore here[1]; SpiderMonkey here[2].

Now, they are all generational, incremental and concurrent, but each with particular tweaks.

The approach detailed in this blog post is similar to one taken in ChakraCore, except it sets barriers on fixed-size blocks of memory, while Riptide uses objects.

[0]: http://v8project.blogspot.fr/2015/08/getting-garbage-collect...

[1]: https://github.com/Microsoft/ChakraCore/wiki/Architecture-Ov...

[2]: https://developer.mozilla.org/en-US/docs/Mozilla/Projects/Sp...

the8472 · on Jan 20, 2017

It's a non-compacting collector.

In principle the downsides compared to compacting collectors is reduced cache locality, fragmentation, somewhat higher allocation overhead and higher memory footprint.

On the other hand it is a lot easier to implement concurrent collectors when objects are not moved.

> The changes that transformed WebKit’s collector were landed over the past six months, starting with the painful work of removing WebKit’s previous use of copying.

It is curious that mozilla chose the opposite, doing the painful work of turning a conservative, non-moving collector into a precise, compacting one.

pizlonator · on Jan 20, 2017

> In principle the downsides compared to compacting collectors is reduced cache locality, fragmentation, somewhat higher allocation overhead and higher memory footprint.

In actuality the downside of not copying is:

- Your allocator has to be tuned to resemble first-fit in its allocation order. This is empirically enough to control memory usage.

- You better have more virtual address space than physical address space. This means that the Robson bound for memory usage goes from M log M to M log P where M is memory size, P is page size, and log M or log P is the fragmentation wastage multiplier. People needed compaction really badly back when the state-of-the-art GCs ran on maxed-out 32-bit address spaces.

- Mark-sweep collectors have to sweep, copying collectors don't have to. But this is a fixable problem. Sweeping can be made concurrent, parallel, incremental, or very cheap. We pick the very cheap, by using bump'n'pop.

Initially we had some regressions from removing copying, but we fixed them by making sure that the first and third mitigation listed above were in place. WebKit rarely runs on a maxed-out 32-bit address space.

BenoitP · on Jan 20, 2017

> painful work of removing WebKit’s previous use of copying

Do I understand correctly that the hard work was done, and this is a regression?

the8472 · on Jan 20, 2017

I understood it as that their collector used to be precise, moving and they turned it into a partially-conservative, non-moving one.

pizlonator · on Jan 20, 2017

Clarification: our previous collector did conservative root scanning, used a less evolved mark-sweep system for some objects, and used a Bartlett-style of quick-release compaction for other objects.

Riptide removes the copying because concurrent or event incremental copying is sure to introduce overhead. Are a minimum you either need a barrier on all pointer reads from the heap or all writes to the heap, and none of those barriers resemble our old generational barrier so this would be new overhead. Riptide only barriers writes of pointers to the heap, and uses a single unified barrier for both concurrency and generations to get maximum throughput. Riptide's concurrency actually improves end-to-end execution time because the overhead of the concurrency is negligible so you just get to enjoy the benefit of continuing to run while the GC runs, so you run faster overall. The speed-up was like 5% on the splay test, and I measured speed-ups in other things I cared about, like page load times.

olliej · on Jan 21, 2017

It still does conservative root scanning for the stack. Hurrah for ABI compatibility!

pizlonator · on Jan 20, 2017

You should read the related work section of the post. :-)

You can also run the benchmark in Safari Technology Preview 21 and compare it to what you see in the other browsers.

geodel · on Jan 20, 2017

You are right. Thanks.

zbraniecki · on Jan 20, 2017

that is, if you're on mac, of course. ;)

pizlonator · on Jan 20, 2017

It should be enabled on Linux, so you could build some WebKit-based browser to try it.

hacker_9 · on Jan 20, 2017

tl;dr:

"Riptide is an improvement to WebKit’s collector and retains most of the things that made the old algorithm great. The changes that transformed WebKit’s collector were landed over the past six months, starting with the painful work of removing WebKit’s previous use of copying. Riptide combines Guy Steele’s classic retreating wavefront write barrier with a mature sticky-mark-sweep collector and lots of concurrency tricks to get a useful combination of high GC throughput and low GC latency."

on Jan 20, 2017

[dead]

fidget · on Jan 20, 2017

Well that seems like an easily applied solution to the problem of GC in javascript.

olliej · on Jan 21, 2017

As a follow up, what do you suggest should be used instead?

Explicit life time management? refcounting? ownership models?

They all have problems (although the exact problems change) and/or require substantially different programming models.

deathanatos · on Jan 21, 2017

Personally, I'm a fan of explicit lifetime management, such as RAII in C++ and especially in Rust. It solves the general problem of resource management, as opposed to only memory management. Most languages that use garbage collection leave generalized resource management in a C like state of malloc and free. A few (not JS) have syntax for binding the lifetime of an object to a scope, such as "with" in Python or "using" in C#, but the burden is still on the consumer of an interface to know that they need to use it (you need to know that object must be .close()d, or .unlock()ed, etc.) and you need to remember at each and every call site; programmers make errors.

That said, changing from GC to explicit lifetime management requires wide-reaching language changes that I don't see how you would apply to JavaScript, or nearly any GC language. It's a very fundamental aspect of the language, and not easily changed. Hence the work into better collectors by the languages that do GC, so the parent's comment is not really enlightening.

olliej · on Jan 21, 2017

I know the root's comment wasn't actually useful. People who have decided they hate GC like this will always do so. Much like vim/emacs :D

It was more a question of whether the commenter had any better solutions to propose. I was /really/ trying not to just be a dick and say "cool, so you work out how to do that without changing the language at all".

:D

matthewmacleod · on Jan 21, 2017

I mean, it very evidently does work, given the popularity of JavaScript and other GC'd languages…

stcredzero · on Jan 20, 2017

The post is written/formatted in a confusing way for those not already versed in garbage collection. In particular, the important but idiosyncratic term "draining" is not given a reference link, but is instead given a different font that makes it fade into the background.

pizlonator · on Jan 20, 2017

"Most of the time in GC is spent performing a depth-first search over marked objects that we call draining."

It's a term we use in our code to refer to that depth-first search.

bdash · on Jan 20, 2017

I do agree that the choice of a narrower font for emphasized terms has the opposite effect from what is intended.