> his main example of a deep module is actually shallow. It's not, you're just i...

Darmani · 2024-12-22T20:03:14 1734897794

Hi Mawr,

I don't have much to say to most of your comment --- a lot of the text reads to me like a rather uncharitable description of the pedagogical intent of most of my writing.

I'll just respond to the part about deep modules, which brings up two interesting lessons.

First, you really can't describe an implementation of the Unix IO interface as being hundreds of thousands of lines.

That's because most of those lines serve many purposes.

Say you're a McDonalds accountant, and you need to compute how much a Big Mac costs. There's the marginal ingredients and labor. But then there's everything else: real estate, inventory, and marketing. You can say that 4 cents of the cost of every menu item went to running a recent ad campaign. But you can also say: that ad was about Chicken McNuggets, so we should say 30 cents of the cost of Chicken McNuggets went to that ad campaign, and 0 cents of everything else. Congratulations! You've just made Big Macs more profitable.

That's the classic problem of the field of cost accounting, which teaches that profit is a fictional number for any firm that has more than one product. The objective number is contribution, which only considers the marginal cost specific to a single product.

Deciding how many lines a certain feature takes is an isomorphic problem. Crediting the entire complexity of the file system implementation to its POSIX bindings -- actually, a fraction of the POSIX bindings affected by the filesystem -- is similar to deciding that the entire marketing, real estate, and logistics budgets of McDonalds are a cost of Chicken McNuggets but not of Big Macs. There is a lot of code there, but, as in cost accounting, there is no definitive way to decide how much to credit to any specific feature.

All you can objectively discuss is the contribution, i.e.: the marginal code needed to support a single function. I confess that I have not calculated the contribution of any implementation of open() other than the model in SibylFS. But Ousterhout will need to do so in order to say that the POSIX file API is as deep as he claims.

Second, it's not at all true that a garbage collector has no interface. GCs actually have a massive interface. The confusion here stems from a different source.

Programmers of memory-managed languages do not use the GC. They use a system that uses the GC. Ousterhout's claim is similar to saying that renaming a file has no interface, because the user of Mac's Finder app does not need to write any code to do so. You can at best ask: what interface does the system provide to the end-user for accessing some functionality? For Finder, it would be the keybindings and UI to rename a file. For a memory-managed language, it's everything the programmer can do that affects memory usage (variable allocations, scoping, ability to return a heap-allocated object from a function, etc), as well as forms of direct access such as finalizers and weak references. If you want to optimize memory usage in a memory-managed language, you have a lot to think about. That's the interface to the end user.

If you want to look at the actual interface of a GC, you need to look at the runtime implementation, and how the rest of the runtime interfaces with the GC. And it's massive -- GC is a cross-cutting concern that influences a very large portion of the runtime code. It's been a while since I've worked with the internals of any modern runtime, but, off the top of my head, the compiler needs to emit write barriers and code that traps when the GC is executing, while the runtime needs to use indirection for many pointer accesses (if it's a moving GC). Heck, any user of the JNI needs to interface indirectly with the GC. It's the reason JNI code uses a special type to reference Java objects instead of an ordinary pointer.

If you tally up the lines needed to implement either the GC or the POSIX file API vs. a full spec of its guaranteed behavior, you may very well find the implementation is longer. But it's far from as simple a matter as Ousterhout claims.