Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: Micro-mitten – Research language with compile-time memory management (github.com/doctorn)
236 points by doctor_n_ on May 8, 2020 | hide | past | favorite | 62 comments


I haven't dug into the details a ton, but I am excited to see this! Would love to see more research in this direction.

> micro-mitten's approach is significantly different from Rust's. Rather than depending on single ownership and a complex lifetime system, micro-mitten uses a series of data-flow analyses to statically approximate heap liveness.

To be clear, Rust these days also looks at control-flow. This was what all the "non-lexical lifetimes" hubbub was about. And the next generation checker is based on datalog...


> And the next generation checker is based on datalog...

Where can I learn more about this?


It's called "polonius": https://github.com/rust-lang/polonius

There are some posts on Niko Matsakis' blog, starting with this one: https://smallcultfollowing.com/babysteps/blog/2018/04/27/an-...

More recently a really good talk, with slides here: https://nikomatsakis.github.io/rust-belt-rust-2019/


Niko's (Rust current lead) blob[1] talks about it among many other Rust things. I'm on mobile atm, so I can't give you exact links, but you should find what you're looking for.

[1]: http://smallcultfollowing.com/babysteps/


Discussion on r/ProgrammingLanguages: https://www.reddit.com/r/ProgrammingLanguages/comments/gfgn0...

Discussion on r/rust: https://www.reddit.com/r/rust/comments/gfgt1b/rustlike_langu...

I look at this and I think it's a innovative and promising idea - the freedom of a garbage collected language, but with the tracing done as a type-aware static analysis, and the cleanup code inserted at compile-time!


It is not the only one

https://www.csail.mit.edu/event/safe-parallel-programming-pa...

https://chapel-lang.org/docs/master/builtins/OwnedObject.htm...

And a couple more with affine types, or algebraic effects.

Yes, this might be the future, but we are still far from the overall convenience of GC for common programming scenarios.

If anything one was to thank the Rust community for pushing more people to look into this area, regardless of its outcome in the language market.


> This means that it maintains the ability to insert freeing code at appropriate program points, without putting restrictions on how you write your code.

How does the approach in mitten compare to Automatic Reference Counting in Objective-C (and I think Swift too)? From my experience, ARC can still add a surprising amount of memory management overhead to a program and needs a lot of hand-holding to keep that overhead down to an acceptable level (e.g. low single-digit percentage of overall execution time in programs that talk to Obj-C APIs a lot). I would be surprised if a "traditional GC" can do any worse in that regard (maybe reference counting smears the overhead over a wider area, e.g. no obvious spikes, but instead "death by a thousand cuts").

One thing I'd like to see in modern languages is to encourage and simplify working with an (almost) entirely static memory layout, and make manipulations inside this static memory layout safe. This static memory layout doesn't need to be magically derived by the compiler as long as the language offers features to easily describe this memory layout upfront.

A lot of data structures in applications don't need to live in "short-lived" memory regions, but they often do because that's what today's languages either encourage (e.g. when built on the OOP philosophy), or what happens under the hood without much control from the code (e.g. in "reference-heavy" languages like Javascript, Java or C# - or even "modern C++" if you do memory management via smart pointers).

Minimizing data with dynamic lifetime, and maximing data with static lifetime could mean less complexity in the language and runtime (e.g. lifetime tracking by the compiler, or runtime memory management mechanisms like refcounting or GCs).


"How does the approach in mitten compare to Automatic Reference Counting in Objective-C (and I think Swift too)? From my experience, ARC can still add a surprising amount of memory management overhead to a program and needs a lot of hand-holding to keep that overhead down to an acceptable level (e.g. low single-digit percentage of overall execution time in programs that talk to Obj-C APIs a lot). I would be surprised if a "traditional GC" can do any worse in that regard (maybe reference counting smears the overhead over a wider area, e.g. no obvious spikes, but instead "death by a thousand cuts")."

The reference counts have to be incremented when a new reference is made and decremented when one is deleted; freeing the memory when the count goes to zero. (This activity is cache- and atomicity-unfriendly (in the presence of threads).) A sufficiently smart compiler can omit many if not most of the count activity, but this kind of static analysis promises to remove all of it.

Further, reference counting has difficulty with circular references as the counts never go to zero. This should also be able to handle that.

Both this and reference counting are likely victims of the "death by a thousand cuts" you mention, as well as "drop the last pointer to a large structure and wait for a long time while the pieces are deleted"---the reference counting equivalent of a stop-the-world-and-trace garbage collection.


”This activity is cache- and atomicity-unfriendly (in the presence of threads).“

Indeed it is. https://iacoma.cs.uiuc.edu/iacoma-papers/pact18.pdf states reference counting takes 42% of execution time in client programs and 15% in server programs.

Luckily, they also present amore cache-friendly variation on reference counting that halves that overhead.

They modified the Swift compiler, so I think there’s a decent chance we’ll see this added to Swift.

Swift also, I think, is somewhat designed around the inefficiency of reference counting by promoting the use of structs for small objects (structs in Swift are value objects and not reference-counted in the language).


From what I understood, it’s not reference counting but trying to determine at compile time when to drop using data flow analysis to come up with an approximation of the liveness.

I had a thought sometimes back, can compilers do a profile run to get information about the liveness of objects it couldn’t determine statically by dumping gc info?


> get information about the liveness of objects it couldn’t determine statically by dumping gc info?

Well it may be possible to optimize via profiling (this is what PGO is), but this of course wouldn’t be a static analysis, so it would just allow some optimizations on dynamic access.

In theory you could use profiling as part of the implementation of a theorem prover, but I haven’t seen any examples where that is more effective then the conventional methods of data flow analysis. And in this case, it would still be static analysis.


Most programs are complex, have lots of branching, so this wouldn't work.


From the ASAP dissertation (https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-908.pdf):

> Reference counting, like asap, is a safe, synchronous memory management strategy. However, rc approximates waste by unreachability which is less timely than asap’s approximation by Access.

I think a more careful reading of the work is required to distinguish the precise meanings of "unreachability" and "access" in this context.


As I understand it:

- unreachability means there's no live reference to that object. - access means somebody is going to use this object again. If you have a live reference but won't ever use it again it's a kind of leak (that's how GCed languages can leak memory without any manual management).


> One thing I'd like to see in modern languages is to encourage and simplify working with an (almost) entirely static memory layout, and make manipulations inside this static memory layout safe.

This sounds interesting! What do you mean by (almost) static memory layout? Fixed sizes for everything in a contiguous region, or multiple growing arrays, or something else entirely?

In recent time I have written a few programs in a way that uses only realloc inside dynamic arrays for memory management and never frees. This leads to a big semi-global struct holding all the dynamic arrays, which I am reminded of. (Local functions can then take those arrays, use them, and give them back once they return.)


I (mostly) returned to C a little while ago, and for smaller things I sometimes create the entire application state as a single, big struct that's made of many smaller nested structs, maybe with one or very few "layers" of dynamically allocated data dangling off from the static "root structure" (but only when really needed).

A very simple example is an all-in-one application data structure like this:

https://github.com/floooh/v6502r/blob/1d2b79ac11d7828b2722b5...

This very simple approach has some downsides of course, mostly because C doesn't help much to solve some problems like a more specialized language could (but on the other hand, it also doesn't get in the way much):

- Every part of the program sees and is allowed to change everything, so it would be nice to have a simple syntax for fine-grained visibility and access rules (but not at all like C++ public/private, more like compile-time read-only and read-write access tokens).

- Not much compile- and runtime-protection from code scribbling over neighboring nested structs.

- Not much flexibility for dynamic arrays. It would be good to have 3 flavors: (1) compile-time max capacity which can be completely embedded, (2) a runtime max capacity array, which is allocated once but can never grow, and (3) a fully dynamic array which can grow (but maybe never shrink?). Such dynamic arrays should never change their base address, so that memory locations remain stable.

- It's not well suited for bigger programs built from many modules. It should be possible to have highly modular program code, but still end up with a single monolithic "root data layout".

One great side effect of this approach is that it feels completely natural to not do dynamic memory allocation all over the place (and one of the good features of C is that memory allocation is always very obvious - and thus easy to avoid).


Isn’t that just a variant on organizing your globals well to make using lots of globals manageable?

That’s what historically was done in languages such as FORTRAN and COBOL (both of which had compile-time memory management, but managed to do that by completely dropping memory allocation at runtime)

And yes, that meant dropping all dynamic memory allocation, too. If your wanted to run your FORTRAN program on a larger data set, you replaced the cards defining the dimensions of your arrays and recompiled.


Most likely! I never wrote FORTRAN or COBOL, but instead was too heavily influenced by the OOP hype of the 90s, and I think this hype still heavily lingers on even in the current "post-OOP" world. It feels like memory management is still heavily stuck in the "every small thing in a program needs its own lifetime" mindset.

Remembering how I did "memory management" in my early 8-bit assembler programs (e.g. not at all, just figure out what's needed upfront and assign every single data item its fixed address) was when I realized that dynamic allocation actually isn't all that important as I assumed the whole time.

But I'd like a "modern approach" and modern tooling for that very old idea :)


I reached a similar conclusion. Dynamic memory is seductive for the task of indefinite scaling, but a practical system always encounters bottlenecks that aren't along the memory management axis, and in the meantime your code is much harder to verify.

NB: Historically, game engines have tended towards object pooling at runtime without any dynamic allocations. In that case there is a defined limit to what a scene will accommodate, and the object counts often simply reflect the other performance bottlenecks involved.

GC is still nice, but mostly in the sense of stitching together the most dynamic elements of the system. You don't want to have to trace a ton of stuff, and that also leads in the direction of flattening the data and making it more manual and static as the type system allows.


> It's not well suited for bigger programs built from many modules. It should be possible to have highly modular program code, but still end up with a single monolithic "root data layout".

You can still have modules per file, where the globals can be thought as members of a virtual root. And the globals can be either shared or private to the module.

I have seen very complex programs done like that. There are some advantages like proving the program does not run out of memory, but the problem is that you end up with code that is harder to reuse, harder to test, etc. as soon as you start sharing state.

But yeah, it is sometimes done like that.


Reference counting happens at runtime, this happens at compile time.


But with ARC the compiler also does compile-time tracking to figure out when object references are shared and unshared, and based on that analysis inserts retain/release calls at the right points in the program (or more importantly: avoids the refcount overhead completely when it is not needed - e.g. when ownership is moved, not shared).

If mitten can do complete compile-time analysis also for all sorts of shared references and thus can avoid refcounting completely at all times then this would indeed be a nice improvement.


I didn't know that some reference-counting languages optimize away some cases of runtime counting by attempting to track ownership, though it makes sense. But I think when people say "reference counting" they mean the naiive approach. Even Rust has reference-counting structures, if you need them, and they actually expand what you're allowed to do with those values because you're partially stepping outside of the ownership system (at the cost of runtime performance).


It's pretty easy to optimise away some reference counting operations. For example if you allocate something on the heap that is not returned from the function, nor passed to any other function, or captured in a closure, then you know it will be dead at the end of the function, so you don't need to emit reference counting operations for it.


Yeah. I guess Rust's secret sauce is just handling the long tail of much harder cases


Probably yeah. One thing is that it helps to have a statically typed language to do these optimisations, since you can used type-based analysis for them. So dynamically typed languages like Lisp or Python are going to have a harder time doing them.


If a language like this were to take off, I could see linter-style errors pop up that are not currently possible.

"ERROR: maximum memory usage computed to be XYZ MB, which is higher than the speicified limit of 500MB"

Now that would help keep the RAM bloat down!


My functional language Winter has errors like that. You can compute the maximum memory usage of the program and then throw an error if it exceeds some threshold.

Edit: I'll add that there are lots of programs the prover can't effectively handle right now, so can't compute a good memory bound. It works fine for some relatively simple programs however.


> there are lots of programs the prover can't effectively handle right now

And always will be, due to Rice's Theorem. Can still be useful though - various formal methods techniques are like this.


Winter is not a Turing equivalent language, so technically Rice's theorem doesn't apply (I think). Nevertheless practically speaking you are right, there will always be valid programs that the prover can't prove are correct.


As long as it still allows for arbitrary-length vectors - which I can't imagine it not - this would be impossible due to the halting problem. Or I guess, maybe you could derive a lower bound on memory usage, but you could not get an upper bound/exact number.


What about allocation is only possible when you can prove the upper limit. So this is possible

    if (sz < 1e6)
        vec = vector<int>(sz)
    else
        throw "Size exceeds upper limit"


I'm saying you could know "it will use at least X memory", but you could never know "it will use at most X memory" unless you seriously cripple the language's capabilities


Maybe there could be a compiler flag. You let any program compile normally, but if you enable the flag, it only allows programs to compile if the maximum memory usage can be computed, and if there are under the limit you specify.

That means the language isn't always crippled, but you can get compiler enforcement for certain embedded programs.


It would effectively become a stack-only language (heap allocations' sizes would always have to be known at compile time, just like on the stack). I could see that serving an interesting special subset of use-cases, but I was under the impression we were talking about a general programming language, which that would not be.


For example MISRA C disallows dynamic memory allocations completely and still pretty complex applications have been written to that spec. Similar guidelines are pretty common in other high reliability or safety critical software specifications. Another example is "JPL Institutional Coding Standard for the C Programming Language", which specifies "Do not use dynamic memory allocation after task initialization"


Fortran didn't have dynamic memory allocation until Fortran 90. If you wanted to run a problem larger than the original code's author had anticipated, you needed to recompile the source with larger values. This could indeed be an annoying restriction. But you might be surprised how much software was successfully written in Fortran.


Ada/SPARK does not allow for dynamic allocations, all of them must be proven at compile time.


This is Ada behaviour when doing stack allocations actually.


You can already do this with certain forms of type-level programming. We did it in the Rust-based OS we created atop seL4. It started out as using types to track setup processes and kernel objects at compile-time to bring some of the guarantees seL4 provides only at run-time, and also to provide guard rails to ensure that you're not accidentally creating unsecure systems using the tools seL4 gives you. After some exposure to doing this it became clear that we could also track memory allocations this way, so we created an allocator to do exactly that.

It basically ensures at compile-time that every allocation in the program will succeed given some expected usable footprint, which also has the benefit of ensuring that we kill the kernel/OS immediately at bootstrapping if the machine/environment doesn't have enough memory to make sure our proof-carrying code can meet its expectations.

One huge problem with this approach though is that it doesn't scale well. Where "scale" is in the dimension of revising your programs because the type-parameters tend to bleed all over the place and don't compose well/at-all. So, it's great if you're sure you've got your design locked-in, but it's kind of a nightmare if you're still prototyping & refactoring.


There is a Rust tool that computes the maximum stack usage of a program (if such maximum is bound). You can use it to make sure that your microcontroller running Rust doesn't run out of stack space.


This could be an interesting avenue to work towards data layout optimizations. If the language is built to require static knowledge about memory accesses, it could change the layout to be more cache-friendly, use range optimizations to pack fields more tightly, and customize array-of-struct representations to be blocking/tiling friendly etc.


The great thing in Rust is that variable lifetimes are defined on function boundaries. Taking that away would take away the guarantees that Rust libraries can provide.

In another word it's a good thing forprogrammers that Rust doesn't allow more freedom, and requires them to restructure the code if necessary.


I'm thinking that until they are perfect these kind of languages lure the developer into a one-way street, convenient until you reach a dead-end at which point your only escape (if you are lucky) is a whole bunch of contrived and difficult to maintain typing constructs.

Still interesting research though.


Your comment carries very little substance (reads like a baseless opinion) but appears on the very top of the page. Interesting.


At least according to [1] and [2], comment order is not determined just by the comment's score but also by the posters', when it was posted, and other things.

[1] https://news.ycombinator.com/item?id=1398764

[2] https://news.ycombinator.com/item?id=13867739


It's refreshing to see new approaches for memory management exploring notably the power of static analysis at compile time. I will take the time to read this dissertation!

Just a question. It reminds me previous works on static inference of stack-allocated regions. What are the relationships, if any?

* [A Simplified Account of Region Inference](https://hal.inria.fr/file/index/docid/72527/filename/RR-4104...)

* [MLton regions](http://mlton.org/Regions)


The GitHub readme links to a scholarly thesis discussing the general approach[0], as well as to the author's own dissertation[1] on this specific project (which "aims to investigate the practical viability" of [0] "by building the technology into a real compiler" for empirical evaluation on real-world hardware). Region inference is discussed throughout [0], and extensively in Chapter 9.

[0] https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-908.pdf

[1] http://nathancorbyn.com/nc513.pdf


If the paper is anything like the abstract, this will be wonderful!


Interesting as well. I see the MLTon regions have issues with bounding space complexity. I wonder if a language could impose some reasonable restrictions to bound region space complexity. That would seem to make regions quite attractive.



"In its current form, asap is not equipped to handle CONCURRENT programs. Managing memory in concurrent programs poses its own set of challenges."

So it's doesn't have the fearless concurrency of Rust yet and I wonder if it's possible this approach at all. I guess it's an open research question.


Concurrency is a "proposed extension", per section 6.4 of the referenced thesis. Among other things, it is noted that cooperative concurrency with explicit yield points would be somewhat feasible, whereas anything more general than that is very much an active research area to say the least.


If unrestrictive compile time GC is possible, couldn't this be retrofitted into JITs for JavaScript, Java, .NET, WASM, etc.? Isn't this just another way of implementing GC?


JITs already do this kind of thing, it is called escape analysis, it is just quite hard to get right.

Also many GC based languages are adding some form of linear types, so that you can still enjoy the productivity of having a GC around, while being able to get hold of these kind of tools.

https://github.com/apple/swift/blob/master/docs/OwnershipMan...

https://gitlab.haskell.org/ghc/ghc/-/wikis/linear-types


> series of data-flow analyses

Screems highly non-compositional to me. No thanks if so, I'll stick with good old types and proofs.


I imagine the compile would be magnitudes slower than Rust or C++, and I can't really stomach slow compilers. Yesterday I was hacking on a Qt app and the time taken to rebuild after a slight change to the header is distressing (more than 5 seconds, which can afford a full rebuild on a typical C program). I'm kind of surprised that the quick sort example took only around twice to thrice as long to compile compared to the no GC approach. The example seems to be working on a compile-time defined list though. I'd like to see how it scales on arbitrary runtime-defined input.


> I imagine

No need to image, the thesis mentioned in the README provides compile-time results. From no noticeable overhead to 2x larger compile-times than Rust.

Quite reasonable if you take into account that one is one person's university thesis, and the other is a project with 100s of active developers, 100s of open PRs, etc.


To my understanding skimming the paper, the analysis must compute "call contexts" of functions, which use information from call sites. I wonder if this will impede incremental compilation and modularity. As programs get bigger, perhaps this approach may not scale.


might be worth knowing that there is a production-deployed programming language which - besides being a great language in many, many respects - will very soon (next release) have compile-time memory management (already working and performant for stdlib including async) in a stable release: Nim.

[1] https://forum.nim-lang.org/t/5734#35562 [2] https://forum.nim-lang.org/t/6125#37829


Looks like the ARC we've had in other languages (ObjC, Swift) for many years now. Any difference?


Yep, made a comment without really knowing the subject. Can I downvote it too? :P




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: