Languages using moving garbage collectors, like C# and Java are particularly goo...

raggi · on April 10, 2024

Yup, and exposing pointers in a gc language was a mistake as it blocks this. It limits (efficient) applications to small deployments

neonsunset · on April 10, 2024

Why would it impose such a restriction?

raggi · on April 10, 2024

In order to move objects you need to stop the world and update everything that will ever point to them: pointers, pointer aliases, arithmetic bases, arithmetic offsets. This quickly becomes intractable. It's not strictly speaking just pointers themselves, but the fact that pointers can be used arithmetically in various ways, even though the most obvious ways are disallowed. The obvious example is unsafe.Pointer and uintptr, but that has some guards, for example you'll get an error converting from a uintptr variable to an unsafe.Pointer, but mix in some slices, or reflect usage and you can quickly get into shaky territory. I believe you can achieve badness even without using the unsafe package.

neonsunset · on April 10, 2024

Interesting. I did not realize it is such a problem in JVM. EDIT: in Go, not JVM, somehow it is always Go having trouble with systems programming domain.

In .NET, this does not seem to be a performance issue, in fact, it is used by almost all performance-sensitive code to match or outperform C++ when e.g. writing SIMD code.

There are roughly three ways to pass data by reference:

- Object references (which work the same way as in JVM, compressed pointers notwithstanding)

- Unsafe pointers

- Byref pointers

While object references don't need explanation, the way the last two work is important.

Unsafe pointers (T*) are plain C pointers and are ignored by GC. They can originate from unmanaged memory (FFI, NativeMemory.Alloc, stack, etc.) or objects (fixed statement and/or other unsafe means). On an occasion a pointer into object interior (e.g. byte* from byte[], or MyStruct* from MyStruct field in an object) is required, such object is pinned by setting the bit in object header which is coincidentally used by GC during mark phase (concurrent or otherwise) to indicate live objects. When an object is pinned in this way, GC will not move it during relocation and/or compaction GC phase (again, concurrent or otherwise, not every GC is stop-the-world). In fact, objects are moved when GC deems so to be profitable, either by moving to a heap belonging to a different generation or by performing heap compaction when a combination of heuristics prompts it to reduce fragmentation. Over the years, numerous improvements to GC have been made to reduce the cost of object pinning to the point that it is never a concern today. Part because of those improvements, part because of the next feature.

Byref pointers (ref T) are like regular unsafe pointers except they are specially tracked by GC to allow to point to object interiors without writing unsafe code or having to pin the object. You can still write unsafe code with them by doing "byref arithmetics", which CoreLib and other performance sensitive code does (I'm writing a UTF-8 string library which heavily relies on that for performance), and they also allow to point to arbitrary non-object memory like regular unsafe pointers do - stack, FFI, NativeMemory.Alloc, etc. They are what Span<T> is based on (internally it is ref T + int length) allowing to wrap arbitrary memory ranges in a slice-like type which can then be safely consumed by all standard library APIs (you can int.Parse a Span<byte> slice from stack-allocated buffer, FFI call or a byte[] array without any overhead). Byref pointers can also be pinned by being stored in a stack frame location for pinned addresses, which GC is aware of. For stack and unmanaged memory this is a no-op and for object memory this only matters during GC itself. Of course nothing is ever free in software engineering but the overhead of byrefs is considered to be insignificant compared to object pinning.

raggi · on April 10, 2024

Java behavior is in general/abstract similar to .net - there are types & APIs to pin memeory for ffi and aliasing use cases but normal references are abstracted enabling the gc to perform compaction/moves for non pinned objects. The behavior I described in the parent post was go, which exposes pointers directly, rather than references.

It is still possible to end up with fragmentation challenges with both the jvm and .net under specific circumstances- but there are also a lot of tunable parameters and so on to help address the problem without substantially modifying the program design to fit an under specified contract with the memory subsystem. In go there are few tunables, and an even less specified behavior of the gc and its interaction with application code at this level of concern.