> but not for serious engineering of long-lived applications Oh come on. Does Lu...

jstimpfle · on Sept 15, 2017

Yes I knew I should not pull the "serious engineering" card going in... But there I go, giving a mostly clueless answer to a high-profile HN user :-)

I don't know elasticsearch, but if this is something like a database where millions of objects are tracked (like in an RDBMS, or in a 3D Game if coded by an unexperienced coder who likes to isolate everything down to the Vertex or scalar level into "objects"), then I would assume at least one of the following applies

   - The objects in the datastore are not represented as individual runtime object after all
   - The GC for objects in the datastore is highly tuned (GC only done manually, at certain points),
     and the memory space overhead of having individual DB objects represented by runtime objects
     is just accepted.

I mean, I did finish said Java application, but I got good performance from it only after transforming it into an unreadable mess based on SOA's of int[] (which means unboxed integers, not objects) and lots of boilerplate code. Would have been easier to do in C, hands down (language was not my own choice).

> Huh? This is the representation of a slice: https://golang.org/pkg/reflect/#SliceHeader --- It's pretty standard for a dynamically growable region of memory.

and object/GC overhead? It's GC tracked objects after all, right? (again, I admit to knowing next to nothing about Go's runtime)

burntsushi · on Sept 16, 2017

> and object/GC overhead? It's GC tracked objects after all, right? (again, I admit to knowing next to nothing about Go's runtime)

Go has value semantics. So when you have a `[]T` ("slice of T"), then what you have is 24 bytes on the stack consisting of the aforementioned `SliceHeader` type. So there's no extra indirection there, but there might be a write barrier lurking somewhere. :-)

> I don't know elasticsearch, but if this is something like a database where millions of objects are tracked

Elasticsearch is built on top of Lucene, which is a search engine library, which is itself a form of database. I don't think there's any inherent requirement that a database needs to have millions of objects in memory at any given point in time. There are various things you can ask Elasticsearch to do that will invariably cause many objects to be loaded into memory; and usually this ends up being a problem that you need to fix. It exposes itself as "OMG Elasticsearch is spending all its time in GC," but the GC isn't the problem. The allocation is.

In normal operation, Elasticsearch isn't going to load a millions of objects into memory. Instead, its going to read objects that it needs from disk, and of course, the on disk data structures are cleverly designed (like any database). This in turn relies heavily on the operating system's page cache!