Well, C is pretty bad, compared to modern languages. It's full of buffer overflows, undefined behaviour, etc. Just because something is popular doesn't mean it is well designed.
Man, will this sentiment ever die... You know Stroustroup, there are only two kinds of languages: The ones everybody bitches about and the ones nobody uses.
Please guys, demonstrate a better way. Make that better language.
There is a reason most kernels are developed in C, most databases are developed in C, most (AAA) games are developed in C. And it's not "C has the necessary mind share". Man, these people make tons of scripting and DDL languages to get their game done. If they knew a better (more practical) approach than using C in the critical places, they would do it.
Yes, you get buffer overflows and memory management bugs. Much more so if you are a beginner, but also if you're very experienced. But people just haven't found a practical alternative.
How would you even be able to define what a valid buffer region is, if so many functions are basically custom allocators, declaring a certain sub-slice of the buffer they were given (taking pointers / indices), and handing it to the next function? It's just not practically possible to make a formalism that guards against buffer overflows here. You could make a "slice"/"buffer" datatype, but that would just just increase the line noise and keep the bugs at the countless locations of wrapping/unwrapping.
> There is a reason most kernels are developed in C, most databases are developed in C, most (AAA) games are developed in C. And it's not "C has the necessary mind share".
The only thing that is still pretty much all C is kernels. Game engines are usually C++ (a much safer language). Databases are written in all sorts of languages. Sure the big old ones are C or C++, but that's because they are old!
> But people just haven't found a practical alternative.
They have. Garbage collection works in a lot of cases. Rust has its lifetime & borrowing system. C++ has proper smart pointers (finally).
> It's just not practically possible to make a formalism that guards against buffer overflows here. You could make a "slice"/"buffer" datatype, but that would just just increase the line noise and keep the bugs at the countless locations of wrapping/unwrapping.
I take it you haven't used Go. It has slices. They work fine. No buffer overflows.
If you carefully read my post, you noticed C here means actually C(++). And a great majority of developers from these domains actually write C++ like this: C(++).
Rust, Go, Swift. Show me the serious kernels, databases, AAA games. Rust hasn't even descended the hype train.
Garbage collection is a bit like exceptions. It works for some cases, but not for serious engineering of long-lived applications. If you have ever written a Java application with millions of little objects, you know what I mean. For starters, we can not even afford to pay for the extra memory overhead that comes with each object. For the advanced, at some point we want the application to do exactly that: advance, instead of doing GC only.
> I take it you haven't used Go. It has slices. They work fine. No buffer overflows.
These slices all come with the (huge) overhead of adding a reference to the original runtime object, and on top of that each array access is checked. Correct?
> but not for serious engineering of long-lived applications
Oh come on. Does Lucene (or Solr or Elasticsearch built on top of it) not qualify as serious engineering? Elasticsearch is quite successful, and is indeed intended to be used a long-lived application!
Does this mean that the likes of Lucene don't run into GC issues? Of course not. I've certainly diagnosed problems in Elasticsearch related to GC (which, more often then not, is a symptom of something else going wrong), but saying it's not qualified for "serious engineering" is just patently ridiculous.
And that's only one example. There are loads more!
> These slices all come with the (huge) overhead of adding a reference to the original runtime object
Yes I knew I should not pull the "serious engineering" card going in... But there I go, giving a mostly clueless answer to a high-profile HN user :-)
I don't know elasticsearch, but if this is something like a database where millions of objects are tracked (like in an RDBMS, or in a 3D Game if coded by an unexperienced coder who likes to isolate everything down to the Vertex or scalar level into "objects"), then I would assume at least one of the following applies
- The objects in the datastore are not represented as individual runtime object after all
- The GC for objects in the datastore is highly tuned (GC only done manually, at certain points),
and the memory space overhead of having individual DB objects represented by runtime objects
is just accepted.
I mean, I did finish said Java application, but I got good performance from it only after transforming it into an unreadable mess based on SOA's of int[] (which means unboxed integers, not objects) and lots of boilerplate code. Would have been easier to do in C, hands down (language was not my own choice).
> and object/GC overhead? It's GC tracked objects after all, right? (again, I admit to knowing next to nothing about Go's runtime)
Go has value semantics. So when you have a `[]T` ("slice of T"), then what you have is 24 bytes on the stack consisting of the aforementioned `SliceHeader` type. So there's no extra indirection there, but there might be a write barrier lurking somewhere. :-)
> I don't know elasticsearch, but if this is something like a database where millions of objects are tracked
Elasticsearch is built on top of Lucene, which is a search engine library, which is itself a form of database. I don't think there's any inherent requirement that a database needs to have millions of objects in memory at any given point in time. There are various things you can ask Elasticsearch to do that will invariably cause many objects to be loaded into memory; and usually this ends up being a problem that you need to fix. It exposes itself as "OMG Elasticsearch is spending all its time in GC," but the GC isn't the problem. The allocation is.
In normal operation, Elasticsearch isn't going to load a millions of objects into memory. Instead, its going to read objects that it needs from disk, and of course, the on disk data structures are cleverly designed (like any database). This in turn relies heavily on the operating system's page cache!