Hacker Newsnew | past | comments | ask | show | jobs | submit | simonask's commentslogin

Why do you mention lifetimes here? They are exclusively a compile-time pointer annotation, they have no runtime behavior, thus no overhead.

Dynamic dispatch in general is much, much faster than many people’s intuition seems to indicate. Your function doesn’t have to be going much at all for the difference to become irrelevant. Where it matters is for inlining.

Dynamic dispatch in Rust is expected to be very slightly faster than in C++ (due to one fewer indirections, because Rust uses fat pointers instead of an object prefix).


Rust is basically in the same place as C++, i.e. provenance rules are currently ad-hoc/conventional, meaning that pointer tagging is a grey area.

Nope. Rust stabilized strict provenance over a year ago. Some details about aliasing aren't tied down, but so long as you can obey the strict provenance rules you're golden today in Rust to hide flags in pointers etc.

https://blog.rust-lang.org/2025/01/09/Rust-1.84.0/#strict-pr...


Oh damn, I missed this, thanks for the correction!

But LLVM's optimizations aren't sound and this affects Rust too.

Huh? Which optimizations?

LLVM is quite sure that, for example, two pointers to different objects are different. That's true even if in fact the objects both lived in the exact same spot on the stack (but at different times). That's... well it's not what Rust wants but it's not necessarily an unacceptable outcome and Rust could just ask for their addresses and compare those...

Except it turns out if we ask for their addresses, which are the same integer, LLVM remembers it believed the pointers were different and insists those are different too.

Until you call its bluff and do arithmetic on them. Then, in some cases, it snaps out of it and remembers that they're identical...

This is a compiler bug, but, apparently it's such a tricky bug to fix that I stopped even looking to see whether they'd fixed it after a few years... It affects C, C++, Rust, all of them as a result can be miscompiled by a compiler using LLVM [it's easiest to demonstrate this bug with Rust but it's the same in every language]. But as you've probably noticed, this doesn't have such an enormous impact that anybody stopped using LLVM.


"More safety than C" is an incredibly low bar. These are hygiene features, which is great, but Rust offers a paradigm shift. It's an entirely different ballpark.

negative. For example bounds checking is turned on by default in Zig, which prevents classes of overflow safety errors.

I don't think you've necessarily understood the scope and impact of the borrow checker. Bounds checking is just a sane default (hygiene), not a game changer.

I mean, I'm the author of this?

https://github.com/ityonemo/clr

so yes, I understand that it's important. It doesn't need to be in the compiler though? I think it's likely the case that you also don't need to have annotations littering the language.


I wish you good luck! Successive attempts to achieve similar levels of analysis without annotations have failed in the C++ space, but I look forward to reading your results.

yeah afaik you cant easily intercept c++ at a meaningful IR in the same way as you can zig. Zig's AIR is almost perfect for this kind of thing.

Rust uses LLVM because it's pretty great, not because you couldn't implement LLVM in Rust.

Maybe cranelift will eventually surpass LLVM, but there isn't currently much reason to push for that.


If anything, making cranelift an LLVM replacement would likely go counter to its stated goals of being a simple and fast code generator.

Thus Rust cannot really replace C++ when its reference toolchain depends on it.

If you define success for Rust as "everything is written in Rust", then Rust will never be successful. The project also doesn't pursue success in those terms, so it is like complaining about how bad a salmon is at climbing trees.

That is however how the Rust Evangelism Strike Force does it all the time, hence these kind of remarks I tend to do.

C++ is good for some things regardless of its warts due to ecosystem, and Rust is better in some other ones, like being much safer by default.

Both will have to coexist in decades to come, but we have this culture that doesn't accept matches that end in a draw, it is all about being in the right tribe.


So... Like, what? Do you agree that there is no technical reason for LLVM to be written in C++ over Rust?

Have you considered that you perhaps do more damage to the conversation by having it with this hypothetical strike force instead of the people that are actually involved in the conversation? Whose feelings are you trying to protect? What hypocrisy are you trying to expose? Is the strike force with us in the room right now?


I assert there is no reason to rewrite LLVM in Rust.

And I also assert that the speech that Rust is going to take over the C++, misses on that as long as Rust depends on LLVM for its existence.

Or ignoring that for the time being NVidia, Intel, AMD, XBox, PlayStation, Nintendo, CERN, Argonne National Laboratory and similar, hardly bother with Rust based software for what they do day to day.

They have employees on WG14, WG21, contribute to GCC/clang upstream, and so far have shown no interest in having Rust around on their SDKs or research papers.


> I assert there is no reason to rewrite LLVM in Rust.

Everybody agrees with that, though? Including the people writing rustc.

There's a space for a different thing that does codegen differently (e.g. Cranelift), but that's neither here nor there.

> And I also assert that the speech that Rust is going to take over the C++, misses on that as long as Rust depends on LLVM for its existence.

There's a huge difference between "Rust depends on LLVM because you couldn't write LLVM in Rust [so we still need C++]" and then "Rust depends on LLVM because LLVM is pretty good". The former is false, the latter is true. Rust is perfectly suited for writing LLVM's eventual replacement, but that's a massive undertaking with very little real value right now.

Rust is young and arguably incomplete for certain use cases, and it'll take a while to mature enough too meet all use cases of C++, but that will happen long before very large institutions are also able to migrate their very large C++ code bases and expertise. This is a multi-decade process.


The errno/GetLastError() pattern is a remnant from a time before threads were a thing. You could have multiple processes, but they were largely scheduled collaboratively (rather than preemptively).

In that world, things like global variables are perfectly fine. But then we got first preemptive scheduling and threads, then actual multicore CPUs, so global variables became really dangerous. Thread locals are the escape hatch that carried these patterns into the 21st century, for better or worse.


Indeed, and this change of philosophy shows up in the pthread (POSIX threads) API, which returns error values directly (as a negative integer) instead of setting the errno variable.

Not everybody is writing web apps.

You can also see it differently: If the language dictates a 4x increase in memory or CPU usage, you have set a much closer deadline before you need to upgrade the machine or rearchitect your code to become a distributed system by a factor 4 as well.

Previously, delivering a system (likely in C++) that consumed factor 4 fewer resources was an effort that cost developer time at a much higher factor, especially if you had uptime requirements. With Rust and similar low-overhead languages, the ratio changes drastically. It is much cheaper to deliver high-performance solutions that scale to the full capabilities of the hardware.


The unique geography of the Scandinavian peninsula combined with very low population density makes Sweden a bit less interesting in terms of achieving zero emissions in other geographies, and I doubt Swedes would be cool with expanding hydro and nuclear to the scale required by Germany.

But yeah, I mean, good job and all. The answer for the rest of the continent is going to be wind and solar in the medium term, and probably more nuclear in the long term.


Totally. Tech neutral state incentives is the way to go for sure, everybody has different environment and context to consider (same within Sweden). Southern Europe has very different opportunities (much better situation for solar for example).

Anyway, my comment was in response to the extreme comment (parent) about how all rich countries became rich using fossil fuels - implying that that's the more or less only way to transition from poor to rich. I think it's important to note that that's not necessarily the case. You don't need to destroy the environment to go from poor to rich, even though a lot of countries historically have done it that way (also noteworthy that they did it without knowing about the consequences for the environment).


You know what, I’ve heard people say this and thought “OK, maybe these other languages with GCs and huge runtimes really do something magical to make async a breeze”.

But then I actually tried both TypeScript and C#, and no. Writing correct async code in those languages is not any nicer at all. What the heck is “.ConfigureAwait(false)”? How fun do you really think debugging promise resolution is? Is it even possible to contain heap/GC pressure when every `await` allocates?

Phooey. Rust async is doing just fine.


Try golang, where they did the only sane thing: everything is async from the very beginning, no function colouring

I think practical experience with Go reveals that this choice being “the only sane thing” is highly debatable. It comes with huge drawbacks.

Of course it has drawbacks, everything does, but my practical experience has been hugely in favor of what golang is doing, at least, in terms of cognitive load and code simplicity. It is very much worth it in many, many cases

In .NET 11 C# async management moved to the runtime, mostly eliminating heap allocations and also bringing clean stack traces. You really only need to think about ConfigureAwait(false) when building shared libraries or dealing with UI frameworks (even there you mostly don't need it).

You speak in the past tense, but .NET 11 is not released yet at time of writing. Runtime-async is not the current reality.

True, it's in preview currently, but actually .NET is already very efficient with async today also - https://hez2010.github.io/async-runtimes-benchmarks-2024/ (.NET9 tested here).

Sure, it’s really fine for what it does, but it is not significantly easier to deal with than Rust async, and remains fundamentally unsuited in several scenarios where Rust async works really well.

How does it compare to Kotlin async? I find Kotlin generally hits a good balance across the board.

Unless you are writing GUI code, ConfigureAwait() shouldn't be on your source code.

I would also have liked to see some motivational examples, but I think the most interesting upside of an effect system is composability.

Rust is actually really unique among imperative languages in its general composability - things just compose really well across most language features.

The big missing pieces for composability are higher-kinded types (where you could be generic over Option, Result, etc.), and effects (where you could be generic over async-fn, const-fn, hypothetical nopanic-fn, etc.)

The former becomes obvious with the amount of interface duplication between types like Option and Result. The latter becomes obvious with the number of variants of certain functions that essentially do the same thing but with a different color.


Some uses of higher-kinded types (though not all of them) can be addressed by leveraging Generic Associated Types (GAT).

Part of the problems is that the "things just compose really well" point becomes gradually less and less applicable as you involve the lower-level features Rust must be concerned with. Abstractions start to become very leaky and it's not clear how best to patch things up without a large increase in complexity. A lot of foundational PL research is needed to address this sensibly.


Not so much composability on async Rust.

> Rust is actually really unique among imperative languages in its general composability

Can you compare it to some other imperative language? Because I really don't see anything particularly notable in Rust that would give it this property.


I’m mainly comparing to the progeny of C, where the biggest difference is the fact that almost everything is an expression in Rust.

No need for ternary operators. C# unsafe blocks can only appear as statements (so you cannot delegate from a safe to an unsafe constructor, e.g.). C++ cannot return from the middle of an expression.

A related aspect is the type system, which composes with expressions in really interesting ways, so things like constant array sizes can be inferred.


English has plenty of Unicode — claiming otherwise is such a cliché…

Unicode is a requirement everywhere human language is used, from Earth to the Boöotes Void.


I am talking about coded values, like Status = 'A', 'B' or 'C'

Taking double the space for this stuff is a waste of resources and nobody usually cares about extended characters here in English language systems at least they just want something more readable than integers when querying and debugging the data. End users will see longer descriptions joined from code tables or from app caches which can have unicode.


It's way better to just use a DBMS that supports enums. I know SQL server isn't one of those but I still don't store my coded values as strings.

How do you store them? Also enums are not user configurable normally. It would be a good feature to have them, but they don't work well in many cases.

Typical code tables with code, description and anything else needed for that value which the user can configure in the app.

Sure you can use integers instead of codes, now all your results look like 1, 2, 3, 4 for all your coded columns when trying to debug or write ad-hoc stuff. Also ints are not variable length so your wasting space for short codes and you have to know ahead time if its only going to be 1,2,4 or 8 bytes.


Enums are for non user-configurable values.

For configurable values, obviously you use a table. But those should have an auto-integer primary key and if you need the description, join for it.

Ints are by far more the efficient way to store and query these values -- the length of the string is stored as an int and variable length values really complicate storage and access. If you think strings save space or time that is not right.


>Enums are for non user-configurable values

In the systems I work with most coded values are user configurable.

>But those should have an auto-integer primary key and if you need the description, join for it.

Not ergonomic now when querying data or debugging things like postal state are 11 instead of 'NY'

select * from addresses where state = 11, no thanks.

Your whole results set becomes a bunch of ints that can be easily transposed causing silly errors. Of course I have seen systems that use guids to avoid collision, boy is that fun, just use varchar or char if your penny pinching and ok with fixed sizes.

>the length of the string is stored as an int

No it's stored as a smallint 2 bytes. So a single character code is 3 bytes rather than a 4 byte int. 2 chars is the same as an int. They do not complicate storage access in any meaningful way.

You could use smallint or tinyint for your primary key and I could use char(2) and char(1) and get readable codes if I wanted to really save space.


> They do not complicate storage access in any meaningful way.

Sure they do, because now your row / index is variable length rather than fixed length. Way more complicated. Even 3 bytes is way more complicated to deal with than 4 bytes.

> select * from addresses where state = 11, no thanks.

I will agree that isn't fun. Is it still the trade off I do make? Absolutely. And it's not really that big of a problem; I just do a join. It also helps prevent people from using codes instead of querying the database for the correct value -- what's the point of user-configuration of someone hard-codes 'NY' in a query or in the code.


>Sure they do, because now your row / index is variable length rather than fixed length. Way more complicated.

Come on its literally a 2 byte per column header in the row so it just sums the column lengths to get the offset, it does the same thing for fixed length except it gets the col length from the schema.

It's not much more complicated than a fixed length column only the column length is stored in row vs schema. I am not sure where you are getting this idea it way more complicated, nor the 3 vs 4 byte thing, the whole row is a variable length structure and designed as such, null values change the row length fixed or variable data type and have to be accounted for since a null takes up no space in the column data its only in the null bitmap.

> what's the point of user-configuration of someone hard-codes 'NY' in a query or in the code

Because it doesn't matter, 'NY' isn't changing just like 11 the int wouldn't change, but 'NY' is way easier to understand and catch mistakes with and search for code without hitting a bunch of nonsense and distinguish when 10 columns are all coded next to each other in a result set.

I prefer my rows to be a little more readable than 1234, 1, 11, 2, 15, 1 ,3 and the users do too.

I have had my fill of transposition bugs where someone accidentally uses the wrong int on a pk id from a different table and still gets a valid but random result that passes a foreign key check almost enough for me to want to use guid's for pk's almost. At least with the coded values it is easier to spot because even with single character code people tend to pick things that make sense for the column values you know 'P' for pending, 'C' for complete etc, vs 1 2 3 4 used over and over across every different column with an auto increment.


> Come on its literally...

You're the one saying a 2 character string is somehow a space savings. If we're going to split hairs that finely then you have to know that any row with a variable length string makes the entire row/index variable length and that is a net storage and performance loss. It's worse in every way than a simple integer. I will admit that it ultimately doesn't matter. But I'd also argue using an nvarchar in place of varchar for this also doesn't matter. It's not just premature optimization it's practically useless optimization.

> Because it doesn't matter, 'NY' isn't changing just like 11 the int wouldn't change, but 'NY

That's not what happens but what happens is that somebody renames New York to New Eburacum and now your code doesn't match the value and it just adds more confusion.

But I'll grant you that it's totally fine. It's even more fine if you don't use varchar and instead use char(x).


>You're the one saying a 2 character string is somehow a space savings. If we're going to split hairs that finely then you have to know that any row with a variable length string makes the entire row/index variable length and that is a net storage and performance loss.

The row is always variable lengths as a structure it has flags noting how many columns there are with values and if there is a variable length section or not, only rows with no variable length fields at all has no variable length section and that is a bit flag check in the header.

You are making a non argument, variable length fields can be a space savings over an int with single char codes which is very common, and do not impact performance in any meaningful way. Besides that one could use fixed length chars and still get the other benefits I mentioned while having the same exact space usage and processing as a fixed length ints.

>That's not what happens but what happens is that somebody renames New York to New Eburacum

Changing the descriptive meaning of an entry causes all sorts of problems and even more so if it is a int because it's completely opaque its much harder to see an issue in the system because everything is a bunch of ints that do not correlate in any way to their meaning.

Changing the description to something that has the same meaning worded differently is usually not an issue and still gives good debug visibility to the value. If you and your users consider New Eburacum synonymous with New York, then having the code stay 'NY' should not be an issue and still be obvious when querying the data.

Unless someone is using the short code in a user visible way and it has to be updated. State is a common one that does this and nobody is changing state names or codes because it is a common settled natural key.

In the rare situation this actually needed to be done then one can update existing data, this is a not an issue in practice. You have the be extremely cautious updating the description of a code because much data was entered under the previous description and the meaning that it carries, having the code have some human meaning makes it more obvious to maintainers this should be done with care, many times it would involve deprecating the old one and making a new one with a different code because they have different meanings, having a table instead of a enum allows other columns to have this metadata.

This is not the same issue as say using a SSN for a person ID.


Please take literally one course.

Do NOT use mnemonics as primary keys. It WILL bite you.


https://en.wikipedia.org/wiki/Natural_key you should have learned learn this in your courses.

Clam down, I am not suggesting using this for actual domain entity keys, these are used in place of enums and have their advantages. I have doing this a long time and it has not bit me, I have also seen many other system designed this way as well working just fine.

Using an incrementing surrogate key for say postal state code serves no purpose other than making things harder to use and debug. Most systems have many code values such as this and using surrogate key would lead to a bunch of overlapping hard to distinguish int data that leads to all sorts of issues.


The way to do enums in SQL (generally, not just MSSQL) is another table. It's better that they don't offer several ways to do the same thing.

Mostly agree separate tables can have multiple attributes besides a text description and can be exposed for modification to the application easily so users or administrators can add and modify codes.

A common extra attribute for a coded value is something for deprecation / soft delete, so that it can be marked as no longer valid for future data but existing data can remain with that code, also date ranges its valid for etc, also parent child code relationships.

Enums would be a good feature but they have a much more limited use case for static values you know ahead of time that will have no other attributes and values cannot be removed even if never used or old data migrated to new values.

Common real world codes like US postal state can take advantage of there being agreed upon codes such as 'NY' and 'New York'.


While I generally would prefer lookup tables, it's much easier to sell dev teams on "it looks and acts like a string - you don't have to change anything."

Those are all single byte characters in UTF-8.

We are talking nvarchar here, yes UTF-8 solves this issue completely and MSSQL supports it now days with varchar.

But nvarchar is UTF-16

No. Look closer.

Just to be pedantic, those characters are in 'ANSI'/CP1252 and would be fine in a varchar on many systems.

Not that I disagree — Win32/C#/Java/etc have 16-bit characters, your entire system is already 'paying the price', so weird to get frugal here.


My comment contains two glyphs that are not in CP1252.

Also less awkward to make it right the first time, instead of explaining why someone can’t type their name or an emoji

Specifically not talking about a name field

> Unicode is a requirement everywhere human language is used

Strange then how it was not a requirement for many, many years.


Oh, it was. It was fun being unable to type a euro sign or the name Seán without it being garbled. Neither were matched quotation marks, and arguably computer limitations killed off naïve and café too.

Don’t confuse people groaning and putting up with limitations as justifying those limitations.


In Portugal it always was, that is why we got to use eh for é, ah for á, he for è, c, for ç and many other tricks.

Shared by other European languages, like ou for ö in German, kalimera for καλημέρα, and so on all around the world in non-English speaking countries during the early days of computing.


Or rather, computers had inadequate support.

It was a mess back then though. Unicode fixed that.

I'm not convinced that Unicode fixed anything. I was kind of hoping, way back when, that everyone would adopt ASCII, as a step to a more united world. But things seem to have got more differentiated, and made things much more difficult.

The options were never ASCII or unicode though. Before unicode we had ASCII + lots of different incompatible encodings that relied on metadata to be properly rendered. That's what unicode fixed

Besides I like being able to put things like →, €, ∞ or ∆ into text. With ascii a lot of things that are nowadays trivial would need markup languages


For whom? Certainly not any of the humans trying to use the computer.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: