This comes up again and again in one form or the other, yet new languages still seem to be making the same mistake. Of all languages I've touched, Rust seems to be the only one that mostly circumvents this problem. Are there other good examples?
> Rust seems to be the only one that mostly circumvents this problem.
The Rust hype is getting ridiculous here. There are plenty of languages with non-nullable references as first-class, and optionals for the nullable case.
(...And I say this as a Rust fan myself, for what it's worth.)
Imho, Rust is an awkward language because it positions itself as a systems language but it makes low-level stuff more difficult (there's even a book teaching how to implement doubly linked lists in Rust [1]), hence prone to mistakes. At the same time, people are using Rust to build non-systems programs, where other languages would be more appropriate (e.g. those with garbage collectors). I don't think it is a good idea that Rust is promoted as the language that will rule them all; in my opinion, it is still a research language.
Linus Torvalds said the following about Rust [2]:
[What do you think of the projects currently underway to develop OS kernels in languages like Rust (touted for having built-in safeties that C does not)?]
> That's not a new phenomenon at all. We've had the system people who used Modula-2 or Ada, and I have to say Rust looks a lot better than either of those two disasters.
> I'm not convinced about Rust for an OS kernel (there's a lot more to system programming than the kernel, though), but at the same time there is no question that C has a lot of limitations.
> there's even a book teaching how to implement doubly linked lists in Rust [1]
Doubly-linked lists are an awkward example because the "safety" of a doubly-linked list as a data structure involves fairly complex invariants that Rust can't even keep track of at this point, much less check independently. These things are exactly why the unsafe{} escape-hatch exists and is actively supported. But just looking at the amount of unsafe code in common Rust projects should suffice to figure out that this is not the common case, at all.
> At the same time, people are using Rust to build non-systems programs, where other languages would be more appropriate (e.g. those with garbage collectors).
Garbage collectors are good for one thing, and one thing only: keeping track of complex, spaghetti-like reference graphs where cycles, etc. can arise, perhaps even as a side effect of, say, implementing some concurrency-related pattern. Everything else is most likely better dealt with by a Rust-like system with optional support for reference counted data.
That's without even mentioning the other advantages that a Rust-like ownership system provides over a GC-only language. See e.g. https://llogiq.github.io/2020/01/10/rustvsgc.html this recent post for some nice examples.
> Doubly-linked lists are an awkward example because the "safety" of a doubly-linked list as a data structure involves fairly complex invariants that Rust can't even keep track of at this point.
Perhaps it's just me, but I'd like to assume that my language does not treat any algorithm found in a basic algorithms course (e.g. Sedgewick) as awkward.
Eh, I have no problem with the idea that data structures that work in C are awkward in different paradigms. Many data structures are awkward in functional programming. Lots of things are awkward in C that are easier in other languages.
How times have you used doubly liked lists outside a CS101 programming exercise? Even if you did, it’d usually be trivial to implement an array index version or just use unsafe. Basically it seems you give up almost nothing for memory safety.
For sure, you can avoid awkwardness by not statically verifying memory usage & invariants (c++, etc) or using a GC'd language. Rust's ownership and borrowing rules are limited, but simple enough for someone to internalize them quickly.
There's a pretty vast difference between human simple and computer simple. Rust requires that you prove memory safety, or use unsafe. That's a different problem than just informally ensuring invariants are met.
You could probably pull in more advanced type theory research for more nuanced ownership, but I'd bet the language would be harder to understand overall (Haskell disease).
I’m a rust fan, but garbage collection is about removing having to think about memory management from the developer almost entirely, not about performance.
I don't do the sort of programming recently where it matters, but when I read the debates on GC on HN, I think, why not a language where there is a GC, but it is "cooperatively scheduled" - you explicitly invoke it for a fixed amount of time. Wouldn't that be the best of both worlds?
Python allows you to do something with memory that Rust has made it a priority to be more concerned with, and that’s sharing.
C also has easily shared memory, much like Python. Point being that Rust wants to make sure that your references are safe to share, whereas Python wants you to share as much as possible and makes it safe by not allowing multiple threads to interact with it.
These are different trade offs, but Rust does allow you to forget about memory management in the same way Python does, but forces you to think how it’s being shared.
That’s the added cost over Python and the extra thought that goes into using the language.
No, having to think about value and move semantics is extra overhead you take on. It's better when the compiler can help you catch this, like in Rust, but it still forces you to structure your program a certain way and to constantly think about incidental details like ownership.
Perhaps, but at pretty severe cost. Your heap must be structured in a way that the tracing routine can make sense of (and the consequences of this involve considerable waste and inefficiency in practice - lots and lots of gratuitous pointer chasing), and the compacting step itself involves a lot of traffic on the memory bus that trashes your caches and hogs precious memory bandwidth.
Forget it. Obligate GC is a terrible idea unless you really, really, really know what you're doing.
Rust is a force for good but I think Andrei Alexandrescu was right when he said Rust feels like it "skipped leg day" (in the sense that it has its party piece and not much else) - from the perspective of the arch metaprogrammer himself at least.
Rust is obviously good for safety but for everything else (to me at least) it seems unidiomatic and ugly, admittedlty I've never really sunk my teeth into it (I've read a fair amount into the theory behind the safety features but never done a proper project)
> ...Andrei Alexandrescu was right when he said Rust feels like it "skipped leg day" (in the sense that it has its party piece and not much else) - from the perspective of the arch metaprogrammer himself at least.
Mind if I ask what that means? It seems like an interesting observation, but there are a couple of bits of terminology I don’t understand, like "leg day" and "party piece".
Perhaps this link would explain it more clearly: [1].
Leg-day is bodybuilding terminology, and refers to the day of the week when the bodybuilder is supposed to be training the leg muscles. According to the meme, nobody wants to train the legs because they show the least.
> nobody wants to train the legs because they show the least.
It reminds me of the story about how Google drops products because no one wants to maintain an existing product. That would not show "impact", not like launching a new product would, and impact is how you get raises and promotions.
If we're limiting ourselves only to new languages, then nulls are statically excluded not only by Kotlin and Apple’s imitation of it, Swift, but also by F#, Agda, Idris, Elm, and (sort of) Scala. But the zozbot didn't seem to be talking only about new languages, so Haskell, Miranda, Clean, ML, SML, Caml, Caml-Light, and OCaml are also fair game. (It wouldn't be hard to list another dozen in that vein.) Moreover I think you could sort of make a case for languages like Prolog and Aardappel where you don't have a static type system at all, much less one that could potentially rule out nils, but in which the consequences of an unexpected nil can be much less severe than in traditional imperative and functional languages like Java, Lua, Python, Clojure, Smalltalk, or Erlang, which more or less need to crash or viralize the nil in those cases.
I've found the consequences of a nil type are less severe in dynamic languages, where all variables have the Any type, since nil is just one of the options one needs to account for.
Static languages where everything is nullable are reneging on the promise; you say something is a String but that just means Option<String>, and it saps a lot of the reasoning power which static typing should give.
Even in dynamic languages, the consequences can be pretty bad. For example, I've seen lots of Ruby bugs where things end up being unexpectedly `nil`, but I haven't seen as many Python bugs where things end up being unexpectedly `None`.
How does this happen? Well, in Ruby it's a lot more normal to just return nil. For example, consider the following code snippet:
[][0]
In both languages you are trying to examine the zeroth element of an empty array (or list, as Python calls it). In Ruby this evaluates to nil. Python throws an IndexError. So in Python, if you have a bug where you address an array with an invalid index, it manifests as an error in how you're indexing the list. Ruby silently returns nil, and the only actual error backtraces you see are when you actually try to call a method on this nil later on, which might not be anywhere near where your program messed up the array indexing.
That seems (although I don't have experience with either language) straightforwardly correct treatment in Python and wrong in Ruby, and the problem seemingly should be attributed to [lack of] range checking, not nil/null.
Sure; in this case, range checking provides an alternative behavior to generating a null reference. But Ruby has other places where it generates null references more promiscuously than Python. Java does too. If you take every use case that could generate a null reference and instead behave differently in that situation you’ve eliminated null references, and Python has largely done so despite having a None type.
> Rust seems to be the only one that mostly circumvents this problem. Are there other good examples?
Swift, Kotlin, and of course older languages of a functional bend like MLs, Haskell, Idris, Scala, …
Some are also attempting to move away from nullable references (e.g. C#), though that is obviously a difficult task to perform without extremely severe disruptions.
Scala happily accepts null as it is the bottom type for AnyRef and needed for jvm compatibility. Kotlin has a compiler check that enforces it, Scala does not.
I really love(d) Scala for introducing me to the whole idea of Optionals.
I wish for the life of me I felt like I could approach Scala at a time when it wasn't going through huge flux (I have shitty luck). I spent a good amount of time pre-version 2.10 :( and then recently went to have a look but saw Dotty (version 3.0?) coming by the end of 2020 and I was like "well, FML, time to wait a few more years and try again."
Anyone have any tips for using the Scala ecosystem effectively these days? Should I just wait for 3.0? Is it going to be a long winding road of breaking changes until a "3.11" version?
Is there a good resource for what folks are using it for these days? It seems like all the projects I used to know are ghostly on Github (but that could also be the fact it has been quite a few years, heh). Or do most folks just pony-up and use plain ol' Java libraries while writing their application/business logic in Scala?
Functional programming languages have been doing it for ages. Most "newer" statically typed languages also have it (Swift, Kotlin, Rust) by default. And old languages had it bolted on (C# 8, Java 8, C++ 17).
I think at this point basically everyone has realized null by default is a terrible idea.
> And old languages had it bolted on (C# 8, Java 8, C++ 17).
C#: actually true, you can switch over to non-nullable reference types
Java 8: meeeh, it provides an Optional but all references are still nullable, including references to Optional. There are also @Nullable and @NotNull annotations but they're also meh, plus some checkers handle them oddly[0]
C++17: you can deref' an std::optional, it's completely legal, and it's an UB if the optional is empty. Despite its name, std::optional is not a type-safety feature, its goal is not to provide for "nullable references" (that's a pointer), it's to provide a stack-allocated smart pointer (rather than have to allocate with unique_ptr for instance).
The reason they come up again and again is that it's hard to design an imperative language without them (try, assuming you want to provide generic user-defined data structures that allow for cycles).
As a result, calling them a "mistake" is reasonably dishonest, as it implies there was an obvious, better alternative.
Can you give some more details on what the design problem is here?
It seems to me that nullable references are isomorphic to having an option type with non-nullable references, but prevent accidental unchecked dereference. What are some of the difficulties that you'd expect to come up if you took an imperative language with nullable references and replaced them with options of non-nullable references?
I don't consider 'option' types to have interesting semantic differences with nullable types. YMMV.
But beyond that, the absence of nullable references (really, a valid default value for every type) is a problem for record/object/struct initialisation - you either have to provide all values at allocation time, or attempt to statically check that the object is fully initialised before any use - Java has rules to that effect for 'final' fields, and they are both broken and annoying (less broken rules would likely just be more annoying).
The difference is that you can't accidentally use an option as a pointer without checking it first, and when your APIs specify a non-nullable pointer you can rely on the callers to have checked for null.
When you're reading or writing a function that accepts a non-nullable reference, you never have to worry about whether the argument is null or not. It's easier to get right, constrains the scope of certain types of errors.
If you get things wrong, and unwrap the option without checking, you get an assert failure at that location, rather than potentially much later on when the pointer is used.
The whole point is that Option<&Foo> replaces nullable &Foo, so your record/object/struct member is Option<&Foo> and the default value for it is None. Option<&Foo> even has the same runtime representation for nullable &Foo, as Option<&Foo> uses NULL to represent None.
It's just a different way of representing nullable references, but with semantics that make it easier to track null-checked vs nullable references, impossible to accidentally get it wrong and derefence a nullable pointer you mistakenly assumed was already checked, and better errors when you do make mistakes.
While I totally agree with everything you’re saying, I think they are right about it being annoying to initialize structs/records when all fields must be defined upfront. For one, it becomes harder to incrementally build a record in generic way. And if you decide to make a bunch of fields optional, then that optionality is carried with it forever, long after it’s obvious that the data exists for that field. Those are legitimately annoying things to deal with.
To avoid that annoyance, you almost have to rethink the problem. You can’t do it the imperative way, at least not without all that pain. Instead, if you don’t yet have the data, you should simply assign the field with a function call or an expression which gets that data for you. In other words, the record initialization should be pushed to a higher level of the call graph. If you do that, then every record initialization is complete.
Other solutions are more language-specific. TypeScript has implicit structural typing, so incremental construction is pretty easy. You just can’t try to tell the compiler that it belongs to the type you’re constructing, unless it actually does include all the necessary data.
In OCaml, you can define constructor functions which take all the data as named parameters. Since function currying is part of the language, you can just partially apply that function to each new piece of data, as you incrementally accumulate it. Then you finally initialize the record when the function is fully applied.
Suffice it to say that there are plenty of solutions to this problem.
I assume two reasons, efficiency and because an efficient implementation of mutable state would have the same problem.
Right now, a single sentinel value makes a pointer null or not null (0x0 is null, everything else is not null). This is exactly how you'd implement a stricter type, like "Maybe". Encoded as a 64-bit integer, "Nothing" would be represented as 0x00000000 and "Just foo" would be represented as 0xfoo. No object may be stored at the sentinel value, 0x00000000. Exactly the same as what we have now, and provides no assurances that 0xfoo is actually a valid object.
Meanwhile, Haskell which "doesn't have null" crashes for exactly the same reason your non-Haskell program crashes with a null pointer exception:
f :: Num a => Maybe a -> Maybe a
f (Just x) = Just (x + 41)
This blows up at runtime when you call f Nothing, because f Nothing is defined as "bottom", which crashes the program when evaluated.
It's exactly the same as langages with null pointers:
And the solution is the same, your linter or whatever has to tell you "hey maybe you should implement the Nothing case" or "hey maybe you should check the null pointer".
Where I'm going with this is that you need to develop entirely new datatypes and have an even stricter type system than Haskell. Maybe Rust is doing this, but it's hard. We all know null is a problem, but calling null something else doesn't make the problems go away.
> It's exactly the same as langages with null pointers:
Four huge differences:
1. You don’t need to pass around ‘Maybe a’ everywhere. If null isn’t expected as a possible value (which usually it isn’t), you just pass around ‘a’, and when you do use ‘Maybe’ it actually means something.
2. The Haskell compiler can, and does (with -Wall), tell you that your pattern match is non-exhaustive. You don’t need a separate “linter or whatever”. This is possible because the needed information is present in the type system, and doesn’t need to be recovered with a complicated and incomplete static analysis pass.
3. If you do this anyway, the error is thrown at exactly the point where ‘Maybe a’ is pattern-matched, not at some random point several function calls later where your null has already been coerced into an ‘a’.
4. This program is defined to throw an error; it’s not undefined behavior like in C that could result in something weird and unpredictable happening later (or earlier!).
Also, Rust optimizes away the tag bit of ‘Option’ under common circumstances; for example, ‘None: Option<&T>’ (an optional reference to ‘T’) is represented internally as just a null pointer, which is safe because ‘&T’ cannot be null.
> You don’t need to pass around ‘Maybe a’ everywhere.
You don't need to pass pointers around everywhere. Languages with null still have value types that cannot be null.
> You don’t need a separate “linter or whatever”.
Optional compiler flags count as "whatever" to me.
> it’s not undefined behavior like in C that could result in something weird and unpredictable happening later (or earlier!)
C++ doesn't define this, but the OS does (and even has help from the CPU).
Anyway, my TL;DR is that it's easy to have a slow program that passes everything by value, or east to have a fast program that uses pointers or references. Removing the special case of null is meaningless, because you can still have a pointer to 0x1 which is just as bad as 0x0, probably. This goes back to my original answer to the question "why don't more languages get rid of null" which was "it's harder than it looks." I think I'm right about that. If it were easy, everyone would be doing it.
> Languages with null still have value types that cannot be null.
Not all languages.
> C++ doesn't define this, but the OS does (and even has help from the CPU).
That's not how it works anymore, because C / C++ front-ends interacting with the optimizers are yielding too "optimized" results. See the classic https://t.co/mGmNEQidBT
That's not the same thing as a null pointer because Nothing isn't allowed in place of e.g. integers, strings, etc. like in Java. What you're doing is defining a non-total function. Haskell, per default, doesn't perform exhaustivity checks when pattern matching, but you can enable that via a compiler flag - then it won't let you compile your example. Ocaml, for example, does that by default.
This missed the point. The point of not that you can forget to check the null case. The point is that you can express that sometimes there's no null case.
The "no null" case in traditional languages is just "int" instead of "*int". All values inside an "int" are valid integers.
Certainly it's problematic to use the same language primitive to mean "a pointer" and "this might be empty", but it's what people use them for in every language that has pointers (that I've used anyway).