The problem is that user inferring doesn't scale. For small projects this is rea...

joshlemer · 2024-07-23T17:55:21 1721757321

But then ironically, the problem with relying on programming language type systems is that they don't scale (beyond the boundaries of a single process). As soon as you are crossing a wire you have to go back to runtime validations.

nightski · 2024-07-24T02:43:02 1721788982

Throwing out type safety simply because you have unverified input is a little silly. It's trivial in most statically typed languages to safely turn input into a concrete type (or an error if it is invalid) at the edge and meanwhile maintain type safety throughout the rest of the code.

TuringTest · 2024-07-23T18:29:28 1721759368

There's where good documentation and training pays off.

Some invariants that are too complex to express purely in code, requiring numerous separate low level bookkeeping functions throughout the system, can be concisely expressed as a high level idea in natural language (wether as comments or at an external technical document).

Make sure that your developers are exposed to those documents, throw in a few tests for the most obvious cases to ensure that newcomers are learning the invariants by heart, and if they're competent you should not have problems but in the most obscure cases (that could appear anyway even if you try to formalize everything in code).

yencabulator · 2024-07-23T22:10:43 1721772643

Typed constraints don't scale either.

I eagerly await your implementation of a type describing my tax return. And to implement a compiler for the new programming language you'll inevitably have to construct, we're gonna need two types, one expressing all valid source programs, and the other all valid programs for the target platform.

Oh, can we also get updates to that for next year's taxes, new platforms, and other such future business needs? Of course the current requirements have to still remain supported.

wredue · 2024-07-24T00:55:43 1721782543

There is no world where the added time here for the developers is with the trade off, and if you look through my history, you will see that I largely reject “dev time trade off” as a notion a company should ever seriously entertain as a significant burden.

mason55 · 2024-07-23T15:15:09 1721747709

Yeah - in theory they are the same thing. You can have a dependent type that's an array with a length of 5 or you can create a type called ArrayLengthFive, and in both cases the type checker ensures that you don't accidentally use an unbounded array.

But the difference is that with a dependent type, you get more guarantees (e.g. my ArrayLengthFive type could actually allow for arrays of length 20!). And the type checker forces you to prove the bounds (there could be a bug in my code when I create an ArrayLengthFive). And the dependent type allows other parts of the system to use the type info, whereas ArrayLengthFive is just an opaque type.

I do think there is room for both ways of working. In the string -> Email example, it's probably enough to parse your string and just call it an email. You don't need to try to encode all the rules about an email into the type itself. But there are certainly other cases where it is useful to encode the data constraints into the type so that you can do more operations downstream.

taeric · 2024-07-23T17:47:33 1721756853

Parsing a string into an email, of course, is already fraught with peril. Is it valid HTML? Or did you mean the email address? Is it a well formed email, or a validated address that you have received correspondence with? How long ago was it validated? Fun spin on the, "It's an older code, sir, but it checks out."

I've seen attempts at solving each of those issues using types. I am not even positive they aren't solvable.

zdragnar · 2024-07-23T18:53:11 1721760791

Validation is an event, with it's own discrete type separate from the address itself. This is no different than a physical address.

    123 Somewhere Lane
    Somewhereville, NY 12345

is a correctly formatted address but is almost certainly not one that physically exists.

Validation that it exists isn't solvable in the type system because, as I mentioned, it is an event. It is only true for the moment it was verified, and that may change at any point in the future, including immediately after verification.

taeric · 2024-07-23T19:36:13 1721763373

I think I get your point. I would add reconciliation to validation. In that sometimes you cannot validate something without doing it, and are instead storing a possibly out of band reconciliation of result.

I'm curious, though, in how this argument does not apply to many other properties people try and encode into the types? It is one thing if you are only building envelopes and payloads. And, I agree that that gets you a long way. But start building operations on top of the data, and things get a lot more tedious.

It would help to see examples that were not toy problems. Every example I have seen on this is not exactly leaving a good impression.

zdragnar · 2024-07-23T20:55:06 1721768106

There are things that can embed a bit of temporal logic into types, such as a linear type that can statically guarantee a file handler has been closed exactly once, or error.

For the most part, what I think people really want are branded types. F# has a nice example of unit, where a float can be branded Fahrenheit or Celsius or Kelvin, and functions can take one of those types as parameters.

You then have some function that can parse user input into a float and brand it one of those types. The compiler doesn't make any guarantees about the user input, only that the resulting branded value cannot be used where a different one is expected. Other good examples are degrees and radians, or imperial and metric, etc.

Depending on what you are doing, knowing at a type level that some number can never be negative (weight in pounds) can save you a lot of hassle. If one part of the system doesn't brand a value, and it feeds into another part of the system that is branded, you're stuck with extraneous runtime checks all over the place or a lot of manual unit tests. Instead, the compiler can point out exactly, and in every instance, where you assumed you had a validated value but instead did not.

taeric · 2024-07-23T21:13:17 1721769197

Some of these are why I would add reconciliation to the idea. Linear types, specifically. You can encode that you only send a message once. But, without doing reconciliation with the destination, you can not be sure that it was received.

I'm torn, as the examples from small programs make a ton of sense and I largely like them. I just cannot escape that every attempt I've seen at doing these in large programs is often fairly painful. I cannot claim they are more error prone than the ones that skip out on it. I can say they are far more scarce. Such that the main error condition appears to be to stall out on delivery.

bombela · 2024-07-23T17:36:01 1721756161

> In the string -> Email example, it's probably enough to parse your string and just call it an email. You don't need to try to encode all the rules about an email into the type itself.

There is also the in-between Rust approach. Start with the user input as a byte array. Pass it to a validation function, which returns it encapsulated within a new type.

    #[derive(Clone, Hash, Ord, Eq...)]
    struct Email(Box<str>);

    // validate UTF-8 + 
    fn validate(raw: &[u8]) -> Result<Email, EmailValidationError>;

What is annoying is the boilerplate required. You usually want your new type to behave like a (read-only) version of the original type, and you want some error type with various level of details.

You can macro your way out of most of the boilerplate, with the downside that it quickly becomes a domain specific language.

oasisaimlessly · 2024-07-23T17:44:17 1721756657

AKA the "parse, don't validate" approach [1].

1: https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-va...

bombela · 2024-07-23T19:04:01 1721761441

The salient excerpt is: "A parser is just a function that consumes less-structured input and produces more-structured output". In opposition to a validation function that merely raises an error.

    - validate(raw) -> void
    - parse(raw) -> Email | Error

Of course the type signature is what seals the deal at the end of the day. But I am going to follow this naming convention from now on.