Algebraic Data Types in Elixir

ashton314 · on June 1, 2022

Nice article.

I must say though, I hate dialyzer. Did I get it wired up at my company? Oh yeah. I made it part of our CI pipeline in my first few weeks. My team and I have liked the help it has given us. But where it falls down is when it’s errors are stupidly obtuse or when it doesn’t catch errors. By construction it is an unsound check: some errant programs will not be caught because Elixir (and Erlang) make it really hard to infer types. The creators wanted something that could check existing Erlang code, [1] so they had to sacrifice soundness for completeness. [2] So often I really want soundness checks, and I have spent many an hour chasing down stupid little type errors that dialyzer doesn’t tell me how to find. So that’s the hate. Overall, though, it does catch a lot of stupid mistakes, so I still appreciate its efforts. :)

[1]: Lindahl and Sagonas, “Practical Type Inference Based on Success Typings.”

[2]: I have a write up on the difference between some of these terms here: https://lambdaland.org/posts/2022-03-02_sound_complete_and_d...

bmitc · on June 1, 2022

I was recently having trouble with it on a project I'm working on along with a book. I am still a bit stumped because I couldn't make it fail. I started to get suspicious when I knew a spec was incorrect (I just hadn't updated a type yet), but yet Dialyzer was happy. I even started to worry it wasn't configured properly or there was a state problem, but deleting the `_build` directory didn't help. Still not sure. I've used it successfully in another personal project.

Do you know of a good reference that discusses what types of problems Dialyzer is good for and what problems it isn't good for?

ashton314 · on June 1, 2022

Woof. I feel your pain. That paper by Lindahl and Sagonas is pretty decent at exploring some of the theoretical limitations of the system. (DM me on Matrix if you can’t find it: I’m @ashton314:lambdaland.org Email is also in my resume which is on my blog; see bio)

Beyond that… er no, unfortunately. I’ve done many a web search trying to figure out some funky behaviour. :-P

pmontra · on June 1, 2022

I'm copy pasting my usual comment about types in Ruby: if I'd want static types I'd be using one of the many static typed languages out there. One among the many reasons I like Ruby and Elixir is that they are dynamically typed.

I'm happy to see that the official type annotation system for Ruby is basically hiding them under the carpet by writing them into a separate optional file where they hopefully go to die in the dark. The @spec line in Elixir is just not DRY and inconvenient. Sometimes you fail to match it to the actual function definition because typos happen.

I add an anecdote from the largest Elixir project I've been working on. One of the original developers added @specs to all the modules he developed. The other two of us did that too for many reasons. We're all senior developers but he's more senior and we'd like to see how it turned out. That developer left the project pretty soon, with some appearances now and then. New developers joined the project, mostly junior. After five years no new code has @specs AFAIK.

I remember a couple of cases where static typing would have spared us a half hour of debugging. No type declaration saved us hours of typing and thinking though. I can't think about a single test that we wouldn't have to write if we had static typing because we're testing for functionality, not for input and output types.

Finally, pattern matching in function definition is a great substitute for static typing because it limits the shape of the data that code can work on. There is no way not to have two Person arguments in function with a declaration of

  def marry(%Person{name: name1}, %Person{name: name2}) do
    IO.puts("#{name1} and #{name2} are married")
  end

You can't marry a dog and a cat there, or an int and a string.

nagasadhu · on June 1, 2022

> if I'd want static types I'd be using one of the many static typed languages out there.

Some of us want to have the concurrency and resilience of the BEAM with ADTs for data modelling and other static type advantages.

Caramel, gleam and purerl are worthy efforts in this direction. A more production ready half-ass alternative is to use elixir with whatever type advantages you can get.

pmontra · on June 1, 2022

In an ideal world you would be using a static typed language on the BEAM. Maybe there are some. I'm not saying that you should look for them or create one, because there are obvious advantages in using a language with a large community and creating a new community is a years long process, apart from the technical complexity of designing and implementing a language. You and I are probably paid for something else.

conradfr · on June 1, 2022

There's https://gleam.run/

nagasadhu · on June 2, 2022

>obvious advantages in using a language with a large community and creating a new community is a years long process, apart from the technical complexity of designing and implementing a language

All projects including elixir started as one man's itch and turned to large communities. It has to start somewhere... Much of what you and I use today wouldn't exist if people only worked what they were paid to do.

throwawaymaths · on June 1, 2022

I am pessimistic on all three of those projects because they are applying h-m (or similar) type systems to the BEAM. The BEAM needs it's own type system. Dialyzer is basically a half-assed project, it's so frustratingly close to what is needed (subtractive types, and better support for maps and multipart function headers)

ashton314 · on June 1, 2022

> The @spec line in Elixir is just not DRY and inconvenient.

I disagree. The spec is part of the documentation of the function. I find code with specs is 10x easier to read and safely modify/refactor when you better understand the function contract.

See also: Typed Racket, Haskell where the types are often specified just above the actual implementation. If you don’t like it, I doubt I’ll change your mind. :) I don’t think it’s that bad though.

There are plenty of times when you can make the specs really expressive, much more than just pattern matching (which is great, especially when combined with guards) eg: (on phone, might be a like off on syntax)

    @spec find_if([t1()], (t1() -> boolean())) :: t1() | nil when t1: any()

Function types are opaque to pattern matching and you can’t express return types in patter matching either.

pmontra · on June 1, 2022

I had to look at that spec for a while but I got it. Of course I'm out of training. More than that, there is the self selection bias: the main reason for I'm using dynamically typed languages is that they don't have type declarations so I'm never happy to see static typing get close to them. BTW I definitely lean towards strong typing, so 1 + "1" is an error (Ruby) and not 11 or "11" (Perl and JS.)

ashton314 · on June 1, 2022

Well, Perl actually would return 2 in that case because the + turns it’s arguments into numbers… ;-)

Yeah, I’m all for strong typing too.

Most specs aren’t as complex as the one I just threw up there. What I really like specs for is to tell me whether or not the function is meant to return a maybe tuple ({:ok, val} | {:error, “err”}) or if it will throw an error. Sometimes that’s not obvious from bigger functions, and it makes refactoring harder.

My ideal would be something like Typed Racket’s gradual typing, which would let you get static types around eg core modules that see a lot of use and where you use the types to check your assumptions but non-core modules can be untyped.

conradfr · on June 1, 2022

> @spec line in Elixir is just not DRY and inconvenient.

I'm guilty of regularly copy/pasting @spec from another function and forgetting to change the name, it's very inconvenient.

I wonder how much work it would be to allow type hints and for Elixir to transform them to specs (and maybe guards) at compile time.

Having made the transition in PHP years ago it always feels a bit archaic to write specs again.

Especially as you kind of doing it already when pattern matching a struct parameter.

bmitc · on June 1, 2022

Pattern matching in a function definition is a great way to get good runtime error reports (I'm thinking of Elixir here, not sure how Ruby does it), but it does not help the developer during development. I don't want to look at a function's definition to understand how to use and shouldn't have to. It totally breaks the point of having a function in the first place. When using a function, I want the name, the name of the arguments, and the typespec, whether it's in a statically or dynamically typed language. Anything more is cognitive overhead that is unnecessary and even likely to create improper use, two things that introduce bugs in a system.

In a dynamically typed language, I'd wager that 95% of the code doesn't need to be dynamically typed. In that, almost all of the time, you have code that could easily be done in a statically typed language and modeled well with custom types and type aliases like what Elixir provides with `@type`. The choice of using a dynamically typed language provides some freedom at the boundaries of programs. In functional languages like Elixir and Erlang, these happen when side effects take place like file I/O and also in message passing. The choice to use a dynamically typed language does make things a little easier here, but the tradeoff should be that you should think and program the rest of the system as if it's a statically typed language. Given that something like Elixir is not statically typed and especially does not have type inference, the tradeoff is that now you need add typespecs to your code. This is to help prevent type-related bugs and to help communicate the code to yourself and other developers.

> New developers joined the project, mostly junior. After five years no new code has @specs AFAIK.

I'm not sure what your point is here, but that sounds like a negative direction. Dynamically typed code without typespecs increases developer cognitive load, increases the likelihood of bugs, and increases the difficult of on-boarding new developers.

dmitriid · on June 1, 2022

> @spec line in Elixir is just not DRY and inconvenient.

This. So many times this. If specs could be inlined it would be so much less hassle (but might hurt readability because of pattern matching)

throwawaymaths · on June 1, 2022

You can use @spec as a public API declaration at the top of the module by bunching them together away from the functions themselves.

dmitriid · on June 1, 2022

I prefer to have my specs as close to the function as possible :)

throwawaymaths · on June 2, 2022

Open your mind.

brightball · on June 1, 2022

I share your perspective here. I use dialyzer on my Elixir projects but only to let it check what it can infer on its own without me having to add specs. It can find a ton on its own and I appreciate the automated set of eyes.

But I get so much from Elixir pattern matching without needing to bother forcing types everywhere that I have a hard time understanding why people feel like they need them so much?

bmitc · on June 1, 2022

> I have a hard time understanding why people feel like they need them so much?

How do you figure out types in a code base you didn't write or don't know? Because when you're working in a code you wrote or have worked on extensively, you have the typespecs in your head, but everyone else doesn't. And I'd even wager there's holes in what you have in your head and thus potentially in the code.

I've worked on Erlang and Elixir code bases, and without typespecs, you spend an inordinate amount of time learning and tracing what the expected types of arguments are and what return types are. That time and effort is completely saved if there are good typespecs. When I read a function, I do not want to spend anytime understanding the types. Typespecs are an absolutely necessary evil in a dynamically typed language. It's part of the tradeoff.

brightball · on June 1, 2022

I've spent 20 years working in dynamically typed languages and I've never needed to spend that much time understanding the types of functions.

90% of the time the types are simply an extension of the database and enforced strictly at the database layer while everything else is just data pass through anyway (in Ruby, Python, PHP, etc). You understand the database, you understand the application data.

With Elixir you get so much added clarity just from pattern matching and guards that you end up with a Strongly typed code base, just not Statically typed. For an application backed by a relational DB, strong typing is more than enough for almost every use case I've ever come across.

Now, if there's a lot of data work happening inside the application that's not a direct extension of a database then the value of static typing goes up significantly.

I really think it's just one of those areas where people come at problems differently. Some people love it and swear by it. For those people, I think the lack of static typing causes a real gap in their entire approach to programming which translates to frustration and seeing a lot of "this wouldn't have been a problem with static typing..." experiences.

For the people used to dynamic typing, static typing adds a lot of unnecessary complications and headaches without much benefit. Suddenly you have to ensure your types are in sync with your database at all times. You have to enforce them at many different locations in your code base. You've got to define a lot more structure just to translate arguments from web forms, APIs, JSON, etc. If all that overhead saves your bacon later, great...but for a lot of people used to dynamic typing the benefits just haven't been there to outweigh the negative tradeoffs.

bmitc · on June 1, 2022

I have yet to experience a dynamically typed code base in which missing types was helpful and in which you weren't required to trace multiple layers of functions to understand the argument and return types of the function you care about, even with pattern matching.

Your last paragraph is accurate regarding the tradeoffs of a statically typed language. However, it leaves off the tradeoffs of a dynamically typed language. In a statically typed language with type inference, in the main portion of the system, statically inferred types are a massive benefit and save a lot of heartache. When you call a function, you know its types. Yes, at the boundaries of a program such as I/O and sometimes message passing, things get sticky. Dynamically typed languages help at the boundaries because they loosen things up enough to be helpful. However, they throw out all the benefits in the parts of the system away from the boundaries. I view typespecs as a necessary tradeoff to accommodate static types missing from those parts of the system. In such parts in a dynamically typed language, one basically is coding like you would in a statically inferred typed language and is not taking advantage of dynamic types, so typespecs should be there. I'd argue that abuse of dynamic types away from boundaries should be considered very seriously, same as one would do with macros.

> 90% of the time the types are simply an extension of the database and enforced strictly at the database layer while everything else is just data pass through anyway (in Ruby, Python, PHP, etc). You understand the database, you understand the application data.

Case in point. In a large system, why do I or others need to understand the data at the database layer to develop something many layers above or away from that? There is almost always a transformation from a storage data structure to the in-application and domain data structures. Databases are for storage and state retention, not application logic and data structures. Your 90% of the time applies at the database interaction layer, but it is not the majority of the code.

> I really think it's just one of those areas where people come at problems differently. Some people love it and swear by it. For those people, I think the lack of static typing causes a real gap in their entire approach to programming which translates to frustration and seeing a lot of "this wouldn't have been a problem with static typing..." experiences.

You seem to put it on developers coming to dynamically typed languages from statically typed languages having a "gap" in their approach. When wanting to call a function, I do not see it as a gap in one's skills or experience in not wanting to have read the implementation of that function and the functions it calls. When calling a function, I want to know its name, its argument names and types, the return types, and documentation. I don't see why it's a gap in one's approach to want that. It's almost the first thing you learn when learning software development. I think dynamically typed developers are just used to the overhead of reading function definitions to understand how to call things or they are the ones who wrote or have significantly used the code base.

brightball · on June 2, 2022

> You seem to put it on developers coming to dynamically typed languages from statically typed languages having a "gap" in their approach.

Gap was a bad choice of words. I just meant a different approach.

I've worked with both dynamic and statically typed languages, as well as Elixir which I consider the right "in between" balance.

In the environments that I've worked in, the devs who were comfortable with dynamically typed languages were extremely productive. The ones that preferred static typing who tried to work in the dynamic languages regularly complained about the lack of static typing and were typically not as productive.

The in the static typing environments, you'd see the reverse. The devs who were more comfortable with dynamic typing were regularly frustrated by the steps they saw negatively impacting their productivity while the devs who preferred the static environment typically seemed happy and complaint free.

I really think it just boils down to how people think about problems. This is entirely anecdotal of course.

bmitc · on June 2, 2022

Interesting stuff to hear, but I still think we're not getting down to what I've referred to as it being a necessary thing and not just a preference. When developers are working inside a dynamically typed code base, they have types in their head, do they not? The runtime system certainly does. Why not just write them down? Why are people so afraid of or unwilling to write down stuff they know or think they know?

And again, I'd wager that "productive" people in dynamically typed languages are more comfortable than productive if one considers bugs introduced by assumptions of types and the unavoidable overhead in reading code implementations to understand call uses. I have seen countless bugs that would have been prevented or at least lessened by typespecs. If that's just a "statically typed perspective", what other process is catching these very real bugs and mistakes in the code? I have paired with people seemingly comfortable with not having typespecs and when asked questions like "how do you know ___?", the answer was usually "I don't" or "I think". Once we figure out the actual types and expected values, again, why not just right them down? Why throw out that knowledge?

The way that I view things is that statically inferred types are a huge boon to the majority of a program abut can be a real pain at I/O boundaries, message passing, and other wings and edges of the software design. With dynamic types, they are a major boon to these I/O boundaries, message passing, and other wings and edges, but they are a major pain point in the rest of the program. My perspective is that whatever type of language you are using, you need to accept but account for its deficiencies.

cercatrova · on June 1, 2022

See also, Witchcraft [0] (adds ADTs to Elixir in a subjectively better way than Dialyzer) and Gleam [1] (not Elixir but its own full blown language that uses BEAM but with much more of a Haskell-like flavor) which both offer algebraic data types in their own ways.

I used to do a lot of Python and heard of Elixir as a marrying of BEAM and dynamic typing, so I started learning it. However, as I used it more, I actually moved to TypeScript and more recently Rust after I found that I actually liked thinking in types, and I truly did miss them from Elixir.

[0] https://github.com/witchcrafters/witchcraft

[1] https://github.com/gleam-lang/gleam

losvedir · on June 1, 2022

This is a good overview on ADTs and how to write them in your typespecs. Unfortunately, it doesn't dive into how exactly they interact with dialyzer, which in my experience is not perfect.

Since dialyzer does "success typing", it will allow code that could work in the happy path, even if it has type issues, which is very limiting. You can make it stricter with `overspecs` and `underspecs`, which will flag if the inferred type is either stricter or looser than the given spec, and these are so close to what I want, but they don't quite work with sum types as in the article here.

The issue is dialyzer considers "too loose" or "too strict" on both function inputs and outputs, but really what you want is to warn only if the function returns more than the spec allows or accepts less than the spec allows. (This is, I believe, covariance and contravariance in types[0].) In other words, you want half of each of underspecs and overspecs.

Consider the example in the article of `@type direction :: east | north | west | south`. Then, it's perfectly valid for a function to return only a subset of those:

     @spec common_wind() :: direction
     def common_winds(), do: :east

But with `underspecs` this gets flagged. (This would be fine in Rust, for example. If you say a function returns an enum, you don't need it to return every variant of that enum.)

On the other hand, this is what you want on the input. If you specify `@spec foo(direction)`, then it better handle every variant of that enum.

The opposite problem is a function that can, in theory, handle more than specified. For example:

     @spec func_that_can_handle_null(direction | null) :: boolean
     def func_that_can_handle_null(arg), do: ...

     @spec my_func(direction) :: boolean
     def my_func(arg) do
       func_that_can_handle_null(arg)
     end

In this case, you want to express to callers of your function that `null` is not acceptable. However, incidentally from the implementation, it is (for now). In this case, dialyzer will infer that the function can handle `direction | null`, and see that your spec only has `direction`, and so the `underspecs` flag will now warn against it! But there's no problem here. The problem function inputs can have is the reverse: if you specify `direction | null` but then don't handle the `null`. That's the `overspecs` flag, but then that will mess you up on the first example!

OTP 25 introduced `missing_returns` and `extra_returns`, which is "half" of it, only checking the return side, so I'm thrilled we're finally going in the right direction. But until we have `missing_inputs` and `extra_inputs`, I will continue to think that dialyzer doesn't really support ADTs.

[0] https://docs.microsoft.com/en-us/dotnet/standard/generics/co...

scythmic_waves · on June 1, 2022

This was a very helpful explanation of why dialyzer yells at me sometimes and I can't figure out why. Is it possible to analyze a particular function with `underspecs` or `overspecs` without affecting the rest of the code?

throwaway932432 · on June 1, 2022

> We Elixir programmers usually don’t think in terms of sum types.

yikes

Elixir is one of many languages that are red flagged in my job filter.

I conditionally allow python where python >= 3.10, and ruby where sorbet is strictly enforced, )perhaps I'll make an exception for dialyzer)

I'm bullish on golang being in my filter for years to come..

Also I feel that people that care about ADTs wouldn't be using Elixir in the first place, so the evangelism feels wasted here.

ashton314 · on June 1, 2022

That’s just like… your opinion man.

Some of us who do care are also working in domains where we need really high concurrency, where Elixir frequently is the best tool, ecosystem, etc. for the job. Elixir has a great ecosystem and tooling; I’ve found it to be a joy to work in.

Dialyzer is a must for me; I wrote above that it’s a love/hate relationship, but it’s something. I’d enjoy a more strongly typed language to work with for sure.

aspett · on June 1, 2022

> We Elixir programmers usually don’t think in terms of sum types.

This person speaks.. for themselves only.

> perhaps I'll make an exception for dialyzer

If your opinion is to disregard Elixir, then dialyzer shouldn't change your mind. It's pretty ineffective and painful to use.

That said, I'm a big proponent of Elixir; it's a great language and makes me happy to use it.

throwawaymaths · on June 1, 2022

People who don't care about ADTs are often caring about other things, like UX, DX, OX, readability, maintainability, etc.