let x = id id id id id id id id id id id id id id id id id id id id id id id id
See how long that takes (add ids if your machine can take it). To deduce a type for x, ghci will use 2^k type variables, if you have k ids. This is easily seen by induction:
If k = 1, then the type of x is a -> a. This takes 2 = 2^1 = 2^k variables.
If k > 1, it is of the form id f. Let t be the type of f. t has 2^{k - 1} variables, by induction. Thus the outermost id must take something of type t (this is 2^{k - 1} variables) and produce something of that same type t (this is another 2^{k - 1} variables), that is, it's of type t -> t. 2^{k - 1} + 2^{k - 1} = 2^k.
Note that this information is purely at compile time. Haskell isn't typed at runtime, this is all just to check types. At runtime, x is a no-op, as expected.
I used to be able to kill Scala compiler performance with just a a few identifiers. Well, Scala is trying to deal with nominal subtyping, which is well known to be quite hard (so the type inference algorithm isn't really H-M, and must rely on heuristics).
This was in 2006/7 when I was working on scalac, so the early 2.1/2.2 days. I seemed to often push the type system in places that were not well defined, mostly because my scala style was so different (very OO, but with heavy recursive use of type parameters). Unfortunately, I've forgotten what the cases were, but they were usually simple constructions.
That's very interesting! This seems to me like it might be a different kind of edge case though, since the type of x is still t -> t, even if it takes a long time for the type checker to arrive at this. From the little I got out of Mairson's paper, it seems that the proof he uses depends on the exponential size of the type itself, but I could be mistaken... either way I wonder how they're related?
Actually, these examples are similar to the first example in my post (under the first "pathological case" heading) and don't quite exhibit the worst case behavior since the type can be represented in linear space as a dag.
The title's a little misleading, as it suggests that there's something strange or even wrong about the type that OCaml or Haskell infer for the program. Of course the type that these systems infer for the program is correct, and the author does demonstrate an understanding of that. But that a small input to a system can produce a large output should be little surprise to any person that works with complex systems, especially computer systems.
The underlying mechanism that leads to the computational complexity in HM(x) systems--the process of generalizing types to schemas and instantiating schemas to types in order to support let-polymorphism--would make for an interesting discussion, but the author hasn't hit on that yet. Unfortunately, the only people that really understand that part of the algorithm are the people that have implemented the algorithm, and there aren't many of those people around. Strange but true. The algorithm that's taught and implemented in most undergraduate PL courses (PL:AI-based courses included) does not support let-polymorphism. If the author's reading, upgrade to ATTAPL[0] and start implementing.
Also, to answer the question "How to compile Turing machines to ML types", have a look at typo[1] for one way to do it in Haskell (by reducing it to lambda calculus (shameless plug)).
Thanks! Typo made it HN last summer, so here's a response to your question on motivation[0]. Also, the 'cat' thing has come up before. Here's my response to that[1] as well. But now that I'm rereading my response it doesn't seem clear what I mean. The README for the language demonstrates how you can use shell grouping as a poor man's module system. When you do that, you have to use 'cat' on single files in certain situations. So I stuck with 'cat' for the simple example so it's consistent with the more complex examples. If you take a look at the typo script[2] that wraps the compiler, you'll see that I know when to use redirects and when to use 'cat'. But regardless of all that, I think most modern machines can spare an extra process.
Thanks for the helpful reply. By the way, I didn't mean to suggest that you didn't know when to use `cat` vs. re-directs; I just thought it might have been an oversight. I'm sorry if it came across as rude.
f0 x = (x,x)
f1 x = f0 (f0 x)
f2 x = f1 (f1 x)
f3 x = f2 (f2 x)
f4 x = f3 (f3 x)
This is worse than O(c^n): the size of the last function's type is O(2^2^n).
(In spacemanaki's blog example, each new expression doubles the size of the previous one. Here, each new expression squares the size. f4 builds 2^2^4 = 65,536 copies of 'x', and f5 will crash your compiler if you define it).
This is a fantastic post. I'm passingly familiar with ML and have heard the term "let polymorphism" a number of times, but this is the first time I've seen it clearly explained.
> Why can’t we just allow polymorphic lambda-bound variables?
Two reasons. The first is that in general, type inference for higher-rank polymorphism is undecidable. The second is that even if it were decidable, it would be impractical - I think that if we have a program like this one:
let test(f) = (f(1), f(true))
the assumption that the programmer made an error is statistically much more likely than the assumption that `f` should have a polymorphic type. Another reason is that it's not entirely clear what kind of polymorphic type to infer for `f` - should it be `forall a. a -> a` or e.g. `forall a. a -> int`?
> What does this have to do with different “ranks” of polymorphism?
Higher-rank polymorphism means that polymorphic types can appear within other types; in a system with support for higher-rank polymorphic types, the parameter `f` in the function `test` above could be declared with type `forall a. a -> a`, the function `test` would have a type `(forall a. a -> a) -> int * bool`. Higher-rank polymorphism is formalized using System F, and there are a few implementations of (incomplete, but decidable) type inference for it - see e.g. Daan Leijen's research page [1] about it, or my experimental implementation [2] of one of his papers. Higher-rank types also have some limited support in OCaml and Haskell.
> How is let-polymorphism implemented? How do you implement it without just copying code around?
The "standard" implementation consists of two operations, generalization and instantiation. At let bindings, types are generalized by replacing all unbound type variables with polymorphic type variables (care must be taken not to generalize type variables that can be bound later, but that's a secondary issue here). Every time a variable with a polymorphic type is used, its type is instantiated, which means that all polymorphic type variables are replaced with fresh unbound type variables.
let f = fun x -> (x, x)
print f(1), f("boo")
In the example above, the type inferred for `fun x -> (x, x)` is `_a -> _a * _a`, where `_a` is an unbound (but not polymorphic) type variable. At let binding, this type is transformed into `forall a. a -> a * a` by replacing `_a` with a polymorphic type variable `a` (in most ML languages, the `forall` is implicit). Then, when `f` is used in `f(1)`, its type is instantiated into `_b -> _b * _b` and `_b` is unified with the type of `1`, `int`. When `f` is used in `f("boo")`, its type is instantiated into `_c -> _c * _c`, and `_c` is unified with type of `"boo"`, `string`. Since `_b` and `_c` are different variables, there is no error here (if `f` had a monomorphic type, `_b` and `_c` would in fact be the same, and this example would result in an error "cannot unify type int with type string").
> What’s the relationship between let enabling exponential function composition and the exponential time result?
So there is quite a bit of copying happening every time a variable with polymorphic type is used. I've never studied the worst-case scenarios of Hindley-Milner, but I imagine that it has to do with instantiation copying large polymorphic types around.
> Do implementations of Hindley-Milner actually represent types as dags and utilize structural sharing?
Yes; an unbound type variable is typically represented as a reference cell, which is assigned to the type the type variable is unified with. So, if we have
double : forall a. a -> a * a
double (1, 1, 1) : (int * int * int) * (int * int * int)
then the type of double is first instantiated, yielding `_b -> _b * _b`, and then the type of the parameter is unified with the type of the argument, `int * int * int`. At this point, the reference cell in the internal representation of `_b` is updated to point to `int * int * int`, which means that both `_b` in the result type actually point to the same representation of `int * int * int`.
Another reason is that it's not entirely clear what kind of polymorphic type to infer for `f`
How about forall a. a -> b ? If we make assumption that f is polymorphic, then nothing can be inferred about the return value, so it should also be polymorphic?
Let's instead talk about the type of `test`. It could be one of these:
test : (forall a. a -> a) -> int * bool
test : forall b. (forall a. a -> b) -> b * b
test : (forall a b. a -> b) -> (forall c d. c * d)
The first case is the most "obvious" one with polymorphic types, the one that I described above.
The second is the one that I hinted at, but I used less general type `(forall a. a -> int) -> int * int`, i.e. instantiating `b` with `int`. The second type above is a more general one, as it could be instantiated into other types such as `(forall a. a -> bool) -> bool * bool` as well, but this is the standard first-rank HM polymorphism.
The third type above is the type that would be inferred if `f` had different polymorphic type variables as parameter and return types. However, there are no values that have the type `forall a. a`, so a function with type `forall a b. a -> b` either (1) cannot return (i.e. diverges or raises an exception) or (2) is a dynamic cast function, i.e. a hack (`Obj.magic` is an example of such a function, but using it is "dangerous" in the sense that it bypasses the type system and so might result in undefined behavior at runtime). Disregarding case (2), the function `test` would also not return, so it's equivalent to say that it's type would be
Not sure why "test : forall b. (forall a. a -> b) -> b * b" is "the standard first-rank HM polymorphism". The first comment implied "test" is a type error in HM, typed as "forall a b. (a -> b) -> b * b", which fails to unify a with both int and bool.
Based on the principle of least privilege, "test : forall b. (forall a. a -> b) -> b * b" feels the right type. Only the body of test feeds a to the first argument, so forall a is scoped to the first argument, whereas the caller of test needs to consume b, so we scope forall b to the whole expression.
What I meant is that going between `forall b. (forall a. a -> b) -> b * b` and `(forall a. a -> int) -> int * int` is something that is possible in standard HM (i.e. if we consider the `forall a. a -> ...` part as opaque to HM).
I guess you're right, the "least polymorphism" we need to add to make `test` pass the type checking is to make just the argument of `f` polymorphic... However, in standard HM without any hacks/dynamic types, a function with the type `forall a. a -> b` for some fixed `b` can't really do anything useful with its parameter - it can't inspect it, it can only wrap it up in a data structure, but it can't return the data structure, unless the type of the parameter is wrapped in an existential type (but unwrapping it, there is still nothing to be done with it...).
Great example, it now clicks. We lifted b all the way up because it needs to be consumed by the caller, might as well lift a as well, as types with free variables, like b in "forall a.a -> b", are useless. Which looks a lot like let-polymorphism :)
If you're busy, just read the original Stack Overflow Q&A: http://stackoverflow.com/questions/22060592/. It's 1/20th the size of the blog post and I got all I needed out of it.
to be very useful for understanding how the HM type algorithm works.
They rephrase the problem as "writing an expression interpreter" for your program that recursively evaluates the abstract types of each expression instead of the concrete value.
I found this to be a very intuitive way to understand it.
It also then makes it easier to see why some "type inference" algorithms might guarantee to terminate while others might run forever: the more complicated your type system, the less abstract the "type values" are. They become like actual "values", and the behaviour becomes closer to the actual running of the program, which may not terminate.
Fascinating behavior. Since this is an edge, would it ever occur in human-written code? Could there be a situation where a computer would end up generating such code?
I've had this kind of thing happen when I try to use dependent types in Scala. E.g. if you want to select a HList sublist out of another HList (using the types) then that's quadratic, because just figuring out that a HList contains a particular type takes linear time.
(The idea was to do DI using HLists, to get something like Spring but that worked at compile time. It worked, but as the program got larger the compile times became too long for it to be practical)
I've written a lot of OCaml code and I don't recall having the compiler blow up on hand-written code. The OCaml type inference engine works well except on objects where you sometimes need to annotate variables to tell it what object type it is. (But then we rarely use OO in OCaml).
It's a different matter with generated code. For a while I had a program that generated OCaml code (from descriptions of Linux kernel structs). That would result in 100000+ line generated OCaml programs which frequently caused very long compile times (eg 20 minutes and more), with the time being spent in the type inference part of the compiler.
It's unlikely to occur as the type is exponential in size and thus quite unwieldy to do anything with! I haven't worked with generated ML or Haskell code, but I still think it's unlikely to be an issue in "real" code.
It's rarely a compile-time performance issue because O(2^N) algorithms perform well on modern hardware when N ~= 3. You could nest the `double` example from the article 6 times and it would still only be a 64x slowdown.
Where it does show up, annoyingly, is in error messages. You actually see this all the time with GCC's STL error messages (C++ templates provide a parameterized type system very similar to Haskell or ML, but without the inference, at least until C++11). STL containers are usually parameterized not only over the key and value, but over the comparator and allocator. Nest a couple of them and you suddenly have a type with several dozen variables. GCC expands the whole type out, without typedefs, when printing error messages, which means that you will sometimes get compile errors that are 10 pages long. Clang fixes this by preserving the typedefs, without which you wouldn't want to write such a type. Haskell has the same problem, except worse because it infers types, but GHC has gone to great pains to format error messages so they are at least readable if verbose.
XML/SGML people know this sort of thing (exponential resource use due to expanding references) as the "billion laughs" attack. There are real-world consequences to this sort of stuff.
No, HM algorithm has terrible worst case complexity, but the worst cases don't correspond to programs humans write and generally it works perfectly in practice.
IIRC, the algorithm used by the ocaml implementation is guaranteed to be close to linear time as long as you dont have this sort of nested polymorphic lets.
In my experience the failure rate is far below 5% for vanilla HM algorithm. I've never seem a pathological input in the wild that wasn't contrived specifically to test the typechecker.
Actually, there's some use case for these things:
you can simulate varargs, and yes the compilation time can be made arbitrary large:
https://gist.github.com/toolslive/5957292
$> time ocamlc vararg.ml
real 1m9.372s
user 1m9.264s
sys 0m0.044s
Thanks for pointing that out! I hadn't made the connection and now I've got another reason to figure out how fold works (have yet to wade through all of http://mlton.org/Fold)
question: anyone know of where the pronunciation of "!" to be "bang" came from? I always called these exclamation points but see a lot of programmers calling them "bangs." anyone?
Well, they are exclamation points. But for many punctuation characters, there's a hacker nickname (or more than one). I've heard of "bang" for "!", "hook" for "?", and "shriek" or "splat" for "#".
Well, I look like an idiot.. I read the whole article and then came back to the comments and it popped into my mind. Which was apparently just me remembering it.
If k = 1, then the type of x is a -> a. This takes 2 = 2^1 = 2^k variables. If k > 1, it is of the form id f. Let t be the type of f. t has 2^{k - 1} variables, by induction. Thus the outermost id must take something of type t (this is 2^{k - 1} variables) and produce something of that same type t (this is another 2^{k - 1} variables), that is, it's of type t -> t. 2^{k - 1} + 2^{k - 1} = 2^k.
Note that this information is purely at compile time. Haskell isn't typed at runtime, this is all just to check types. At runtime, x is a no-op, as expected.