Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Haskell RecordDotSyntax language extension proposal (Accepted) (github.com/ghc-proposals)
179 points by harporoeder on Sept 21, 2020 | hide | past | favorite | 103 comments


For those that want to see what it actually looks like:

https://github.com/ghc-proposals/ghc-proposals/blob/master/p...

    data Grade = A | B | C | D | E | F
    data Quarter = Fall | Winter | Spring
    data Status = Passed | Failed | Incomplete | Withdrawn

    data Taken =
      Taken { year :: Int
            , term :: Quarter
            }

    data Class =
      Class { hours :: Int
            , units :: Int
            , grade :: Grade
            , result :: Status
            , taken :: Taken
            }

    getResult :: Class -> Status
    getResult c = c.result -- get

    setResult :: Class -> Status -> Class
    setResult c r = c{result = r} -- update

    setYearTaken :: Class -> Int -> Class
    setYearTaken c y = c{taken.year = y} -- nested update

    getResults :: [Class] -> [Status]
    getResults = map (.result) -- selector

    getTerms :: [Class]  -> [Quarter]
    getTerms = map (.taken.term) -- nested selector


Ugh fucking finally. Too many hours wasted learning the complexities and template macro nonsense of lenses.

I am so elated to see this change, even if it is 15 years overdo. Nested selectors in particular are beautiful.

Maybe it's time again to do me some haskell for great good.


Could you explain why you think this proposal would help you avoid lens? If I am not mistaken all of the above code can easily be rewritten in the old syntax without using lens, so you probably have another use case in mind?


It is very cumbersome to write nested updates using the old syntax, and overlapping names between records was quite painful to deal with.


I love this syntax, it's elegant and a big improvement!

When/ how can I use this? Do I have to wait for a new GHC release?


It's not released yet AFAIK -- there's a tracking issue on the GHC tracker[0].

I agree that it's very nice, and a big improvement!

[0]: https://gitlab.haskell.org/ghc/ghc/-/issues/18599


You should have included a record field of the same name in both data types, since this is where the proposal really shines.

E.g. if both data types were to contain an "id" field then "taken.id" and "class.id" wouldn't result in a compilation error as it does now.


Oh that's another big issue, thanks for pointing that out! Accessor functoin collisions was already somewhat solved, but this is a much nicer solution.

That code is copy-pasted by the way, from the proposal.


Simon Peyton Jones: "One possible reaction to a diversity of opinion is to do nothing and wait for clarity to emerge. That is often the right choice, but not (I believe strongly) in this case. We have waited a long time already -- I have been engaged in debate about this topic for over two decades -- and I think it's time to decide something."


Simon's [entire post](https://github.com/ghc-proposals/ghc-proposals/pull/282#issu...) that the parent comment quotes from is worth a read, as an example of a high concentration of sanity.


There's lots of Haskell fanboys (like me ;) who can become a bit annoying, but the people usually recognised as leaders in the community, like Simon Peyton Jones, are all rather sane and pretty pragmatic.


Not a Haskell programmer so I didn’t know about this issue, but to me this change is night and day better.


It's a nuanced issue, because Haskell extensively uses the dot (.) infix operator to mean function composition. Currently, the expression `f.g` will attempt to compose the function bound to `f` with the function bound to `g`.


It's worth pointing out that Haskell has always overloaded the dot. It is used for module namespace disambiguation, e.g Set.fromList


Great point! I hadn't thought of this.

It looks like the "record dot" follows the same parsing rules as the "module dot": the presence of whitespace turns the . into function composition.


But you've got the module there to disambiguate the ., which you don't have in records.


I don't think this is true. The presence of a qualified import of "Data.Set as Set" does not change how the following is parsed.

    import Data.Set (fromList)
    import qualified Data.Set as Set


    data Set a = Set (Set.Set a)

    f :: Ord a => [a] -> Set a
    f = Set . fromList

    g :: Ord a => [a] -> Set a
    g = Set .fromList

    h :: Ord a => [a] -> Set a
    h = Set. fromList

    i :: Ord a => [a] -> Set.Set a
    i = Set.fromList
In other words, the . is interpreted as a "module dot" if and only if there is no whitespace either preceding or following the dot.


> Currently, the expression `f.g` will attempt to compose the function bound to `f` with the function bound to `g`.

So what will this mean now? Will function composition require a space between the dot, or how will a “record dot” be disambiguated?


From https://github.com/ghc-proposals/ghc-proposals/blob/master/p...

.g is a field selector.

f.g is field selection of a record.

f . g is function composition.

f. g is function composition.


Whilst I'm not going to argue the proposal is a bad thing, I think you've just summarized why some are giving it the side eye pretty well.


To me it's just, "the dot hugs a field, dot by itself is composition, and module names are capitalized."

While I wouldn't write f. g, if I see it, it's not hugging the field name so it's composition. I don't think I'd have trouble reading it.

But... it will probably trip up newbies and people with odd spacing styles. Hopefully GHC can give a useful error like "In the field selection expression f.g, g is a function defined on XX. Did you mean f . g?"


(.g) us a field selector, is (. g) then a function :: (a -> b) -> c -> b?


Yep! You could check this in GHCi (the repl), for example:

  > :t (. id)
  > (. id) :: (b -> c) -> b -> c
(:t prints the type of an expression, and id is the "identity function" that's b -> b here.)

(EDIT: Maybe you know the above and you're asking if it's being changed in the proposal. It doesn't seem so to me.)


I'm just pointing out that the difference between (.g) and (. g) is potentially easy to miss.


Possibly recommend changing link to the proposal itself[1]?

Both pages link to each other, but the discussion is unlikely to be of very much note unless one has already read the proposal being discussed.

1. https://github.com/ghc-proposals/ghc-proposals/blob/master/p...


Nothing is better than seeing the broken pieces of the foundation fixed in Haskell.

Records were my number one gripe in the language that otherwise does so many advanced things so uniquely well.


I also liked that they managed to pull off https://wiki.haskell.org/Foldable_Traversable_In_Prelude


It looks sane and solidly specified. I never trusted DisambiguateRecordFields, so I'll finally ditch a lot of ugly prefixes I stuck in there.

The whitespace requirements, though, are going to lead to some confusing error messages for newbies who want to type myRecord . someField


Whitespace requirement is unfortunate but probably necessary. Changing meaning of . from function composition to anything else will break approximately 100% of existing Haskell libraries.


The breakage problem is interesting. Right now, a.b parses just fine, so existing code will break when this is turned on. And that is an issue as people often turn extensions on per project, but it's also something a code formatter may be able to fix.

(Hopefully britanny will get an update to handle this correctly!)

They could, In Theory, rename the composition operator selectively when this language extension is enabled. That'd mean Prelude has to change what operators it exports depending on what extensions are enabled, which would be horribly magical.


How would this break 100% of existing libraries? Is there a breaking case outside of where an existing composed function and record name are identical? Haskell is a strongly typed language, where that doesn't happen.


Haskell record fields get turned into top-level functions that act as field accessors. That is, if we have a record:

  data Student = Student { name :: Text, school :: School }
we end up with two functions 'name' and 'school' in scope that extract the values of their respective fields:

  name :: Student -> Text
  school :: Student -> School
This means it is common for field names to be composed with functions. I write code like this all the time:

  schools = map (schoolId . school) students
This is compounded by the fact that a lot of newtypes use record syntax to define functions in this style that are even named like functions:

  newtype Reader r a = Reader { runReader :: r -> a }
Importing runReader from a module, you might not even know that it's implemented as a record field! This is pretty handy since you can change the implementation of runReader from being a record field to a normal function without breaking most of your callers. (In fact, I think this might have happened with runReader being implemented in terms of runReaderT.)

This kind of pattern means that people write lots of code that uses record fields as functions, sometimes without even realizing that they are using a record field!

The practical upshot is that myRecord . someField is used with . as function composition all the time, so any proposal that cares about backwards compatibility can't change the meaning of . in that context.


I think you misread my post. Identical names doesn't just mean the same type.

The change appears to just be syntactic sugar and so would not conflate anything.


Right, I think I misunderstood your question.

Rereading now, it seems you're asking how interpreting a . b as record access only if b is a record field in scope would break existing code.

The new record system is implemented on top of a HasField typeclass under the hood. Thanks to the way Haskell's typeclasses work, this means that the instances that define what fields a record has will be in scope when you import the module containing that record—directly or indirectly—even if you don't import the record itself.

This means that even if we do not have a record with a field b in scope, we might still have the instance for some other record in scope—leading to ambiguity in previously unambiguous code. Moreover, even if a . b is unambiguous right now, a record with a field called b might be added somewhere deep in the codebase and cause unexpected ambiguities in the future.

I suppose you could say that a . b is composition whenever a function b is in scope, even if there is also a record with a field called b. That rule would not break existing code, but it also seems inconsistent and confusing—a.b and a . b are sometimes the same and sometimes different, depending entirely on whether b is in scope.

Making the meaning of . purely lexical rather than depending on what's in scope seems a lot more consistent and easier to follow. It's already the case with Haskell anyway, since . can mean function composition or a qualified name depending on the spaces around it. This isn't an ideal situation, but I don't think we can aim for "ideal" in a living language that's 30 years old.


I am under the impression that it’s undesirable to have the syntactical meaning change based on the type of the expressions involved.


I agree. My understanding is this is just syntactic sugar.


It's ambiguous syntactic sugar, that can only be resolved by applying the semantic type checking pass. That's the kind of sigare that that leads to vexing parses, "causes cancer of the semicolon" as they say.


since basically no other language with dot syntax lets you do myRecord . someField with a space in between, it doesn't seem like something that will actually trip newbies up


Huh? It works in C (I just tried) and I bet in most other languages.


Works in JS as well. Though I have never seen anyone put whitespace around field accesses like that, in any language.


Newlines are pretty common. Especially in chained cases. No?


i guess i forgot about the newlines case, and just had never seen anyone put spaces in like that without newlines, so figured it didn't exist. strange that it's even allowed.


That’s because C-like languages don’t distinguish newlines from other whitespace. If you want

  foo()
    .filter(...)
    .map(...)
then you have to allow

  foo() .filter(...) .map(...)
as well, as a matter of principle.


But this doesn't apply to Javascript in its absolute sense, since although

    (1) foo()
        .bar()
is equivalent to

    (2) foo() .bar()
there is a difference between

    (3) foo()
        bar()
and

    (4) foo() bar()
(the (3) results in valid code, whereas (4) would not).

But trying to argue that (2) should be syntactically invalid would be a very hard ask indeed.


Yes. Though even Python, which does treat newlines different from other whitespace, allows `object . someMethod` with spaces on either side.


Works in Python, too.


The spaces are mandatory in Python, no?

    >>> 2020 . to_bytes(4, 'little')
    b'\xe4\x07\x00\x00'
    >>> 2020.to_bytes(4, 'little')
      File "<stdin>", line 1
        2020.to_bytes(4, 'little')
             ^
    SyntaxError: invalid syntax
JavaScript also requires the spaces:

    > 2020 . toFixed(2)
    '2020.00'
    > 2020.toFixed(2)
    2020.toFixed(2)
    ^^^^^

    Uncaught SyntaxError: Invalid or unexpected token


Numeric literals is a special case (in most languages), since the dot could also be a decimal separator. You have to use space or parentheses to disambiguate.

But for decimals there is no ambiguity:

  >>> 2020.50.is_integer()
  False


You say disambiguate, but there is no ambiguity. Only one parse could possibly be valid.


The lexer could disambiguate but that would require lookahead. I believe Python have a deliberate policy of keeping the parser simple.

But I just noticed C# support this, so it is not the same for all languages. Java doesn't, but then again you cant call methods on numbers in Java anyway.


It's a real lexing ambiguity because '0.' is a valid numeric literal (so '0..toString()' parses ok, somewhat counterintuitively). In principle, yes, you could lex '.' as a separate token even in numeric literals and have the parser figure it all out.


No, it is not a lexing ambiguity. You just need one extra character of lookahead after encountering a decimal point.


Clearly one character lookahead is not sufficient, because e.g. '0. toString()' (note the space). There's no question that the lexer could in principle disambiguate with unbounded lookahead, but it would be a bit hacky, as you'd effectively be implementing part of the parser in the lexer (by attempting to figure out if if was a method call, which is really the parser's job).

So basically, you could easily write a parser that allowed '0.toString()', but you'd either have to piece numeric literals together in the parser or add nasty hacks to the lexer.


> There's no question that the lexer could in principle disambiguate with unbounded lookahead, but it would be a bit hacky, as you'd effectively be implementing part of the parser in the lexer (by attempting to figure out if if was a method call, which is really the parser's job).

This is actually not hacky. It's just a rule that the "." cannot be followed by [ \t]∗\w, which is a simple negative lookahead assertion. Replace \w with whatever you use at the start of identifiers.

It is extremely common for languages to have corner cases like this in the lexer to make the language more usable. For example, consider the rules in JavaScript or Go concerning where you can put line breaks. Or the rules for JavaScript concerning regular expression literals, which must be disambiguated from division.

> So basically, you could easily write a parser that allowed '0.toString()', but you'd either have to piece numeric literals together in the parser or add nasty hacks to the lexer.

This is factually incorrect. As I explained, you would only need one character of lookahead. There is no need to parse "0. toString()" successfully. If you wanted to parse "0. toString()" correctly, you could use unbounded lookahead, which is fairly simple in practice (speaking as a sometimes parser writer). I don't get why you say it is hacky, this is all just a bunch of regular expression stuff (in the traditional sense of "regular").


>If you wanted to parse "0. toString()" correctly, you could use unbounded lookahead

Right, which is what I said. If you agree that unbounded lookahead is required then we don't really disagree, except on the somewhat subjective question of how 'hacky' that is.

If I understand correctly, you suggest that unbounded lookahead could be avoid by allowing '0.toString()' but not '0. toString()', while still allowing both '(0).toString()' and '(0). toString()' and both 'foo.bar' and 'foo. bar'. That would produce highly counterintuitive results in some instances:

    Parsed as one expression:
    {}.
      foo

    Parsed as two statements:
    0.
      toString()
But again, it is really a subjective judgment. Obviously you could modify Javascript in this way, and on that point there is no disagreement.


Wow, I never knew python supported dots after numbers. Thanks!


But prefer (55).foo() as you shouldn't expect your reader to have memorized the Python grammar.


For languages with C derived syntax, whitespace around operations are insignificant. "a+b" is the same as "a + b". But by convention, space after dot is very rare.


Works in Lua.

I even use it sometimes, like this:

local field = require "some/module" . field

When I want exactly one field from a module which returns a table. I think the spaces look nicer next to the string.


sigh don't forget the /s next time..


i dont care


Does the proposal have any effect on lens usage?

EDIT: I mean, lenses and RecordDotSyntax both provide a way to access nested fields in records, so the new syntax would take away some of the reasons to use lenses. But are there any other relations among the two, or do lenses and RecordDotSyntax just provide two different solutions to the same problem?


Lenses (or, rather, optics) are still far more powerful than plain record field getters. Optics allow

- Mutation of nested records,

- Iteration of collections,

- Folding of collections,

- Viewing discriminated cases that may or may not exist,

- Building a structure around a value, and so on and so forth.


Yeah, there are still a lot of reasons to use lenses. But to rephrase - i was wondering if lenses and the proposal somehow "interact". Actually, I just found some of what i was after in this comment [1].

[1] https://github.com/ghc-proposals/ghc-proposals/pull/282#issu...


But this has nested mutation, no?


Wow this is great! Fixing Haskell's arcane record system would be a huge step towards more mainstream adoption. It was certainly my largest pain point with the language over years of use in production.


That's interesting! I would have expected space-buildups from accidental laziness to be a bigger problem in production?


That is not an especially big production problem. Of course you don't want it to happen. But it rarely happens. And it happens even less if you use -XStrictData or manually make your record fields strict since that's by far the most common cause: lazy records used as accumulators.

I would say in my 5 years of production Haskell, I've seen that sort of thing deployed in production _maybe_ once, and I've run into it during development once or twice.

One time it was in integration testing, and a simple heap profile made it clear it was in a library dependency. And a quick look at the library source was enough to diagnose it. It was like an hour or two tops from "we have a space leak somewhere" to "here's a PR to a third-party library fixing the leak."


Thanks! Most of my production experience with 'Haskell' was with Standard Chartered's Mu dialect, which is strict anyway.

A bit of a problem when you first start using Haskell seriously is that the IO mechanisms built into the prelude are very slow once you get to even only a few MiB.

That's easily remedied by switching to eg ByteStrings, but it's still a bit annoying that what the language presents as the 'default' is such a toy implementation.


Imho this looks good. It will save on key strokes and maybe bring more readability. I always liked "." or "->" syntax in different languages.

My only worry is . being function composition operator so it might not be crazy amazing for readability only you get

a . b . c . d.e

So maybe it should have been something like "->".


> So maybe it should have been something like "->".

The problem is that "->" has an even more special meaning than "." in Haskell, since "->" separates the arguments of an unnamed function from its definition.

So, for example

    addOne = \x -> x + 1
and

    addOne x = x + 1
are equivalent.

This means that "->" is about as special as "=". Which leads me to think it would require some very substantial changes to the parser, and perhaps even defeat the purpose of increased readability. For example, how readable is the following?

    getY = \x -> x -> y
(which reads as "getY = \x -> x.y" in the current proposal)


I thought conflating the dot with function composition wouldn't be too bad because they're similar in a sense, but then I remembered that the two work in the reverse order in the sense that:

(.a.b.c) . (.d.e.f) = (.d.e.f.a.b.c)

If you use -> both for the member function and for reverse function composition you'd get:

(->a->b->c) -> (->d->e->f) = (->a->b->c->d->e->f)

which is somehow nicer, especially because accessing a member and composing functions become the same thing when you view a value as the unique function sending the initial object to that value.

But unfortunately we've spent centuries composing functions the wrong way round, so now we're forced to deal with the consequences.


You can do left to right function composition with (>>>)

Given functions f, g , h,

h >>> g >>> f is equivalent to f . g . h

Its from Control.Category, so it generalizes to other things, but for functions specifically its left to right composition.

( theres also (<<<), which in this case is identical to (.) )

( Also present in Control.Arrow . Why? I dont know this topic well enough to explain. My understanding is limited to what these operators mean in the specific of functions. Functions(->) are a type as well, so they can be instances of typeclasses. for example

  instance Functor ((->) r) where  
        fmap = (.)

)


Almost, but you are conflating function composition (lambda x: lambda y: f(x(y))) with function application (lambda x: f(x)).


PureScript, which borrows heavily from Haskell's syntax (you could say it is Haskell for JavaScript but with strict evaluation) already had something like this for syntax for accessing records (JS objects). It wisely didn't also use the dot for function composition however. I hope it will be the norm to use some alias other than (.) for function composition when this extension is used.


I'm a big fan of the flow package (https://hackage.haskell.org/package/flow), which defines

  (<.) :: (b -> c) -> (a -> b) -> a -> c
and

  (.>) :: (a -> b) -> (b -> c) -> a -> c


Haskell is full of things (hello, most of Prelude) where the default thing is terrible and the correct thing is in a package (usually a choice between multiple incompatible packages). It makes it extremely uncomfortable to grow from newbie to intermediate.


How did Idris solve this again btw?


Idris 2 added dot notation. Idris 1 simply disambiguates based on type or namespaces each accesor.


The syntax to update records doesn't seem particularly nice:

    e{lbl = val}
Though this has been a part of haskell for a while it seems:

> Note: e{lbl = val} is the syntax of a standard H98 record update

FSharp, OCaml and Elm have not got great solution here either:

   { e with lbl = value } // ocaml/fsharp
   { e | lbl = value }
Do any functional languages have anything as nice as:

   e.lbl = value


As frou_dh hints, the updates are not done in place, so it would have to be something like

    new_e = e.lbl = value
at which point "nice" is not how I would describe it.


To me those ML syntax look very in keeping with the fact that it's an immutable update. Don't mind them at all.


The `e.lbl = value` syntax suggests mutation to me, which would be misleading.


True, and so there might be a need for something like:

    e = e.lbl <- value
(I didn't reuse the '=' because two '=' like that looks weird.

The main thing I want to address is that a.b.c.d is so much work to write to.


Elixir has a few forms of updating a struct (or map)

    %{existing_map_or_struct | some_key_1: :some_val, some_key_2: :some_val}
Then there is also `put_in/2`, `put_in/3`, `update_in/2`, `update_in/3`, these are nice because they work in pipelines

   some_val
   |> put_in([:foo, :bar, :baz], 3)


   some_val
   |> update_in([:foo, :bar, :baz], fn x -> x + 1 end)
The macro forms are cool too

   put_in(foo.bar.baz, 2)

   update_in(foo[:some]["nested"][:path], & &1 + 1)
The cool thing about the `/2` forms of `put_in` and `update_in` is that they are macros they rewrite the ast to return the entire object, and not just the changed value.

    iex(1)> foo = %{bar: %{baz: 2}}
    %{bar: %{baz: 2}}

    iex(2)> update_in(foo.bar.baz, & &1 + 1)
    %{bar: %{baz: 3}}
There's even more powerful macros like `get_and_update_in` which can use "selectors" to extend their functionality (https://hexdocs.pm/elixir/master/Kernel.html#get_and_update_...)

https://hexdocs.pm/elixir/master/Kernel.html#put_in/2

https://hexdocs.pm/elixir/master/Kernel.html#put_in/3

https://hexdocs.pm/elixir/master/Kernel.html#update_in/2

https://hexdocs.pm/elixir/master/Kernel.html#update_in/3


To clarify, the thing I want is to make it nice and simple to create a new record with a nested field updated. It's easy to read a.b.c.d, but updating that field painful:

    let a = {a with b = { a.b with c = { a.b.c with d = 5 } } } }


Lens has this:

    e & lbl .~ value

It's not an in-place update like `e.lbl = value`.


That doesn't solve GP's problem of "not looking exactly like C++/Java/Python"


My goal isn't to look _exactly_ like C++/etc, but rather to have a fairly natural duality between read and write. You can read with:

    a.b.c.d
but a write is:

    let a = {a with b = { a.b with c = { a.b.c with d = 5 } } } }
which is much much more verbose than the OO/imperative:

    a.b.c.d = 5


Yeah, lenses let you do

    a & b.c.d .~ 5
To construct a new version of `a` where those nested fields have the new value 5.

Or for an in-place update in a context where that is allowed,

    a.b.c.d .= 5
The lens concept is incredibly nice.


jax has nice e2 = e.at[lbl].set(value)

syntax, and Ramda has

e2 = assoc(lbl, value, e)

which is curried and data-last, so you can keep that setter around for future instances of e. Usually not too bad to translate Ramda into your language of choice, and a valuable exercise. I did it with Python and it helped a TON

Once JS gets immutables, web devs will have more tools for FP, and we can look forward to that expanding the community of functional programmers a lot: (https://www.infoworld.com/article/3569118/ecma-proposal-woul...)

Python’s lack of an immutable map is annoying but there are nice pip packages for it


The update syntax is actually more of a `.copy` method, than a property update since it returns a new object. It's actually quite nice in that respect since it's easy to update multiple fields:

    e{lbl = val, lbl2 = val2}


Yes, the languages use immutable values so I understand the semantics.

I agree that's quite nice, however updating nested values is very painful.


Afaik with the New extension you can write nested updates easily as e { a.b.c = 10 } which is I think is nice.


Ocaml distinguishes between "e.lbl <- value" which is a mutation of e and "{e with lbl = value}" which creates a modified copy.


Where can one track the progress of the implementation and be informed when it's ready?



Thanks !

I can't find a subscribe button in gutlab but I will set a reminder


... and if stick a lambda in there the whole thing becomes on object :)


As record slot's value retrieval (I love old lisp terminology) could be viewed as a pure function (given the record as a parameter) it is reasonable to think of .slot as a function.

The old classic languages just create so-called selectors in the global namespace, like defstruct macro does. This is not the best possible solution.

However, the . is used for function composition, so the better alternative could be the -> syntax, reflecting that in Haskell everything is a pointer.


Everything in Haskell records is _not_ necessarily a pointer though thanks to {-# UNPACK #-}


What's sad is that only reason this situation is so controversial is because all of programming language design is constrained by the 19th Century typewriter keyboard.


Modern mobile keyboards have much fewer symbols still.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: