I enjoyed the discussion of the types. I dig haskell (and clojure), and I think this is perhaps the perfect lens with which to view how to make choices between them. You can have an insanely complex typesafe haskell transducer, a still very complex but unsafe haskell transducer, a weaker and less flexible version of transducers with a simpler type encoding in haskell, some combination of those ideas, OR...
you just test your code out in the repl while developing in clojure and just kinda rely on the fact that core infrastructure or popular libraries will generally work.
Personally, I don't think the gist posted by tel qualifies as "insanely complex" by any stretch.
But a more serious question is: how do you know if "will generally work" applies to your particular case? The short answer is that you don't and you end up having to test library-provided functionality as part of your own test suite. You can argue that you'd have to do the same in e.g. Haskell, but really the surface area of potential failures is hugely reduced by the fact that you know whether a function "f: a -> b" can have side effects and that given something shaped "a" it will give you something shaped "b" (modulo termination, but that applies in any non-total language).
This is not merely a theoretical issue: Compare the amount of documentation you need when using some random library in Haskell vs. e.g. JavaScript. The types act as compiler-checked documentation. In JavaScript it's really a crapshoot if the library has sufficient documentation and it's always specified in an ad-hoc manner (which naturally differs from library to library).
I think if you sit down with the code clojure uses to implement transducers and compare it directly against the haskell code and _don't_ come to the conclusion that the clojure code is simpler, there might be some motivated cognition at work. I don't think it's trivial for someone with Haskell experience to even verify that code implements the same (or a similar) thing.
I also believe it lacks sufficient polymorphism, for instance surrounding the output of the step function, and lacks safety around (notably, but not exclusively) the application of the step function (i.e. to only the last value of the transducer, not just something of that type). So this would be squarely in the tries-to-be-simple category, despite it's use of profunctors (don't know why that was used here, it's not a super-standard abstraction).
But this is all beside the larger point. Things generally working is learned through philosophical induction in the case of clojure -- just seeing something work a bunch of times and understanding the mechanisms in some level of detail. That's not the same as having a machine-verified proof, but it's also not the same as not knowing at all.
It depends on what you mean by "simpler", I suppose.
> there might be some motivated cognition at work.
Did you really just say that?
> I don't think it's trivial for someone with Haskell experience to even verify that code implements the same (or a similar) thing.
No, not to verify that it does the same thing. For that you'd have to understand exactly what the Clojure version does too. I'm a quite rusty on Clojure, so I can't make a fair comparison on how easy it is to understand vs. the Haskell version. However, you'll note that I didn't actually say anything about the Clojure version being harder to understand.
In fact, my point is that I don't even have to understand the implementation of the Haskell version: I just have to understand its interface (i.e. the types) and have a general idea of what it's supposed to do (in fuzzy terms).
Outside of the opinions being expressed, some technical comments.
1. I'd like to learn more specifically what kind of output polymorphism you would like. Right now the outputs are universally quantified, but linked. I've written it in other ways as well, but I could not find any reason in practice to use that added complexity.
In particular, the universal quantification does force you to use the output of the step since there is no other way (in scope) to produce values of the needed type. For that use case at least the RankNType is exactly what you want.
2. Profunctors are not really needed at all. It was more to demonstrate that Moore and T have nice structure. The same holds for Category (and my note about Arrow). In all cases, this is just giving nice organization to regular properties of T and Moore.
However, it only helps if you thoroughly understand the type system. Since Haskell has a rather complex type system, the relative lack of documentation in English is a barrier to entry for the uninitiated. (Also, aren't the types often omitted due to type inference?)
It's actually considered bad practice to omit types in Haskell for top level declarations. Unless needed, it's typically best to omit them _within_ declarations, though. I am not entirely sure why this is where the community landed, but you could get away with e.g. only providing types explicitly for exported functions, and no one would really care, I think. You can always use your IDE/repl to just tell you the types whether they are explicit or not.
[EDIT: This didn't really address your core point. I'm not sure how to do that, precisely, but here is a shot. I think the ideal Haskell use case is active collaboration between the programmer and the compiler. The lack of English language docs is seen as less important than clear contracts within that interaction. I think the descriptive statement of "If you try to operate without the compiler and reason using English language understandings, Haskell will be more frustrating for you than it is for people in the Haskell community" seems both true and fair. Suffice it to say, most understanding of actual things is most easily expressed in natural language, because most communication happens in natural languages, so that's just how we hear about those ideas.]
Yes, I agree. This dialog with the compiler may help library authors and library callers, but it doesn't help readers. A beginner is likely to be reading code on the Internet, in tutorials, blog posts, on stack overflow, or in papers. Perhaps a particularly active, determined reader might also try out the code they read.
It's common to automatically syntax-highlight online code but not to type-annotate or cross-reference it (adding links between declarations and usages). But perhaps the tools will get better.
Another way to say this is that Haskell encourages what Rich Hickey would call "guardrail programming". The types give you hints as to how the different pieces of the program piece together. When you finally get everything to compile it should hopefully work straight away.
That is definitely true and a flaw. I believe there's a good middle ground to be had when one considers the difference between a tutorial and documentation. The docs really are almost best left to types in the detail (they're machine checked, highly informative, and the reality of what you'll be handling when you use the code) but many higher level details are difficult (but not impossible) to intuit from the types alone. Further, these "intentionalities" are likely more stable to changes and thus would be served well as documentation.
But as usual, people dislike writing docs. Nobody dislikes having a new contributor write docs for their code, though :)
>You can have an insanely complex typesafe haskell transducer
How do you come to this conclusion? I can't even see how that is possible other than outright ignoring reality and just making up something you like the sound of. It isn't even complex, much less insanely complex.
>you just test your code out in the repl while developing in clojure and just kinda rely on the fact that core infrastructure or popular libraries will generally work.
And then get bitten by the fact that "generally work" isn't adequate.
There is no way in Haskell 2010 to express a function who can take as an argument more than one shape of thing and then give compiler assurances that the value, not just the shape, given to it is precisely that of the result of calling it last (not to mention unless you fold this into a strict monad to hide the type changes of the state function, 'called last' may not be very well defined). This is simply not how Haskell works. You can drop polymorphism or you can drop safety, but I am not pretending to describe a potential tradeoff, I am describing a tradeoff the designers of Haskell already made when they decided (rightly, perhaps) that even if you had dependent types, encoding a fully polymorphic (map map) doesn't help as much as it hurts.
and you keep specializing it with whatever input you like.
f (x :: Int) :: Int
f (y :: Char) :: Char
which is essentially the RankNType trick. You could express something fancier with dependent types, but I've yet to see a real need for it.
You could also do this with a "Resumption Type":
data Resumption a where
Resume :: x -> (x -> (a, Resumption a)) -> Resumption a
In this case, each time you receive a Resumption you can open it up to receive a totally unique `x` which you have zero knowledge of except in that you are able to give it back to the "resumption function" you also unpacked in order to receive a token.
A little variation on this is exactly the Moore machine encoding I used in the Gist.
So, my primary confusion was on how to implement this in Haskell 2010 (without Rank[N,2]Types, GADTs, etc.), but I'd like to say I don't think we materially disagree about any matters of fact -- it isn't as though I'm pointing to code and claiming it shouldn't compile (as I'd seen the gist) or anything equally crazy.
The specific polymorphism tradeoff I'm talking about would force the types to look a little more like...
data Resumption [a, b, c...] where
Resume :: x -> (x -> (a, Resumption [b, c, d...])) -> Resumption [a, b, c...]
I think having dependent types means you could have something polymorphic with respect to the value :: [Type], which encompasses a large portion of what I'm pointing towards. This gets closer to covering the cyclic type system Rich Hickey mentioned, but I still think you'd have to be pretty careful to make sure the compiler doesn't explode with an infinite-sized type.
Ah, sorry, Haskell 2010 won't have nice enough types on its own. The best you could do would be to fake RankNTypes and use them for existentials. A sorrier state.
You could definitely do something like what you ask. You could do something even nicer using a type family indexed on Nat. That all said, I don't know that I'm convinced that it's actually what was discussed in Hickey's slides even if his notation suggested it. You don't want your output changing at each step—you want the "sink" end of the applied transducer to pick the output and the "source" side to just treat it opaquely.
Internal states stored local to a transduction "step" might be valuably treated as Rich suggests, but this is exactly what the Moore machine handles—the constructor can pick the state and the user can only pass it forward with some new variable each time they "open" the state. Furthermore, due to purity only one "sequence of reductions" can survive the whole process. This gives you your indexed types much more naturally.
If you can think of a concrete use case where the more convoluted Nat-indexed type family would work better I'd be interested to see it. I'm reasonably convinced right now that it's closer to a sketch than a real need.
Are you trolling? If you don't know anything about haskell then making up random nonsense is not really constructive. You can ask for help learning, there's tons of good resources.
First, to be clear, I really liked this presentation. The criticism below is both technical and small---all in all I greatly enjoy Rich Hickey's work and generally admire his ability to talk compellingly about complex technical topics.
That said.
I somewhat disliked Hickey's presentation of typing transducers here. I feel as though he builds a number of strawmen toward typing and then tries to knock them down and suggest that either Clojure has some kind of mystical mechanism that is ineffable to types or that the exercise of typing transducers is wasteful. I disagree on both accounts, I suppose. I think types are useful for analysis and teaching.
The two major points he seems to make is that in order to "properly type" transducers you must
1. Index the type of the "accumulation so far" so
that it cannot be transformed out-of-order
2. Implement early stopping "without wrapping anything
except for the reduced value"
There may be other critiques as well, but I want to examine these two in the context of Haskell.
With respect to the first point, the major concern appears to be prohibiting behavior loosely described as "applying the reducing function, say, 3 times and then returning the first resulting accumulation". In some sense, the idea is to force us to be faithful in passing on the accumulating parameter. In code, a pathological setting is the following:
transduce :: (r -> a -> r) -> (r -> a -> r)
transduce reduce accu0 a =
let acc1 = reduce acc0 a
acc2 = reduce acc1 a
acc3 = reduce acc2 a
in acc1
The concern is unfounded in a pure language, however, since calling `reduce` can have no side effects. This entails that all possible effects on the world of calling `reduce` are encapsulated in the return and, therefore, we can completely eliminate the steps producing `acc2` and `acc3` without worry.
transduce :: (r -> a -> r) -> (r -> a -> r)
transduce reduce accu0 a =
let acc1 = reduce acc0 a
in acc1
Now, there may be concern here that we still want to index the `r` type somehow to allow for changes of accumulation to occur. This is not the case (in this simple model!) as in order to achieve the "baggage carrier independence" property the `r` type must be left unspecified until the transducer is actually applied. The cleanest way to do that is to use a higher-rank type (Hickey mentions these briefly and offhandedly toward the end of his talk)
type Transducer a b = forall r . (r -> b -> r) -> (r -> a -> r)
which thus prohibits the implementer of a Transducer from affecting the values of `r` in any way whatsoever---they must be left anonymous until someone decides to use the Tranducer on a particular collection of values `a`.
(It must be noted that the model given above is isomorphic to a "Kleisli arrow on the list monad" which I described a little bit here http://jspha.com/posts/typing-transducers/. It should also be noted that this model includes neither (a) the ability to use local state to capture time-varying transductions or (b) the ability to terminate early)
With respect to the second point, I'd like to suggest that there is a difference between the semantic weight of wrapping the result types in an Either in order to indicate early termination and the implementation weight. I completely agree that using an Either to implement early stopping (as it's easy, if finicky for the library implementor, to do) will involve wrapping and unwrapping the "state" of the transduction continuously. I also would like to suggest that it's a very natural way of representing the "accumulation | reduction" notion Hickey uses in his own "Omnigraffle 8000 type system".
We really would like to capture the idea of the transducer state as being "either" in-progress or fully-reduced and act accordingly. If Clojure's implementation of that requires fewer runtime tags than an Either, so be it, but I personally fail to see a semantic difference except in the way one can play fast-and-loose with dynamic types over static types to begin with.
---
So, I gave above an implementation of Transducers in types which has some of their properties, but certainly not all. In fact, I abused the fact that there is no ambient state in Haskell in order to ensure that a certain property would hold (notably this doesn't require a type system at all, just purity). I also argued that using Either is a perfectly natural way to implement early termination in such a transduction pipeline.
I've also made an extension to the `(r->b->r) -> (r->a->r)` mechanism which enables local state to be enabled for various components of the transduction pipeline. A version without early termination is available here:
Notably, this uses most of the same typing tricks as `(r->b->r) -> (r->a->r)` but adds a "reduction local hidden state" variable which lets us implement `take` and `partition`. This takes Hickey's notion of needing to be explicit about the state being used to a whole new level.
---
So what is the point of all this?
I'd like to argue that Transducers do not present such a mysterious mechanism that they cannot be sanely typed in a reasonably rich language. I believe that I can capture most of their salient features in types without using the dependent indexing Hickey suggested was necessary.
More than this, the compartmentalized, hidden reducer-local state in the Gist implementation allows for each reduction step to include fairly exotic local states in their state machine. You could implement a kind of type indexing here if desired and no end-user would ever know of its existence.
I also absolutely concede that many type systems people regularly use could not achieve this kind of encoding.
Finally, what I really want to say is that type systems are not something to be denigrated. I believe some of the earliest "transducers v. types" argumentation took a nasty turn as amateur type theorists (myself included) rushed to write things like "Transducers are just X".
I want to apologize for any kind of bad feelings my own writing in that thread may have stirred up. I try not to be haughty or dismissive with this kind of writing, but I also make mistakes.
So what I'd really like to suggest is that types should not be taken as reductivist on interesting techniques like Transducers but instead as a tool for analyzing their construction and either improving it or better teaching it. Hickey himself often turns to some kind of "pseudotyping" to talk about how Transducers work---formalizing those notions should only lead to greater clarity.
Of course, implementations will differ in small ways. As I've noted abundantly here, a major difference between the Haskell and Clojure implementations is driven more by Haskell's purity than its typing. Hopefully, however, exploration of alternative implementations and the rich analysis produced by their typing can help to introduce new ideas.
For instance, the Gist implementation, if you strip the types away, is an interesting divergence in raw functionality from Clojure Transducers---if Clojure Transducers are "reduction function transformers" than the Gisted Transducers are "Moore-style (Infinite) State Machine transformers" and that difference allows the implementer to be extra explicit about the use of local state.
I'd rather see discussion about whether such InFSM transformation techniques have a place in Transducers literature than a fight over whether or not its possible or reasonable to "type transducers".
I must say I haven't watched the presentation (yet!), but your comment touches on something that really irks me when people who mostly use unityped languages comment on staticially type-checked languages. It very often turns into this weird thing where you just hear complaining about "types getting in your way", whereas I personally tend to see it as "the types are there to help you understand what you're actually doing". Maybe it stems from a difference in approach. I'll usually practice what's been called "typeful programming" where I just write all the type declarations for data/functions and use "undefined" ("???" for Scala folks) as the implementation just to see if the whole thing fits together before trying to code up any of the details. The types often lead to an "obvious" implementation of any given function. This is very similar to top-down programming. It seems to me that most "dynamic" programmers (esp. of Lisp-derived languages) tend to do things bottom-up. I think this may be at the root of a sizable portion of the "gets-in-the-way"/"there to help you" schism.
I'm not sure using Either to implement early stopping is that different from what Clojure is doing. Since its a dynamically typed language, all objects are already wrapped and tagged so they can use the "Reduced" tag to stand for Left and anything else to stand for Right. This is the same idea of using null to stand for Nothing and anything else to stand for Just.
Other than that, I assume that it should be possible to switch to some continuation-passing implementation if you really wanted to avoid using Either. I probably won't matter much unless you have deeply nested transducers though.
I'm personally not really sure there's a difference either, but I wanted to be conservative in what I argued. Even if Clojure is more efficient, the semantics are still identical.
For example, the following code needs Higher Ranked Types to typecheck:
id :: forall t. t -> t
id x = x
-- mk_pair's type is a rank-2 type, because its
-- f parameter is a polymorphic function that gets applied
-- to different types inside mk_pair.
mk_pair :: (forall t. t -> t) -> (Int, String)
mk_pair f = (f 10, f "str")
a_pair :: (Int, String)
a_pair = mk_pair id
But it can obviously be written in an untyped language if you wanted to. Just erase all those type annotations:
function id(x){ return x}
function mk_pair(f){ return [f(10), f("hello")] }
var a_pair = mk_pair(id);
The deal with higher-ranked types is basically a tradeoff between the flexibility of the type system and the power of type inference. If you restrict yourself to rank-1 types (that is, none of your function arguments are polymorphic) then the compiler can infer all all types in the program without you having to write any type signatures. If you want to use higher ranked types then the compiler can't infer everything anymore so you might need to add some type annotations yourself.
The "kind of" being really pertinent. In particular, this forgoes the guarantee that the only thing the transducer is allowed to do is return a value from the reduction. Hickey notes that this is a law of writing a proper transducer. Higher rank types ensure that this law is properly encoded directly into the system.
I'm not sure offhand. Even if your language lacks it, however, you can usually get away without it. The following works
Transducer r a b = (r -> b -> r) -> (r -> a -> r)
but if you even accidentally specialize the `r` while constructing Transducers then you will have reduced their ability to generalize over input and output containers. In other words, it'll be up to you to maintain the universality of `r`.
I don't understand why you use a "Moore-style State Machine".
I don't understand too why all the typed transducers which are shown here use a restricted version of reducers
(i.e. a step function of type `r -> a -> r`).
Rich Hickey points out that transducers must deal with initialization and completion. At minute 42:26 of his talk, he even presents a complete picture of tranducers type. They are reducer transformers where a reducer is rather made of 3 pieces :
data Reducer a r s = Reducer {
init :: () -> r,
step :: r -> a -> r,
complete :: r -> s
}
We may than write :
type Transducer a b = (Reducer a r s) -> (Reducer b r s)
But I think that we miss the point doing that. A transducer should be able to lead to a new reducer which use a different type to accumulate inputs. This opens the way to state management and the implementation of transducer like `taking` or `partition`.
Leaving Haskell for OCaml I'm more comfortable with :
type ('a, 'b, 'c) red = {
init : unit -> 'b;
step : 'b -> 'a -> 'b;
comp : 'b -> 'c;
}
let taking n reducer =
let init () = (0,reducer.init ()) in
let step (i,acc) item = if (i<n) then (i+1,reducer.step acc item) else (n,acc) in
let comp (i,acc) = reducer.comp acc in
{
init = init;
step = step;
comp = comp
}
The type of taking is `int -> ('a, 'b, 'c) red -> ('a, int * 'b, 'c) red`. The former `b` accumulator state has been replaced by a pair `(int,b)` where a counter is used to track how much items have been taken so far. Yes this should be improved with early termination. But the key point I want to highlight is that wrapping all the transducers into a single general type may imply the use of some sort of existential type (to abstract away the accumulation type). This general type is furthermore useless for the system to check the compatibility of a reducers along a transducer chain.
---
Another point I would like to say is that if transducers are undoubtedly an important and effective mean to build efficient transformation chains,
they doesn't deserve to be pushed forward of conventional transformers (map, filter, take, ...) to which we are used for long.
Based on collection definition as something whose content is reducible, we can define the 'map' collection transformation on the basis of 'mapping' reducer transformation
type 'a col = {
fold: 'b 'c. ('a, 'b, 'c) red -> 'c
}
let map f col =
{
fold = fun reducer -> col.fold (mapping f reducer)
}
So, we have the clarity of map/filter/reduce with the power of transducers.
https://github.com/didier-wenzek/odist
is a project where I started to implement these ideas some months ago.
This transducer discussion will undoubtedly energize me to work again on the topic !
Correcting two flaws with this leads to an interesting result:
1) no error handling, at all, anywhere, not just missing from the presentation, but not in the code. transduce just wraps reduce. FAIL!!!
2) the 'result' parameter unnecessarily pollutes the interface. reduce can be reduced to foreach (doseq) with state and be implemented like other tranducers that require state.
Correcting for these two errors:
a) the 0-arity init function is removed
b) the 1-arity completed function becomes 0-arity
c) the 2-arity reducing function becomes 1-arity
d) a 1-arity error function is added.
Since these functions cannot be distinguished by arity alone, we give them names: call the reducing function 'onNext', the error function 'onError', and the completed function 'onCompleted', and optionally, group the three functions into an object and voila, we have Rx Observers.
Hickey's twist here is composing Observers instead of Observables. Whether this buys you anything worthwhile is debatable.
Two derivations of Rx often accompany it's introduction:
1) Observable being the dual of Iterable
2) Observables being Promises with multiple return values.
Thanks to Hickey, we can add Observers being an abstraction from reduce/fold (along with it's many other names).
I'm still a little confused and will have to go over some code or something to really understand the limitations of what transducers can do... can any transducer be used in a "parallel" context (like map and filter) or are they limited to a linear context (like the fold makes me suspect)?
I think it could be parallelized by chunking up the input, however the transducer doesn't know how to do that by itself, so the step function would have to be modified somehow.
Check out clojure reducers (http://clojure.org/reducers) they are earlier work in the same line of thinking. The where used to do pmap style things. The same tricks can be used with trnasducers, but I dont think its implmented yet.
One simplified model of transducer happens to be similar to half of a particular monad. It's very disingenuous to say that "transducers are monads" or even "transducers are a particular form of monad", however.
:( it was an honest question. I really didn't know. I watched the talk. I didn't get everything that was said, so I came here to ask a question. Man, HN is use to too much snark.
Oh! Sorry, I may have written that poorly. I meant that technically: "in almost every sense, no". There is one sense in which "yes" they kind of are: a simplified model can be seen as
Transducer a b ~ (a -> [b])
where the right side is a pretty fundamental type when talking about the list monad. In fact, composition of transducers is exactly "monad-arrow composition" or "composition in the list monad Kleisli category".
So in that sense, they are a particular form of monad. But it's a bit of a stretch.
Thus, I really meant it in "almost every way", not every way entirely.
So by what you wrote, it sounds like they're related, but a completely separate concept. Where can I find an intro to transducers that's readable for beginners? I don't know what "list monad Kleisli category" means. Searching for "transducers" came up with a physical transducer.
Since list is a monad then we know that join can be specialized to
join :: [[a]] -> [a]
which is just `concat` then. Now Transducers are a bit of an elaboration over functions like
Transducer in out === in -> [out]
and so we might compose them a little like
comp t1 t2 in = concat (map t2 (t1 in))
which if you follow in types ends up working out. Finally, that concat/map business is just monadic bind in the list monad. We could also write
comp t1 t2 in = t1 in >>= t2
= (concatMap t1 t2)
= bind t2 (t1 in)
depending on what syntax is most familiar. That's why they're (an elaboration of) "monad composition", i.e. a Kleisli category.
The only thing left is just that Hickey did a transformation called "Church Encoding" or "Continuation Passing Style" to the regular list functions. This eliminates the need to generate concrete lists and turns this fancy composition into reverse normal function composition. It's a clever trick.
yeah, is this going to become a 'thing'? i'm finding it really kind of annoying that terms from the fundamentals of theoretical & applied compsci ([weighted] finite-state transducers WFSTs) are being appropriated like this.
you just test your code out in the repl while developing in clojure and just kinda rely on the fact that core infrastructure or popular libraries will generally work.