It would be great to see a refactor of some code using transducers to get a better sense of what they're useful for. From an abstraction standpoint, I can see the attraction; but I am having a tough time imagining how it would improve code in practice.
Much Clojure code relies on applying nested transformations to sequences:
(reduce + 0 (filter odd? (map inc (range 1000))))
Conceptually, this takes an input sequence of 1000 elements (range 100), then creates an intermediate sequence of (map inc), then creates an intermediate sequence of (filter odd?), then reduces that with +. (I am glossing over a number of internal optimizations - chunking and transients, but the idea holds.)
Transducers let you represent the transformation parts as an independent (reusable) thing:
(def xf (comp (map inc) (filter odd?)))
You then apply that composite transformation in a single pass over the input:
(transduce xf + 0 (range 1000))
transduce combines (a) input iteration (b) transformation application and (c) what to do with the results - in this case apply + as a reduce. Other functions also exist that make different choices about how to combine these parts: into collects results into a collection, sequence can incrementally produce a result, eduction can delay execution, core.async chans can apply them in a queue-like way.
There are a number of benefits here:
1. Composability - transformations are isolated from their input and output contexts and thus reusable in many contexts.
2. Performance - the sequence example above allocates two large intermediate sequences. The transduce version does no intermediate allocation. Additionally, some collection inputs are "internally reducible" (including range in 1.7) which allows them to be traversed more quickly via reduce/transduce contexts. The performance benefits of these changes can be dramatic as you increase the size of input data or number of transformations. If you happen to have your data in an input collection (like a vector), into with transducers can be used to move data directly from one vector to another without ever producing a sequence.
3. Eagerness - if you know you will process the entire input into the entire output, you can do so eagerly and avoid overhead. Or you can get laziness via sequence (although there are some differences in what that means with transducers). The upshot is you now have more choices.
4. Resource management - because you have the ability to eagerly process an external input source, you have more knowledge about when it's "done" so you can release the resource.
If you don't know Clojure, you should know that Clojure has a sort of generic collection type called a 'sequence'. Arrays, Lists and Lazy Sequences are data structures that implement the sequence interface. There is a standard set of functions, 'map', 'reduce', 'filter' that operates on that sequence interface and provide a common set of functions for different kinds of datatypes. This lets the programmer reuse code and idioms and makes programs easier to understand overall.
The problem is that there are datatypes, such as streams and channels, which are fundamentally sequence abstractions but for one reason or other don't implement the 'sequence' interface. You could wrap them in a lazy sequence but that can be complex and will introduce a performance penalty. When they wrote core.async, they the authors found themselves reimplementing 'map', 'reduce' etc, for channels.
Transducers are a way to provide sequence operations to any kind of collection split the part that interfaces with the collection from the operation. This way we can dispense with the old 'map', 'filter', etc and just use transducers for everything. Because they're implemented with reducers, they're faster as well.
It improves the performance by fusing some of the operations doing them in one pass instead of multiple passes, and it does so generically - operations are composed regardless of the source, and the implementation isn't looking at the type of the source at all.
I believe one of the arguments was also that these couldn't be written in statically typed languages. Although, I do not know if this turned out to be true.
>I believe one of the arguments was also that these couldn't be written in statically typed languages. Although, I do not know if this turned out to be true.
It's not true, they were just abstrusely defined and there's some weirdness around the implicit effects that you can choose to either ignore or incorporate into the equivalent.
Fusion is one of the main things Haskell is known for and an API that gets close enough to what one would use transducers for is `Control.Lens.Fold`.
Rich didn't claim you couldn't implement it. He just claimed that some aspects of transducers are difficult to represent as types. Or at least, that's how I took it.
My use cases are 1) to emulate yield / a pipeline of generators 2) to avoid intermediate object creation. My code is an indirect port of some Haskell code, so it's a tad awkward, but overall Transducers were a small perf win over reducers (which was a big perf win over lazy seqs) and also provide a good effectful-ish model for my pipeline.