Reverse-mode algorithmic differentiation using effect handlers in OCaml 5

agent281 · on Nov 19, 2022

This looks very intriguing, but I'm at a loss for what it is and how it works. Can someone provide an explanation at a level between a tweet and an academic paper?

credit_guy · on Nov 20, 2022

It looks like it's doing only addition and multiplication. The rest is left as an exercise to the reader.

Here's a much better introduction, that links you to an industrial-level autodiff library [1].

[1] https://sidsite.com/posts/autodiff/

cercatrova · on Nov 19, 2022

Ah algebraic effects systems, if only more languages had them. Then we wouldn't need keywords like async, mutable, immutable, etc, or function colors at all. Rust is beginning to add something like this.

bmitc · on Nov 19, 2022

I'm working on an implementation of forward-mode in F# and have been wanting to move into reverse-mode as well, so this will be a great resource.

chombier · on Nov 19, 2022

What does effect handling bring compared to building/evaluating an expression tree?

Is it faster/terser/more easily extensible?

Garlef · on Nov 19, 2022

Out of curiosity since I'm not that familiar with OCaml:

How would you make this extensible?

It seems that adding new functions besides add and mult would require modifications to `run`, right?

WastingMyTime89 · on Nov 19, 2022

Add and Mult are added as extension of the type Effect.t and run installs the effect handler. To add more operations you would have to extend Effect.t again then setup an handler somewhere.

The current effects syntax is very rough. It’s considered an experimental feature by the compiler team. Currently effects handlers are exposed to users as a thin wrapper on top of their implementations in the runtime. A better syntax will come but the Ocaml dev team wants to take the time necessary to get it right.

tinco · on Nov 19, 2022

I'm only superficially involved with machine learning, is it ever required for applications to implement their own differentation for something specific? In my line of work it's mostly transfer learning, maybe with small modifications to architectures and retraining. Is this only for researchers?

constantcrying · on Nov 19, 2022

Automatic differentiation gives you the ability to freely define the forward propagation of your neural network and you get backpropagation for free. The NN library "Flow" for Julia makes great use out of this. Having automatic differentiation makes it very simple to define novel layers, with very little work.

>is it ever required for applications to implement their own differentation for something specific?

If you want to do backpropagation you need to manually or automatically calculate derivatives woth regards to your parameters.

Buttons840 · on Nov 19, 2022

Do you mean Flux?

I've had to define my own gradients in Flux before, but that was two years ago, maybe things have improved?

constantcrying · on Nov 19, 2022

IIRC even two years ago you could get gradients by AD in Flux (you are completely correct about the name). Nowadays you have https://fluxml.ai/Flux.jl/stable/training/zygote to calculate gradients with AD.

In any case AD in general is useful for NNs if you want to implement a novel layer. Of course you could instead derive back propagation by algebraic or manual differentiation instead.

Buttons840 · on Nov 19, 2022

It was only one weird operation I was doing that needed a gradient defined. I should have made that more clear. I was using Zygote.