Hacker News new | past | comments | ask | show | jobs | submit login
Reverse-mode algorithmic differentiation using effect handlers in OCaml 5 (github.com/ocaml-multicore)
88 points by nequo on Nov 19, 2022 | hide | past | favorite | 12 comments



This looks very intriguing, but I'm at a loss for what it is and how it works. Can someone provide an explanation at a level between a tweet and an academic paper?


It looks like it's doing only addition and multiplication. The rest is left as an exercise to the reader.

Here's a much better introduction, that links you to an industrial-level autodiff library [1].

[1] https://sidsite.com/posts/autodiff/


Ah algebraic effects systems, if only more languages had them. Then we wouldn't need keywords like async, mutable, immutable, etc, or function colors at all. Rust is beginning to add something like this.


I'm working on an implementation of forward-mode in F# and have been wanting to move into reverse-mode as well, so this will be a great resource.


What does effect handling bring compared to building/evaluating an expression tree?

Is it faster/terser/more easily extensible?


Out of curiosity since I'm not that familiar with OCaml:

How would you make this extensible?

It seems that adding new functions besides add and mult would require modifications to `run`, right?


Add and Mult are added as extension of the type Effect.t and run installs the effect handler. To add more operations you would have to extend Effect.t again then setup an handler somewhere.

The current effects syntax is very rough. It’s considered an experimental feature by the compiler team. Currently effects handlers are exposed to users as a thin wrapper on top of their implementations in the runtime. A better syntax will come but the Ocaml dev team wants to take the time necessary to get it right.


I'm only superficially involved with machine learning, is it ever required for applications to implement their own differentation for something specific? In my line of work it's mostly transfer learning, maybe with small modifications to architectures and retraining. Is this only for researchers?


Automatic differentiation gives you the ability to freely define the forward propagation of your neural network and you get backpropagation for free. The NN library "Flow" for Julia makes great use out of this. Having automatic differentiation makes it very simple to define novel layers, with very little work.

>is it ever required for applications to implement their own differentation for something specific?

If you want to do backpropagation you need to manually or automatically calculate derivatives woth regards to your parameters.


Do you mean Flux?

I've had to define my own gradients in Flux before, but that was two years ago, maybe things have improved?


IIRC even two years ago you could get gradients by AD in Flux (you are completely correct about the name). Nowadays you have https://fluxml.ai/Flux.jl/stable/training/zygote to calculate gradients with AD.

In any case AD in general is useful for NNs if you want to implement a novel layer. Of course you could instead derive back propagation by algebraic or manual differentiation instead.


It was only one weird operation I was doing that needed a gradient defined. I should have made that more clear. I was using Zygote.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: