Autodiff does not work with for loops or if statements. The current solutions effectively pick a few promising traces through the program and then assume that nothing else exists. To handle it more elegantly (for things like preserving equational reasoning or avoiding exponential blowup) you need to address it at the level of language semantics.
> Autodiff does not work with for loops or if statements.
Is that necessarily true? Here is an incomplete automatic differentiation implementation that handles if statements just fine in a function definition. Unless you mean something else.
type Dual = {Real: float; Epsilon: float} with
static member (~-) (x: Dual) = {Real = -x.Real; Epsilon = -x.Epsilon}
static member (+) (x: Dual, y: Dual) = { Real = x.Real + y.Real
Epsilon = x.Epsilon + y.Epsilon }
static member (+) (x: Dual, c: float) = {x with Real = x.Real + c}
static member (+) (c: float, y: Dual) = {y with Real = c + y.Real}
static member (-) (x: Dual, y: Dual) = x + (-y)
static member (-) (x: Dual, c: float) = x + (-c)
static member (-) (c: float, y: Dual) = c + (-y)
static member (*) (x: Dual, y: Dual) = { Real = x.Real * y.Real
Epsilon = x.Real * y.Epsilon + x.Epsilon * y.Real }
static member (*) (c: float, y: Dual) = {Real = c; Epsilon = 0} * y
static member (*) (x: Dual, c: float) = x * {Real = c; Epsilon = 0}
let dcos (x: Dual) = {Real = cos x.Real; Epsilon = -(sin x.Real) * x.Epsilon}
let dsin (x: Dual) = {Real = sin x.Real; Epsilon = (cos x.Real) * x.Epsilon}
let differentiate (f: Dual -> Dual) a =
let x = f {Real = a; Epsilon = 1.0}
x.Epsilon
let testFunction (x: Dual) = if x.Real < 0.0 then dcos x else dsin (x*x - 3.0*x)
Now, of course, one needs to be careful interpreting the result at a = 0.0. That's because the testFunction is not differentiable at that point due to a jump discontinuity there, but we still get a value back. But as far as I know, this is simply an issue with automatic differentiation in that it only correctly tells you what the derivative is if it exists at the given point.
This "discretizes then differentiates" to borrow terminology from [1] which is one of the more accessible presentations and papers. The program might evaluate correctly, but equational reasoning (like you might want for any kind of automated optimizations) is broken. In a toy example like this where you're doing everything manually then you probably don't care, but for larger systems, it gets tiring to do the mental equivalent of assembly programming.
This isn't a toy example, though. It's the start of a library. Once you've developed the dual numbers and differentiate function and defined the dual number versions of all elementary functions, then you have a full (forward-mode) automatic differentiation library that can just be used. You wouldn't have to do anything manually. You'd just define your functions using this library instead of the built-in functions, since you can use the dual number functions both to differentiate or simply to evaluate (setting the dual part to 0).
> This "discretizes then differentiates"
Not sure what you mean. It defines dual numbers, then defines elementary functions on dual numbers (I only did two as an example). From there, you get differentiation for free (i.e., automatically). The only thing that was done manually was defining the testFunction. Everything else would be part of a library that you'd consume.
I'm not sure what you mean by "equational reasoning is broken".
Thank you for the link to the paper. Seems interesting, and I'll read through it more. Although, it is discussing differentiating integrals, which is where their language "discretize-then-differentiate" comes from. From this paper, I sort of get a sense of why differentiable programming might make sense as a concept, but I've only ever seen the term introduced with automatic differentiation, which is what I was balking at (given the content of the original post here). I'll keep reading this paper, but I think what you've mentioned before hasn't convinced me. Thanks for the discussion.
> In a toy example like this where you're doing everything manually then you probably don't care, but for larger systems, it gets tiring to do the mental equivalent of assembly programming.
But by this argument (which sounds plausible to me) you have defeated your previous claim that differential programming is really a new paradigm, as it seems you have adopted what bmitc wrote earlier, that differential programming is not a new paradigm but "seems like automatic differentiation just implemented properly".
There's no contradiction: autodiff is a method of implementing differentiable programming. In this example, it is implemented as a type that handles a trace of a program, but everything else is left to the programmer. This is a problem because most of the code I would want to write is not a single trace!
Analogously, I could write a program in C that does message sends and organizes code in a design pattern called "objects" and "classes". Incredibly painful, but workable sometimes. Some people even call it "object oriented C" and go on to create a library to handle it like [1]. Is object orientation not a paradigm because I've implemented a core piece as a library?
No, because that misses the intangible part of what makes a paradigm a paradigm: I structured my code this way, for a reason. In OOP, that reason is the compartmentalization of concerns. The underlying OOP mechanism gives me a way to reason about composition and substitution of components to minimize how much I have to reason about when writing code. Similarly, in differentiable programming, the differentiability of all things gives me a way to reason about the smooth substitution of things because it more easily lets me reason about how the machine writes code.
Seems we're arguing about definitions. Currently differentiable programming seems to be this vaguely defined term (I don't get what you mean by smooth substitution), with autodiff being its only (proper) instantiation.
You say autodiff is actually not representative of differentiable programming. But if there aren't any other good examples that illustrate
differentiable programming, how is differentiable programming (currently) more than autodiff?...
@bmitc: Reading your replies (some of which seem to have been written the same time I wrote mine), it seems we are on the same page; I'm also a mathematician and I also have some qualms with how people invent new names for automatic differentiation :)
I had a look at your bio and couldn't find any email address. Would you perhaps be interested in having a longer, scientific discussion about AD?