This is not true. Doomed is sensational. With proper perturbation analysis, one ...

ogogmad · on Aug 1, 2022

Won't work on my example. Given any differentiable function f:R -> R, you can add a very small function to it that dramatically changes its derivative. But maybe this is too pathological to bother about.

Over the complex numbers, this phenomenon doesn't happen. This can be shown, for instance, using the maximum modulus principle. You can therefore use Cauchy's Integral Formula to accurately estimate derivatives. But why not then just use the dual numbers?

goldenkey · on Aug 1, 2022

I see your edit - fair point, analyticity does make complex functions much more well-behaved than their real counterparts. Still, this isn't a death knell like you previously claimed.

Let's put up a bet. I win the bet if I provide a polynomial approximation of f'(x), R[f(x)], bounded by a maximum of 5% error, on the interval (-1, 1), by November 1st, 2022. If I fail to, or resign by then, you win.

If I win, you agree to silence yourself for 1 year, regarding the split-complex and dual numbers, no longer touting them as a panacea.

If I lose, I'll agree to make a YouTube video extolling the virtues of a mathematical subject of your choice, such as these types of numbers.

ogogmad · on Aug 1, 2022

I don't fully understand your bet (and I don't take kindly to being asked to silence myself). I don't have time at the moment to craft an example. I might play around a bit with the edges of floating point. Differentiation is indeed discontinuous, so with enough determination I should be able to win such a bet. My function would probably be a sigmoid function which would go from -a to +a (for small a) over a very small interval [-b,b], where b can equal a.

> If I win, you agree to silence yourself for 1 year, regarding the split-complex and dual numbers, no longer touting them as a panacea.

I never said these numbers were a panacea. I said the dual numbers were an exact method. In this comment [1], which is what you're referring to, I said that the connection with other planar algebras was cute, but you mischaracterised what I said, which was rude. The algebras are pertinent because the article mentioned the dual numbers.

[1] - https://news.ycombinator.com/item?id=32305644

goldenkey · on Aug 1, 2022

Fair enough. The bet I had in mind was the function you'd already stated as intractable: f(x) = a sin(x / a), the value of a is up to you, within reason (no IEEE fp shenanigans.)

If you'd rather some other consequence than silencing, please suggest one. I am tired of hearing your panaceas. Symbolic differentiation is also exact but intractible. What makes the duals yield exactness in any way that isn't prone to the same finitude as other approximations?

I see a lot of beasts but only one individual claiming to tame them with ease and exactness.

If it comes across as rude, that's not my intent. Moreso to have some clarity and to have your claims be validated.

ogogmad · on Aug 1, 2022

I'm not going to accept your bet because I don't understand its conditions (and what's the point?), but I accept your challenge. Also, I'm going to f*** around bigly with floating point.

  def f(x,a):
    return 1 + a * sin(x / a)

Now let epsilon = 2.2204460492503133e-36. Consider f(x, epsilon).

The function should always return 1 in floating point, so any finite differencing method will estimate its derivative as 0. But over the dual numbers, the function should have f(e, epsilon) = e, where e is the dual number imaginary. In other words, the dual numbers return the correct value of the derivative at x=0, which is 1. You should be able to use any Python implementation of the dual numbers; my own one uses Sympy. Actually, you might be able to use Scipy and represent dual numbers as matrices.

Look up automatic differentiation because you're not aware of it, and it is the subject of the article. You seem to misunderstand symbol methods. The problem of finding an exact derivative at a single point is not as hard as finding the symbolic derivative everywhere. You're referring to the product rule and chain rule, but these are not needed to find derivatives at single points.

goldenkey · on Aug 1, 2022

I am aware of AD. All AD does is use the chain rule on the Taylor (or otherwise) approximations of all arithmetic functions used in the calculation of the target function. It's symbolic but only at the single arithmetic operation level. The chain rule is concealed under rules on epsilon. The reason epsilon's square equals 0 is to eliminate any higher order derivatives as soon as they appear. If the derivative is abnormal in a way such that the normal truncated calculative formulas would conceal it, then AD would fail to give an accurate result as well.

ogogmad · on Aug 1, 2022

> If the derivative is abnormal in a way such that the normal truncated calculative formulas would conceal it, then AD would fail to give an accurate result as well.

The following is proof of otherwise. First implement the dual numbers by borrowing the code from here [1]. Then define the sine function in a crude way:

  from math import factorial
  
  def sin(x):
      return sum((-1)**i * x**(2*i + 1) / factorial(i) for i in range(100))

We then define f:

  def f(x, a=1e-36):
      return 1 + a * sin(x / a)

and use the dual numbers to estimate the derivative at 0:

  In [49]: print(f(Dual(0,{'epsilon':1})))
  f = 1.0
  fepsilon = 1.0

We get 1 as the estimate, which is correct! For comparison, we may try using finite differencing to estimate the derivative at 0:

  In [50]: h = 1e-36; (f(h) - f(0))/h
  Out[50]: 0.0

The estimate for the derivative is 0 here, which is wrong. The finite differencing cannot be improved using stencil theory because the estimate of f(x) is always exactly 1 under floating point, and therefore the estimate for f'(x) is always exactly 0. Dual numbers win.

[1] - https://github.com/ujjwalkhandelwal/Dual-numbers-and-automat...

ogogmad · on Aug 1, 2022

You've just described the dual numbers, which provide a way of implementing forward-mode autodiff.

  > If the derivative is abnormal in a way such that the normal
  > truncated calculative formulas would conceal it, then AD
  > would fail to give an accurate result as well.

What I'm seeing directly contradicts that. I've tested the example above (the one I called f(x,a) with a=1e-36, trying to find its derivative at x=0), and it gave the right answer with dual numbers, but not with finite differencing. So what you're saying isn't true. I'll post a minimal implementation here later.

[edit]

I've used Sympy out of laziness, which is inelegant. I don't know how clear this will be.

First of all, I have to change the number 1 in the definition of f(x,a) into a matrix. We need this because we'll be representing dual numbers as matrices, and you can't add the scalar 1 to a matrix:

  one = eye(2)
  
  def f(x, a=1e-36):
    return one + a * sin(x / a)

Sympy doesn't have its own implementation of `sin`, so we'll need to provide one:

  def sin(M):
    return im(exp(I*M).n())

At the dual number ε, we have f(ε) = f(0) + ε f'(0). The dual number ε will be represented as the matrix

  ⎡0  1⎤
  ⎢    ⎥
  ⎣0  0⎦

As code:

  epsilon = Matrix([[0,1],[0,0]])

Evaluating f on the above matrix gives:

  In: f(epsilon)
  Out: 
  ⎡1  1.0⎤
  ⎢      ⎥
  ⎣0   1 ⎦

Which tells us that f(0) ≈ 1 and f'(0) ≈ 1. Both are correct.

If we use finite differencing instead, we get:

  In: (f(h * eye(2)) - f(0 * eye(2)))/h
  Out: 
  ⎡0  0⎤
  ⎢    ⎥
  ⎣0  0⎦

So it's claiming incorrectly that the derivative at 0 is 0. No finite differencing scheme can fix this because floating point causes the function f to become constant.

Autodiff using the dual numbers has pulled off the seemingly impossible.

[edit: An early typo caused the code to produce an incorrect result. Now fixed, and my claim holds.]

goldenkey · on Aug 1, 2022

> Autodiff using the dual numbers has pulled off the seemingly impossible. It's not seemingly impossible, if you understand that the chain rule for f' is just being executed at the same time as the calculation of f by having derivatives for basic operations already defined.

However, like I said, if you have a calcuation method that hides derivatives in terms that have been truncated, then this will not save you.

(-1)*i * x*(2*i + 1) / factorial(i) is not one of those methods -- and sin is rather regular in regard to its Taylor series. So of course it works out in this case, with a normal power series calculation method.

Try a different series or calculation method, and dual numbers will get you a wildly different result. Understand, dual numbers only work well, when you use a method of calculation that front-loads terms that have high bearing on the derivative. Otherwise the missing terms/truncation, causes severe inaccuracy.

However, stenciling might actually perform better in these scenarios.

ogogmad · on Aug 2, 2022

> Try a different series or calculation method, and dual numbers will get you a wildly different result

No. The example works because:

  While the exact value of f(x,a) isn't 1, given any inexact representation of real numbers like floating point or fixed point, the value of "a" can be chosen so that f(x,a) has 1 as its closest representation.

Trying to compute f(x,a) differently isn't going to change that, so stencilling methods are never going to work here. But autodiff will always work. This means I win your challenge.

Your other claims are probably gibberish. You need to provide an example.

goldenkey · on Aug 2, 2022

What don't you understand -- the method of calculating the function is an approximation, thus the AD derivative is dependent on it, and the AD derivative is a derivative of the approximation, not the actual function. Whereas an approximation of the actual derivative is what we are truly after.

> The key take away here is that the map is not the territory. Most nontrivial functions on computers are implemented as some function that that approximates (the map) the mathematical ideal (the territory). Automatic differentiation gives back a completely accurate derivative of the that function (the map) doing the approximation. Furthermore, the accurate derivative of an approximation to the idea (e.g d_my_sin), is less accurate than and approximation to the (ideal) derivative of the ideal (e.g. my_cos). There is no truncation error in the work the AD did; but there is a truncation error in the sense that we are now using a more truncated approximation that we would write ourselves.

https://www.oxinabox.net/2021/02/08/AD-truncation-error.html

AD is great but if you have a calculation method ill-suited for AD, then you'll get shite results. Why is this surprising?

And yeah, stenciling is mostly for PDEs and other state spaces that we don't have a closed form for. It's generally not used for an analytic function. But you can use it for an analytic function if you tailor the stencil to the function. In fact, you'll just yield a truncated Taylor polynomial if you provide a perfect function-specific stencil.