I can't wait for someone to figure out automatic integration (similar to automat...

laplacesdemon48 · on Sept 9, 2020

Definitely not an expert but there's a fundamental hurdle to this.

One way to think of this is to realize how easy it is to multiply numbers but how much more work it takes to divide numbers.

For something like automatic differentiation, you're essentially applying the chain rule for partial derivatives repeatedly. This is analytically pretty straightforward to do for most applications. All you need is an analytical derivative for the simple functions your more-complex function is comprised of (e.g. a neural network).

For integration, the analogue of the chain rule is integration by substitution [1]. The toolbox for solving integration problems is more limited than for differentiation. You run into issues where the answer cannot even be expressed using standard mathematical notation [2]. Sometimes you get lucky and the answer can be expressed via an alternating Taylor series so you can estimate the answer within some margin of error [3].

Stan is a piece of software that runs state-of-the-art MCMC methods to basically just compute fancy integrals. A Stan model will take an order of magnitude more time to run than a simple neural network via something like PyTorch on the same dataset. But they answer different questions.

[1] https://math.stackexchange.com/questions/1635949/is-there-a-...

[2] https://math.stackexchange.com/questions/1397132/why-cant-so...

[3] https://math.stackexchange.com/questions/145087/how-to-calcu...

lordgrenville · on Sept 9, 2020

Slightly tangential fact I heard recently on a podcast [0]: historically the problem of integration preceded that of differentiation. But because differentiation is so much easier, we teach that to kids first. It's one of the many ways that schools obscure the intuition of calculus and turn it into so many formulae to be memorised.

[0] https://newbooksnetwork.com/david-bressoud-calculus-reordere...

ogogmad · on Sept 9, 2020

You're assuming that you want to express the integral as an elementary function, or as a member of another narrow class of functions. It's only in that case that the lack of a chain rule is a problem. What you actually want is the ability to evaluate the integral to within epsilon numerically, which is a much more flexible problem.

But then again, it's not clear what the OP meant by "automatic integration" anyway.

adamnemecek · on Sept 9, 2020

You are right, however the same way with automatic differentiation, you never get the expression, with automatic integration it would be the same.

laplacesdemon48 · on Sept 9, 2020

Can you please elaborate on "you never get the expression"?

When you input the variable values into a symbolic derivative you just get a value at the end. d/dx x^2 = 2x. If x = 0.5 then d/dx = 1. The same is true for symbolic integrals. For most practical applications, we don't really care about the full symbolic expression. We just want the answer, or at least a good approximation. This post uses a specific example of the difference between two Beta distributions. We want to get that 0.71. It is very hard to "automatically" make that happen.

foxes · on Sept 9, 2020

For algebraic expressions you get an expression if you expand in dual numbers.

But what about something like cos(x). Either you can lazily evaluate the power series or you know its sin(x).

adamnemecek · on Sept 9, 2020

Are you familiar with automatic differentiation? https://blog.demofox.org/2014/12/30/dual-numbers-automatic-d... note that you get f(x) and f'(x) for some x without getting the derived function.

justinpombrio · on Sept 9, 2020

Ooh, that's a neat technique. I think you can get an expression for the derivative from it.

Say we want the derivative of f(x) = x^3 - 2x^2 + 5. That becomes:

(x+e)^3 - 2(x+e)^2 + 5

= x^3 + 3x^2e - 2x^2 -4xe + 5

= (x^3 - 2x^2 + 5) + (3x^2 - 4x)e

The term in front of 'e' is "3x^2 - 4x", which is f'(x).

judofyr · on Sept 9, 2020

Just to clarify: This result follows directly from the definition of the derivative:

f'(x) = lim_{e->0} (f(x+e) - f(x))/e

If you're able to express f(x+e) on the form (f(x) + y e) then it follows that y is the derivative.

It also should be noted that auto-diff doesn't let you skip the rules for derivation and you're using the same calculation as you would to show that e.g. f'(x^n)=n*x^{n-1}.

srean · on Sept 9, 2020

Closed form integration is fundamentally different.

What's the integral of exp( sqrt ( 1 + (tan^(3/2) X)2 ) ) ) ?

We only know a handful of forms that can be integrated in closed form and its down to our creativity to discover new forms that can be integrated (same deal with solving differential equations and the reasons are the same).

The forms that we know how to integrate can be done by a computer. CAS tools will do that for you. For example Mathematica.

adamnemecek · on Sept 9, 2020

Are you familiar with automatic differentiation?

judofyr · on Sept 9, 2020

The reason why auto-diff is exact and amazing is because (1) the derivative operation is "decomposable" through the chain rule and (2) we've found the symbolic derivative of the basic functions. There's not a mathematical trick in auto-diff which we recently discovered. It's mainly a reformulation of the properties of the derivative which turns out to be useful for computers.

Finding an "auto-integrate method" would probably involve finding a way of calculating the integral in a decomposable way, and that indeed would be amazing, but I don't really see that happening any time soon.

srean · on Sept 9, 2020

I am. Try the integration analogue of it on the fairly simple example i gave. The auto-diff of that is a straightforward.

adamnemecek · on Sept 9, 2020

Right. I'm still working out the autoint thing so no. I can't try it right now.

tlarkworthy · on Sept 9, 2020

U made me think integration approximations on GPU might be a valid optimization.

eeegnu · on Sept 9, 2020

There's a nice discussion here: https://math.stackexchange.com/questions/20578/why-is-integr...

on why integration is a fundamentally harder problem than differentiation. New techniques would be required to programatically analytically solve integrals as well as current differentiation programs, but it would be exciting to see.

adamnemecek · on Sept 9, 2020

I actually have an idea about how it would work that side steps all of this. I need more time to develop it but watch this space. But I agree with the local vs global relationship, in fact the whole trick is about handling this.

mlthoughts2018 · on Sept 9, 2020

The NUTS algorithm in hybrid monte carlo would be pretty close to this.

The problem is that integration suffers from the curse of dimensionality much worse than differentiation.

Differentiation is a local activity. You approximate a derivative near any point. Integration is non-local, you must aggregate a function over a span of points in the domain space.

As the dimensions get higher, differentiation merely must apply local approximation to each successive dimension of a point in a grid. It only adds the computation cost.

Integration requires exponentially more data (for a given level of accuracy) because adding a new dimension enlarges the entire integration domain space that must be ranged over, and worse you generally need all that new “volume” of the enlarged space to be covered evenly to avoid bias.

So there are some fundamental asymmetries between differentiation and integration that play a role in why “automatic integration” isn’t as straightforwardly feasible as for differentiation.

srean · on Sept 9, 2020

The selling point of Monte Carlo is that regardless of dimension the estimation error is going to go down as 1/sqrt(n). That hides the dimension part in the 'constants' and the sampling routine.

Usually one cannot afford to visit all of the space. One visits high density regions preferentially.

mlthoughts2018 · on Sept 10, 2020

Yes, agreed, I just meant why we won’t see “automatic quadrature” in the same sense as “automatic differentiation”. It’s always going to rely on adaptive sampling, for exactly this reason of dimensionality.

hervature · on Sept 9, 2020

I think you will be waiting a long time. I hope this helps elucidate why:

https://math.stackexchange.com/questions/1635949/is-there-a-...

ogogmad · on Sept 9, 2020

Integration and differentiation are opposites:

Integration is well behaved numerically, but poorly behaved algebraically.

Differentiation is poorly behaved numerically, but well behaved algebraically.

Therefore if one wants to do "automatic integration", one has to approach it in a radically different way to autodiff. And arguably, such a method does already exist; it's just very inefficient.

srean · on Sept 9, 2020

> Integration is well behaved numerically, but poorly behaved algebraically.

> Differentiation is poorly behaved numerically, but well behaved algebraically.

This is so pithy and true, am stealing that.

adamnemecek · on Sept 9, 2020

This assumes a little too much about how automatic differentiation would work.

hervature · on Sept 9, 2020

Based on the rest of your comments, I'm not sure you know that automatic differentiation is a well established technique that is 40+ years old.

adamnemecek · on Sept 9, 2020

Based on your comment, I'm not sure you know that dual numbers, the foundation of autodiff, is 150+ years old.

hervature · on Sept 9, 2020

> This assumes a little too much about how automatic differentiation would work.

The use of 'would' implies that automatic differentiation has not been established. Also, if something is 40+ years old, then it follows that its foundation is even older.

adamnemecek · on Sept 10, 2020

I meant automatic integration.

konjin · on Sept 9, 2020

It has been figured out: https://en.wikipedia.org/wiki/Risch_algorithm

The problem is that it's unknown if a symbolic equality algorithm exists for elementary functions and what those functions should be.

It is known that the general case is undecidable: https://en.wikipedia.org/wiki/Richardson%27s_theorem

adamnemecek · on Sept 9, 2020

Risch is not automatic.

konjin · on Sept 9, 2020

Because there is no known algorithm for deciding equality. It becomes automatic if we assume an oracle. Since deciding equality is an open problem and the answer is probably 'no there isn't a decidable equality algorithm' it is as good an answer as we are going to get.

adamnemecek · on Sept 9, 2020

You are assuming that we care about expressing the integral as a function. Given that with autodiff, we don't actually get the function expression, presumably with autodiff, we also wouldn't get it.

konjin · on Sept 9, 2020

In automatic differentiation we depend on the fact that a closed form elementary solution exists even if we don't calculate it directly.

For integration we need to prove that such a object exists for each expression we want to integrate before we try integrating. Which as far as I'm aware is a problem equivalent to that solved by the Risch algorithm.

ajuc · on Sept 9, 2020

> For integration we need to prove that such a object exists for each expression we want to integrate before we try integrating.

Can't we just integrate away and check if it's correct by differentiating it back?

EDIT: oh, I guess that's why we need to be able to check for function equality :)

konjin · on Sept 9, 2020

It's been over a decade since I studied it, but from memory you could also end up in infinite loops when you start generating the closed form solution if it doesn't exist. Which with as much hand waving as possible is ultimately equivalent to the computer function form auto integration would produce.

ajuc · on Sept 9, 2020

These things seem impossible in theory but in practice you just set a time limit.

konjin · on Sept 9, 2020

Math algorithms are the one area where good enough really isn't good enough.

adamnemecek · on Sept 9, 2020

OK but that's not how automatic differentiation would work?

alew1 · on Sept 9, 2020

In a sense, probabilistic programming languages are automatic (approximate) integrators. Programs can represent arbitrary measures, and the back-ends provide various methods of estimating integrals with respect to them.

kgwgk · on Sept 9, 2020

And how would the similarity work? The derivative is defined at a point, it makes sense to calculate f(x) and f’(x) together. What integral are you talking about precisely?

adamnemecek · on Sept 10, 2020

An integral between the two extrema.

kgwgk · on Sept 11, 2020

The extrema of the function are the min/max values. If you think a bit about it, you’ll notice that what you write doesn’t make much sense.

ogogmad · on Sept 9, 2020

Integration is a computable operator ([1]), so arguably such a thing does already exist. The difficulty though is in computing it efficiently, especially in high dimensions.

[1] - https://en.wikipedia.org/wiki/Computable_analysis#Basic_resu...

Hitton · on Sept 9, 2020

Differentiation is like squeezing toothpaste out of tube, integration is like pushing the paste back inside.