Definitely not an expert but there's a fundamental hurdle to this.
One way to think of this is to realize how easy it is to multiply numbers but how much more work it takes to divide numbers.
For something like automatic differentiation, you're essentially applying the chain rule for partial derivatives repeatedly. This is analytically pretty straightforward to do for most applications. All you need is an analytical derivative for the simple functions your more-complex function is comprised of (e.g. a neural network).
For integration, the analogue of the chain rule is integration by substitution [1]. The toolbox for solving integration problems is more limited than for differentiation. You run into issues where the answer cannot even be expressed using standard mathematical notation [2]. Sometimes you get lucky and the answer can be expressed via an alternating Taylor series so you can estimate the answer within some margin of error [3].
Stan is a piece of software that runs state-of-the-art MCMC methods to basically just compute fancy integrals. A Stan model will take an order of magnitude more time to run than a simple neural network via something like PyTorch on the same dataset. But they answer different questions.
Slightly tangential fact I heard recently on a podcast [0]: historically the problem of integration preceded that of differentiation. But because differentiation is so much easier, we teach that to kids first. It's one of the many ways that schools obscure the intuition of calculus and turn it into so many formulae to be memorised.
You're assuming that you want to express the integral as an elementary function, or as a member of another narrow class of functions. It's only in that case that the lack of a chain rule is a problem. What you actually want is the ability to evaluate the integral to within epsilon numerically, which is a much more flexible problem.
But then again, it's not clear what the OP meant by "automatic integration" anyway.
Can you please elaborate on "you never get the expression"?
When you input the variable values into a symbolic derivative you just get a value at the end. d/dx x^2 = 2x. If x = 0.5 then d/dx = 1. The same is true for symbolic integrals. For most practical applications, we don't really care about the full symbolic expression. We just want the answer, or at least a good approximation. This post uses a specific example of the difference between two Beta distributions. We want to get that 0.71. It is very hard to "automatically" make that happen.
Just to clarify: This result follows directly from the definition of the derivative:
f'(x) = lim_{e->0} (f(x+e) - f(x))/e
If you're able to express f(x+e) on the form (f(x) + y e) then it follows that y is the derivative.
It also should be noted that auto-diff doesn't let you skip the rules for derivation and you're using the same calculation as you would to show that e.g. f'(x^n)=n*x^{n-1}.
Closed form integration is fundamentally different.
What's the integral of exp( sqrt ( 1 + (tan^(3/2) X)2 ) ) ) ?
We only know a handful of forms that can be integrated in closed form and its down to our creativity to discover new forms that can be integrated (same deal with solving differential equations and the reasons are the same).
The forms that we know how to integrate can be done by a computer. CAS tools will do that for you. For example Mathematica.
The reason why auto-diff is exact and amazing is because (1) the derivative operation is "decomposable" through the chain rule and (2) we've found the symbolic derivative of the basic functions. There's not a mathematical trick in auto-diff which we recently discovered. It's mainly a reformulation of the properties of the derivative which turns out to be useful for computers.
Finding an "auto-integrate method" would probably involve finding a way of calculating the integral in a decomposable way, and that indeed would be amazing, but I don't really see that happening any time soon.
on why integration is a fundamentally harder problem than differentiation. New techniques would be required to programatically analytically solve integrals as well as current differentiation programs, but it would be exciting to see.
I actually have an idea about how it would work that side steps all of this. I need more time to develop it but watch this space. But I agree with the local vs global relationship, in fact the whole trick is about handling this.
The NUTS algorithm in hybrid monte carlo would be pretty close to this.
The problem is that integration suffers from the curse of dimensionality much worse than differentiation.
Differentiation is a local activity. You approximate a derivative near any point. Integration is non-local, you must aggregate a function over a span of points in the domain space.
As the dimensions get higher, differentiation merely must apply local approximation to each successive dimension of a point in a grid. It only adds the computation cost.
Integration requires exponentially more data (for a given level of accuracy) because adding a new dimension enlarges the entire integration domain space that must be ranged over, and worse you generally need all that new “volume” of the enlarged space to be covered evenly to avoid bias.
So there are some fundamental asymmetries between differentiation and integration that play a role in why “automatic integration” isn’t as straightforwardly feasible as for differentiation.
The selling point of Monte Carlo is that regardless of dimension the estimation error is going to go down as 1/sqrt(n). That hides the dimension part in the 'constants' and the sampling routine.
Usually one cannot afford to visit all of the space. One visits high density regions preferentially.
Yes, agreed, I just meant why we won’t see “automatic quadrature” in the same sense as “automatic differentiation”. It’s always going to rely on adaptive sampling, for exactly this reason of dimensionality.
Integration is well behaved numerically, but poorly behaved algebraically.
Differentiation is poorly behaved numerically, but well behaved algebraically.
Therefore if one wants to do "automatic integration", one has to approach it in a radically different way to autodiff. And arguably, such a method does already exist; it's just very inefficient.
> This assumes a little too much about how automatic differentiation would work.
The use of 'would' implies that automatic differentiation has not been established. Also, if something is 40+ years old, then it follows that its foundation is even older.
Because there is no known algorithm for deciding equality. It becomes automatic if we assume an oracle. Since deciding equality is an open problem and the answer is probably 'no there isn't a decidable equality algorithm' it is as good an answer as we are going to get.
You are assuming that we care about expressing the integral as a function. Given that with autodiff, we don't actually get the function expression, presumably with autodiff, we also wouldn't get it.
In automatic differentiation we depend on the fact that a closed form elementary solution exists even if we don't calculate it directly.
For integration we need to prove that such a object exists for each expression we want to integrate before we try integrating. Which as far as I'm aware is a problem equivalent to that solved by the Risch algorithm.
It's been over a decade since I studied it, but from memory you could also end up in infinite loops when you start generating the closed form solution if it doesn't exist. Which with as much hand waving as possible is ultimately equivalent to the computer function form auto integration would produce.
In a sense, probabilistic programming languages are automatic (approximate) integrators. Programs can represent arbitrary measures, and the back-ends provide various methods of estimating integrals with respect to them.
And how would the similarity work? The derivative is defined at a point, it makes sense to calculate f(x) and f’(x) together. What integral are you talking about precisely?
Integration is a computable operator ([1]), so arguably such a thing does already exist. The difficulty though is in computing it efficiently, especially in high dimensions.