Can't every constrained optimization problem be converted into an unconstrained one? For eg, instead of saying "minimize f(x) with the constraint g(x) = 0", can't we just say "minimize f(x) + abs(g(x)) * 10^30" ?
Also, is there a reason why most optimization texts (like this one) only discuss point optimization and not path optimization (i.e. calculus of variations) ?
You are not guaranteed to find optimal solutions (although you might find local minima)
For constrained optimisation ideally you should be using Lagrange multipliers. But even then there are `abnormal` cases where the Lagrange multiplier fails to give you an optimal solution. Roughly your constraint does not have enough information to give you a unique solution (dg insufficient rank to find solutions in df = \lambda dg).
In infinite dimensions (minimising a functional) a penalty metric is an even worse an idea. My favorite example is sub-Riemannian geometry [0]. People wanted to find sub-Riemannian geodesics - special curves that could only move in restricted directions. Early on people tried to use the penalty metric idea (making it really expensive to move in certain directions), but Montgomery [0] proved that this does not necessarily give optimal solutions. The limit of an equation is not necessarily the same as the limit of the solutions. In a sense they used Lagrange multipliers to examine the situation.
Looking at abnormal optimisation problems is an interesting area of research [1]. However if a local solution is "good enough" maybe you can get away with a penalty metric.
> For constrained optimisation ideally you should be using Lagrange multipliers.
Lagrange multipliers is for ̶u̶n̶c̶o̶n̶s̶t̶r̶a̶i̶n̶e̶d̶ optimization... you're probably thinking of KKT.
EDIT: D'oh... typed too quickly, thanks for the correction. I meant to write they can't handle inequalities, which for some reason is what I always assume when I hear constrained optimization...
Lagrange multipliers can be applied to solve optimization problems with equality constraints. KKT generalizes Lagrange multipliers to allow inequality constraints as well.
See the literature on exact penalty methods. I believe the short of it is yes, for a large class of problems this works, but the new problem will not be necessarily easier to solve. In the case of the abs(.) function, the nonlinearity at 0 makes subgradient methods slow, and the large constant in front of the abs(.) might prove numerically unstable.
Yes, all constrained optimization problems can be converted to unconstrained problems. For example, consider:
minimize f(x)
subject to x \in C
Let g(x)=f(x) if x \in C and infinity otherwise. Then
minimize g(x)
has the same solution as the original constrained problem. However, this conversion often conceals some of the structure of the original problem which can be exploited to solve the problem more efficiently.
Solving point problems as you have called them typically involves solving a sequence of least squares problems, which are simple to reason about and computationally efficient to solve. Solving a calculus of variations problem typically involves solving an integral or partial differential equation. Although there are theoretical similarities in practice they are pretty different.
> this conversion often conceals some of the structure of the original problem which can be exploited to solve the problem more efficiently
I had thought about this. My Q then is: Why do we study generalized versions of problems where the objective function is arbitrary f(x) instead of the specific function that we care about? Aren't we losing some potential efficiency here as well?
We definitely do study specific instances. For example, the other day an article was posted here on Support Vector Machines, which are a convex optimization problem having a special structure that is exploitable for fast solutions. For more examples, see Convex Optimization by Boyd and Vandenberghe, and the course notes for EE364a/b at Stanford. Modern Convex Optimization by Bertsekas is also good. I had the privelege of taking the convex optimization sequence at Stanford from Prof. Boyd, and one of the major take aways is that many practical problems can be solved in quadratic or even linear time. A general purpose solver typically runs in cubic time. For large problems (lots of variables), a special purpose solver can solve problems in seconds that would take a general purpose solver hours!
The book posted here is very much an introduction; even the EE364 sequence was merely a jumping off point for being able to explore the literature. Mathematical optimization, and especially convex optimization, is a deep and beautiful field!
That is indeed how the subject has been developed historically and often introduced in optimisation courses. First one studies Linear Programs (minimize a linear function with affine equality and affine inequality constraints), then Quadratic Programs (quadratic functions with affine constraints), Quadratically Constrained Quadratic Programs, etc. So we often assume very specific properties of not just the objective but also the constraints, and we get some extra juice in these cases. Through this process of seeing what results we can prove in more generality, we finally arrived at getting a good body of theory and algorithms for minimizing convex functions with convex constraints. In some courses they start with the general theory, as the top down approach can save some time and brain space.
To add to the answers already given, this is indeed a viable option to solve some problems, see e.g. the barrier method for linear programming. But then the constant 10^30 is gradually increased for numerical reasons.
I assume you mean norm rather than absolute value because absolute value would give a vector result?
But yes, but this is generally unhelpful since in general the behavior at a given point would give you almost no information regarding behavior at a nearby point. It's a bit like doing the reverse by turning the objective into a feasibility test. e.g. min f(x) is like min 0 : {f(x) <= f(x') for all x'}... you can do it, but it's not really helpful.
I can see that it will be unhelpful in most cases (because we're losing information by implementing constraints as penalties). But are there any cases at all where this would make the solution easier?
well you're technically solving a different problem in the second case, you won't always get the same answer and sometimes adding that can mess up your solver. However a very common method for handling tricky constraints is by "relaxing" them, by moving g(x) into the cost function, exactly as you propose.
I think trajectory optimization can still be viewed as a point optimization, and it's helpful to understand the fundamentals before jumping into applications.
a) this is not the same problem. The second equation you suggest could have a different minimum than the first, e.g. if f(x) goes to negative infinity but only when g(x) != 0.
b) because this is a much more complex topic, for which you need to know optimization first anyways.
This is a very high-level introduction. If you want a lot more meat, see Boyd & Vandenberghe, which is legitimately free online at http://web.stanford.edu/~boyd/cvxbook/. It doesn't have any material on global optimization for nonconvex problems, but that's not really an introductory topic anyway.
Also, is there a reason why most optimization texts (like this one) only discuss point optimization and not path optimization (i.e. calculus of variations) ?