Naive question: what are the most suitable problems that Genetic Programming is to solve, despite that machine learning especially deep learning is all the rage now? Or do we have models that integrate genetic programming into deep learning?
Real-time and energy-constrained problems would be the top use cases in my mind given the current state of affairs. Beyond this, problems that are non-differentiable or where we desire strong symbolic interpretation. I think these techniques also tend to extrapolate more robustly than DNNs. Technically, any problem where algorithmic shortcuts like the chain rule of calculus break down.
I've been working on GP architectures that use short linear program tapes as part of their genome. Executing these programs can be done very quickly. Assuming the program tape is <=10kb in length and you are constraining to ~a megacycle in the interpreter, you can execute these programs 100k~1m times per second on a chunky workstation. The problems you can solve in this time & space are non-trivial. Once you evolve the desired program, you can write the interpreter for it in any language on any platform within half an hour.
Training is definitely going to be expensive but it can fit in one powerful machine. You don't need to keep a terabyte of data around in VRAM to make reasonable rates of progress. Paging out parts of the population to disk or S3 buckets is feasible with these techniques. It subdivides very nicely into however many islands you can afford to run.
Inference is essentially free and could run on the processor in your toaster oven.
These approaches can make CUDA & friends look like a Rube Goldberg machine by comparison. Being able to explain how the entire thing works to a peer in a single afternoon is perhaps another significant feature of these designs.
At the end of the day, there is a pretty good reason these techniques aren't popular - There aren't any quadratic scaling laws that constrain the architecture of GP. Only the much scarier exponential ones. I contend that the search space of all linear programs is viable, but I still don't have any proof that this is the case after chasing the rabbit for about a year. My central argument is that there are many high-quality programs out there. We just need to find one of them. I think the arguments against this tend to unfairly cast the problem as searching for a single needle in a haystack of 10^10000.
Parametric programming problems are problems whose structure (most often sparsity structure) is known.
Most quadratic and linear programming solvers assume a fully generic problem structure. You give them any matrix and they'll spit out an answer. This means that internally they must also use generic matrix operations.
If you know the full structure of the optimization problem and how it changes with respect to changes in parameters, you could in principle compile a unique solver that is fully specialized to the parameterized problem. If your problem is particularly simple, you could even enumerate all possible answers in a lookup table.
Now here is the problem: While it is not that difficult to find the sparsity pattern of the optimization problem and choose the appropriate sparse matrix library based on the problem, actually producing a unique program that does nothing but solve your parametric optimization problem is a pipe dream. Most approaches involve a lot of human ingenuity to find exploitable structures in specific problems that become useless the moment you change the problem even a little bit.
So here is my suggestion: figure out how to use genetic programming to produce fast quadratic programming solvers to specific problems.
Note that you don't necessarily need a perfect solution since you can always run a bunch of QP steps on an initial guess to nudge it towards convergence. The goal would be to reduce the number of these steps to as close to zero as possible.
I have a hypothesis that the smallest possible program that can handle things like natural language translation will be much smaller than we are expecting it to be.
Anything where we are transforming information from one format to another on a ~1:1 basis without injecting additional external information (i.e., what foundation models do with their broad world knowledge).
I don't know what your expectation on the program's size is, but even for a constrained translation case (i.e. legal transcription between two Romance languages), I would expect you are embedding at least the size of WordNet in both languages.
Genetic Programming and the wider field of Evolutionary Computation and Genetic Algorithms are really good at some kinds of optimization.
If the problem fits into a tree form, then Genetic Programming programming is your best friend.
If the problem can be encoded onto a set of substrings of 0s and 1s, then Genetic Algorithms are your best friend.
Likewise if your problem can be encoded using floating point numbers, then Evolutionary Strategies is your best friend.
Now... I could ruffle some feathers by saying that Genetic Algorithms and Evolutionary Strategies aren't really all that different. Same algorithms could be used for both. But historicaly they they came roughly at similar times frome different places on earth, Illinois vs Germany.
Back to GP. The cool thing about GP is that when the solution is evolved you have THE solution to the problem. No futzing about with how or what the results mean.
A big problem in the past is that GP doesn't scale well like neural networks. It is not embarrassingly parallel. It has been very limited by the types of hardware/architectures available.
Note that genetic programming is a specific subset of genetic algorithms that focuses on searching for "programs" (hence the name), typically encoded in the form of a tree structure similar to an AST, though other representations exist. In theory, you could use GP for almost any situation where you'd want to synthesize a mathematical expression or piece of code for which you have a criteria to optimize (i.e. "fitness"). In practice, since GPs are basically just semi-randomly searching a huge combinatorial space, they tend to work best in low-dimensional problems, ideally with a fitness function that is cheap to evaluate. They can work nicely for finding nonlinear symbolic formulas for regression, for example. But there's also some other cool results over the years - Hod Lipson has published several cool results in robotics with them.
Until a few years ago, the popular deep learning methods like CNNs weren't great at that kind of thing, but LLMs definitely changed that - it's safe to say that by drawing on huge amounts of data LLMs are much better at most practical programming tasks. They're not necessarily a complete replacement though, there's definitely space for hybrid approaches, eg https://arxiv.org/abs/2401.07102 or https://www.nature.com/articles/s41586-023-06924-6.
I suppose with Genetic Programming, given an appropriate set of descriptive symbols, it is relatively easy to understand the final result and intuit if there is any over-fitting involved. On the other hand, machine learning results are typically a black box, the weights involved typically do not easily lend themselves to understanding the nuances of the solution.
Definitely not an expert here, but the main difference is that in machine learning the goal is to find/optimize algorithm to fit given datasets, whereas in genetic programming the goal is to find/optimize dataset to fit given algorithm. There is a lot of overlap though.
EDIT: > what are the most suitable problems that Genetic Programming is to solve
Given above, genetic algorithms are really suitable when part of your problem domain is bounded execution time. You perform n iterations of genetic algorithm and even if the result does not fit within fitness function, you still get some result that is closer to fitness function than a wild guess.
> in genetic programming the goal is to find/optimize dataset to fit given algorithm
No. Possibly you're confused between GAs and GP, a common confusion. In GP, the goal is to find an algorithm - in a real programming language, not as weights - to optimise an objective function. Often that is specified by input-output pairs, similar to supervised learning.
Big combinatorial problems still use genetic algorithms. Specifically I know it's still used for logistics routing problems that are too big for solvers like Gurobi.
Deep learning on graphs is unfortunately still a little underwhelming.