how is this different from boring old evolutionary algorithms? In my opinion the...

rectangletangle · on Aug 29, 2019

>Unless I see a mathematical proof that evolutionary algorithms are as efficient as RM AD, I see little future in it, and apparently neither did biology since it decided to create brains.

In the biological context the complexity is shifted away from the actual selection algorithm and onto the "scoring function." Although the filter of reproduction is relatively simple [1], the reason why the organism was fit and could ultimately reproduce is very complex. The brain's "topology" is the result of selection pressure favoring adaptations that improve fitness by dynamically adapting to relevant patterns found in the environment. The brain attempts to accurately model important aspects of the complex environment it's challenged with to increase fitness.

[1] Nothing in biology is ever actually simple, even though individuals undergo fitness based scoring, the actual notion of individual is arbitrary. In the case of bees most of the individual bees in the colony are sterile, which is obviously bad for fitness of the "individual." However a few individuals reproduce, so all the sterile worker's fitness is shifted onto the queen/drones through the concept of inclusive fitness. This indirect selection also applies to humans, who undergo inclusive fitness through their siblings. This can even be generalized to your individual cells/organs which function as a colony supporting the gonads which actually undergo direct selective pressure.

DoctorOetker · on Aug 29, 2019

with efficiency I meant computational efficiency:

consider the task of computing a gradient at a point p0 = <x1, x2, x3, ..., xN> in an N-dimensional space.

the naive approach was for a long time: compute the value of the score at p0, then for each coordinate compute the score for the same point but shifted a delta in the direction of that coordinate, i.e. the i-th component is computed as:

component_i = (score( <x1, x2, ..., xi+delta, ..., xN> ) - score (p0))/delta

thats N+1 evaluations or trials of the score to compute the final gradient <component_1, ..., component_N>

notice how reminescent this is of natural selection: the average of the last generation p0 is used to generate ~N trials, which then result in the average of the next generation shifting somewhat.

compare reverse mode automatic differentiation to calculate a gradient: one forward pass of the computation with one backward pass...

I am not complaining about the complexity of the fitness or scoring function, I am complaining about trial and error approaches, when we have discovered a rocket for differentiation!

rectangletangle · on Aug 29, 2019

Ah, in that sense it's almost certainly more efficient. Biological selection is undirected so progress toward the goal happens stochastically, so slow, but then sometimes fast.

xiler · on Aug 29, 2019

In the related work section of their paper they mention that WANNs are related to genetic programming [1], a subfield of evolutionary algorithms.

Genetic programming is quite a powerful tool. IIRC, a few years ago, a researcher evolved expressions to model the dynamics of a double pendulum based only on measured data. To his surprise he found that the expressions were the Lagrangian of the system.

[1] https://en.wikipedia.org/wiki/Genetic_programming

DoctorOetker · on Aug 29, 2019

but a double pendulum isn't complicated at all!

how many floats are there? 2 lengths, 2 initial angles, a 2-dimensional velocity, a mass, and a gravitational field strength? thats like 8 floats...

elcomet · on Aug 29, 2019

I don't agree with your breakthrough. Reverse mode automatic différenciation is really very simple and is not at all what got us there.

It's finding the right architectures that capture data priors and symmetries, and that can learn efficiently with stochastic gradient descent. Automatic differentiation is just a useful tool to use with those ideas.

DoctorOetker · on Aug 29, 2019

I don't dispute the importance of architecture!

But there was nothing new about architecture, essentially it is the choice of multidimensional function one tries to fit. In physics we have been fitting functions for hundreds of years. If you look at some experimental plot of say interference then you might decide to fit a sinusoid to it plus a background constant etc... the importance of fitting the right kind of function is obviously important, but we didn't know about algorithmic differentiation for hundreds of years (and it surely would have been welcome back then, even if performed by hand, it beats trial and error gradients).

That RM automatic differentiation is simple is easy to say in hindsight!

I don't think a richer diversity of functions is a bad idea, but it's already being used, softmax, exponents, sums, squares, ... why not perform gradient descent over a differentiable family of function that encompass these?

It's really disingenious to pretend RM AD was so very simple and then watch approvingly how someone throws it out the window and reverts to ... genetic programming? You want to let the computer find the best functions? fine, but then give the computer a superfunction which for certain values of an extra parameter differentiably reaches the functions you want to be considered.

Most of the architectures ... end up looking suspiciously much like plain old statistical physics! It's like we repeatedly witness how yet another introductory statistical physics expression turns out to perform well on very general sets of tasks (it really comes across as if everything should be treated like a dumb mole of water, and we never tried before because we simply refused to believe it could be that simple).

elcomet · on Aug 29, 2019

I'm not sure I understand what you're saying. I think you're talking about gradient descent, not rm autograd. Gradient descent, which is a breakthrough dates back to the 19th century. That's what allow us to approximate gradients. RM autograd is a clever implement detail of this (how to compute gradients efficiently).

buboard · on Aug 29, 2019

Evolutionary algorithms are perhaps easier to approach theoretically. Backprop works amazingly but how does one begin to approach why. There is also an element of backprop in evolution via epigenetics

p1esk · on Aug 31, 2019

Evolutionary algorithms are perhaps easier to approach theoretically

Why?

DoctorOetker · on Aug 31, 2019

>Evolutionary algorithms are perhaps easier to approach theoretically.

I would certainly welcome recommendations on theoretical approaches of evolutionary algorithms, both for my own review (there is always new insights to be gained), and to have better quality pointer when I know from discussion that the counterparty would benefit greatly from insights into things like the Fischer equations etc:

From higher to lower preference, by format:

1. Open courseware

2. Books

3. Reviews (in the scientific article sense)

4. Articles (same)

From lower to higher preference, by content:

1. Must include a rigorous modern mathematical treatment of the "modern synthesis"

2. The same as 1), but also including information theory connections.

3. The same as 1), but also including the post-modern synthesis ideas.

4. The same as 1), but also including both 2) and 3)

And from highest to lowest preference, by presentation:

1. Theoretical, with numerical exercises (of equations), with numerical simulations, and theorems and proofs.

2. Theoretical, with numerical simulations.

3. Theoretical.

>Backprop works amazingly but how does one begin to approach why.

In my opinion, we fully understand why backpropagation works, but we are still mystified by the exact meaning of the weights and architecture (although we slowly start understanding facets here and there). Another issue is: if a person makes a claim, we can ask why, and the person will explain in human terms why he believes something. Currently the network itself does not understand our question of why, so we concoct mathematical tricks to "ask" the network why, which is not entirely the same thing as having a network make a conclusion and then explain why interactively. But the backpropagation itself is entirely understood IMHO: it's optimization for a better score.

>There is also an element of backprop in evolution via epigenetics

I only ever use the word epigenetics in arguments against the usage of the concept. As far as I can tell, everyone seems to reference a different concept or anomaly or deviation with epigenetics. Its like talking about "new physics" without specifying what unexplained phenomenon it hypothesises or adresses. Even worse: sometimes it simultaneously proposes a mechanism and a hypothetical unobserved deviation. There is no agreement on what it is it has to explain nor on how it is to be explained. How can I even comment on "backpropagation in evolution via epigenetics" then?

[At least in the operation of the brain there is a very clear dichotomy between observed interactions between neurons in the brain" and training a digital neural network on a computer: we perform backpropagation in our algorithms, but to my knowledge we have not unambiguously identified the biological mechanism through which it arises. We know that IF the brain uses reverse-mode automatic differentiation that it must entail retrograde signalling from the post-synaptic to the pre-synaptic neuron, but such feedback mechanism has not been positively identified to my knowledge]

The most common interpretation is this, from wikipedia:

>Epigenetics most often denotes changes that affect gene activity and expression, but can also be used to describe any heritable phenotypic change. Such effects on cellular and physiological phenotypic traits may result from external or environmental factors, or be part of normal development. The standard definition of epigenetics requires these alterations to be heritable,[3][4] in the progeny of either cells or organisms.

To the extent it refers to simply cellular differentiation why not simply state "cellular differentiation". Cellular differentiation is fully understood in a conceptual level and does not require a second storage mechanism besides DNA: simple concentration levels suffice. The affinities of binding regions, and the chemical reaction constants set up a differential equation that can be theoretically simulated (say numerically by the Gillespie algorithm). The same unique DNA code admits multiple cell types: how? The homeostatis response is multistable, just like 2 identical flip flops from the same manufacturing line can memorize different states: if the concentration or voltage wanders from a stable equilibrium point, the flip flop will correct it back to the nearest stable equilibrium point. Pull it over the unstable equilibrium point and it will switch to the other stable equilibrium point. This may or may not involve histones etc, but those are chemical species like any other in the cell. Even without histones you can have a single feedback mechanism (specified by the genome) that supports multiple stable points (the cell types), just like the brain does not need a homunculus for its identity, so the cell does not need a cell type-unculus to remember its cell type, it just looks at the current cellular concentrations and homeostatically corrects it in a direction and rate that is a function of the current cellular concentrations. It is my interpretation that a lot of epigenetic talk, and histones and methylation are a kind of search for the unnecessary "cell type-unculus", stemming from a lack of knowledge that a single set of differential equations can imply multiple stable points. Those who are unaware in this way would benefit from reading the very accessible book by Steven Strogatz "Chaos and Nonlinear Dynamics".

For a single cellular organism, this explains heritable information that is not encoded in DNA: after cellular division the daughter cells have roughly identical concentrations as the mother cell did.

For a multi-cellular organism, this explains cell types.

[EDIT: Perhaps a simpler way to express this argument is using a simpler model for gene regulation, instead of continuous concentration levels, pretend they are binary, as in binary (on or off) concentrations: a combinational logic circuit has no memory, but a sequential one, where some outputs are fed back as inputs can have memory, so upon cellular division the state of the daughter cell boolean concentrations will be nearly identical as the mother cell concentrations (apart from those concentrations involved in cellular division itself), so they will have the same cell type as the mother cell A -> A + A (unless it crucially depends on some of the signals involved in the process of cellular division, which allows A -> A + B, or A -> B + C, where A,B,C are distinct cell types), in theory extracellular signals entering the cell could prompt it to change cell type too: A + signal -> B]

About say methylation as an explanation for heritable traits for multicellular organisms: how does one propose that the methylation state is copied when DNA is duplicated during cell division?

Any time you have the urge to use the word "epigenetics" ask yourself if you perhaps just mean "cellular differentiation", if you can be precise, why not be precise?