Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

how is this different from boring old evolutionary algorithms?

In my opinion the big breakthrough that enabled optimization and machine learning was the discovery of reverse mode automatic differentiation, since the space or family of all possible decision-functions is high dimensional, while the goal (survival, reproduction) is low dimensional. Unless I see a mathematical proof that evolutionary algorithms are as efficient as RM AD, I see little future in it, and apparently neither did biology since it decided to create brains.

It's not an ideological stance I take here (of nature vs nurture).

For simplicity, lets pretend humans are single-cellular organisms, what does natural selection exert pressure on? our DNA code: both the actual protein codes and the promotor regions. I claim that variation on the proteins are risky (a modification in a proteinn coding region could render a protein useless) while a variation on the promotor regions is much less risky: altering a nucleotide there would slightly affect the affinity modulating transcription, so the cell would behave essentially the same but with different treshold concentrations, think of continuous parameters that describe our body (assuming same nurture, food, etc) some people are a bit taller, some people a bit stronger, etc... so how many of these continuous parameters do we have? On the order of the same number as the total number of promotor regions in DNA in the fertilized egg: both on human DNA and in one mitochondria (assuming there isn't a chemical signals addressing and reading and writing scheme for say 10 mitochondria)...

EDIT: just adding that for a certain fixed environment, there are local (and a global) optimum of affinity values for each protein, so that near a local optimum the fitness is roughly shaped like -s(a-a_opt)^2 where s is spread and a_opt the local optimum affinity value. In other words, it is not so that "better affinity", means fitter, not at all, a collection of genomes from an identical environment will hover around an affinity sweet spot.

According to wikipedia [0] that would result in about

about 2x 20412 "floats" for just protein-coding genes

about 2x 34000 "floats" when also including the pseudo-genes

about 2x 62000 "floats" when also including long ncRNA, small ncRNA, miRNA, rRNA, snRNA, snoRNA

these "floats" are the variables that allow a species to modulate the reaction constants in the gene regulatory network, since natural selection can not directly modulate the laws of physics and chemistry, and modulating the protein directly instead of the promotor region affinities / reaction rates risks disfunctional proteins...

so my estimate of an upper limit of the number of "floats" in the genetic algorithm is ~120000 (and probably much less if not each of the above has a promotor region).

thats not a lot of information, if we think about the number of synaptic weights in the brain, and many of these are shared in utilization by the other cell types besides neurons.

I consider the possibility that: sperm cell, egg cell, or fertilized egg cell performs a kind of POST (power-on-self-test) that checks for some of the genes, although simply reaching the fertilized state may be enough of a selftest so no spontaneous abortion test may be needed (to save time and avoid resources spent on a probably malformed child).

[0] https://en.wikipedia.org/wiki/Human_genome#Molecular_organiz...

EDIT2: regarding:

>This makes WANNs particularly well positioned to exploit the Baldwin effect, the evolutionary pressure that rewards individuals predisposed to learn useful behaviors, without being trapped in the computationally expensive trap of ‘learning to learn’.

The computationally expensive trap of having to 'learn to learn' could end up being as mundane as a low number of hormones to which neurons in the brain globally or collectively respond, which enables learning by reward or punishment, and from then on anticipating reward or punishment, and our individual end goal stems from this anticipation, and anticipating the anticipation etc...



>Unless I see a mathematical proof that evolutionary algorithms are as efficient as RM AD, I see little future in it, and apparently neither did biology since it decided to create brains.

In the biological context the complexity is shifted away from the actual selection algorithm and onto the "scoring function." Although the filter of reproduction is relatively simple [1], the reason why the organism was fit and could ultimately reproduce is very complex. The brain's "topology" is the result of selection pressure favoring adaptations that improve fitness by dynamically adapting to relevant patterns found in the environment. The brain attempts to accurately model important aspects of the complex environment it's challenged with to increase fitness.

[1] Nothing in biology is ever actually simple, even though individuals undergo fitness based scoring, the actual notion of individual is arbitrary. In the case of bees most of the individual bees in the colony are sterile, which is obviously bad for fitness of the "individual." However a few individuals reproduce, so all the sterile worker's fitness is shifted onto the queen/drones through the concept of inclusive fitness. This indirect selection also applies to humans, who undergo inclusive fitness through their siblings. This can even be generalized to your individual cells/organs which function as a colony supporting the gonads which actually undergo direct selective pressure.


with efficiency I meant computational efficiency:

consider the task of computing a gradient at a point p0 = <x1, x2, x3, ..., xN> in an N-dimensional space.

the naive approach was for a long time: compute the value of the score at p0, then for each coordinate compute the score for the same point but shifted a delta in the direction of that coordinate, i.e. the i-th component is computed as:

component_i = (score( <x1, x2, ..., xi+delta, ..., xN> ) - score (p0))/delta

thats N+1 evaluations or trials of the score to compute the final gradient <component_1, ..., component_N>

notice how reminescent this is of natural selection: the average of the last generation p0 is used to generate ~N trials, which then result in the average of the next generation shifting somewhat.

compare reverse mode automatic differentiation to calculate a gradient: one forward pass of the computation with one backward pass...

I am not complaining about the complexity of the fitness or scoring function, I am complaining about trial and error approaches, when we have discovered a rocket for differentiation!


Ah, in that sense it's almost certainly more efficient. Biological selection is undirected so progress toward the goal happens stochastically, so slow, but then sometimes fast.


In the related work section of their paper they mention that WANNs are related to genetic programming [1], a subfield of evolutionary algorithms.

Genetic programming is quite a powerful tool. IIRC, a few years ago, a researcher evolved expressions to model the dynamics of a double pendulum based only on measured data. To his surprise he found that the expressions were the Lagrangian of the system.

[1] https://en.wikipedia.org/wiki/Genetic_programming


but a double pendulum isn't complicated at all!

how many floats are there? 2 lengths, 2 initial angles, a 2-dimensional velocity, a mass, and a gravitational field strength? thats like 8 floats...


I don't agree with your breakthrough. Reverse mode automatic différenciation is really very simple and is not at all what got us there.

It's finding the right architectures that capture data priors and symmetries, and that can learn efficiently with stochastic gradient descent. Automatic differentiation is just a useful tool to use with those ideas.


I don't dispute the importance of architecture!

But there was nothing new about architecture, essentially it is the choice of multidimensional function one tries to fit. In physics we have been fitting functions for hundreds of years. If you look at some experimental plot of say interference then you might decide to fit a sinusoid to it plus a background constant etc... the importance of fitting the right kind of function is obviously important, but we didn't know about algorithmic differentiation for hundreds of years (and it surely would have been welcome back then, even if performed by hand, it beats trial and error gradients).

That RM automatic differentiation is simple is easy to say in hindsight!

I don't think a richer diversity of functions is a bad idea, but it's already being used, softmax, exponents, sums, squares, ... why not perform gradient descent over a differentiable family of function that encompass these?

It's really disingenious to pretend RM AD was so very simple and then watch approvingly how someone throws it out the window and reverts to ... genetic programming? You want to let the computer find the best functions? fine, but then give the computer a superfunction which for certain values of an extra parameter differentiably reaches the functions you want to be considered.

Most of the architectures ... end up looking suspiciously much like plain old statistical physics! It's like we repeatedly witness how yet another introductory statistical physics expression turns out to perform well on very general sets of tasks (it really comes across as if everything should be treated like a dumb mole of water, and we never tried before because we simply refused to believe it could be that simple).


I'm not sure I understand what you're saying. I think you're talking about gradient descent, not rm autograd. Gradient descent, which is a breakthrough dates back to the 19th century. That's what allow us to approximate gradients. RM autograd is a clever implement detail of this (how to compute gradients efficiently).


Evolutionary algorithms are perhaps easier to approach theoretically. Backprop works amazingly but how does one begin to approach why. There is also an element of backprop in evolution via epigenetics


Evolutionary algorithms are perhaps easier to approach theoretically

Why?


>Evolutionary algorithms are perhaps easier to approach theoretically.

I would certainly welcome recommendations on theoretical approaches of evolutionary algorithms, both for my own review (there is always new insights to be gained), and to have better quality pointer when I know from discussion that the counterparty would benefit greatly from insights into things like the Fischer equations etc:

From higher to lower preference, by format:

1. Open courseware

2. Books

3. Reviews (in the scientific article sense)

4. Articles (same)

From lower to higher preference, by content:

1. Must include a rigorous modern mathematical treatment of the "modern synthesis"

2. The same as 1), but also including information theory connections.

3. The same as 1), but also including the post-modern synthesis ideas.

4. The same as 1), but also including both 2) and 3)

And from highest to lowest preference, by presentation:

1. Theoretical, with numerical exercises (of equations), with numerical simulations, and theorems and proofs.

2. Theoretical, with numerical simulations.

3. Theoretical.

>Backprop works amazingly but how does one begin to approach why.

In my opinion, we fully understand why backpropagation works, but we are still mystified by the exact meaning of the weights and architecture (although we slowly start understanding facets here and there). Another issue is: if a person makes a claim, we can ask why, and the person will explain in human terms why he believes something. Currently the network itself does not understand our question of why, so we concoct mathematical tricks to "ask" the network why, which is not entirely the same thing as having a network make a conclusion and then explain why interactively. But the backpropagation itself is entirely understood IMHO: it's optimization for a better score.

>There is also an element of backprop in evolution via epigenetics

I only ever use the word epigenetics in arguments against the usage of the concept. As far as I can tell, everyone seems to reference a different concept or anomaly or deviation with epigenetics. Its like talking about "new physics" without specifying what unexplained phenomenon it hypothesises or adresses. Even worse: sometimes it simultaneously proposes a mechanism and a hypothetical unobserved deviation. There is no agreement on what it is it has to explain nor on how it is to be explained. How can I even comment on "backpropagation in evolution via epigenetics" then?

[At least in the operation of the brain there is a very clear dichotomy between observed interactions between neurons in the brain" and training a digital neural network on a computer: we perform backpropagation in our algorithms, but to my knowledge we have not unambiguously identified the biological mechanism through which it arises. We know that IF the brain uses reverse-mode automatic differentiation that it must entail retrograde signalling from the post-synaptic to the pre-synaptic neuron, but such feedback mechanism has not been positively identified to my knowledge]

The most common interpretation is this, from wikipedia:

>Epigenetics most often denotes changes that affect gene activity and expression, but can also be used to describe any heritable phenotypic change. Such effects on cellular and physiological phenotypic traits may result from external or environmental factors, or be part of normal development. The standard definition of epigenetics requires these alterations to be heritable,[3][4] in the progeny of either cells or organisms.

To the extent it refers to simply cellular differentiation why not simply state "cellular differentiation". Cellular differentiation is fully understood in a conceptual level and does not require a second storage mechanism besides DNA: simple concentration levels suffice. The affinities of binding regions, and the chemical reaction constants set up a differential equation that can be theoretically simulated (say numerically by the Gillespie algorithm). The same unique DNA code admits multiple cell types: how? The homeostatis response is multistable, just like 2 identical flip flops from the same manufacturing line can memorize different states: if the concentration or voltage wanders from a stable equilibrium point, the flip flop will correct it back to the nearest stable equilibrium point. Pull it over the unstable equilibrium point and it will switch to the other stable equilibrium point. This may or may not involve histones etc, but those are chemical species like any other in the cell. Even without histones you can have a single feedback mechanism (specified by the genome) that supports multiple stable points (the cell types), just like the brain does not need a homunculus for its identity, so the cell does not need a cell type-unculus to remember its cell type, it just looks at the current cellular concentrations and homeostatically corrects it in a direction and rate that is a function of the current cellular concentrations. It is my interpretation that a lot of epigenetic talk, and histones and methylation are a kind of search for the unnecessary "cell type-unculus", stemming from a lack of knowledge that a single set of differential equations can imply multiple stable points. Those who are unaware in this way would benefit from reading the very accessible book by Steven Strogatz "Chaos and Nonlinear Dynamics".

For a single cellular organism, this explains heritable information that is not encoded in DNA: after cellular division the daughter cells have roughly identical concentrations as the mother cell did.

For a multi-cellular organism, this explains cell types.

[EDIT: Perhaps a simpler way to express this argument is using a simpler model for gene regulation, instead of continuous concentration levels, pretend they are binary, as in binary (on or off) concentrations: a combinational logic circuit has no memory, but a sequential one, where some outputs are fed back as inputs can have memory, so upon cellular division the state of the daughter cell boolean concentrations will be nearly identical as the mother cell concentrations (apart from those concentrations involved in cellular division itself), so they will have the same cell type as the mother cell A -> A + A (unless it crucially depends on some of the signals involved in the process of cellular division, which allows A -> A + B, or A -> B + C, where A,B,C are distinct cell types), in theory extracellular signals entering the cell could prompt it to change cell type too: A + signal -> B]

About say methylation as an explanation for heritable traits for multicellular organisms: how does one propose that the methylation state is copied when DNA is duplicated during cell division?

Any time you have the urge to use the word "epigenetics" ask yourself if you perhaps just mean "cellular differentiation", if you can be precise, why not be precise?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: