Last time I read about this the main practical difficulty was model transferabil...

alexpotato · on July 10, 2024

> Last time I read about this the main practical difficulty was model transferability.

There is a great write up of this in this old blog post: https://www.damninteresting.com/on-the-origin-of-circuits/

robertsdionne · on July 10, 2024

This is “Mortal Computation” coined in Hinton’s The Forward-Forward Algorithm: Some Preliminary Investigations https://arxiv.org/abs/2212.13345.

bongodongobob · on July 10, 2024

Reminds me of the evolutionary FPGA experiment that was dependent on magnetic flux or something. The same program wouldn't work on a different FPGA.

cyberax · on July 10, 2024

Here's the paper about it: https://www.researchgate.net/publication/2737441_An_Evolved_...

And a more approachable article: https://www.damninteresting.com/on-the-origin-of-circuits/

actionfromafar · on July 11, 2024

Would be interesting to hook up many FPGAs of the same model and train all of the at once. Programs with differing outputs on different individuals could be discarded. The program may still not transfer to another batch of FPGAs but at least you have a better chance of the working.

Another idea is to just train a whole bunch of them individually, like putting your chips in school. :-D

rusticpenn · on July 10, 2024

What thy did was overfitting. We later found other ways of getting around the issue.

jegp · on July 11, 2024

It's still possible to train a network that's aware of the physics and then transfer that to physical devices. One approach to this from the neuromorphic community (that's been working on this for a long time) is called the Neuromorphic Intermediate Representation (NIR) and already lets you transfer models to several hardware platforms [1]. This is pretty cool because we can use the same model across systems, similar to a digital instruction set. Ofc, this doesn't fix the problem of sensitivity. But biology fixed that with plasticity, so we can probably learn to circumvent that.

[1]: https://github.com/neuromorphs/nir (disclaimer: I'm one of the authors)

programjames · on July 10, 2024

You can regularize the networks to make them transfer easier. I can't remember the abstract's title off the top of my head though.

dsabanin · on July 11, 2024

Couldn’t you still copy by training a new network on a new device to have same outputs for the same inputs as the original?

tomxor · on July 11, 2024

Yes, but training is the most expensive part of ML, for example GPT-3 is estimated to cost something like 1-4 million USD.

With ANN you can do it one time and then clone the result for negligible energy cost.

Maybe training a batch of PNNs in parallel could save some of the energy cost, but I don't know how feasible that is considering they could behave slightly differently during training causing divergence... Now that sarcastic comment at the bottom of this thread is starting to sound relevant "Schools".

kmmlng · on July 11, 2024

> Yes, but training is the most expensive part of ML, for example GPT-3 is estimated to cost something like 1-4 million USD.

That entirely depends on how many inferences the model will perform during its lifecycle. You can find different estimates for the energy consumption of ChatGPT, but they range from something like 500-1000 MWh a day. Assuming an electricity price of $0.165 per kWh, that would put you at roughly $80,000 to a $160,000 a day.

Even at the lower end of $80,000 a day, you'll reach your $4 Million in just 50 days.

tomxor · on July 11, 2024

That's not a proportional comparison, n simultaneous users to 1 training. How many users across how many GPUs is that 80k?

With PNN you would have to multiply n by 1-4 million, training cost explodes.

l33tman · on July 11, 2024

That's not true for the most well-known models. For example Meta's LLAMA training and architecture was predicated on the observation that training cost is a drop in the well compared to the inference cost for a model's lifetime.

etiam · on July 11, 2024

Distillation (as you may be aware). https://arxiv.org/abs/1503.02531

Having to do that in each instance is still really cumbersome for cheap mass deployment compared to just making a digital-style exact copy, but then again I guess a main argument for wanting these systems is that they'd be doing things unachievable in practice on digital computers.

In some cases one might be able to distill to digital arithmetic after the heavy parts of the optimization are done, for replication, distribution, better access for software analysis, etc.

CuriouslyC · on July 11, 2024

This was the thing Geoff Hinton cited as a problem with analog networks.

I think eventually we'll get to the point where we do a stage of pretraining on noisy digital hardware to create a transferrable network, then fine tune it on the analog system.

6gvONxR4sf7o · on July 11, 2024

If (somehow/waves hands) you could parallelize training, maybe this would turn into an implicit regularization and be a benefit, not a flaw. Then again, physical parallelizability might be an infeasibly restrictive constraint?

trextrex · on July 10, 2024

Well, the brain is a physical neural network, and evolution seems to have figured out how to generate a (somewhat) copiable model. I bet we could learn a trick or two from biology here.

hansworst · on July 10, 2024

The way the brain does it is by giving users a largely untrained model that they themselves have to train over the next 20 years for it to be of any use.

salomonk_mur · on July 11, 2024

It is extremely trained already. Everyone alive was born with the ability for all their organs and bodily function to work autonomously.

A ton of that is probably encoded elsewhere, but no doubt the brain plays a huge part. And somehow, it's all reconstructed for each new "device".

wbillingsley · on July 11, 2024

Sometimes. Foals are born (almost) able to walk. There are occasions where evolution baked the model into the genes.

tomxor · on July 12, 2024

Yeah that example came to my mind too.

I suspect there may be trade off undergoing evolutionary selection here, where for some organisms a behaviour is more important from the offset, it's worth encoding more of the behaviour into genes, at what cost I wonder?

It's also possible there is some other mechanism going on at an embryonic stage, a kind of pre-training.

I suspect some of the division is also defined by how complex the task is, or how sensitive the model is to it's own neurons (kind of like PNN). I don't have a well rounded argument, but my instinct is that encoding or pre-training walking is far easier than seeing. Not to mention basic quadrupedal walking/standing is far easier than bipedal, they can learn the more complex coordinated movements after.

lynx23 · on July 11, 2024

20 years of training is not enough. Neuroscientists say 25. According to my own experience, its more like 30.

dcuthbertson · on July 11, 2024

In the end, it's a life-long process.

tomxor · on July 10, 2024

Some parts are copiable, but not the more abstract things like the human intellect, for lack of a better word.

We are not even born with what you might consider basic mental faculties, for example it might seem absurd, but we have to learn to see... We are born with the "hardware" for it, a visual cortex, an eye, all defined by our genes, but it's actually trained from birth, there is even a feedback loop that causes the retina to physically develop properly.

immibis · on July 10, 2024

They raised some cats from birth in an environment with only vertically-oriented edges, none horizontal. Those cats could not see horizontally-oriented things. https://computervisionblog.wordpress.com/2013/06/01/cats-and...

Likewise, kittens with an eye patch over an eye in the same time period remain blind in that eye forever.

tomxor · on July 10, 2024

Wow, that's a horrific way of proving that theory.

BriggyDwiggs42 · on July 10, 2024

Geez poor kitties, but that is interesting.

alexpotato · on July 10, 2024

Another example:

Children who were "raised in the wild" or locked in a room by themselves have shown to be incapable of learning full human language.

The working theory is that our brains can only learn certain skills at certain times of brain development/ages.

deepfriedchokes · on July 10, 2024

We should also consider the effects of trauma on those brains. If you’ve ever spent time around people with extreme trauma they are very much in their own heads and can’t focus outside themselves long enough to focus enough to learn anything. It definitely impacts intellectual capacity. Humans are social animals and anyone raised without proper socializing and intimacy and nurturing will inevitably end up traumatized.

chbint · on July 12, 2024

There's indeed a nice trick to be learned from cognitive science focused in biological cognition: the mind is embodied and embedded. Which means, roughly, that it is not portable. It doesn't store things like "glass at position x,y" but only "glass is at a small movement of the hand towards the right". Consequently, whatever gets encoded only makes sense within a given body and only inasmuch as it relied on its environment (with humans, that includes social environments). The good news is that, despite being not portable, this reliance on physical properties might be a step in the right direction, after all.