Perovskite Neural Trees

cs702 · on May 8, 2020

Fascinating.

Accuracy on MNIST is a lousy 82.9%, 86.5%, and 87.4% for the networks with 400, 1000, and 6400 neurons, respectively -- well below the accuracy achieved by even LeNet-5, a landmark historical CNN architecture from the 1990's.[a]

Nonetheless, I get excited when I come across research like this. I get excited about the possibility of building devices with internal physical/chemical network/tree structures capable of learning in response to stimuli, e.g., by settling into low-energy states that "best model" such stimuli or via some other mechanism.

This is the kind of research, I think, that could eventually lead to AI systems operating with orders of magnitude more energy, space, time, and cost efficiency than current systems implemented as software running on GPU, TPU, and similar devices.

[a] http://yann.lecun.com/exdb/lenet/

p1esk · on May 8, 2020

Nonetheless, I get excited when I come across research like this

I don't, even though I did a phd in this type of research (analog neural computation with novel memory technologies). You either design specialized HW for the best available algorithm to solve a task of interest, or design better algorithms for that task. Do not try to design better hardware for poorly understood algorithms, especially when these algorithms don't really work (82% on MNIST, really?). If the task of interest is image classification, either focus on accelerating convnets, in which case all you really need is a fast multiplication of large matrices with enough precision, or focus on developing better algorithms (capsules, attention, neural trees, etc), in which case you use GPUs until you beat the state of the art.

cs702 · on May 8, 2020

If your goal is to build something that works and is of interest to industry ($$$), I couldn't agree more :-)

But I don't think that's the goal here. I view this type of research as part of early efforts in a long-term quest to discover if it's possible to build "neuromorphic" devices capable of universal computation that are much more efficient than current methods. These efforts today seem to be roughly at the same stage that "deep learning" was in the late 1970's to early 1980's, or even at an earlier stage.

For what it's worth, I also think it's worthwhile to develop hardware/software that accelerates linear/multilinear algebra operations and/or makes it practical for us to use more general/flexible abstractions in our code.[a]

--

[a] For example, I would love to be able to model each artificial neuron as a separate object in code, making it trivial to group/ungroup neurons into irregularly shaped and/or overlapping groups and easily shrink/expand/alter such groups on the fly without incurring a performance penalty. Right now, that's not possible: If we want decent performance, we have little choice but to pigeonhole all our designs into functions that operate on grid-like shapes (equal-sized vectors, matrices, higher-order tensors) and resort to thinks like masking so we can have fast multiply-sum operations.

p1esk · on May 8, 2020

What is "universal computation"? Turing complete? Then all we need is a good NAND gate. Or whatever computation is happening inside our brains? Then maybe we should first figure out what is actually happening there.

The field today seems to be roughly at the same stage that "deep learning" was in the late 1970

Have all the research directions that seemed promising in 70s become as successful as DL?

My point is - if you're interested in how brains work, focus on that. If you're interested in building better hardware, focus on that instead and try to build something that can be useful. This article got published because of "neuromorphic" (such a bullshit term!), and because there are MNIST results (which are embarassing, but the peer reviewers didn't catch that because they are probably device physicists, not ML people)

EDIT (responding to your edit): I would love to be able to model each artificial neuron as a separate object in code

So you want to run some experiments, without any idea whether it would lead to any good insights or benefits. Seems like building a specialized hardware just to accelerate this particular experiment is premature, to say the least. Just rent a bigger GPU cluster - you'll get your answer much faster and much cheaper.

cs702 · on May 8, 2020

Similar arguments could have been made against deep (multilayer) neural networks (DNNs) for a couple of decades.

No one knew for certain in the 70's and 80's if DNNs would pay off, yet a handful of then-obscure researchers persisted. And for quite a while DNNs were kind of a research backwater -- some researchers moved to Canada because they couldn't get funding anywhere else. As a close approximation, "almost no one" was interested in DNNs until Alexnet won ImageNet in 2012.

Just because we don't know which research directions may or may not pay off decades from now does not mean such directions aren't worthwhile and should be abandoned.

topynate · on May 8, 2020

This gadget doesn't have any hidden layers, though. It does OK in comparison to a simple linear classifier. If you can make and train a multi-layer architecture with perovskite, that starts to get interesting.

bunnie · on May 8, 2020

Am I correct in understanding that this paper describes a 'neuromorphic device structure', measures a few instances of it, and then plugs a theoretical model of this structure into a simulation?

So figure 2A-F are measured on a single device, and figure 2G-I are just simulations based on a model derived from 2A-F and scaled up in silico to large networks?

In that case, nothing more than single devices were fabricated, but results reported as a hypothesized neural network... and also if I'm understanding this right, the device is basically a resistor whose resistance can be changed using a series of voltage pulses.

This leaves a lot of devilish system implementation details as an exercise to the reader; a system level comparison of this to existing ReRAM (resistive RAM) implementations of DNN accelerators would be enlightening.

p1esk · on May 8, 2020

Yes, you're correct, and that describes majority of "neuromorphic" papers.

By the way, which "existing ReRAM (resistive RAM) implementations of DNN accelerators" are you referring to? I have a feeling you will be disappointed once you take a closer look at those implementations.

deepnet · on May 8, 2020

Shades of Asimovean Positronic Brains

glennvtx · on May 8, 2020

My thoughts immediately went to data from star trek.

monocasa · on May 8, 2020

Data's brain is a pretty explicit reference to Asimov's stories and their positronic brains.

muzakthings · on May 8, 2020

I would love to see an ELI5 writeup of this. For instance, I can't quite warp my head around - how are they storing neuron state in the lattice?

economicslol · on May 8, 2020

While pretty interesting, do you gain anything from this over purely software Neural Nets running on standard silicon chips(cpu, gpu, tpu, etc.)?

scruffups · on May 8, 2020

My understanding is that the D-Wave quantum computer uses quantum spin-glass annealing (by tapping into these physical effects) to solve combinatorial optimization problems faster than classical computers, including simulated (thermodynamic) annealing.

I believe this research work is going in the direction of building quantum annealers. Reinventing what D-Wave had done, or improving on it. Not a condensed matter physicist nor an optimization theory nerd, so I can't tell you much more, but that's my gut feeling.

From 2015 via Google AI:

"We found that for problem instances involving nearly 1000 binary variables, quantum annealing significantly outperforms its classical counterpart, simulated annealing. It is more than 10^8 times faster than simulated annealing running on a single core. We also compared the quantum hardware to another algorithm called Quantum Monte Carlo. This is a method designed to emulate the behavior of quantum systems, but it runs on conventional processors. While the scaling with size between these two methods is comparable, they are again separated by a large factor sometimes as high as 10^8."

https://ai.googleblog.com/2015/12/when-can-quantum-annealing...

andbberger · on May 8, 2020

Maybe you should read the paper.

scruffups · on May 8, 2020

Maybe you could explain where the paper contradicts the above?

mpoteat · on May 8, 2020

I believe the above poster was referencing the fact that the underlying technology is a lattice structure composed of relatively "normal" Nickel based materials, and it's mechanism depends only on classical effects.

Additionally, from a computational perspective, I don't think problems such as MNIST have a particularly clean formalism under which they can be solved more efficiently by exploiting quantum "parallelism", although perhaps a researcher can correct me on that front.

I admit I only read the abstract however, maybe there's something I am missing.

andbberger · on May 8, 2020

There is a strong analogy between even your garden variety deep learning and annealing/statistical physics: learning rate is analogous to temperature. Boltzmann machines, which have gone out of favor in recent years, are explicitly crystal models: the Hamiltonian has the same form as an Ising model.

No need to appeal to anything qUanTuM though. This is all standard equilibrium statistical mechanics. Quantum annealing may or may not have an advantage over simulated annealing/classical methods (have not kept up with the D-wave literature) - but the underlying physics is all classical. Just a fancy optimization technique.

scruffups · on May 8, 2020

That's not true. The quantum tunneling effect is what gives physical (as opposed to simulated) quantum annealing an advantage over simulated annealing (classic thermodynamics)

Google that.

"Quantum annealers are physical quantum devices designed to solve optimization problems by finding low-energy configurations of an appropriate energy function by exploiting cooperative tunneling effects to escape local minima. Classical annealers use thermal fluctuations for the same computational purpose, and Markov chains based on this principle are among the most widespread optimization techniques. The fundamental mechanism underlying quantum annealing consists of exploiting a controllable quantum perturbation to generate tunneling processes. The computational potentialities of quantum annealers are still under debate, since few ad hoc positive results are known. Here, we identify a wide class of large-scale nonconvex optimization problems for which quantum annealing is efficient while classical annealing gets stuck. These problems are of central interest to machine learning."

https://www.pnas.org/content/115/7/1457

Bam!

scruffups · on May 8, 2020

I saw the CuMn reference and thought they're leveraging quantum spin dynamics in glass alloys.

It's not quantum parallelism but quantum annealing that I was referring to. Very different models of computation.

So this seems to be about neuromorphic computing, not quantum annealing. In that case, the question about why would this be better than GPU models of computation is very valid. Maybe cheaper and less energy? But I doubt that it would be practical if it significantly underperforms in comparison.

EDIT: https://arstechnica.com/science/2019/10/what-problems-can-yo...

If you down-vote, please explain. Else, what's the benefit?

andbberger · on May 8, 2020

There is a huge advantage to 'embedded computing' (using the physics of your system itself to do the compute): energy efficiency.

economicslol · on May 8, 2020

Possibly, but I didn't see anything in the paper about energy efficiency vs a similar software implementation of their neural net and I don't think the wider Deep Learning community really has spent much time thinking about energy efficiency of their models.

andbberger · on May 8, 2020

There is a substantial subfield doing embedded deep learning - they absolutely care about energy efficiency.

economicslol · on May 11, 2020

"Substantial", maybe. I don't recall any major papers this year about efficiency in NeurIPS, ICLR, etc.

Nevertheless I think my point on "the wider Deep Learning community" not focusing on efficiency is correct.

CoolGuySteve · on May 8, 2020

Interesting that the ratio between the branches and their shape look kind like a fig tree diagram: http://complex.upf.es/~josep/bifurcationGREBOGI.jpg

Might imply that the bifurcations could be reasoned about using the Feigenbaum constants.

newsreview1 · on May 8, 2020

Fascinating that Memory tracking models seem to be organized in hierarchical structures that can be represented in tree-like forms. Reminds me of Helical models discovered in the early 70's for DNA strata

person_of_color · on May 9, 2020

materials science + AI + ??? = profit!