Hacker News new | past | comments | ask | show | jobs | submit login

Can someone ELIF this for me?

I understand how neural networks try to reduce their loss function to get the best result. But what's actually different about the KANs?




I'm not an ML person and am just learning from this article, but I understand a little bit about ML and the key thing I get out of it is the footnote in the diagram.

A regular neural network (MLP) has matrices full of floating point numbers that act as weights. A weight is a linear function y=wx, meaning if I plot the input x and output y on cartesian coordinates, it will generate a straight line. Increasing or decreasing the input also increases or decreases the output by consistent amounts. We won't have points where increasing the output suddenly has more or less effect than the previous increase, or starts sending the output in the other direction. So we train the network by having it learn multiple layers of these weights and also connecting them with some magic glue functions that are part of the design, not something that is trained up. The end result is the output can have a complex relationship with the input by being passed through all these layers.

In contrast, in a KAN rather than weights (acting as linear functions) we let the network learn other kinds of functions. These are nonlinear so it's possible that as we increase the input, the output keeps rising in an accelerating fashion, or turns around and starts decreasing. We can learn much more complex relationships between input and output, but lose some of the computational efficiency of the MLP approach (huge matrix operations are what GPUs are built for, while you need a CPU to do arbitrary math).

So with the KAN we end up with few but more complex "neurons", made up of complex functions. And if I understand what they're getting at here, the appeal of this is that you can inspect one of those neurons and get a clear formula that describes what it is doing, because all the complexity is distilled into a formula in the neuron. While with an MLP you have to track what is happening through multiple layers of weights and do more work to figure out how it all works.

Again I'm not in the space, but I imagine the functions that come out of a KAN still aren't super intuitive formulas that look like something out of Isaac Newton's notebooks, they're probably full of bizarre constants and unintuitive factors that cancel each other out.


I'm not sure if this counts as ELIF but it's a gross simplification

perceptron layer is

output = simple_function( sum(many_inputs*many_weights) + extra_weight_for_bias)

a KAN layer is

output = sum(fancy_functions(many_inputs))

but I could be wrong, it's been a day.


The output of an MLP is a black-box function f(x, y).

The output of a KAN is a nice formula like exp(0.3sin(x) + 4cos(y)). This is what is meant by interpretable.


a kan is, in a way, like a network of networks, each edge representing its own little network of sorts. I could be very wrong, I am still digesting the article myself, but that is my superficial take.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: