Integral Neural Networks (CVPR 2023 Award Candidate), a nifty way of building re...

numbers_guy · on July 24, 2023

Going just by your description this sounds like they are doing operator learning. It's actually a very old idea. The proof that started operator learning is from 1988 I believe. Mathematicians have been playing around with the idea since 2016 at least.

w-m · on July 24, 2023

Indeed, this seems closely related, thanks for the pointer!

Unfortunately I'm not deep enough into the topic to understand what their contribution to the theory part of it is. (they have some Supplementary Material in [INN Supp]). In the discussion of the Integral Neural Networks (INN) paper, there's this paragraph about an operator learning publication:

"In [24] the authors proposed deep neural networks with layers defined as functional operators. Such networks are designed for learning PDE solution operators, and its layers are continuously parameterized by MLPs only along the kernel dimensions. A re-discretization was investigated in terms of training on smaller data resolution and testing on higher input resolution. However, the proposed framework in [24] does not include continuous connections between filters and channels dimensions."

Also the weight permutation to perform the resampling on pretrained networks in INNs seems to be novel? And I guess it doesn't hurt that they're bringing new eyeballs to the topic, by providing examples of common networks and a PyTorch implementation.

[INN Supp]: https://openaccess.thecvf.com/content/CVPR2023/supplemental/...

[24]: Zongyi Li Nikola Kovachki. Neural operator: Graph kernel network for partial differential equations. arXiv preprint arXiv:2003.03485, 2020, https://arxiv.org/abs/2003.03485

dicroce · on July 24, 2023

Damn. It's like jpeg for neural networks.

kirill_sldskkh · on July 25, 2023

Great understanding of the work! I will add more details about INNs.

* In fact, INNs concept opens possibility to utilise differential analysis for DNNs parameters. Concept of sampling and integration can be combined with Nyquist theorem (https://en.wikipedia.org/wiki/Nyquist%E2%80%93Shannon_sampli...). Analysing the FFT image of weights allows to create the measure of a layer capacity. Two different size DNNs can be equivalent after conversion to INN because max frequency is the same for both networks.

* Tuning the integration grid is actually first steps for fast knowledge extraction. We have tested INNs on discrete EDSR (super-resolution) and have prune without INN training in 1 minute. We can imagine situation when user fine-tunes GPT-4 for custom task just by integration grid tuning simultaneously reducing number of model parameters keeping only important slices along filters/rows/heads etc. Because of smooth parameters sharing new filters/rows/heads include "knowledge" of neighbours.

* Also interesting application is to utilise integral layers for fast frame interpolation. As conv2d in INN can produce any number of output channels i.e. frames.

You can stay tuned and also check Medium on INN progress and applications. New Medium article already available: https://medium.com/@TheStage_ai/unlocking-2x-acceleration-fo...

smaddox · on July 24, 2023

Nice. I was wondering if something like this is possible a few days ago. The next step would be somehow extending the discrete->continuous concept to layers.

smaddox · on July 24, 2023

Ahh, I guess that's been done, too: https://proceedings.neurips.cc/paper_files/paper/2018/file/6...

Now we just need an iterative solver over both the structure and the "weights", and we get both architecture search and training at the same time

diracs_stache · on July 24, 2023

After finally learning some complex integrals/residue theory and seeing the connection to continuous and discrete signal processing I was very happy that the "magic trick" disappeared, your comment has me interested in pulling the string farther. Thanks!

llaolleh · on July 24, 2023

Supercool.

ninjaa · on July 24, 2023

smart