Bayesian Flow Networks

albertzeyer · on Aug 16, 2023

One of the authors provides a nice high-level overview: https://twitter.com/rupspace/status/1691584987148218841

You find more Twitter threads on the paper.

And here some more on Reddit: https://www.reddit.com/r/MachineLearning/comments/15rrljw/ba...

It's the first paper from Alex Graves since quite a while. Also, he is with NNAISENSE now.

dharma1 · on Aug 16, 2023

Schmidhuber lab on fire again

muhaaa · on Aug 16, 2023

Learning and compressing is closely related. When compressing, you extract systematic rules and unsystematic parameters from the original data. When you learn you do the same but unsystematic parameter become noise. If you throw the noise away you get a lossy compressor.

tnecniv · on Aug 18, 2023

Depending on how you define “learning” (there are different basic frameworks like PAC learning) there’s a number of analyses that show learning and compression are equivalent.

Even quite broadly, Bayesian methods can be interpreted as a rate-distortion problem from Information Theory, which is an approach to lossy compression

gdiamos · on Aug 16, 2023

How a tiny diffusion LLM generates a sentence.

https://imgur.com/gallery/kZa6VuZ

Visualization from the paper. Figure 20.

kpe · on Aug 17, 2023

Here is a link to a talk by Jack Rae on "Compression for AGI" which I found related to the BFN paper - https://twitter.com/siddhadev/status/1692215022145974607?t=i...

light_hue_1 · on Aug 16, 2023

There are countless papers showing amazing results on MNIST and CIFAR-10. No non experts should be reading any paper like this.

Just like we cure cancer in mice all the time we solve these datasets all the time. And it means nothing. Wait to see if the method is actually useful.

gdiamos · on Aug 16, 2023

Alex's papers have been a great source of inspiration in the past.

Even if the methods require scaling up and significant engineering to put into production.

The parent is right to set expectations. Ideas like this often take 5-15 years to be refined into real products, if they make it at all.

Q6T46nT668w6i3m · on Aug 16, 2023

I’m surprised this is downvoted. I liked this paper (I’m mildly biased since I’ve been doing generative modeling work and am very excited about flows) but was disappointed to see MNIST and CIFAR-10.

ks2048 · on Aug 16, 2023

No non experts should be reading any paper like this. I agree non-experts have no need to read this paper, but not because of what what datasets it used. If you can read and understand this level of technical paper, I’d say you are an expert by definition.

mdp2021 · on Aug 16, 2023

> MNIST and CIFAR-10

I understood (though only skimmed) that this paper is about generating MNIST and CIFAR-10, not the usual classifying?

light_hue_1 · on Aug 16, 2023

There are endless papers generating data from them too. Any ML task pretty much gets done on these datasets. It means nothing.

mdp2021 · on Aug 16, 2023

It may mean nothing "within" MNIST and CIFAR-10 - but if you want to show how your new technology works, why not using the usual set? Is it not like "Hello World" - not innovation, just demonstration?

I suppose you mean, "Yes you can do "hello world" but now show us something that makes your product worthwhile, a convincing use-case, a record worth noting".

light_hue_1 · on Aug 16, 2023

20 years ago that was true. MNIST and CIFAR-10 were a vehicle to show that your method had promise.

Today, any bad idea you throw at a wall will work on MNIST and CIFAR-10. Even buggy broken code will generally work. We've advanced to the point where these datasets are totally meaningless.

In 1910 if you could show that your airplane idea could fly 100ft, you had something promising. In 2010 we're so used to building airplanes and understand the physics involved to the point where this isn't a meaningful test of the promise of your new aircraft concept.

It's worse than "in mice" in biology/cancer papers. We cure everything in mice. Basically nothing ever transfers to humans. Same with MNIST/CIFAR-10. Everything works, basically nothing matters.

Or another way to put it. A "Hello World"-level compiler would be interesting to report on in 1960. That same compiler would be trivial today and anyone could build it in minutes/hours.

kastnerkyle · on Aug 16, 2023

I disagree with this. Binarized MNIST samples of any reasonable quality are (still) tricky to get right without a hierarchical system (read: VQ-VAE tokens or some such encoder space). Same with really solid CIFAR-10. "Scaling down" is a different problem than scaling up, not everything transfers but saying "everything works on MNIST / CIFAR-10" in generative modeling is a bit glib.

Would much prefer to see early work with solid small scale results on arXiV, than have people hold concepts for another 6 months scaling up. Let that be for a v2, if you cannot put early but concrete results on arXiV where else is there?

Recalling that a lot of nice papers are mostly MNIST / CIFAR-10 level results at first, followed by scale (thinking of VQ-VAE, PixelCNN / RNN, PerceiverAR, many others that worked well at scale later). That doesn't mean every result will scale up, but we have a lot of tricks to scale "small-scale" models using pretrained latent spaces and so on. The first diffusion results were also pretty small scale... different time but I don't think things are so different today.

That said, I can agree that you need to be a bit in the weeds on the research side to be diving deep on this - but I expect lots of followup clarifications or blog posts on this type of work.

hatmatrix · on Aug 16, 2023

Why does broken code today excel over the best codes of yore?

Q6T46nT668w6i3m · on Aug 16, 2023

We are much better at optimization.

kastnerkyle · on Aug 16, 2023

What since Adam? Learning rate scales / schedules? I cannot think of many big massive changes since ~2014, most of the setups from that era (grad clip + medium-ish LR, some ramp up or roll-off at the end) work fine today for me.

(Note: There are many, many great optimization papers since 2014 - I just don't see them show up in general recipes in open source too often)

kpe · on Aug 16, 2023

Not the dataset is of importance here, but the author!

p1esk · on Aug 16, 2023

If the idea works, why not show results on bigger benchmarks?