The 1-1 correspondence between prediction and compression is one of the most cou...

mjburgess · on May 5, 2024

Its only equivalent for a very narrow sense of 'prediction' , namely modelling conditional probability distributions over known data.

There's no sense, for example, in which deriving a prediction about the nature of reality from a novel scientific theory is 'compression'

eg., suppose we didn't know a planet existed, and we looked at orbital data. There's no sense in which compressing that data would indicate another planet existed.

It's a great source of confusion that people think AI/ML systems are 'predicting' novel distributions of observations (science), vs., novel observations of the same distribution (statistics).

It should be more obvious that the latter is just compression, since it's just taking a known distribution of data and replacing it with a derivative optimal value.

Science predicts novel distributions based on theories, ie., it says the world is other than we previously supposed.

Legend2440 · on May 5, 2024

It doesn’t matter how your predictor works, whether it’s a scientific theory, a statistical model, or a magic oracle. You can always perfectly convert its predictions into compression using entropy coding. The conversion process is black-box.

mjburgess · on May 6, 2024

yes, and this is a very trivial notion of prediction.

Predictions of novel objects derivative from scientific theories arent quantitative data points.

palmtree3000 · on May 5, 2024

Sure it is! If we were trying to compress an archive of orbital data, one way to do it would be "initial positions + periodic error correction". If you have the new planet, your errors will be smaller and can be represented in less space at the same precision.

mjburgess · on May 5, 2024

This is assuming the theory.

A statistical model of orbits, without a theory of gravity, is less compressed when you assume more objects. Take all the apparent positions of objects in the sky, {(object, x1, x2, t),...}. Find a statistical model of each point at t+1, so y = (o, x1, x2, t+1). There is no sense in which you're deriving a new object in the sky from this statistical model -- it is only a compression of observable orbits.

When you say, "if you have the new planet", you're changing the data generating process (theory) to produce a new distribution of points {(o' x1', x2', t'), ...} to include an unseen object. You're then comparing two data generating models (two theories) for their simplicity. You're not comparing the associative models.

Call the prior theory 8-planets, so 8P generates x1,x2,t; and the new theory 9P which generates x1',x2',t'

You're then making a conditional error distribution when comparing two rival theories. The 9P theory will minimize this error.

But in no sense can the 9P theory be derived from the initial associative statistical distribution. You are, based on theory (, science, knowledge, etc.) choosing to add a planet, vs. eg., correcting for measurement error, modifiying newton's laws, changing the angle of the earth wrt the solar system... or one of an infinite number of theories which all produce the same error minimization

The sense of "prediction" that science uses (via Popper et al.) is deriving the existence of novel phenomena that do not follow from prior observable distributions.

mr_toad · on May 6, 2024

> A statistical model of orbits, without a theory of gravity

You want a statistical model that produces a theory of gravity.