Its only equivalent for a very narrow sense of 'prediction' , namely modelling conditional probability distributions over known data.
There's no sense, for example, in which deriving a prediction about the nature of reality from a novel scientific theory is 'compression'
eg., suppose we didn't know a planet existed, and we looked at orbital data. There's no sense in which compressing that data would indicate another planet existed.
It's a great source of confusion that people think AI/ML systems are 'predicting' novel distributions of observations (science), vs., novel observations of the same distribution (statistics).
It should be more obvious that the latter is just compression, since it's just taking a known distribution of data and replacing it with a derivative optimal value.
Science predicts novel distributions based on theories, ie., it says the world is other than we previously supposed.
It doesn’t matter how your predictor works, whether it’s a scientific theory, a statistical model, or a magic oracle. You can always perfectly convert its predictions into compression using entropy coding. The conversion process is black-box.
Sure it is! If we were trying to compress an archive of orbital data, one way to do it would be "initial positions + periodic error correction". If you have the new planet, your errors will be smaller and can be represented in less space at the same precision.
A statistical model of orbits, without a theory of gravity, is less compressed when you assume more objects. Take all the apparent positions of objects in the sky, {(object, x1, x2, t),...}. Find a statistical model of each point at t+1, so y = (o, x1, x2, t+1). There is no sense in which you're deriving a new object in the sky from this statistical model -- it is only a compression of observable orbits.
When you say, "if you have the new planet", you're changing the data generating process (theory) to produce a new distribution of points {(o' x1', x2', t'), ...} to include an unseen object. You're then comparing two data generating models (two theories) for their simplicity. You're not comparing the associative models.
Call the prior theory 8-planets, so 8P generates x1,x2,t; and the new theory 9P which generates x1',x2',t'
You're then making a conditional error distribution when comparing two rival theories. The 9P theory will minimize this error.
But in no sense can the 9P theory be derived from the initial associative statistical distribution. You are, based on theory (, science, knowledge, etc.) choosing to add a planet, vs. eg., correcting for measurement error, modifiying newton's laws, changing the angle of the earth wrt the solar system... or one of an infinite number of theories which all produce the same error minimization
The sense of "prediction" that science uses (via Popper et al.) is deriving the existence of novel phenomena that do not follow from prior observable distributions.