They have encountered an interesting algorithmic problem here, very cool.
To determine whether the universe as a whole has a preferred handedness, they had to repeat the analysis for all tetrahedra constructed from their database of 1 million galaxies. There are nearly 1 trillion trillion such tetrahedra — an intractable list to handle one at a time. But a factoring trick developed in earlier work on a different problem allowed the researchers to look at the parity of tetrahedra more holistically: Rather than assembling one tetrahedron at a time and determining its parity, they could take each galaxy in turn and group all other galaxies according to their distances from that galaxy, creating layers like the layers of an onion.
Given there’s 100 billion-1 trillion galaxies estimated in the observable universe, isn’t 1 million a frighteningly small sample to draw conclusions from? How can we be sure that what we’ve observed is representative enough to extrapolate conclusions of? Moreover, how do we know if extrapolates to the universe homogeneously? What if the “opposite” asymmetry is present in the non observable universe such that on average the universe doesn’t actually have an asymmetry globally?
A million galaxies in a survey of 1/20th of sky (I suspect, haven't run the numbers) manages to get most of the galaxies in our local Pisces–Cetus Supercluster Complex [1] that fall within the field of view, especially if the survey is pointed towards a less dense part of the PCSC. Since the supercluster complex is a galaxy filament [2], galaxies in the next nearest supercluster complex will be much farther away - just like the nearest star in the Andromeda galaxy is going to be orders of magnitude further away than the furthest star in the Milky Way (see artistic rendering in the Wiki page in [2]). Throw in gravitational lensing and the data becomes much noisier the further out you go [3].
Could you elaborate on how the information you provided answers GP's questions? I'm not really following you but I also don't know a lot about this topic.
> ... a frighteningly small sample to draw conclusions from?
Just make the usual assumption: The data consists of random variables that are independent and have the same distribution. Or "independent and identically distributed", i.e., the i.i.d. case.
Then can apply the theorems called the law of large numbers, either the weak version or the strong version.
Then in the application of the law of large numbers, the number of samples is crucial, but, possibly surprisingly, the number of cases in all the data does not get involved, is not used in the law of large numbers, and is irrelevant. That is, to be more explicit, with this approach, we don't care how many galaxies there are, 10^12, 10^100, 10^1000, whatever.
Right: With different approaches, might care a lot about (a) the number of galaxies there were in total, (b) the number of galaxies used in the analysis, i.e., if in (b) we had enough samples to characterize (a), to have a good approximation of the results we would get if we analyzed all of (a).
For more, might claim that (A) the distribution we use in the law of large number is just that of ALL of the galaxies, a large number but not infinite (commonly a distribution is for infinitely many) and (B) drew the sample to be analyzed by rolling dice or some other mechanism that achieved independence so that, net, we have i.i.d. as wanted for the law of large numbers. So, here the distribution is not of some imaginary collection of infinitely many galaxies but just of the large but finitely many galaxies there are in THIS universe or THIS observable universe, or ..., this universe once we get tired of collecting data, start to run out of grant money, want to get home with the kids, want to watch the soccer matches, ...!
For one thing, these galaxies were observed and cataloged - something that already puts them on unequal footing with most of the universe' galaxies. Furthermore, of the catalogued galaxies, how were these particular galaxies selected for inclusion in the study? Humans are very bad at choosing things randomly - even when assisted by computers.
Take some independent random variables. Put them on a list, in a catalog, in a box, etc., and now they are still independent. That each random variable is about a galaxy and that they are all independent but now know that they are all visible only from the southern hemisphere of the earth, they are still independent.
For such things will want to learn about the careful definition of independence and, for more, about conditional independence and the Radon-Nikodym theorem. There is a nice proof of this result by von Neumann in, e.g., the W. Rudin Real and Complex Analysis.
> But these are _not_ independent datum
You have not shown that.
> ... unequal footing
Your "unequal footing" is not part of probability theory.
Pick a galaxy X. Note what you know about it.
Now someone tells you it is in the catalog and also lets you see the other galaxies in the catalog.
Now what more do you know about X than you did before????
Can read more about independence in
Jacques Neveu,
{\it Mathematical Foundations of the Calculus of Probability,\/}
I think what I implied (and others are trying to communicate across) is how do we know the galaxies are independent variables as you claim?
After all, there are galactic forces at work here (gravity, initial conditions etc). How do we know that everything that’s observable to us isn’t biased? Ie the galaxies that are asymmetric are more easily observed for some secondary reason (closer to us, brighter to us, etc)? Even if the galaxies in our observable universe are independent variables, how do we know that our observable universe is an independent variable of all observable universes? Eg what if the structure of the Big Bang sent different asymmetries in every direction and thus each observable universe will always have some asymmetry from the perspective of any single observer but overall there’s no asymmetry.
Or heck. The ages of the galaxies are presumably hard to account for individually in this kind of combinatorial analysis. How do we know that the asymmetry isn’t because the galaxies are all in different stages of development / the light from them took variable amounts of time to reach us?
> I think what I implied (and others are trying to communicate across) is how do we know the galaxies are independent variables as you claim?
Simple. I never claimed that the random variables were independent. Instead I just observed that if one made an ASSUMPTION of independence (and the rest of i.i.d.) then could use the theorem the law of large numbers and get a result without being concerned with what fraction the sample size was of the whole population.
I didn't even try to argue that there was independence or even approximate independence.
The independence assumption is really common in applications of probability and statistics. Maybe then people comfort themselves by believing that independence holds approximately and close enough. As I recall, there has been at least one attempt to investigate approximate independence, but that work is likely not well known.
Questioning independence in this discussion of cosmology is appropriate.
Again, in practice, an independence assumption is common, and with all of "i.i.d." could apply the strong law of large numbers and get results about a large population from a comparably small sample thus responding to the stated concern I was responding to:
> ... a frighteningly small sample to draw conclusions from?
The data consists of random variables that are independent
But that's still an assumption, which means that if all your samples come from the local cluster, you cannot extrapolate from your measurements to the entire universe -- because you haven't ruled out that the asymmetry is influenced/controlled by the layout of the local cluster.
> Just make the usual assumption: The data consists of random variables that are independent and have the same distribution. Or "independent and identically distributed", i.e., the i.i.d. case.
So you used the word assumption. I used he word assumption. Gee ....
Quite broadly in practical statistics and applied probability, "usual assumption" is correct, that is, that the assumption is usual, that is, usually made, is correct, not that the assumption itself is correct.
E.g., the strong law of large numbers is commonly applied, and it has an independence assumption. Yup, the weak law of large numbers needs only an uncorrelated assumption which is weaker and implied by independence.
Where did I claim that the independence assumption is always correct?
What are you arguing about?????
I used to be a professor. I didn't like it, thought it was not very productive for anyone and financially irresponsible for me.
But I studied probability from some of the best profs and references and wrote a dissertation on stochastic optimal control. If I were to get serious here, I'd have to teach a course, starting with sigma algebras, careful definitions of random variables, various cases of convergence of random variables, ..., conditioning, conditional independence, the Radon-Nikodym theorem, the Markov assumption, martingales, etc. Instead, there are books by Loeve, Breiman, Neveu, ....
I am doing a startup and don't want to be a professor.
There is no way on this planet or in this solar system or this universe I could have been more clear: I responded to
> ... a frighteningly small sample to draw conclusions from?
That's all I responded to.
No way did I try to answer all possible probability and statistical questions in astronomy and astrophysics back to the Big Bang, hope for a Nobel Prize, apply for a chaired professorship, .... Again, I just responded to
> ... a frighteningly small sample to draw conclusions from?
In really simple terms, with common assumptions, the accuracy of a sample mean is JUST from the size of the sample and has nothing to do with what fraction the sample is of some population the sample was drawn from.
This point seems to be worth making. E.g., once while working myself and my wife through our Ph.D. degrees, I was asked to estimate the survivability of the US SSBN fleet under a special scenario. From some WWII era Koopman work on encounter rates, and more, I found a continuous time, discrete state space Markov process and generated sample paths via Monte Carlo techniques. Actually used the random number generator that passed the Fourier test and was published by a group from Oak Ridge and based on the recurrence
X(n+1) = X(n)*5^15 + 1 mod 2^47
So I wrote some code, generated sample paths, and found event by event, essentially hour by hour, the expected number of SSBNs surviving. Right, the scenario included lots of weapon types so had lots of combinations of what was surviving.
It happened, nearly uniquely, that my work right away got a little review from a well known probabilist. He asked:
"How can your scenario generation fathom the enormous state space?"
that is, the number of combinations.
This question is related to
> ... a frighteningly small sample to draw conclusions from?
Soooo, even a well qualified probabilist can ask such a question.
I answered: Pick a point in time, t. There the number of SSBNs surviving is a random variable. It is bounded. The sample paths are independent. So, each sample path gives an independent random variable for time t. So we have an i.i.d. case. So the strong law of large numbers applies. So, run off 500 sample paths, add them up, divide by 500, and get a good estimate of the number of SSBNs surviving within a gnat's ass nearly all the time. The probabilist was socially shocked and offended by my mention of a "gnat's ass".
I went on, intuitively, Monte Carlo "puts the effort where the action is". The probabilist's response was "That's a good way to look at it."
In a sense, the most common sampling situation is a finite sample from an infinite population so that the sample is 0% of the whole population -- and we are back to
> ... a frighteningly small sample to draw conclusions from?
0% "small". So, actually 0% small is common.
So, it is common for people to consider what fraction the sample is of the whole population being sampled. This being the case, in my post here I continued and outlined some ways regard the population as finite. That can take us into sampling from finite populations which can be conceptually tricky -- I want to avoid that stuff, and for something the size of the universe, maybe not for a deck of 52 cards, that is reasonable.
For a simple observation, put 50 independent random variables in a box and they are still independent.
There are a lot of theorems on sampling so though I’m no longer current in that area I presume the authors knew what they were doing. Especially as they had the maths chops to apply that factoring algorithm.
But you can also consider a qualitative analysis. A standard assumption is that the Big Bang should be largely isotopic (else why not, which is the main point of the article). So where you look shouldn’t matter. Even if orientation were uniform in average (impossible to tell), non-uniform regions tell an interesting story.
Now in reality we know the universe isn’t smoothly uniform: parts of it have curdled into galaxies, stars systems etc. Also there’s the puzzling imbalance between matter and antimatter. Why? This article points out another asymmetry, which could provide a clue.
I don’t follow. If there’s a persistent bias in the galaxies that are in our data set or there’s a bias in the person of the local portion of the universe we can observe, then the sample isn’t random. And since we can’t know what’s in the unobservable part of the universe…
In that sense you could argue that all of astrophysics is built off of a biased sample from which we can draw no conclusions. And you may well be right, I'd think, though it is unfalsifiable.
I don’t think it’s necessarily unfalsifiable to some degree. But there’s certainly parts of our astrophysics model that are already unfalsifiable and there’s no way around that. For example, the existence of the unobservable universe by definition is unfalsifiable. We kind of just assume the locality principle and that stars leaving our ability to observe them is expansion and not them hitting the edge of the universe and getting destroyed :).
> If there’s a persistent bias in the galaxies that are in our data set or there’s a bias in the person of the local portion of the universe we can observe, then the sample isn’t random.
The Copernican principle is the default, unless we have some reason to doubt it in a particular case.
„This work considers NPCF estimation on isotropic and homogeneous manifolds in D dimensions. Under these assumptions (which encompass spherical, flat, and hyperbolic geometries), we show that any function of one position can be expanded in hyperspherical harmonics; a D-dimensional analog of the conventional spherical harmonics. “
So it seem they must assume a homogeneous and isotopic universe to be able to do their analysis.
Very interesting point! (Though for non-homogeneous but still isotropic universes there still exist spherical harmonics, so homogeneity doesn't seem like a critical restriction here. Those are used, for instance, to analyze non-homogeneous alternatives to the usual FLRW cosmological model, e.g. Lemaître-Tolman-Bondi (LTB) models.)
I don't understand why do they need this complicated algorithm. If there is such a blatant bias then it can be detected by sampling a few million random tetrahedra; and that can be done instantaneously.
Also, why don't they publish the list of positions of all these galaxies? (Or I couldn't find it easily on either of the original articles [0,1]). If there's just a million of them, this is around the same size as a single high-resolution color image. Then people can download the positions and find the asymmetries themselves!
> Could that arise from any early parity symmetry breaking of the weak force?
This is the first thing I wondered! I'm not a physicist and don't know how to answer the question usefully, but if you ever find out, please reply to the thread and let me know. I'll do the same. :)
I hear topics of analysis of distribution of galaxies from time to time and always wonder how we can actually measure this distribution in any sensible way? What I mean is: Information reaches us at max the speed of light. Information about galaxies from millions of light years afar contribute to the distribution model. Thus the model incorporates spatial distributions of objects (=galaxies) which are certainly no longer true as of now.
Put differently: When we assume the big bang and assume further as a thought experiment that mass expands equally, the model would requires to account for information from early objects, where mass distribution was different. Otherwise the conclusion would be drawn that mass is unevenly distributed.
Disclaimer: absolutely not an expert. As I understood it, this is a test for the cosmological principle, which should be valid for every observer at every point of spacetime
The tetraeders they are talking about in the article are a different name for 4-point correlation functions. The papers headline only mention 4-point correlation function, but the journalist must have decided, that this term is too technical for a pop-science article.
That's really interesting, thanks! Is it that if the early universe was initially even slightly non-uniform, then expansion and gravity would persist those "imperfections", forming the galaxy clumps we observe now?
Can someone give me a lycée dropout level explanation what they are writing here?
I am gathering that galaxies are spaced in tetrahedral formation, and some fashion there are "mirrors" i.e, tetrahedral formation but with opposite side-lengths, but now discovered that is not so?
For a lay person, reading that number is mind blowing. Quick question: how many do we think there are? Or rather, do we have some sort of a number for the total energy there is in the universe compared to that of our sun?
Estimates place the number of galaxies at around 200 billion, give or take an order of magnitude.
The total energy produced by our sun during its entire lifetime is insignificantly miniscule compared to all the energy emitted by every star in the universe for a single nanosecond.
Depending on various theories of universe formation the universe could have a radius anywhere from at least 10x the observable universe to even infinite.
They can, and they will. Fusion will stop once all the elements have been converted to iron. Many black holes will form, protons will probably decay, black holes will evaporate, and nothing but smooth radiation will remain.
This is assuming the cosmic concordance model describes reality. There are other models where some sort of reverse expansion will occur. They call it either Big Crunch or Big Bounce.
That's a neat symmetry. Another symmetry I like, in the other direction, is a fist is to the Earth as a carbon molecule is to a fist sized rock (I might be off by a magnitude or two, but by charitable interpretation it's accurate enough). That's a lot of molecules.
The number of stars in the observable universe is roughly of the same order as the number of grains of sand on earth, and the number of atoms in a human eye.
Estimates are: 10^19 grains of sand, 10^22 atoms in the human eye and 10^23 stars in the observable universe.
I just saw that estimate for atoms in an average human body is on the order of 10^27. I didn’t expect that the whole body is 100,000x atoms compared to the eye.
By weight (68kg/28g) the body is only about 2500x the size of an eye.
Yes, and apparently there are plans to do the same analysis on a newer sky survey that will have around 30M galaxies, which is ... also a drop in the bucket if we're talking hundreds of billions of galaxies in the observable universe. But if the same result holds, then it would definitely offer stronger evidence, particularly if the surveys are of very different regions of the universe, because some studies have suggested that the laws of physics may not be quite the same throughout the universe.
The universe is almost a trillionaire (in estimated number of galaxies). So it would be like $1 to a millionaire or 1 cent to an average lower class family: inconsequential spare change
The question might be, if you ran the experiment again, would you get the same bias, or a different one? (as in, go back to the gnab gib and .. do over)
I get angry when people use it to justify "all moral rules are silly we're just a simulation" -if its a good simulation, then social precepts have value, irrespective. Except for the ones I don't like of course, they're going in the v0.2 iteration..
Large-scale anisometry can also be viewed as more evidence towards invalidating of the big bang model.
Acknowledging the usual pragmas of "all models are wrong" and science needs something to proceed by, I also find it useful the idea of keeping a tally of how many free parameters need to be added to our cosmology as a measure of its (increasingly poor) fit.
> How does drawing lines between galaxies show asymmetry
You know how there's no way to rotate your left hand to make it look like a right hand? There's always something different. The palm facing the opposite direction, or the thumb is at the other side.
The same can be done with tetrahedrons. If they have all different length sides, you can find that the ordering of the sides from shortest to longest go on "spin" around one of the vertices, in a way that doesn't change no matter how you rotate the shape.
The way we understand the universe, we expect that we would have 50% of such arrangements "spinning in one direction", and the rest to "spin in the other direction". But it's actually not balanced.
> Why is that important?
If the result is confirmed, it's important because it breaks one of the fundamental assumptions we had about the universe. We don't know what would cause this difference yet, but the finding proves that there is/was something that would cause this difference, and it is the first clue towards figuring out what it could be.
I'm a physicist but this isn't my field, so I will comment with my understanding, which may be wrong. Usually we expect that on a large scale there should be a uniform structure across the universe, i.e. the universe should be homogenous on a cosmic scale. Galaxies should be for the most part evenly distributed across the universe since everything came from a singularity at the big bang. The reason why we see any asymmetry on a small scale, e.g. we see galaxies and stars and planets form, is because of comparatively "microscopic" effects of physical laws. E.g. clusters, galaxies, stars, and planets form because they overcome the expansion of the universe, but generally the matter should be distributed evenly on a big enough scale.
This result is very interesting, because it implies some kind of asymmetric structure of the universe on a huge, cosmological scale. But how can such a structure arise? What causes such a structure to arise?
But I think for this study it doesn't really matter. As long as all the galaxies are in the same coordinate system, they can calculate the relative distances between any two galaxies. And since they all came from the same sky survey, all the data was probably already in the same coordinate system.
Life can benefit from and seek low-entropy solutions - they are easier to deal with for many situations, be they cognitive, evolutionary, or otherwise. But in the process of making them we naturally put the second law of thermodynamics to use, i.e. increasing entropy in the universe around us. It is vaguely reminiscent of a heat engine - we do our best to expel entropy, but in doing so create slightly more.
Therefore (to put it poetically) it could be argued that chaos creates order to beget more chaos, just like a refrigerator does work to inevitably increases the net heat around it.
This is reminiscent of the saying "a foolish consistency is the hobgoblin of little minds" - it is strategically better to embrace a little chaos in the details that do not overly matter, to focus on consistency where it actually benefits.
As I understood the article, they made no assumptions abut connectedness. They basically tried all combinations of 4 galaxies out of their sample set, and counted the handedness of all those. some clever maths required to deal with the combinatorial explosion
Yep. The key here is that if you choose four arbitrary galaxies from a set, there's two possibilities -- their positions in space are close enough that there's interesting things to be learned about their formation; or they're not. In the latter case, there's a strong reason to expect no bias to the tetrahedral side lengths; in the former case, there may or may not be. If you then look at all possible combinations of four galaxies -- including all the four-tuples that are "real" in 3D space, but also all the four-tuples that aren't -- the average of "no bias" and "interesting bias" will be "smaller, but interesting bias."
Which is why, once a bias is found, figuring out it's p value is non-trivial...
Given the Gosmic Microwave Background Radiation, and things like this study, there were clearly some "parameters" in effect at the strart of the universe.
I'll stop there, call that whatever you want.
But whatever those parameters were, allowed me to type this comment on Hacker News.