What is the body-level phenotype of a ribcage by race?
I think what baffles me is that black people as a group are more genetically diverse than every other race put together so I have no idea how you would identify race by ribcage x-rays exclusively.
I use the term genetic history, rather than race, as race is only weakly correlated with body level phenotypes.
If your question is truly in good faith (rather than a "I want to get in argument "), then my answer is: it's complicated. Machine learning models that work on images learn extremely complicated correlations between pixels and labels. If on average, people with a specific genetic history had slightly larger ribcages (due to their genetics, or even socioeconomic status that correlated with genetic history), that would exhibit in a number of ways in the pixels of a radiograph- larger bones spread across more pixels, density of bones slightly higher or lower, organ size differences, etc.
It is true that Africa has more genetic diversity than anywhere else; the current explanation is that after humans arose in africa, they spread and evolved extensively, but only a small number of genetically limited groups left africa and reproduced/evolved elsewhere in the world.
I am genuinely asking because it makes no sense to me that a genetically diverse group are distinctly identifiable by their ribcage bones in an x-ray. If it's something more specific like AI sucks at statistically larger ribcages, statistically noticeable bone densities, or similar, okay. But something like so-small-humans-cannot-tell-but-is-simultaneously-widely-applicable-to-a-large-genetic-population is utterly baffling to me.
I dunno. My perspective is that I've worked in ML for 30+ years now and over time, unsupervised clustering and direct featurization (IE, treating the image pixel as the features, rather than extracting features) have shown great utility in uncovering subtle correlations that humans don't notice. Sometimes, with careful analysis, you can sort of explain these ("it turns out the unlabelled images had the name of the hospital embedded in them, and hospital 1 had more cancer patients than hospital 2 patients because it was a regional cancer center, so the predictor learned to predict cancer more often for images that came from hospital 1") while other cases, no human, even a genius, could possibly understand the combination of variables that contributed to an output (pretty much anything in cellular biology, where billions of instances of millions of different factors act along with feedback loops and other regulation to produce systems that are robust to perturbations).
I concluded long ago I wasn't smart enough to understand some things, but by using ML, simulations, and statistics, I could augment my native intelligence and make sense of complex systems in biology. With mixed results- I don't think we're anywhere close to solving the generalized genotype to phenotype problem.
Sounds like "geoguesser" players who learn to recognize google street view pictures from a specific country by looking at the color of the google street view car or a specific piece of dirt on the camera lens.
Yeah, there's also an likely apocryphal story about tanks and machine learning:
https://gwern.net/tank
The more you work with large-scale ML systems the more you develop an intuition for these kinds of properties. If you work a lot with debugging models and training data, or even just dimensionality reduction and matrix factorization, you begin to realize that many features are highly correlated with each other, often being close to scaled linear.
> it makes no sense to me that a genetically diverse group are distinctly identifiable by their ribcage bones in an x-ray
I don't see how diversity would prevent identification. Butterflies are very diverse, but I still recognize one and don't think it's a bird. As long as the diversity is constrained to specific features, it can still be discriminated (and even if it's not, it technically still could be by just excluding everything else).
If differences exist then statistical methods will have a better chance at finding them than human intuition, yes. I'm not sure why this is baffling to you.
Africa is extremely diverse but due to the slave trade mostly drawing from the Gulf of Guinea (and then being, uh... artificially selected in addition to that) 'Black' -as an American demographic- is much less so.
If you have 2 samples where one is highly concentrated around 5 and the other is dispersed more evenly between 0 and 10 then for any value of 5 you should guess Sample 1.
But anyways, the article links out to a paper [1] but unfortunately the paper tries to theorize things that would explain how and they don't find one (which may mean the AI is cheating imo not theirs).
Sub-Saharan Africans are extremely genetically diverse but a sample of ~100 Black Americans is unlikely to have any Khoekhoe or Twa representation.
Anyway it’s possible that the model can pick up on other cues as well; if you had some X-rays from a hospital in Portland, Oregon and some from a hospital in Montgomery, Alabama and some quirk of the machine in Montgomery left artifacts that a model could pick up on, the presence of those artifacts would be quite correlated with race.
I think what baffles me is that black people as a group are more genetically diverse than every other race put together so I have no idea how you would identify race by ribcage x-rays exclusively.