Here is the "dumb question" I've always had about recording the human genome.
We all have different DNA. So is "the human genome" some kind of "average" DNA, or is it the DNA of whoever they sampled, or is it maybe an overview of what is common for all of us?
They are talking about one 'reference genome'. The variation from human-to-human is relatively small (a few million bases out of 3 billion). The reference genome has historically been some kind of average/mosaic of several individuals (this has obvious disadvantages), good enough to put reads in the right place (mostly), and call 'variants' - the differences that make the test genome unique.
The latest/greatest end-to-end T2T reference is entirely based on 'HG002' an individual from Utah, due partly to new information derived from long read technologies.
Actually the level of variation per genome relative to a reference is still not completely known because we do not have more than a handful of truly complete assemblies. It is clear though that it is much higher than a few million base pairs, perhaps as high as tens of millions, depending on your alignment parameters. Most of the differences are in regions we have not been able to sequence and assemble until the past few years. This paper being a key example. If two males have different versions of large repetitive arrays on the Y then they will already be much more than "a few million" base pairs different
what would make a sample good? from the point of view of what quantity a sample would entail, there must be no shortage of cadavers or even discarded body parts?
We all have different DNA. So is "the human genome" some kind of "average" DNA, or is it the DNA of whoever they sampled, or is it maybe an overview of what is common for all of us?