Hacker News new | past | comments | ask | show | jobs | submit login

What makes the Y chromosome more difficult?



All of the chromosomes have difficult bits in them. The Y chromosome in particular has huge sections that are difficult.

By difficult, imagine a jigsaw puzzle. Difficult bits in the jigsaw puzzle are where you have the same sub-image repeated over and over again, or where the same image section is repeatedly scattered over the wider image. Puzzle pieces from these bits are difficult because you can't tell which part of the image such a small puzzle piece comes from, because it matches multiple places. It's technically impossible to resolve repetitive features where the repeating unit is larger than the pieces you are trying to assemble together. Modern technology gives us long read sequencing, where sequenced sections of DNA may be up to 100kbp or larger (maybe up to a 1Mbp), where older sequencing methods (that are still heavily used because they are cheaper) give us sequenced sections of DNA between 100-300bp long. (bp stands for base-pairs - one of [ACGT].) These larger puzzle pieces allowed the whole picture to be assembled without ambiguity.

However, this isn't why the Y chromosome was solved last. The reason for this is that the other chromosomes were analysed using a completely homozygous hydatidiform mole, which is where cells generate two copies of their entire genome from just one copy during conception, and therefore the two copies are identical. It makes the sequencing a lot easier if you don't have to deal with having two copies of the DNA that are slightly different. The side-effect of that is the hydatidiform mole doesn't have a Y chromosome, so they had to analyse a different sample later on to get a Y chromosome.


What does "mole" mean in this context?



It's on line 1 of the linked article: "The human Y chromosome has been notoriously difficult to sequence and assemble because of its complex repeat structure that includes long palindromes, tandem repeats and segmental duplications" and comes with 3 citations.


Maybe someone can correct the details since it's been a few years I done this, but we sequence DNA by PCR. Roughly, (1) breaking it up in small pieces and split strands, (2) mixing it with an enzyme that completes each single strand, (3) repeat 1 and 2 a bunch to multiply the strands many times over to make the solution a sense DNA juice, (4) pass it through a machine that'll sequence thousands of these small strands and (5) align these short DNA sequences with a software that matches unique sequences.

I did it with COI gene, which is just a short (1000ish base pairs with our snails IIRC) sequence of purely random ATGC base pairs. Lots of unique sequences make the short strands easy to match, just get a bunch of 10-15 BP bits and you can match the whole thing.

Now if your gene is 62M BP of repeating palindrome sequences, you can imagine how hard it would be to align random pieces sequenced as it will be very hard to find unique sequences to match.


We don't use PCR anymore! It's direct sequencing of the primary DNA. We can read single molecules. That's the quiet revolution in nanotechnology that's driving all these complete assemblies.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: