More

gww · 2025-07-31T15:11:07 1753974667

I have a different problem with my M3 Macbook Pro. If I leave chrome (sometimes other apps too) open with the macbook plugged in and the lid closed the computer will get very warm and stay very warm until I unplug it / close chrome.

Edit: It's also not warm when plugged in and using chrome with the lid open.

gww · 2025-07-17T12:24:42 1752755082

Reminds me of one of my favorite Simpsons lines: "Aw, you can come up with statistics to prove anything, Kent. Forty percent of all people know that"

jrmg · 2025-07-17T13:07:06 1752757626

I’m partial to Homer’s “Facts are meaningless. You can use facts to prove anything that's even remotely true!”

mvcalder · 2025-07-17T12:46:26 1752756386

“They say sixty-five percent of all statistics Are made up right there on the spot”

https://www.youtube.com/watch?v=IUK6zjtUj00

gww · 2025-05-02T11:02:45 1746183765

They are used somewhat commonly in bioinformatics where lookup tables can have long keys and many entries [1].

1. https://en.m.wikipedia.org/wiki/Bloom_filters_in_bioinformat...

gww · 2025-03-09T23:37:02 1741563422

They didn't sequence the whole human genome (~3 billion bases) for multiple reasons. I am not an expert on ancient DNA but I will try to explain the paper as best I can:

1. Contamination with other flora and fauna DNA 2. Relative low proportions of human DNA 3. The DNA is usually highly degraded, which limits the analyses to short read sequencing (in this case they used 76 bp reads). The halflife of human DNA is ~521 years.

To mitigate these problems they used multiple targeted approaches including one to isolate mitochondrial DNA, where they managed to sequence the whole ~16kb human mtDNA, where each base was covered by 62 sequencing reads on average (62x coverage).

They used another to isolate specific regions containing single nucleotide polymorphisms (SNPs), which are DNA mismatches known to be related to ancient human DNA and humans. They targeted 470,724 single nucleotide polymorphisms of which 70% (336,429) were recovered.

They did perform shotgun sequencing on all of the DNA isolated, but due to species assignment issues they again focused on fragments that contain diagnostic SNPs in these cases they only recovered a small number of SNPs per sample, again due to the relatively low proportion of human DNA and its degradation (20,526, 3,734, 124,862, 85,901, 34,756, 41,632, 34,677 and 72,992) as per the legend in figure 3.

dexwiz · 2025-03-10T01:36:04 1741570564

That analysis makes me think of matching more than recovery.

thangngoc89 · 2025-03-10T04:00:53 1741579253

"matching" is exactly how we do DNA sequencing right now. The current technology is called next generation sequencing (NGS), we multiply the DNA and perform matching digitally to construct the full DNA.

vintermann · 2025-03-10T13:02:41 1741611761

It's quite fascinating. It's like if order to figure out the shape of a teacup, we generate thousands of identical copies, smash them all to rather small bits, and then try to count the different types of shards as a first step to piecing together one full copy. Impressive that it works.

inetknght · 2025-03-11T00:06:35 1741651595

> It's like if order to figure out the shape of a teacup, we generate thousands of identical copies, smash them all to rather small bits, and then try to count the different types of shards as a first step to piecing together one full copy. Impressive that it works.

Yes, but you've got the order wrong.

The teacup is smashed before all of the identical copies are created.

(I wrote DNA analysis software for 6.5 years)

dekhn · 2025-03-10T16:05:54 1741622754

It's not fascinating; it's an endless source of trouble. We only do it because we don't have sequencers that produce extremely long (chromosome length) high quality reads, especially in sequences that contain a lot of repetition. This has been a source of errors and ambiguity for as long as we've used shotgun.

gww · 2025-03-10T16:48:50 1741625330

This is a great analogy. One small change is that there are two ways to reassemble it. One is to try to blindly put the pieces together and fork a teacup (read assembly) vs trying to use a picture of the teacup to figure out where the pieces go (read alignment / mapping)

sweeter · 2025-03-10T09:17:13 1741598233

Would it be possible to clone an ancient human being from DNA?

Cthulhu_ · 2025-03-10T09:26:41 1741598801

Probably not, not nearly enough material remained to make an accurate clone. The article mentions 70% recovery rate; according to the internet, humans share 98% of DNA with chimpanzees (and 35% with daffodils), so unless you have 100% or 99.9999% of the DNA, the clone will be imperfect at best and a Thing That Should Not Be at worst.

saagarjha · 2025-03-10T09:35:04 1741599304

I think your ethical board would probably stop you first.

gww · on Aug 23, 2024

The BWT based FM-index is one of my favorite data structures. It's used frequently for DNA mapping, where the 4 letter alphabet can be encoded in two bits and the occurrence function can use clever caching, bit bashing and the pop count function to get nice performance.

gww · on July 28, 2024

Users of these kinds of tools should check that their marker genes are associated with the labelled cell types. There are known markers for many cell types across multiple organisms.

gww · on July 28, 2024

This is really useful thanks for sharing. My students and myself tend to waste a lot of time annotating clusters and have not found a reasonable solution yet. This will be fun to try.

codingfisch · on July 28, 2024

I have written a neural network architecture (way smaller than llama) that can be trained to automate this process. Check out the Custom-Data-Tutorial in the repo!

GitHub: https://github.com/wwu-mmll/gatenet Paper: https://www.sciencedirect.com/science/article/pii/S001048252...

gww · on July 28, 2024

Will check it out. Thanks a lot

celltalk · on July 28, 2024

Thank you. Hope it is useful!

gww · on July 28, 2024

Could this also be adapted for gene set enrichment? For example, if I had a set(s) of genes from an ATAC-seq experiment would it be able to guess their function / cell types?

celltalk · on July 28, 2024

It should be okay if you edit the base prompt properly.

gww · on July 28, 2024

Cool thanks

gww · on June 9, 2024

It is common for human cancers to be polyploid after accumulate whole genome doublings (WGD), where a tumour cells goes from being approximately diploid to tetraploid. Different tumour types have higher rates of WGD, for example, glioblastoma, ovarian cancer, and pancreatic adenocarcinoma. But what usually happens is that the tumour loses parts of the doubled genome to reach a ploidy (average copy number across the genome) of 3-4ish.

gww · on March 13, 2024

I am going to steal that name. I find that computational overcooking happens more and more because the easy questions that can be asked with sequencing datasets is starting to dry up.

dormento · on March 14, 2024

> computational overcooking

The two words make perfect sense together. Reminds me of the "deep-fried memes" subculture.

gww · on Jan 15, 2024

I would argue that these are some of the most important algorithms in bioinformatics/biology. They have a wide range of applications.