Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

While I'm loath to endorse any sort of widespread "big data" on our genetics, I can't help but wonder what other gems we might glean if metrics like genes and medical outcomes of everyone ever born were somehow made safely available to researchers.

(Maybe some kind of decentralized and anonymized project with voluntary participants? Some googling found https://www.labiotech.eu/features/blockchain-control-genomic...)



This can't currently be done. Taking the US as an example, if your data is in EMRs, it's possible for researchers to access your data, but it depends on what you consented to. Obtaining additional consent- for example by mailing a letter to everybody in an org's EHR system- isn't super-effective because most people don't respond to these requests.

Orgs like GA4GH are trying to change consent forms for people who voluntarily take part in research so that other researchers can pull the EMR data and aggregate it for other research. This would only apply to people in the future, we don't normally have such open consent forms right now.

Personally I think it's unlikely that large orgs would agree to decentralized systems. Instead, they would run this on a major cloud provider using standard cloud features like encryption and IAM, as well as standard de-identification techniques.

In principle, if we had a single global database of billions of patients with both high dimensional data (genomes, images) and labels (medical outcomes such as "got flu again"), a lot could be done, but researchers have to be extremely careful. Most folks aren't trained to deal with big data and will immediately attempt to do all sorts of testing that is prone to false positives (data dredging/p-hacking is very common in medical fields, and it gets worse as you have more data).

Note that ultra-large GWAS studies are being done, on datasets with millions of patients, and the results are interesting; you can explain, for example, variance of height using genomic variation data, up to a limit (height is only partially determined by our genome, as is true of effectively any large-scale coarse-grained phenotypic trait).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: