Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The article's core claims are:

> Extracting, structuring or synthesizing "insights" from academic publications (papers) or building knowledge bases from a domain corpus of literature has negligible value in industry.

> Most knowledge necessary to make scientific progress is not online and not encoded.

> Close to nothing of what makes science actually work is published as text on the web

> The tech is not there to make fact checking work reliably, even in constrained domains.

> Accurately and programmatically transforming an entire piece of literature into a computer-interpretable, complete and actionable knowledge artifact remains a pipe dream.

It also states existing old school "biomedical knowledge bases, databases, ontologies that are updated regularly", with Expert Entry cutting through the noise in a way that NLP cannot.

Although I disagree with its conclusions, much of this jives with my experience. From the perspective of research, modern NLP and transformers are appropriately hyped but from the perspective of real world application, they are over-hyped. Transformers have deeper understanding than anything prior, they can figure out patterns in their context with a flexibility that goes way beyond regurgitation.

They are also prone to hallucinating text, quoting misleading snippets, require lots of resources for inference and enjoy being confidently wrong at a rate that makes industrial use nearly unworkable. They're powerful but you should think hard about whether you actually need them. Most of the time their true advantage is not leveraged.

-----

My disagreements are with its advice.

> For recommendations, the suggestion is "follow the best institutions and ~50 top individuals".

But this just creates a rich get richer effect and retards science since most are reluctant to go against those with a lot of clout.

> Why purchase access to a 3rd party AI reading engine...when you can just hire hundreds of postdocs in Hyderabad to parse papers into JSON? (at a $6,000 yearly salary). Would you invest in automation if you have billions of disposable income and access to cheap labor? After talking with employees of huge companies like GSK, AZ and Medscape the answer is a clear no.

This reminds me of responses to questions of the sort: "Why didnt't X (where X might be Ottomans or Chinese) get to the industrial revolution first?".

Article also warns against working on ideas such as "...semantic search, interoperable protocols and structured data, serendipitous discovery apps, knowledge organization."

A lot such apps are solutions chasing after a problem but could work if designed to solve a specific real world problem. On the other hand, an outsider trying to start a generalized VC backed business targeting industry is bound to fail. In fact, this seems a major sticking point in the author's endeavor.

Industry is jaded and set in their ways, startups focus on summarization and recommendations and retrieval which are low value in scientific enterprise and academia is focused on automation which turns out brittle. Still, this line of research is needed. Knowledge production is growing rapidly while humans are not getting any smarter. Specialization has meant increases in redundant information, loss of context and a stall in theory production (hence "much less logic and deduction happening").

While the published literature is sorely lacking, humans can with effort extract and or triangulate value from it. Tooling needs to augment that process.



"follow the best institutions and ~50 top individuals" wasn't meant as a suggestion actually, just an observation of what most people do.

You're right they "could work if designed to solve a specific real world problem" but against what baseline? The baseline could be spending that time on actual deep tech projects and not NLP meta-science


But you're right; open source projects for extracting infos (like PubTator) are valuable but ontologies/KGs need ongoing expert (ML, AI, SWEs, information architects, labelers) work (unlike most of Wikipedia or GH) so it's tough to make something that doesn't suck in a distributed open source fashion




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: