Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Explicit Ontologies in a World Without (davidbieber.com)
41 points by dbieber on Dec 19, 2022 | hide | past | favorite | 8 comments


The key problem with trying to use LLMs as a proxy for ontologies is that you have to figure out what ontology the LLM is actually using if that is even possible in the first place, because LLMs should in principle be encoding multiple perspectives.

If all you want is search and retrieval, that might be ok. Otherwise, you are right back where you started with multiple implicit ontologies that individuals and groups struggle to reconcile.

However, there is a much deeper problem that is faced by ontologies, whether they are implicit/explicit, trapped in an LLP, or written in OWL, and that is that ontologies are best thought of as collections of hypotheses which must be tested and testable in order to be useful and verifiable. We are only now starting to think about how to get formal ontologies into the loop for validation based on observable data in the life sciences (beyond say, pointing the the literature).

LLMs can generate so much garbage that validating any latent ontology they may contain is likely to be both absolutely critical for them to be remotely useful, and also extremely difficult/labor intensive, bringing them right back down to reality when it comes to the difficulty of validating and verifying them.

In the end, formal manual ontologies look hard because they tend to put the validation of the model first. LLM pseudo ontologies might look easy, but the cost of validating and verifying them will likely be almost exactly the same in the end (if not worse).

The reason for this is that the real cost is reconciling a model with reality and having strong control over what constitutes valid data about reality, or making the measurements on the real world to verify some statement.

LLMs might help when it comes to coverage of a domain, but if that coverage is achieved by also having 80% of all statements being demonstrably false and leaving it to the users to determine which 20% are true, then the coverage probably isn't worth it.


Around ~2000 we founded a startup on knowledge management (to tag people), and we had a nice fuzzy search on ontologies implemented.

We got into large companies to solve their problem: Different ontologies in every department (E.g. engineering vs. marketing vs. sales).

Mapping didn't help because their fundamental world views were different.

We failed.

(My Wiki engine made it into Atlassian Confluence though, and Confluence users had to bear with my horrible {...} wiki macro syntax for years)


From my perspective, ontologies serve 2 main purposes: 1. Providing an overview of the structure of a large body of content (e.g. a wiki). 2. Measuring, and understanding the change in the same corpus.

There used to be fancier applications to ontologies, like question answering, but I agree with the author that LLM could replace most of them. The more interesting question is how to auto-generate ontologies?


I like the author’s cautious and careful explanation of the context. And so greater my disappointment with a phrase like this:

“These models can understand and extract relevant concepts and relationships from unstructured text…”

Models “understand”? Is the author being cheeky? Why so careful, and then drop such a blatant anthropomorphism? Sheesh.

Writing is hard.


Serious: I might be crazy when i say that i find most value of ontology exactly where the definition as used in philosophy differs from the definition as used in information sciences. Will LLM's make this more apparent, less or simply cause new simulacra while intelligence is lost ?


what is the difference? (I know neither def'n)


> ...

> Yi Yi tore his gaze from the Cloud of Poems and picked up a crystal chip off the ground. These chips were scattered all around them, sparkling like shards of ice in winter. Yi Yi raised the chip against a sky thick with the Cloud of Poems. The chip was very thin, and half the size of his palm. It appeared transparent from the front, but if he tilted it slightly, he could see the bright light of the Cloud of Poems reflect off its surface in rainbow halos. This was a quantum memory chip. All the written information created in human history would take up less than a millionth of a percent of one chip. The Cloud of Poems was composed of 10^40 of these storage devices, and contained all the results of the ultimate poem composition. It was manufactured using all the matter in the sun and its nine major planets, of course including the Devouring Empire.

> "What a magnificent work of art!" Bigtooth sighed sincerely.

> "Yes, it's beautiful in its significance: a nebula fifteen billion kilometers across, encompassing every poem possible. It's too spectacular!" Yi Yi said, gazing at the nebula. "Even I'm starting to worship technology."

> Li Bai gave a long sigh. He had been in a low mood all this time. "Ai, it seems like we've both come around to the other person's viewpoint. I witnessed the limits of technology in art. I--" He began to sob. I've failed...."

> "How can you say that?" Yi Yi pointed at the Cloud of Poems overhead. "This holds all the possible poems, so of course it holds the poems that surpass Li Bai's!"

> "But I can't get to them!" Li Bai stomped his foot, which shot him meters into the air. He curled into a ball in midair, miserably burying his face between his knees in a fetal position; he slowly descended under the weak gravitational pull of the Earth's shell. "At the start of the poetry composition, I immediately set out to program software that could analyze poetry. At that point, technology once again met that unsurpassable obstacle in the pursuit of art. Even now, I'm still unable to write software that can judge and appreciate poetry." He pointed up at the Cloud of Poems. "Yes, with the help of mighty technology, I've written the ultimate works of poetry. But I can't find them amid the Cloud of Poems, ai..."

> ...

-Cixin Lie, "Cloud of Poems"

---

> ...researchers were running up against the problem of representing significance and relevance—a problem that Heidegger saw was implicitin Descartes’ understanding of the world as a set of meaningless facts to which the mind assigned values... (https://cspeech.ucd.ie/Fred/docs/WhyHeideggerianAIFailed.pdf via https://news.ycombinator.com/item?id=8008053)

Further reading: https://www.ontology.co/husserle.htm


Projection onto a subontology and providing a formal way (much like a bureaucratic form) to enter data both sound suspiciously like the role of an ontology is to provide a means of going back and forth between intensions and extensions.

Edit: the same idea applies to delta'ing two data bases




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: