You're welcome. I recently noticed I get better performance with VLMs when the queries are phrased this way - Descriptive Keys instead of explaining the problem in sentences. Similar to COT reasoning that many people claim gives better results, I personally found querying in this sequence - existenceOfEntity, numberOfEntities followed by propertiesOfEntities etc tends to give better results. I haven't verified any of this rigorously so please do take it with a pinch of salt :)
This one is new to me. I still have my copies of Physics for Entertainment and Mathematics can be fun by the same author - Yakov Perelman. I owe my superfast stereogram decoding skills ( < 2 seconds most of the times ) to him :)
Here's an interesting tidbit from his wikipedia page - He is not related to the Russian mathematician Grigori Perelman, who was born in 1966 to a different Yakov Perelman. However, Grigori Perelman told The New Yorker that his father gave him Physics for Entertainment, and it inspired his interest in mathematics
This is my go to method for pretty much every hard problem that I'm forced to solve where I don't have the domain expertise / interest / time. The trick lies in coming up with a clever similarity metric that incorporates penalties etc. You can even go a level deeper and use multiple similarity algorithms and then poll on top of them. Here's a taxonomy extractor for text that I made using similar principles that is surprisingly as good as anything else that I've seen - https://dash.scooptent.com/text
From their Terms of Use section - The GDELT Project is an open platform for research and analysis of global society and thus all datasets released by the GDELT Project are available for unlimited and unrestricted use for any academic, commercial, or governmental use of any kind without fee.
Congrats on the launch. This is something I'd spent some time on few years ago.
I hacked together something similar for my usecase by reverse engineering. No ML model though - Using Nearest neighbours and Tversky similarity measures in Julia with the same taxonomy that you are using.
Tested with one of the comments from this thread.
requests.post(
"https://x2vud9xfq0.execute-api.ap-south-1.amazonaws.com/api/text/classify",
json={
"text": """
And, to be frank, I can't see why I'd send my confidential information to you when I can send it to Google. (Ahem!)
But the problem with theirs and yours is the OOTB categories are for a global topic set, something like Yahoo directory, rather than for a given discipline. And what's generally needed is a set of disciplines, or several topic trees. (Think Amazon.com instead of Yahoo.)
I've found the general lists, like LCM[^1] (what you really want is LCSH[^2] subject headings, not LCM), too broad for my business or personal content, while something like ACM[^3] is more what's needed for, say, computing related content.
For a firmwide knowledge base at a {field}-tech firm, you have a mix of the firm's focus field, and computing, and a broad scope fallback like you're starting with. Even libraries have their own topic hierarchy! [^4]. Plenty fields have controlled vocabularies[^6], and if you can't find one for a field, you can usually generate one by finding someone who is already classifying that field, and looking at their TOC. All of which is to say, to be generally useful, you have to let people BYOT (bring your own topics) for this.
For instance, we built our topic list based on combining a reference taxonomy for our field, a reference taxonomy for computing, a reference taxonomy for business books, and the Google NLP tool mentioned above.
There are occasional tools that try to match arbitrary documents to arbitrary hierarchies such as clerk [^5] but they are challenging for various reasons.
You have a note to contact you for different topics, but raising this here since so far (6 hours) you had no feedback, and I'm a big fan of what you're doing and the niche is underserved.
A couple other thoughts:
""",
'key': 'HACKERNEWS'
}
).json()
{
'genres': {'Technology': 24, 'Finance': 16, 'Education': 11},
'tags': {'/Business & Industrial/Small Business/MLM & Business Opportunities': 5.094265117745211,
'/Internet & Telecom/Web Services': 5.51434499612552,
'/Finance/Investing': 5.72584536853734,
'/Business & Industrial/Business Operations': 5.888633926463297,
'/Jobs & Education/Education/Standardized & Admissions Tests': 6.0132143106028435,
'/Business & Industrial/Business Services': 6.100261915913882,
'/Jobs & Education/Jobs': 6.126547614437338,
'/Science/Earth Sciences/Atmospheric Science': 6.1553064528175545,
'/Finance': 6.249046550441405,
'/Business & Industrial': 6.333431648078183},
'id': '65f891a111ec14ddd4b56bda'
}
Thanks for sharing. I find it super fascinating that we can model macro phenomena using game theory constructs and validate them with experiments. Very cool. I’ll see if I can get hold of the paper. Cheers
I’m one of those savages who can finish a bottle of Indian pickle in one go. Even if my teeth start hurting because of the acidity, I still keep going at it by sucking on them. Nothing like a hyper concentrated dose of sourness explosion in your mouth.
Brings back childhood memories. Enjoyed reading about these and the accompanying illustrations as a kid from the excellent book Physics for Entertainment by Yakov Perelman. Nirantara Chalana Yantralu they were called in Telugu translation.
I read the same in Malayalam! That book was one of my earliest introductions to practical physics. He deconstructs pretty much all the early "perpetual machine" designs in the book.
I also particularly remember the chapter with the experiments with soap bubbles as being very interesting as a kid.
Intersecting Lines https://replicate.com/p/s24aeawxasrgj0cgkzabtj53rc
Overlapping Circles https://replicate.com/p/0w026pgbgxrgg0cgkzcv11k384
Touching Circles https://replicate.com/p/105se4p2mnrgm0cgkzcvm83tdc
Circled Text https://replicate.com/p/3kdrb26nwdrgj0cgkzerez14wc
Nested Squares https://replicate.com/p/1ycah63hr1rgg0cgkzf99srpxm