Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Nice work. You could encode the text, load this into a vector database and allow semantic search.


Pardon my ignorance as I have not worked on Vector DBs yet, could you come up with an example how it'd be different than a full text search?


Here's a (kinda) ELI5: you would use a language model to create "embeddings" of the text, which you can think of as a set of numbers representing the "meaning" of a set of characters.

These numbers can be plotted as points in a space, and embeddings of things with similar meanings are plotted close to each other. So things like "exam preparation" would have embeddings close to things like "top study tips".

Say you have created embeddings for a large corpus of text (in this case all youtube captions) once. If you create embeddings for a user query, you can search for embeddings close to it, and these will be "semantically" similar to the query.

The advantage is that unlike traditional full-text search, the user doesn't need a query that includes words present in the text.


Do you have any resources that might guide one on doing something like this from scratch?



Here's a 6 minute speed run of something like that on weviate https://youtu.be/mBcBoGhFndY




Something like this? https://dexa.ai


Yes in theory although they are pretty expensive. I am doing something like this at work as I wanted to unlock the wealth of information we have in our tutorials, webinars etc.


https://weaviate.io/ Looks interesting. I was just reading about it.


If you're getting started with Weaviate, these two are probably what you need:

1. Wizard to create a docker-compose file: https://weaviate.io/developers/weaviate/installation/docker-... (e.g. choose the embedding model)

2. Sample notebook showing how to index items using the python library: https://github.com/weaviate-tutorials/vector-provision-optio...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: