Nice work. You could encode the text, load this into a vector database and allow...

pknerd · on May 20, 2023

Pardon my ignorance as I have not worked on Vector DBs yet, could you come up with an example how it'd be different than a full text search?

bigzyg33k · on May 20, 2023

Here's a (kinda) ELI5: you would use a language model to create "embeddings" of the text, which you can think of as a set of numbers representing the "meaning" of a set of characters.

These numbers can be plotted as points in a space, and embeddings of things with similar meanings are plotted close to each other. So things like "exam preparation" would have embeddings close to things like "top study tips".

Say you have created embeddings for a large corpus of text (in this case all youtube captions) once. If you create embeddings for a user query, you can search for embeddings close to it, and these will be "semantically" similar to the query.

The advantage is that unlike traditional full-text search, the user doesn't need a query that includes words present in the text.

cced · on May 20, 2023

Do you have any resources that might guide one on doing something like this from scratch?

fzliu · on May 21, 2023

Here's one that may be helpful for you: https://zilliz.com/blog/using-vector-database-search-white-h...

djhn · on May 20, 2023

Here's a 6 minute speed run of something like that on weviate https://youtu.be/mBcBoGhFndY

password4321 · on May 20, 2023

https://news.ycombinator.com/item?id=34684593#34689275

rahimnathwani · on May 20, 2023

https://news.ycombinator.com/item?id=36012299

Reflecticon · on May 20, 2023

Something like this? https://dexa.ai

monkeydust · on May 20, 2023

Yes in theory although they are pretty expensive. I am doing something like this at work as I wanted to unlock the wealth of information we have in our tutorials, webinars etc.

pyinstallwoes · on May 20, 2023

https://weaviate.io/ Looks interesting. I was just reading about it.

rahimnathwani · on May 20, 2023

If you're getting started with Weaviate, these two are probably what you need:

1. Wizard to create a docker-compose file: https://weaviate.io/developers/weaviate/installation/docker-... (e.g. choose the embedding model)

2. Sample notebook showing how to index items using the python library: https://github.com/weaviate-tutorials/vector-provision-optio...