Hacker Newsnew | past | comments | ask | show | jobs | submit | more raphaelty's commentslogin

I build my personnal search engine which record things I like on twitter, blog posts etc.. It automatically calls those APIs using Github Action and store them in an open source database (json file)

I actualy use it at least twice a week to retrieve content I bookmarked, so I'm happy to have created such a tool.

The app: https://raphaelsty.github.io/knowledge/?query=bayesian

The Github: https://github.com/raphaelsty/knowledge


I think 10 million documents is a large corpus. A retriever like Sklearn TfIdf will have a hard time handling it in a reasonable time. The main goal of Cherche is to prototype a neural search engine quickly and with a large choice of retrievers and rankers for corpus sizes < 1 million documents which is a common use case in the industry.

Search implements a wrapper of the Python ElasticSearch client that is scalable and dedicated to corpora composed of tens of millions of documents.


Thank you for these great resources.


Hi,

1) The dependency on the Elasticsearch python client allows Elasticsearch to be used as a retriever. The same goes for Lunr. It might be interesting to separate the different dependencies.

2) Of course I'll update it.


I'm more used to reading than posting on Hacker News. I'll do better next time. :)


Knowledges graphs are structured resources in the form of graphs that contain knowledge. These resources are used in a large number of applications linked to the machine learning.

I just published a library dedicated to knowledges graphs embeddings. The Mkb API is inspired by Scikit Learn. I provide modular tools for building latent graph representations.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: