more raphaelty's comments

raphaelty · on April 28, 2023

I build my personnal search engine which record things I like on twitter, blog posts etc.. It automatically calls those APIs using Github Action and store them in an open source database (json file)

I actualy use it at least twice a week to retrieve content I bookmarked, so I'm happy to have created such a tool.

The app: https://raphaelsty.github.io/knowledge/?query=bayesian

The Github: https://github.com/raphaelsty/knowledge

raphaelty · on Jan 10, 2022

I think 10 million documents is a large corpus. A retriever like Sklearn TfIdf will have a hard time handling it in a reasonable time. The main goal of Cherche is to prototype a neural search engine quickly and with a large choice of retrievers and rankers for corpus sizes < 1 million documents which is a common use case in the industry.

Search implements a wrapper of the Python ElasticSearch client that is scalable and dedicated to corpora composed of tens of millions of documents.

raphaelty · on Jan 10, 2022

Thank you for these great resources.

raphaelty · on Jan 9, 2022

Hi,

1) The dependency on the Elasticsearch python client allows Elasticsearch to be used as a retriever. The same goes for Lunr. It might be interesting to separate the different dependencies.

2) Of course I'll update it.

raphaelty · on Jan 9, 2022

I'm more used to reading than posting on Hacker News. I'll do better next time. :)

raphaelty · on Oct 17, 2020

Knowledges graphs are structured resources in the form of graphs that contain knowledge. These resources are used in a large number of applications linked to the machine learning.

I just published a library dedicated to knowledges graphs embeddings. The Mkb API is inspired by Scikit Learn. I provide modular tools for building latent graph representations.