Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Often embeddings are not so good for comparing similarity of text. A cross-encoder might be a good alternative, perhaps as a second-pass, since you already have the embeddings. https://www.sbert.net/docs/pretrained_cross-encoders.html Pairwise, this can be quite slow, but as a second pass, it might be much higher quality. Obviously this gets into LLM's territory, but the language models for this can be small and more reliable than cosine on embeddings.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: