Often embeddings are not so good for comparing similarity of text.
A cross-encoder might be a good alternative, perhaps as a second-pass, since you already have the embeddings.
https://www.sbert.net/docs/pretrained_cross-encoders.html
Pairwise, this can be quite slow, but as a second pass, it might be much higher quality. Obviously this gets into LLM's territory, but the language models for this can be small and more reliable than cosine on embeddings.