I asked it some time ago: https://cs.stackexchange.com/questions/147713/why-word...

soVeryTired · on Jan 13, 2024

I have to say, neither of those answers are very satisfying. Neither is wrong but I don't come away with a clear explanation for why cosine similarity is the obvious metric to use.

sdenton4 · on Jan 13, 2024

For euclidean distance, we have (q-v)^2 = qq + vv - 2qv. The qq and vv don't tell us anything about the relationship between q and v, so are irrelevant to search. So really we are mostly trying to maximize inner product when we search.

Additionally, the scale of q (the query) is constant across any given search, so might as well normalize q; the results won't change.

Then the only question is whether the scale of v matters... It, again, doesn't tell us anything about the relationship between q and v. So, might as well normalize... And then we are just doing cosine similarity.