Cosine similarity is not affected by the size of the vector but only by the angle between them. This means that vectors with large or small values will have the same cosine similarity as long as they point in the same direction.
In semantic search the magnitude of vectors isn’t relevant - if you needed a measure that took magnitude into account than Euclidean would make sense. E.g. image embeddings based on pixel intensity.
It's not obvious to me that vector size isn't relevant in semantic search. What is it about the training process for semantic search that makes that the case?
In semantic search the magnitude of vectors isn’t relevant - if you needed a measure that took magnitude into account than Euclidean would make sense. E.g. image embeddings based on pixel intensity.