We are optimizing for latency and vector search is sufficient in 80-90% of cases and 0.6s is about the threshold for acceptable end-user experience. Hybrid search with SPLADE is marginally better but it limits the number of human languages we can use. I am wondering when is full-text better compared to vector search outside of very specific keywords.
Latency of search isn’t much of a concern, I was speaking to quality but did not word it well.
We have just found that vector search does not play well with numbers and does not provide consistent results, so we end up needing more chunks which results compounding token usage, slower responses, and higher chances of incorrect responses due to the customer facing model getting confused by similar results. I’m sure we could optimize our approach but full text has worked far more reliably than expected so we have invested more resources into how we handle documents, latency reduction, and pulling in structured data.