So that depends entirely on how they implemented the feature. There are a few ways this could be working:
- They gave their chat interface the ability to run regular full-text searches against Dropbox - when you ask a question that can be answered by file content, it searches for relevant files and then copies just a few paragraphs of text into the prompt to the AI.
- They might be doing this using embeddings-based semantic search. This would require them to create an embeddings vector index of all of your content and then run vector searches against that.
- If they're doing embeddings search, they might have calculated their own embeddings on their own servers... or they might have sent all of your content to OpenAI's embeddings API to calculate those vectors.
Without further transparency we can't tell which of these they've done.
My strong hunch is that they're using the first option, for cost reasons. Running embeddings is an expensive operation, but storing embeddings is even more expensive - to get fast results from an embeddings vectors store you need dedicated RAM. Running that at Dropbox scale would be, I think, prohibitively expensive if you could get not-quite-as-good results from a traditional search index, which they have already built.
If they ARE sending every file through OpenAI's embedding endpoint that's a really big deal. It would be good if they would clarify!
- They gave their chat interface the ability to run regular full-text searches against Dropbox - when you ask a question that can be answered by file content, it searches for relevant files and then copies just a few paragraphs of text into the prompt to the AI.
- They might be doing this using embeddings-based semantic search. This would require them to create an embeddings vector index of all of your content and then run vector searches against that.
- If they're doing embeddings search, they might have calculated their own embeddings on their own servers... or they might have sent all of your content to OpenAI's embeddings API to calculate those vectors.
Without further transparency we can't tell which of these they've done.
My strong hunch is that they're using the first option, for cost reasons. Running embeddings is an expensive operation, but storing embeddings is even more expensive - to get fast results from an embeddings vectors store you need dedicated RAM. Running that at Dropbox scale would be, I think, prohibitively expensive if you could get not-quite-as-good results from a traditional search index, which they have already built.
If they ARE sending every file through OpenAI's embedding endpoint that's a really big deal. It would be good if they would clarify!