Author here. This section of the readme has more information: https://github.com...

Author here. This section of the readme has more information: https://github.com/lastmile-ai/llama-retrieval-plugin#retrie...

It does use a vector database (pinecone, weaviate, etc.) to store embeddings. The embeddings are created using OpenAI's text-embedding-ada-002 model, but that's not a requirement. In fact we are looking at embeddings generation through BERT or RoBERTa to benchmark performance.

At prompt time, the plugin retrieves the nearest embeddings to the prompt, and inserts them into a more complete prompt before sending it to the model.