Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

One way to get around context length is to perform embedding and retrieval of your entire corpus. Langchain (https://langchain.readthedocs.io/en/latest/) and Milvus (https://milvus.io) is one of the stacks you can use.


Can you elaborate on how this works?


You run the corpus through the model piecemeal, recording the model's interpretation for each chunk as a vector of floating point numbers. Then when performing a completions request you first query the vectors and include the closest matches as context.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: