API wise, it looks very similar to the OpenAI python SDK but not quite the same....

WiSaGaN · on Jan 25, 2024

There is an issue for this: [1]. I think it's more of priority issue.

[1] https://github.com/ollama/ollama/issues/305

d4rkp4ttern · on Jan 25, 2024

Same question here. Ollama is fantastic as it makes it very easy to run models locally, But if you already have a lot of code that processes OpenAI API responses (with retry, streaming, async, caching etc), it would be nice to be able to simply switch the API client to Ollama, without having to have a whole other branch of code that handles Ollama API responses. One way to do an easy switch is using the litellm library as a go-between but it’s not ideal.

For an OpenAI compatible API my current favorite method is to spin up models using oobabooga TGW. Your OpenAI API code then works seamlessly by simply switching out the api_base to the ooba endpoint. Regarding chat formatting, even ooba’s Mistral formatting has issues[1] so I am doing my own in Langroid using HuggingFace tokenizer.apply_chat_template [2]

[1] https://github.com/oobabooga/text-generation-webui/issues/53...

[2] https://github.com/langroid/langroid/blob/main/langroid/lang...

Related question - I assume ollama auto detects and applies the right chat formatting template for a model?

lhenault · on Jan 25, 2024

I've built exactly this if you want to give it a try : https://github.com/lhenault/simpleAI