Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Ollama now supports tool calling with popular models in local LLM (ollama.com)
81 points by thor-rodrigues on Aug 19, 2024 | hide | past | favorite | 24 comments


Pretty sweet to get to run models locally and have more advanced usages like tool calling, excited to try it out


It's a great start but there's a little more work to do for full OpenAI API compatibility, namely streaming support and the tool_choice parameter. Making it fully compatible would allow it be swapped in directly to frameworks like langchain and magentic [1] that I am building).

[1] https://github.com/jackmpcollins/magentic/issues/207


Where is `get_current_weather` implemented?

> Tool responses can be provided via messages with the `tool` role.


Not on the Ollama side.

This sample code shows how a sample implementation of a tool like `get_current_weather` might look like in Python:

https://github.com/ollama/ollama-python/blob/main/examples/t...


Ah, I see. The model returns the name of an appropriate tool, then the client takes arbitrary action, and appends the `tool` message to the chat context, and finally a second call to the model minges these together.

Part of me was hoping for some magic plugin space where I could drop named functions, but I couldn't imagine how.


You specify the API of your functions (inputs/description).

The LLM will decide which functions to call and with what values.

You perform the actual function execution.


Thanks.

If their announcement had included the schema of the return value of `response['message']['tool_calls']` it might've been more transparent to me.


I see Command-R+ but not Command-R marked for tool use. The model is geared for it, much easier to fit on commodity hardware like 4090s, and Ollama's own description for it even includes tool use. I think it's just not labeled for some reason. It works really well with the provided ollama-python package and other tools that already brought function calling capabilities via Ollama's API.

https://ollama.com/library/command-r


How does this compare to Agent Zero (frdel/agent-zero on GitHub)? Seems that provides similar functionality and uses docker for running the scripts / code generated.


Ollama provides an API endpoint that now supports the ability for an LLM to use tools/functions. Ollama is not a framework itself.

Agent Zero already can use Ollama and alternatives to run the LLMs, and this new feature should enable it to more accurately call tools that is getting built into the models that support it.


The first I think of when anyone mentions agent-like “tool use” is:

- Is the environment that the tools are run from sandboxed?

I’m unclear on when/how/why you’d want an LLM executing code on your machine or in a non-sandboxed environment.

Anyone care to enlighten?


The llm just returns a method name and arguments to pass it. Your code is in charge of actually executing it, and then replying with an answer.


Well often the code at the end of the day just reads data from a database or processes it in some way that relies on moving bits around / operations that the LLM on its own cannot do.

IMO Tool is a bad word for the majority of the use cases ("calculator", "weather API"). It's more like giving the LLM an old school calculator + a constrained data retriever.

Because you or somebody you entrust knows every line of code in the functions ultimately called at a high-ish level, you can do it, and know it is only really receiving data, not taking arbitrary action.

Now letting it rampantly run a python process arbitrarily etc, that'd be different, I suppose that fits in. But I think this is largely NOT how people are using tools since if you do that, how do you ever usefully know how to get the output of running it and apply that output?


It's "function calling" that's the even worst naming IMHO, as the point is that the LLM is not actually calling a function, but just proposes a function call... Who will out themselves as having come up with this confusion?


You can use it to feed extra context in, similar to RAG but allowing the LLM to "decide" what information it needs. I think it's mostly useful in situations where you want to add content that isn't semantically related, and wouldn't RAG well.

E.g. if I were making an AI that could suggest restaurants, I could just say "find a Mexican restaurant that makes Horchata", have it translate that to a tool call to get a list of restaurants and their menus, and then run inference on that list.

I also tinkered with a Magic: The Gathering AI that used tool calling to get the text and rulings for cards so that I could ask it rules questions (it worked poorly). It saves the user from having to remember some kind of markup to denote card names so I can pre-process the query.


> Is the environment that the tools are run from sandboxed?

It is up to the person who implements the tool to sandbox it as appropriate.

> I’m unclear on when/how/why you’d want an LLM executing code on your machine or in a non-sandboxed environment.

The LLM does not execute code on your computer. It returns the fact that the LLM would like to execute a tool with certain parameters. You should trust those parameters as much as you trust the prompt and the LLM itself. Which in practice probably ends up being "not much".

Good news is that in your tool implementation you can (and should) apply all the appropriate checks using regular coding practices. This is nothing new. We do this all the time with web requests. You can check if the prompt originates from an authenticated user, if they have the necessary permissions to do the action they are about to do. You can throttle the requests, you can check that the inputs are appropriate etc etc.

If the tool is side-effect free and there are no access restrictions you can just run it easy. For example imagine an LLM which can turn the household name of a plant to its latin name. You would have a "look_up_latin_name" tool which searches in a local database. You have to make sure to follow best practices to avoid an sql injection attack, but otherwise this should be easy.

Now imagine a more sensitive situation. A tool with difficult to undo side-effects, and strict access controls. For example launching an ICBM attack. You would create a "launch_nukes" tool, but the tool wouldn't just launch willy nilly. First of all it would check that the prompt arrived from directly the president. (how you do that is best discussed with your NSA rep in person) Then it would check that the parameter is one of the valid targets. But that is not enough yet. You want to make sure it is not the LLM hallucinating the action. So you would pop up a prompt directly on the UI to confirm the action. Something like "Looks like you want to destroy <target>. Do you want to proceed? <yes> <no>" And would only launch when the president clicks the yes.


It's up to the implementation to determine what running a tool actually means: "tool-use" means you can tell the LLM "you have these functions which take these options", and then it can output a magic stanza asking the code conversing with the LLM to invoke one of those functions with the given parameters.

You COULD do dangerous things, but it's not like the LLM is constructing code it runs on its own.


The given examples like checking weather or performing a nice clean mathematical operation seem more or less automatically safe. On the other hand, they talk about the ability to drive a web browser, which is decidedly less read-only and would also make me nervous.


My guess since programmer blog post writing (plus autism?) assumes “Everyone already knows everything about my project because I do!”

Is this to the effect of running a local LLM, that reads your prompt and then decides which correct/specialized LLM to hand it off to? If that is the case, isn’t it going to be a lot of latency to switch models back and forth as most people usually run the single largest model that will fit on their GPU?


No, this is a bit different. When GPT 4.o came out OpenAI also added new features that allow the models to perform actions. This allows you to do that, but locally.

The reason this is cool is because it allows you to integrate with things like Home Assistant, so you can ask your chat bot or whatever to actual take actions. "Hey bot, turn on the lights in the basement" as an example.


Nit pick, but function calling, which is essentially what Tools are(an earlier evolution), was released before GPT-4-o, in June 2023.

https://openai.com/index/function-calling-and-other-api-upda...


No that's a good call out, I got my timing a bit off there.


Llms are not a niche target and tool use is a major component. It's fair to say, as an author, Im assuming the reader has some comprehension- whether thats frm a widespread base of that the topic is only interesting to those with prior knowledge.

You wouldn't make this complaint against a JS framework blogposting about their new MVC features.

As an aside its actually incredible that these days we idly accuse people of being actual autists just because they didn't condescend to our level first.


> My guess since programmer blog post writing (plus autism?) assumes “Everyone already knows everything about my project because I do!”…

Really unnecessary and distasteful to speculate like this. Just ask your question if you don't understand something.

> Is this to the effect of running a local LLM,

Yes. That is what ollama does.

> that reads your prompt

Yes.

> and then decides which correct/specialized LLM to hand it off to?

No. It does not hand it to a correct/specialized LLM. (or in general that is not the interesting use case) It hands it off to a traditionally coded program. Something written without any AI in it. This traditionally coded program does some job for the AI agent and then returns a result to it. The AI agent using that can use the result to answer the prompt.

Imagine as an example a calculator. Imagine if you want the AI to answer the following prompt: "How much will I have to pay if I bought an apple $2 and two bananas ($3 each) and the sales tax is 3%?"

To answer that question the LLM has to perform three steps: 1; understand that the above text stands for (2+23)1.03 and 2; perform the arithmetic correctly. 3; format the answer in an appropriate way (For example "You will have to pay 8 dollars and a quarter for tax.")

You can try to train an LLM which does all 3 steps internally. It parses the input and outputs the output. But in general you will have a lot of trouble with that approach.

So instead that you train the LLM to parse the prompt and output something like "<calculator (2+23)1.03>" Then your UI intercepts this output from the LLM and recognises that it is asking for a tool to be used. In this case it is trying to use the "calculator" tool with the parameter "(2+23)1.03". So your UI doesn't display anything to the user but passes the "(2+23)1.03" to a traditionally coded binary/script. That script calculates the result using normally programmed logic. Then the UI prompts the LLM again this time the prompt contains the initial prompt text, the LLM's call for the tool, and the output of the tool. Now the LLM can just see the right response in front of it, and using the full context of the original prompt formats an answer.

What can a tool do? Anything really. It can open the pod bay door. It can reach out to a database. It can use a geo api to plan a route between two cities. It can read a wikipedia entry. It can write to a knowledge base. It can activate a nuclear bomb. Whatever is appropriate in your use case.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: