Hacker News new | past | comments | ask | show | jobs | submit login

> I think LLMs are getting better (well better trained) on dealing with basic math questions but you still need to help them

I feel like that's a fools errand. You could already in GPT3 days get the LLM to return JSON and make it call your own calculator, way more efficient way of dealing with it, than to get a language model to also be a "basic calculator" model.

Luckily, tools usage is easier than ever, and adding a `calc()` function ends up being really simple and precise way of letting the model focus on text+general tool usage instead of combining many different domains.

Add a tool for executing Python code, and suddenly it gets way broader capabilities, without having to retrain and refine the model itself.






I personally think getting LLMs to better deal with numbers will go a long way to making them more useful for different fields. I'm not an accountant, so I don't know how useful it would be. But being able to say, here are some numbers do this for scenario A and this for scenario B and so forth might be useful.

Having said that, I do think models that favours writing code and using a "LLM interpretation layer" may make the most sense for the next few (or more) years.


Based on how humans operate, I’d say they should have a good “intuition” for approximate results, but use an external calculator for the exact numbers. Even if you can train it to be accurate, it’s going to be tremendously inefficient compared to calling out to some external service that can directly use the arithmetic hardware in the computer.

I agree and this thread got me thinking about how I can package WASM in my chat app to execute LLM generated code. I think a lot can be achieve today with a well constructed prompt. For example, the prompt can say, if you are asked to perform a task like calculating numbers, write a program in JavaScript that can be compiled to WASM and wait for the response before continuing.

External tool use and general real-world integration seems to be really lacking currently. Maybe current models are still too limited, but it seems like they should be able to do much better if they weren’t effectively running in a little jar.

Don't really need WASM for that - have you tried Claude Artifacts?

I am thinking about making it more versatile. I think having a llm that can process wasm code can be extremely handy.

If only we had a function in JavaScript that could execute JavaScript code directly, wouldn't need WASM then (assuming it's just you + assistant locally).

I think the easiest and safest is to create a docker image that can execute code and display everything in a iframe and pass data back and forth between the llm client and the execution server. I haven't looked at claude artifacts but I suspect that is how it works.

I thought he was hinting on using eval.

To make the long story short, you can manipulate LLM responses (I want this for testing/cost reasons) in my chat app, so it's not safe to trust the LLM generated code. I guess I could make it possible to not execute any modified LLM responses.

However, if the chat app was designed to be used by one user, evaling would not be an issue.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: