> I think LLMs are getting better (well better trained) on dealing with basic ma...

sdesol · 2025-01-20T17:52:26 1737395546

I personally think getting LLMs to better deal with numbers will go a long way to making them more useful for different fields. I'm not an accountant, so I don't know how useful it would be. But being able to say, here are some numbers do this for scenario A and this for scenario B and so forth might be useful.

Having said that, I do think models that favours writing code and using a "LLM interpretation layer" may make the most sense for the next few (or more) years.

wat10000 · 2025-01-20T18:34:49 1737398089

Based on how humans operate, I’d say they should have a good “intuition” for approximate results, but use an external calculator for the exact numbers. Even if you can train it to be accurate, it’s going to be tremendously inefficient compared to calling out to some external service that can directly use the arithmetic hardware in the computer.

sdesol · 2025-01-20T18:50:03 1737399003

I agree and this thread got me thinking about how I can package WASM in my chat app to execute LLM generated code. I think a lot can be achieve today with a well constructed prompt. For example, the prompt can say, if you are asked to perform a task like calculating numbers, write a program in JavaScript that can be compiled to WASM and wait for the response before continuing.

wat10000 · 2025-01-20T19:02:29 1737399749

External tool use and general real-world integration seems to be really lacking currently. Maybe current models are still too limited, but it seems like they should be able to do much better if they weren’t effectively running in a little jar.

Philpax · 2025-01-20T18:56:09 1737399369

Don't really need WASM for that - have you tried Claude Artifacts?

sdesol · 2025-01-20T19:20:47 1737400847

I am thinking about making it more versatile. I think having a llm that can process wasm code can be extremely handy.

diggan · 2025-01-20T20:39:56 1737405596

If only we had a function in JavaScript that could execute JavaScript code directly, wouldn't need WASM then (assuming it's just you + assistant locally).

sdesol · 2025-01-20T21:06:52 1737407212

I think the easiest and safest is to create a docker image that can execute code and display everything in a iframe and pass data back and forth between the llm client and the execution server. I haven't looked at claude artifacts but I suspect that is how it works.

rat9988 · 2025-01-20T23:30:49 1737415849

I thought he was hinting on using eval.

sdesol · 2025-01-20T23:50:13 1737417013

To make the long story short, you can manipulate LLM responses (I want this for testing/cost reasons) in my chat app, so it's not safe to trust the LLM generated code. I guess I could make it possible to not execute any modified LLM responses.

However, if the chat app was designed to be used by one user, evaling would not be an issue.