More

swax · 2025-08-11T23:21:30 1754954490

I’ve been using the same OneNote for like 20 years now. It is backed up, synced and available no matter what device I’m on. Hundreds of lists, notes, references, thoughts, all fully searchable and quick to access.

swax · 2025-08-05T17:42:34 1754415754

It's an unsettling feeling as what's more complicated - all the atoms and galaxies, trillions of life forms, the unimaginable distances of our universe OR a relatively simple world model that is our conscious experience and nothing else.

swax · 2025-07-19T17:28:26 1752946106

Throwing this out there, I have a command line driver for LLMs. Lots of little tricks in there to adapt the CLI to make it amiable for LLMs. Like interrupting a long running process periodically and asking the LLM if it wants to kill it or continue waiting. Also allowing the LLM to use and understand apps that use the alternate screen buffer (to some degree).

Overall I try to keep it as thin a wrapper as I can. The better the model, the less wrapper is needed. It's a good way to measure model competence. The code is here https://github.com/swax/NAISYS and context logs here for examples - https://test.naisys.org/logs/

I have agents built with it that do research on the web for content, run python scripts, update the database, maintain a website, etc.. all running through the CLI, if it calls APIs then it does it with curl. Example agent instructions here: https://github.com/swax/NAISYS/tree/main/agents/scdb/subagen...

skydhash · 2025-07-19T18:24:46 1752949486

The one thing that I always wonder is how varied are those interactions with an agent. My workflow is is enough of a routine that I just write scripts and create functions and aliases to improve ergonomics. Anything that have to do with interacting with the computer can be automated.

swax · 2025-07-19T20:28:04 1752956884

Yea a lot of this is experimental, I basically have plain text instructions per agent all talking to each other, coordinating and running an entire pipeline to do what would typically be hard coded. There’s definite pros and cons, a lot of unpredictability of course, but also resilience and flexibility in the ways they can work around unexpected errors.

DeepYogurt · 2025-07-20T01:50:15 1752976215

> It's a good way to measure model competence.

Can you elaborate?

swax · 2025-07-20T16:09:11 1753027751

Sure, so you tell the model, here's a command prompt, what do you type next? Ideally, it types commands, but a lesser model may just type what it's thinking which is invalid. You can give it an out with a 'comment' command, but some models will forget about that. The next biggest problem is fake output; it types not just 'cat file.txt' but the following command prompt and fake output for the file.

The biggest mark of intelligence is can it continue a project long-term over multiple contexts and sessions. Like, 'build me a whole website to do x', many AIs are very good at one-shotting something, but not continually working on the same thing, maintenance, and improvement. Basically, after that good one shot, the AI starts regressing, the project gets continually broken by changes, and the architecture becomes convoluted.

My plan is not to change NAISYS that much; I'm not going to continually add crutches and scaffolding to handhold the AI; it's the AI that needs to get better, and the AI has improved significantly since I mostly finished the project last year.

swax · 2025-06-20T23:17:43 1750461463

Thanks for trying it. If you click near the edge you should be able to drag it somewhere else.

swax · on Sept 29, 2024

I'm working on a Sketch Comedy Database website:

https://www.sketchtv.lol/

https://github.com/swax/SCDB

Just a fun little CRUD app built with Next.js, MUI, Prisma Postgres. I'm adding Halloween sketches now, if you know some good ones feel free to add them, or anything else :)

swax · on July 6, 2024

AI advancement is coming at us both ways - orders of magnitude more compute, with orders of magnitude more efficiency. Hyper exponential.

Dylan16807 · on July 6, 2024

The efficiency has not improved all that much, and when you multiply two exponential growths it's still exponential.

Though even when you add the efficiency improvements I think we're still lagging behind Moore's Law overall.

downboots · on July 6, 2024

I only hope it brings about more integration of our vast amounts of data instead of more generative inaccuracy

swax · on May 20, 2024

I have an open source project that is basically that (https://naisys.org/). From my testing it feels like AI is pretty close as it is to acting autonomously. Opus is noticeably more capable than GPT-4, and I don't see how next gen models won't be even more so.

These AIs are incredible when it comes to question/answer, but with simple planning they fall apart. I feel like it's something that could be trained for more specifically, but yea you quickly end up being in a situation where you are nervous to go to sleep with AI unsupervised working on some task.

They tend to go off on tangents very easily. Like one time it was building a web page, it tried testing the wrong URL, thought the web server was down, ripped through the server settings, then installed a new web server, before I shut it down. AI like computer programs work fast, screw up fast, and compound their errors fast.

mr_toad · on May 20, 2024

> They tend to go off on tangents very easily. Like one time it was building a web page, it tried testing the wrong URL, thought the web server was down, ripped through the server settings, then installed a new web server, before I shut it down.

At least it just decided to replace the web server, not itself. We could end up in a sorcerer’s apprentice scenario if an AI ever decides to train more AI.

swax · on May 20, 2024

And you just know people will create AI to do that deliberately anyway.

PKop · on May 20, 2024

> it feels like AI is pretty close as it is to acting autonomously

> with simple planning they fall apart

They are not remotely close to acting autonomously. Most don't even act well at all for much of anything but gimmicky text generation. This hype is so overblown.

swax · on May 20, 2024

The step changes in autonomy are very obvious and significant from gpt-3, -4, and to Opus. From my point of view given the kinds of dumb mistakes it makes, it's really just a matter of training and scaling. If I had access to fine tune or scale these models I would love to, but it's going to happen anyway.

Do you think these step changes in autonomy have stopped? Why?

ben_w · on May 20, 2024

> Do you think these step changes in autonomy have stopped? Why?

They feel like they are asymptotically approaching just a bit better quality than GPT-4.

Given every major lab except Meta is saying "this might be dangerous, can we all agree to go slow and have enforcement of that to work around the prisoner's dilemma?", this may be intentional.

On the other hand, because nobody really knows what "intelligence" is yet, we're only making architectural improvements by luck, and then scaling them up as far as possible before the money runs out.

Both are sufficient even in isolation.

nprateem · on May 20, 2024

But training just allows it to replicate what it's seen. It can't reason so I'm not surprised it goes down a rabbit hole.

It's the same when I have a conversation with it, then tell it to ignore something I said and it keeps referring to it. That part of the conversation seems to affect its probabilities somehow, throwing it off course.

nerdponx · on May 20, 2024

Right, that this can happen should be obvious from the transformer architecture.

The fact that these things work at all is amazing, and the fact that they can be RLHF'ed and prompt-engineered to current state of the art is even more amazing. But we will probably need more sophisticated systems to be able to build agents that resemble thinking creatures.

In particular, humans seem to have a much wider variety of "memory bank" than the current generation of LLM, which only has "learned parameters" and "context window".

swax · on May 20, 2024

Humans are also trained on what they’ve ‘seen’. What else is there? Idk if humans actually come up with ‘new’ ideas or just hallucinate on what they’ve experienced in combination with observation and experimental evidence. Humans also don’t do well ‘ignoring what’s been said’ either. Why is a human ‘predicting’ called reasoning, but an AI doing it is not?

nprateem · on May 22, 2024

Because a human can understand from first principles, while current AIs are lazy and don't unless pressed. See for example, suggesting creating bleach smoothies, etc.

ben_w · on May 20, 2024

> But training just allows it to replicate what it's seen.

Two steps deeper; even a mere Markov chain replicates the patterns rather than being limited to pure quotation of the source material, attention mechanisms do something more, something which at least superficially seems like reason.

Not, I'm told, actually Turing compete, but still much more than mere replication.

> It's the same when I have a conversation with it, then tell it to ignore something I said and it keeps referring to it. That part of the conversation seems to affect its probabilities somehow, throwing it off course.

Yeah, but I see that a lot in real humans, too. Have noticed others doing that since I was a kid myself.

Not that this makes the LLMs any better or less annoying when it happens :P

smallnamespace · on May 20, 2024

This might be a dumb question, but did you ever try having it introspect into its own execution log, or perhaps a summary of its log?

I also have a tendency to get side tracked and the only remedy was to force myself to occasionally pause what I'm doing and then reflect, usually during a long walk.

swax · on May 20, 2024

Yea, there's some logs here https://test.naisys.org/logs/

Inter-agent tasks is a fun one. Sometimes it works out, but a lot of the time they just end up going back and forth talking, expanding the scope endlessly, scheduling 'meetings' that will never happen, etc..

A lot of AI 'agent systems' right now add a ton of scaffolding to corral the AI towards success. The scaffolding is inversely proportional to the sophistication of the model. GPT-3 needs a ton, Opus needs a lot less.

Real autonomous AI you should just be able to give a command prompt and a task and it can do the rest. Managing it's own notes, tasks, goals, reports, etc.. Just like if any of us were given a command shell and task to complete.

Personally I think it's just a matter of the right training. I'm not sure if any of these AI benchmarks focus on autonomy, but if they did maybe the models would be better at autonomous tasks.

khimaros · on May 20, 2024

> Inter-agent tasks is a fun one. Sometimes it works out, but a lot of the time they just end up going back and forth talking, expanding the scope endlessly, scheduling 'meetings' that will never happen, etc..

sounds like "a straight shooter with upper management written all over it"

swax · on May 20, 2024

Sometimes I'll tell two agents very explicitly to share the work, "you work on this, the other should work on that." And one of the agents ends up delegating all their work to the other, constantly asking for updates, coming up with more dumb ideas to pile on to the other agent who doesn't have time to do anything productive given the flood of requests.

What we should do is train AI on self-help books like the '7 habits of highly productive people'. Let's see how many paperclips we get out of that.

nerdponx · on May 20, 2024

I suspect it's a matter of context: one or both agents forget that they're supposed to be delegating. ChatGPT's "memory" system for example is a workaround, but even then it loses track of details in long chats.

swax · on May 20, 2024

Opus seems to be much better at that. Probably why it’s so much more expensive. AI companies have to balance costs. I wonder if the public has even seen the most powerful, full fidelity models, or if they are too expensive to run.

nerdponx · on May 21, 2024

Right, but this is also a core limitation in the transformer architecture. You only have very short-term memory (context) and very long-term memory (fixed parameters). Real minds have a lot more flexibility in how they store and connect pieces of information. I suspect that further progress towards something AGI-like might require more "layers" of knowledge than just those two.

When I read a book, for example, I do not keep all of it in my short-term working memory, but I also don't entirely forget what I read at the beginning by the time I get to the end: it's something in between. More layered forms of memory would probably allow us to return to smaller context windows.

swax · on May 22, 2024

I mean we have contexts now so large it dwarfs human short term memory right?

And then in terms of reading a book, a model's training could be updated with the book, right?

swax · on March 12, 2024

I've been working on something similar, here's one of their same tests where the AI learns how to make a hidden text image.

https://www.youtube.com/watch?v=dHlv7Jl3SFI

The real problem is coherence (logic and consistency over time) which is what these wrappers try to address. I believe AI could probably be trained to be a lot more coherent out of the box.. working with minimal wrapping.. that is the AI I worry about.

swax · on March 11, 2024

Amalgamating the pieces is the test. LLMs really can't do it that well.

Is it context size, training, number of iterations, number of agents? That's what's being tried to improve the results.

swax · on March 11, 2024

Using my open source project NAISYS, which is a context friendly command shell wrapper, to compare different AI models in how well they can build a website from scratch

https://github.com/swax/NAISYS