Have you seen that implemented yet?

soulofmischief · 2025-06-12T04:35:02 1749702902

Oh hey Simon!

I independently landed on the same architecture in a prior startup before you published your dual LLM blog post, though unfortunately there's nothing left standing to show since that company experienced a hostile board takeover, the board squeezed me out of my CTO position in order to plant a yes man, pivoted to something I was against, and then recently shut down after failing to find product-market fit.

I still am interested in the architecture, have continued to play around with it in personal projects, and some other engineers I speak to have mentioned it before, so I think the idea is spreading although I haven't knowingly seen it in a popular product.

simonw · 2025-06-12T05:19:16 1749705556

That's awesome to hear! I was never sure if anyone had managed to get it working.

soulofmischief · 2025-06-12T07:09:34 1749712174

Not quite the same, but OpenAI is doing it in the opposite direction with their thinking models, hiding the reasoning step from the user and only providing a summarization. Maybe in the future, hosted agents have an airlock in both directions.

> ... in the future we may wish to monitor the chain of thought for signs of manipulating the user. However, for this to work the model must have freedom to express its thoughts in unaltered form, so we cannot train any policy compliance or user preferences onto the chain of thought. We also do not want to make an unaligned chain of thought directly visible to users.

> Therefore, after weighing multiple factors including user experience, competitive advantage, and the option to pursue the chain of thought monitoring, we have decided not to show the raw chains of thought to users.

Source: https://openai.com/index/learning-to-reason-with-llms/

tough · 2025-06-13T01:24:22 1749777862

Google Deepmind published a paper based on this too https://arxiv.org/abs/2503.18813

Emiledel · 2025-06-12T04:32:58 1749702778

I've shared a repo here with deterministic, policy driven routing of user inputs so as to operate with it without influencing agent decisions (though it's up to tool calls to take precautions with what they return) https://github.com/its-emile/memory-safe-agent The teams at owasp are great, join us !

soulofmischief · 2025-06-12T04:36:26 1749702986

I'm very curious how OWASP has been handling LLMs, any good write-ups? What's the best way to get involved?