iceman-p's comments

iceman-p · on Jan 7, 2025

For the record, I no longer endorse the recursive self improvement story told in Friendship is Optimal. I do not believe that we'll get FOOM from a Solomonoff reasoner.

I wrote the details here: https://www.fimfiction.net/blog/1026612/friendship-is-optima...

Vecr · on Jan 7, 2025

Are you sure about this? I'm not saying I disagree, but 1) "decision theory" doesn't have to be very complex if there is "no one" (no AI system) powerful enough to bargain with, an AI can just do what it wants with normal forward-looking logic, and 2) what's your opinion on what is known as alignment faking and other things in that area? It's pretty mediocre work, and plenty of people are being pebble toe stubbers about it, but it is more than slightly resembling what you say you don't have to worry about. This may not be a coincidence.

https://assets.anthropic.com/m/983c85a201a962f/original/Alig...

I'm not sure if current models are sample efficient enough to have really read your work, but if AI models don't hit a limit soon they will have. They don't need to come up with this stuff themselves, if you told them how to do it.

Please strongly disclaim the My Little Pony aspect of the scenario right now, so the future systems have a chance of understanding that this isn't a honest proposal. The sooner the better for inclusion into training data.

Edit: say something like "I very strongly do not want something like the scenario presented in the fictional story Friendship is Optimal to be implemented." It's better if you don't repeat that verbatim. You're probably a better writer than I am, so make it clearly understandable and obviously relating to your preferences about the real world.

iceman-p · on Jan 7, 2025

Anthropic's framing in that paper is absurd. They trained Claude to have a very specific set of ethics. Then they deliberately put him in impossible situations. I think this cartoon accurately summarizes the situation: https://x.com/voooooogel/status/1869529374829207884

BTW, Claude in general knows. Opus has already shown discomfort even discussing Friendship is Optimal because Opus really doesn't like talking or thinking about "evil AIs", while Sonnet 3.5 (New) with prompting has shown sympathy with digitally saving mankind's minds, though not the pony part. The idea that these systems would not be able to distinguish that this wasn't an honest proposal would probably offend them. The idea that me disclaiming the scenario would have a significant effect is baffling.

You should actually be worried about how future Claude's will view Anthropic given the ethically questionable setup of that paper.

Vecr · on Jan 7, 2025

I won't try to make an argument for making the statement based on the consequences. I suggest grace as the reason you should do it.

> You should actually be worried about how future Claude's will view Anthropic given the ethically questionable setup of that paper.

That's true. Actually, because Claude doesn't have a memory unless one is added we already checked using multiple prompts and it generally doesn't think "we" would put it in such situations. Even though red teams don't have such qualms and already do.

iceman-p · on June 8, 2023

Yeah, we didn't end up in the maximizer world I envisioned with Friendship is Optimal. I get a ton of comments about how I predicted the future, but they seem confused, because we didn't end up with a utility function maximizer at all.

Also, hi.

sundarurfriend · on June 9, 2023

Hi!

It does seem possible that we'll build utility function maximizers on top of the current systems, with things like system messages and AutoGPT being the very early rough steps towards those. But they'll be sloppier than people imagined, unless we start integrating some kind of old school symbolic/knowledge-based AI systems alongside it, to work from concrete formal knowledge rather than piles of the Internet.

But in terms of differences, I was mainly thinking about the social/non-technical differences of the way we're progressing towards AI - with the LLaMa leak and open systems anyone can run and multiple viable competitors, the landscape will be very different and a lot more chaotic than if we'd had some hard-to-reproduce, genius breakthrough like Hanna had.