> They shared the fondness for torture of accused criminals with the ancient Romans.
Well, with their "hostage justice"
using torture tactics to extract false confessions relied on in court and their actual prisons where you have to work 8 hours a day and aren't allowed to say a single word while you do, I guess modern day Japan still has a thing for torturing accused criminals.
Sure it can. You just have to bake into the reward function "if you do the wrong thing, people will stop using you, therefore you need to avoid the wrong thing".
Then you wind up at self-preservation and all the wholly shady shit that comes along with it.
I think the AI accountability problem is the crux of the "last-mile" problem in AI, and I don't think you can necessarily solve it without solving it in a way that produces results you don't want.
I don't want to get into a semantics argument but that's not accountability. That's just one more behavior prompt/indication but you can't fire the LLM. It might still do the wrong thing.
reply