Co-Author of the paper here. We don't know exactly why modern llms don't want to... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		danshapiro 9 days ago \| parent \| context \| favorite \| on: Superpowers: How I'm using coding agents in Octobe... Co-Author of the paper here. We don't know exactly why modern llms don't want to call you a jerk, or for that matter why persuasive techniques convince them otherwise. it's not a hard line like many of the guardrails. That said, I talked to Jesse about this, and I strongly suspect the same techniques will work for prompt conformance when the topic is something other than name calling.

make3 9 days ago | [–]

isn't that just instruction fine tuning and rlhf inducing style & deference? why is that surprising

diamond559 9 days ago | [–]

It's bc they are programmed to be agreeable and friendly so that you'll keep using them.

Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact