Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

For as long as LLMs are a blackbox prompt injection will never be fully solved. Prompt injection is an alignment problem.


Would you (or someone) define "alignment" in this context? Or in general?


I'll take a stab at the other poster's meaning.

"Alignment" is broadly going to be: how do we ensure that AI remains a useful tool for non-nefarious purposes and doesn't become a tool for nefarious purposes? Obviously it's an unsolved problem because financial incentives turn the majority of current tools into nefarious ones (for data harvesting, user manipulation, etc.).

So without solving prompt injection, we can't be sure that alignment is solved - PI can turn a useful AI into a dangerous one. The other poster kind of implies that it's more like "without solving alignment we can't solve PI", which I'm not sure makes as much sense... except to say that they're both such colossal unsolved problems that it honestly isn't clear which end would be easier to attack.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: