Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
stainablesteel
on June 19, 2024
|
parent
|
context
|
favorite
| on:
Refusal in language models is mediated by a single...
if im understanding everything correctly the ablitation concept scouts the model for a similar concept to the "direction" described in this one, and it blocks it in order to "uncensor" the llm
Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: