if im understanding everything correctly the ablitation concept scouts the model...

		stainablesteel on June 19, 2024 \| parent \| context \| favorite \| on: Refusal in language models is mediated by a single... if im understanding everything correctly the ablitation concept scouts the model for a similar concept to the "direction" described in this one, and it blocks it in order to "uncensor" the llm