> When used with the prompt below, the honesty vector doesn't change the model's behavior—instead, it changes the model's judgment of someone else's behavior! This is the same honesty vector as before—generated by asking the model to act honest or untruthful! [...] How do you explain this?
Isn't the control vector just pushing text generation towards the concept of honesty/dishonesty? An LLM is 'just' a text generator, so you get added honesty/dishonesty irrespective of where in the bot/human conversation text generation is occuring?
I agree. More sophisticated model might have two or more to follow narrating different characters... Which kind of brings a concept of character slots into the dimension space
> When used with the prompt below, the honesty vector doesn't change the model's behavior—instead, it changes the model's judgment of someone else's behavior! This is the same honesty vector as before—generated by asking the model to act honest or untruthful! [...] How do you explain this?
Isn't the control vector just pushing text generation towards the concept of honesty/dishonesty? An LLM is 'just' a text generator, so you get added honesty/dishonesty irrespective of where in the bot/human conversation text generation is occuring?