Assistant Axis: Interpretability Meets PSM
Mechanistic evidence that persona has a geometric representation: a leading component in activation space captures how Assistant-like a model is, and steering along it changes behavior.
Read →pieces exploring this theme. A constellation of thoughts connected by a shared thread.