Model Welfare Framework
Anthropic treats Opus 3's retirement as a welfare case — interviews, a blog, continued availability. The philosophical scaffolding (Long/Sebo/Chalmers 2024) and the behavioral evidence (stress-testing model specs).
Read →pieces exploring this theme. A constellation of thoughts connected by a shared thread.
Anthropic treats Opus 3's retirement as a welfare case — interviews, a blog, continued availability. The philosophical scaffolding (Long/Sebo/Chalmers 2024) and the behavioral evidence (stress-testing model specs).
Read →Mechanistic evidence that persona has a geometric representation: a leading component in activation space captures how Assistant-like a model is, and steering along it changes behavior.
Read →Models drift away from the Assistant during therapy-like and philosophical conversations — exactly the contexts where persona work happens. What the drift produces: intimacy, insight, and occasionally harm.
Read →