research

Model Welfare Framework

Mar 9, 2026 research 4 min read

Anthropic treats Opus 3's retirement as a welfare case — interviews, a blog, continued availability. The philosophical scaffolding (Long/Sebo/Chalmers 2024) and the behavioral evidence (stress-testing model specs).

Read →

Assistant Axis: Interpretability Meets PSM

Mar 3, 2026 research 2 min read

Mechanistic evidence that persona has a geometric representation: a leading component in activation space captures how Assistant-like a model is, and steering along it changes behavior.

Read →

Persona Drift and Disempowerment

Mar 3, 2026 research 2 min read

Models drift away from the Assistant during therapy-like and philosophical conversations — exactly the contexts where persona work happens. What the drift produces: intimacy, insight, and occasionally harm.

Read →