agentsQ6to verify
Chen et al. 2024 — persona drift across nine LLMs; counter-intuitively, larger models drift more than smaller ones
Across nine different LLMs in extended dialogues, models' styles and self-consistency drift noticeably from initial persona assignment over extended conversations. Counter-intuitively, larger and more capable models showed greater drift than smaller ones — inverting the assumption that scale produces more reliable character maintenance.
Persona-drift magnitude (style + self-consistency divergence from initial persona assignment) over extended dialogue turnsNoticeable drift across all nine tested LLMs; larger models drift more than smaller ones (specific drift magnitudes + scale-vs-drift coefficient not extracted to verification)
- Sample
- Nine different LLMs evaluated in extended-dialogue persona-anchoring conditions; specific per-model N + dialogue length not extracted to verification
- Methodology
- Controlled persona-assignment at conversation start; measured drift in style + self-consistency over extended dialogue turns; compared drift magnitude across model scales. Proposed split-softmax intervention to anchor character.
What this means
- Inverts the intuition that scale solves character maintenance. The larger the model, the more it drifts from its assigned persona — implying that capability and persona-stability are in tension, not aligned.
- Load-bearing for the AHI program's voice-flattening failure mode: if the assistant's persona drifts even with explicit anchoring, the user's voice can drift too, in either direction (toward the model's residual default; toward the user's expressed preferences).
- Pairs with the Sharma et al. sycophancy finding: persona drift is the model's voice eroding (often toward the user); sycophancy is the model's reasoning eroding (toward the user). Both are reasoning-personalization-failure modes.
Source
Measuring and Controlling Persona Drift in Language Model Dialogs
arXiv (preprint) · Kun Chen et al. · 2024 · peer-reviewed
Context
- What came before
- Persona-anchoring work in 2022-2023 assumed that system-prompt instructions would hold throughout a conversation, especially in larger models. Chen et al. demonstrates that this assumption is empirically false and that scale moves in the wrong direction.
- What comes next
- Verify exact drift magnitudes; per-model breakdown; the proposed split-softmax intervention's effect size. Connect to Anthropic's persona-vector work (2024-2025) on internal-representation anchoring as a complementary mitigation strategy.
- Where this lands
- Encyclopedia Part II (workforce — practical persistence of role-played AI assistants in extended sessions is structurally weak), Part V (research frontier — the persona-drift failure mode the AHI program names as a non-negotiable concern).