peopleanalyst

Insight Cards · agents

agentsQ6to verify

Chen et al. 2024 — persona drift across nine LLMs; counter-intuitively, larger models drift more than smaller ones

Across nine different LLMs in extended dialogues, models' styles and self-consistency drift noticeably from initial persona assignment over extended conversations. Counter-intuitively, larger and more capable models showed greater drift than smaller ones — inverting the assumption that scale produces more reliable character maintenance.

Persona-drift magnitude (style + self-consistency divergence from initial persona assignment) over extended dialogue turnsNoticeable drift across all nine tested LLMs; larger models drift more than smaller ones (specific drift magnitudes + scale-vs-drift coefficient not extracted to verification)
Sample
Nine different LLMs evaluated in extended-dialogue persona-anchoring conditions; specific per-model N + dialogue length not extracted to verification
Methodology
Controlled persona-assignment at conversation start; measured drift in style + self-consistency over extended dialogue turns; compared drift magnitude across model scales. Proposed split-softmax intervention to anchor character.

What this means

  • Inverts the intuition that scale solves character maintenance. The larger the model, the more it drifts from its assigned persona — implying that capability and persona-stability are in tension, not aligned.
  • Load-bearing for the AHI program's voice-flattening failure mode: if the assistant's persona drifts even with explicit anchoring, the user's voice can drift too, in either direction (toward the model's residual default; toward the user's expressed preferences).
  • Pairs with the Sharma et al. sycophancy finding: persona drift is the model's voice eroding (often toward the user); sycophancy is the model's reasoning eroding (toward the user). Both are reasoning-personalization-failure modes.

Source

Measuring and Controlling Persona Drift in Language Model Dialogs

arXiv (preprint) · Kun Chen et al. · 2024 · peer-reviewed

Context

What came before
Persona-anchoring work in 2022-2023 assumed that system-prompt instructions would hold throughout a conversation, especially in larger models. Chen et al. demonstrates that this assumption is empirically false and that scale moves in the wrong direction.
What comes next
Verify exact drift magnitudes; per-model breakdown; the proposed split-softmax intervention's effect size. Connect to Anthropic's persona-vector work (2024-2025) on internal-representation anchoring as a complementary mitigation strategy.
Where this lands
Encyclopedia Part II (workforce — practical persistence of role-played AI assistants in extended sessions is structurally weak), Part V (research frontier — the persona-drift failure mode the AHI program names as a non-negotiable concern).
← All insight cards