agentsQ6to verify
METR 2025 — experienced open-source developers on familiar large repos are slower with AI coding tools than without
In a 2025 study by METR, experienced open-source developers working on large repositories they knew intimately were measurably slower completing tasks with AI coding tools than without — directly inverting the canonical 'AI makes developers faster' assumption in the high-expertise + high-context-specificity regime.
Task-completion time with-AI vs without-AI for experienced developers on familiar large open-source repositoriesExperienced developers were *slower* with AI tools (sign reversed from the controlled-task benchmark). Exact magnitude not extracted to verification.
- Sample
- Experienced developer cohort; exact N not extracted to verification.
- Methodology
- Within-subject or treatment/control study of experienced developers on large familiar repositories, with/without AI coding tools.
What this means
- The single result that most cleanly inverts the Peng et al. 55.8% benchmark — establishes that the 'AI helps' generalization breaks down in the high-expertise + high-context-specificity regime that describes most production engineering work.
- Maps directly onto the AHI institutional-economics reading: when asset specificity (here, repo-specific tacit knowledge) is high, AI generation does not compose well with the verification + integration work that production code requires.
- Critical counter-evidence for the encyclopedia's Part I §1.3 honesty register — without this, the methodology-gap argument leans too heavily on the controlled-task literature.
Source
(2025 study; exact title and URL to verify — referenced in AHI institutional-economics topic review)
METR (Model Evaluation & Threat Research) · METR research team · 2025 · peer-reviewed
Context
- What came before
- The 55.8% Copilot speedup (Peng et al. 2023) and the +14% NBER customer-support gain (Brynjolfsson et al. 2023) had established a 'AI substantially raises productivity' narrative. METR 2025 directly inverts the sign for the experienced-developer-on-familiar-repo case.
- What comes next
- Verify METR's exact title, URL, N, and effect-size estimate (this is the AHI review citation [19], but the precise publication is not given in the review's bibliography). Connect to the Stray two-year Copilot null and the AHI longitudinal-cognitive-effects review's 'measurement instrument matters' synthesis.
- Where this lands
- Encyclopedia Part I §1.3 (methodology gap), Part IV (product/operations — agentic coding limitations), Part V (research frontier — sign-inversion findings).