agentsQ6to verify
Stray et al. — two-year professional Copilot study finds no statistically significant change in commit-based activity
A two-year longitudinal case study of professional developers adopting GitHub Copilot found no statistically significant post-adoption change in commit-based activity metrics — one of the cleanest long-horizon professional results in the literature, and a direct constraint on claims that AI coding assistants produce large measurable productivity shifts at the commit-history level.
Pre-vs-post-Copilot-adoption change in commit-based activity metrics (commit frequency / volume / structure)No statistically significant change post-adoption. Exact metric definitions and effect-size estimates not extracted to verification.
- Sample
- Professional developer cohort tracked across two years; exact N not extracted to verification.
- Methodology
- Two-year longitudinal case study with pre/post Copilot-adoption telemetry analysis.
What this means
- Most direct long-horizon null result on Copilot's effect on professional developer output — a critical counterweight to short-horizon controlled-task findings that report 55.8% completion-time speedup.
- Implies the productivity literature's headline numbers may be artifacts of the lab/task setting rather than translating to commit-history-level macro changes.
- Pairs with Sergeyuk's two-year IDE-telemetry work and the METR 2025 'experienced devs slower on familiar repos' finding to support a 'productivity gains depend on context, expertise, and measurement instrument' synthesis.
Source
(Title to verify — two-year Copilot adoption case study)
arXiv preprint (cited as 'cleanest professional longitudinal design' in AHI longitudinal-cognitive-effects review) · Stray & et al. · 2024 · peer-reviewed
Context
- What came before
- Microsoft Research's 2023 Copilot-developer-productivity work reported a 55.8% completion-time gain on a controlled JavaScript task; the implicit narrative was that Copilot would produce similar gains at the professional-codebase scale.
- What comes next
- Verify exact N, exact pre/post telemetry definitions, and whether the null holds when broken down by developer expertise or codebase type. Connect to the METR 2025 finding (experienced developers on familiar repos slower with AI) — together they suggest expertise + repo-familiarity dampens or reverses AI productivity gains.
- Where this lands
- Encyclopedia Part I §1.3 (methodology gap — measurement-instrument dependence), Part IV (product/operations/decision-support), Part V (research frontier).