peopleanalyst

Insight Cards · agents

agentsQ6to verify

Microsoft Research / GitHub 2023 — developers with Copilot complete a JavaScript task 55.8% faster than control

In a controlled-task experiment, developers with access to GitHub Copilot completed an HTTP-server JavaScript task 55.8% faster than developers in the no-Copilot control group — establishing the benchmark short-horizon controlled-task productivity number that is referenced in essentially every subsequent productivity discussion.

Task-completion time on a controlled HTTP-server-in-JavaScript task: Copilot-treatment vs no-Copilot-control55.8% faster (Copilot group vs control)
Sample
Controlled-task experiment; exact developer N not extracted to verification (the AHI review references but does not restate it).
Methodology
Randomized controlled experiment with developers assigned to Copilot or no-Copilot conditions; outcome was time-to-completion on a defined HTTP-server-in-JavaScript task.

What this means

  • The most-cited single number in the AI-coding productivity literature — sets the upper-bound expectation that subsequent longitudinal and naturalistic studies (Stray two-year null; METR 2025 experienced-devs-slower) systematically fail to replicate at the larger scale.
  • Important to surface alongside the Stray null + METR slowdown to make the 'depends on context + expertise + measurement instrument' point honestly.
  • Provides the institutional-economic baseline for the transaction-cost-compression argument — short-horizon controlled-task generation costs do fall substantially; the question is whether that translates into firm-level outcomes.

Source

The Impact of AI on Developer Productivity: Evidence from GitHub Copilot

Microsoft Research / GitHub · Sida Peng et al. · 2023 · peer-reviewed

Context

What came before
Pre-2023 Copilot-effectiveness discourse was largely qualitative / anecdotal. The 55.8% controlled-task result was the first definitive controlled-experiment number.
What comes next
Verify exact N (developers per condition), exact task design, and whether the experiment included any post-task comprehension probe. Pair with Song / Agarwal / Wen 2024 (+5.9% OSS contributions — much smaller field-setting effect) and Stray two-year null to triangulate the gap between controlled-task and naturalistic measurement.
Where this lands
Encyclopedia Part I §1.3 (methodology gap — controlled-task vs naturalistic measurement), Part IV (product/operations — AI coding agents).
← All insight cards