Tuesday, July 15, 2025

The effect of AI tools on coding

Joel Becker et al., "Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity", METR 7/10/2025:

Absract: Despite widespread adoption, the impact of AI tools on software development in the wild remains understudied. We conduct a randomized controlled trial (RCT) to understand how AI tools at the February–June 2025 frontier affect the productivity of experienced open-source developers. 16 developers with moderate AI experience complete 246 tasks in mature projects on which they have an average of 5 years of prior experience. Each task is randomly assigned to allow or disallow usage of early-2025 AI tools. When AI tools are allowed, developers primarily use Cursor Pro, a popular code editor, and Claude 3.5/3.7 Sonnet. Before starting tasks, developers forecast that allowing AI will reduce completion time by 24%. After completing the study, developers estimate that allowing AI reduced completion time by 20%. Surprisingly, we find that allowing AI actually increases completion time by 19%—AI tooling slowed developers down. This slowdown also contradicts predictions from experts in economics (39% shorter) and ML (38% shorter). To understand this result, we collect and evaluate evidence for 20 properties of our setting that a priori could contribute to the observed slowdown effect—for example, the size and quality standards of projects, or prior developer experience with AI tooling. Although the influence of experimental artifacts cannot be entirely ruled out, the robustness of the slowdown effect across our analyses suggests it is unlikely to primarily be a function of our experimental design.

(See also this version…)

Posted at Language Log by Mark Liberman along with comments by Liberman and others. For example, one Rick Rubenstein said:

I have to admit I've been surprised that the ceiling for generative AI so far has turned out to be somewhere at the top edge of "hack" level. Until recently my hunch was that Doug Hofstadter was essentially right: the hard part was getting computers to match the level of not-especially-clever people; getting them from there to Mozart/Einstein/Shakespeare would just be a matter of degree.

But no, AI is proving more than capable of generating not-actually-good-but-not-laughably-bad output in all sorts of fields. We detect AI by its "slopness", not by its incompetence. It seems clear to me that there's little future for human hack illustrators, hack novelists, hack programmers, hack songwriters. AI is perfectly suited to creating the 90% part of Sturgeon's Law. But thus far I haven't seen anything that looks like it's cracked that top 10% — and tellingly, I don't really hear genAI's hypsters claiming it either.

No comments:

Post a Comment