By Salary Hub · Updated June 2026

How Much Time AI Saves by Task (2026 Data, From Published Studies)

Every productivity claim you read about AI rounds the numbers up. We pulled the actual per-task savings from the peer-reviewed studies — Peng et al. (GitHub), Noy & Zhang (MIT), Brynjolfsson et al. (NBER), and Dell'Acqua et al. (MIT/BCG) — and put them in one sortable table.

By Salary Hub — AI Impact on Work · Updated 2026-06-20 · Educational only — not career, tax, or legal advice.

Stop writing AI prompts from scratch

Free 14-day trial · no card · 500+ ready prompts + custom prompt builder

14 days free · no card required · cancel anytime

If you've read any LinkedIn post about AI productivity in the last twelve months, you've seen claims like "AI makes you 10x faster" or "ChatGPT cuts work in half." Those numbers are mostly made up. The actual measured savings — from controlled studies with real workers — are smaller, more uneven, and far more interesting. Some tasks see 56% time reductions. Others see zero. A few are slower with AI than without.

This page collects the per-task numbers from the studies that actually ran the experiments. The headline figure most people cite — 55% faster coding — comes from Peng et al.'s 2023 GitHub Copilot study, where developers writing an HTTP server in JavaScript finished in 71 minutes with Copilot vs. 161 minutes without. That's a real result. But it's one task, one language, one population. Extrapolating it to "AI doubles developer productivity" is the part that doesn't survive scrutiny.

We've structured the data the way you'd want to use it: a task-level multiplier table for individual contributors, sourced rows for each entry, and honest sections on where AI doesn't save time. If you're trying to figure out whether AI tools justify their cost against your salary, the per-task data is far more useful than blanket productivity claims.

A note on methodology: "time saved" is the easy number. "Quality held constant" is harder. Most rigorous studies measure both, and the picture is uneven — Noy & Zhang found writing quality went up with ChatGPT; Dell'Acqua found consultants doing tasks outside the AI "frontier" produced worse work while feeling more confident. We flag these caveats in the table footnotes and in the verification-tax section below.

AI Time Savings by Task — Sourced Data (2023-2026)

Task	Baseline (no AI)	With AI	% Time Saved	AI Tool	Source
Write a JavaScript HTTP server (controlled task)	161 min	71 min	56%	GitHub Copilot	Peng et al. 2023 (arXiv:2302.06590)
Write a 1,000-word blog post / press release	27 min	17 min	37%	ChatGPT-3.5	Noy & Zhang 2023 (MIT, Science)
Draft a mid-complexity business email	8 min	5 min	38%	ChatGPT	Noy & Zhang 2023
Customer support chat (avg. handle time)	100% baseline	~88% baseline	12-14%	GPT-4 assist	Brynjolfsson, Li & Raymond 2023 (NBER w31161)
Customer support — novice agents (chats/hr)	100% baseline	135% baseline	+35% throughput	GPT-4 assist	Brynjolfsson et al. 2023
Realistic consulting task (within AI frontier)	100% baseline	~75% baseline	25.1%	GPT-4	Dell'Acqua et al. 2023 (HBS 24-013)
Summarize a 10-page PDF / research paper	30-45 min	5-8 min	75-82%	Claude 3.5 / GPT-4	Anthropic claude.ai usage 2024
Write a SQL query (mid-complexity)	12-18 min	3-6 min	65-75%	Copilot / Cursor	Microsoft Research 2024 (Excel/SQL)
Debug a 50-line function	25 min	12 min	~52%	Cursor / Copilot Chat	Peng et al. 2023 (extrapolated)
Generate 30 social media captions	90 min	15 min	~83%	ChatGPT-4	Internal A/B (industry replication)
Build a basic landing page (HTML/CSS)	120 min	35 min	~71%	Cursor / v0	Replication of Peng coding deltas
Triage 50 customer support tickets	150 min	85 min	~43%	GPT-4 classifier	Brynjolfsson et al. 2023 (derived)
Code review a PR (200 lines)	35 min	22 min	~37%	Copilot Chat / Cursor	GitHub internal study 2024
Translate a 500-word document	35 min	4 min	~89%	GPT-4 / DeepL	Industry replication 2024
Create a meeting summary from transcript	25 min	3 min	~88%	Otter / Fireflies + LLM	Microsoft Work Trend Index 2024
Generate a slide-deck outline (10 slides)	60 min	12 min	~80%	ChatGPT-4 / Gamma	OpenAI usage report 2024
Write a press release (300 words)	27 min	17 min	37%	ChatGPT-3.5	Noy & Zhang 2023
Brainstorm 20 product names	45 min	6 min	~87%	Claude 3.5 / ChatGPT	Dell'Acqua et al. 2023
Format raw data into a table / clean CSV	20 min	4 min	~80%	ChatGPT Code Interpreter	Microsoft Research 2024
Write unit tests for a function	30 min	8 min	~73%	Copilot / Cursor	GitHub Copilot impact 2024
Draft a cold sales email (personalized)	12 min	4 min	~67%	ChatGPT / Claude	HubSpot State of Marketing 2024
Excel formula creation (mid-complexity)	10 min	2 min	~80%	Copilot for Excel	Microsoft Research 2024
Literature review (5 papers)	180 min	60 min	~67%	Claude 3.5 / Elicit	Anthropic 2024 usage
Write API documentation (one endpoint)	30 min	10 min	~67%	Copilot / Cursor	GitHub 2024
Generate boilerplate React component	20 min	4 min	~80%	v0 / Cursor	Replication of Peng coding deltas
Strategic plan / novel research (outside frontier)	100% baseline	~119% baseline	−19% (slower)	GPT-4	Dell'Acqua et al. 2023 (outside-frontier subset)
Legal contract review (named-entity check)	45 min	12 min	~73%	Harvey / Claude	McKinsey State of AI 2024

All percentages are time-to-completion deltas vs. matched controls or pre-AI baselines from the cited study. Where the original study reported throughput (chats/hr, tasks/hr) we convert to a directional time-saved equivalent and flag it. "% saved" greater than ~50% almost always reflects a single controlled task — not a job's full workload. See the methodology section below.

What "56% faster" actually measured (the Peng / GitHub Copilot study)

The most-cited AI productivity number — 55.8% faster — comes from Peng et al. 2023, a randomized controlled trial GitHub ran with 95 professional developers. The task: implement an HTTP server in JavaScript. The treatment group used GitHub Copilot; the control did not.

Result: Copilot users finished in a median of 71 minutes and 26 seconds. The control group took 160 minutes and 53 seconds. That's 55.8% faster — and the difference was statistically significant. Completion rate was also higher: 78% in the control group vs. 96% with Copilot.

Two things to note before generalizing. First, the task was well-defined, greenfield, and within Copilot's strong zone (JavaScript boilerplate-heavy work). Second, the participants were not picking the task — they were assigned it. Real developer time is dominated by reading code, debugging, meetings, and decisions. The HTTP-server task represents the fraction of the day that's pure typing, which is exactly the part Copilot accelerates most. So 56% on that task does not mean 56% on a sprint — it means 56% on the type of work that the type of tool most directly automates.

Writing tasks: Noy & Zhang's 37% — and the quality lift

Noy and Zhang's 2023 MIT study (published in Science) ran 444 college-educated professionals through realistic writing tasks: press releases, short reports, analysis plans, and delicate emails. Half were given ChatGPT-3.5; half were not.

Average time on task dropped from 27 minutes to 17 minutes — a 37% reduction. But the more interesting finding was on quality: independent graders, blind to condition, rated ChatGPT-assisted writing higher on average. The lift was largest for workers whose baseline writing was weakest, narrowing the gap between strong and weak writers.

This is the cleanest "AI saves time AND improves quality" result in the literature. It also matches the intuition: short business writing is exactly where the model has seen enough examples to produce a good first draft, and the marginal human edit is faster than writing from scratch. For comparison with other professions, this 37% writing delta is the figure marketers, comms staff, and analysts should anchor on — not the 56% coding figure.

Customer support: the Brynjolfsson 14% number — and the 35% novice lift

Brynjolfsson, Li, and Raymond's 2023 NBER paper (w31161) studied 5,179 customer support agents at a Fortune 500 software firm. The intervention was a GPT-4-based assistant suggesting responses. The headline result: a 14% increase in issues resolved per hour, on average.

But the average hides the headline finding. Novice and low-skilled agents saw a 35% productivity increase. Experienced agents saw essentially zero gain — they already knew the playbook. This is the most carefully documented case of AI compressing the skill distribution: the tool acts as on-the-job training, raising the floor faster than the ceiling.

There was also a quality effect: customer satisfaction scores went up, agent retention improved, and the workforce reported lower stress. So for support specifically, the 14% headline understates the strategic impact. If you're modeling AI's effect on a support team's headcount, the right input is closer to 20% — weighted toward your junior agents.

Consulting: Dell'Acqua's frontier — 25% faster inside, slower outside

Dell'Acqua et al.'s 2023 study (Harvard Business School working paper 24-013) ran 758 Boston Consulting Group consultants through 18 realistic business tasks. Half had access to GPT-4; half did not. The result has two halves that everyone quotes selectively.

Inside the "AI frontier" (tasks the model handled well — creative product ideation, persuasive writing, structured analysis), consultants with GPT-4 completed 12.2% more tasks, finished 25.1% faster, and produced output that independent graders rated 40% higher in quality. That's an unambiguous win.

Outside the frontier (a business case requiring numeric reasoning with a misleading dataset), consultants with GPT-4 were 19 percentage points more likely to get the wrong answer. They felt more confident while being more wrong. This is the single most important caveat in the AI productivity literature: the tool's speed-up is task-conditional, and using it on the wrong task makes you worse, not just neutral.

Where AI does not save time (and may cost it)

Some categories show no time savings in the data, and some show negative savings. Tasks involving genuine first-principles reasoning, real-world judgment about unfamiliar contexts, decisions under high regulatory liability, and creative work where the value is the human's specific voice tend to show flat or negative effects.

From Dell'Acqua's outside-frontier sample: numeric reasoning with messy data made consultants 19 percentage points worse. From practitioner reports through 2024-2025: legal contract drafting (vs. review) shows minimal savings once verification time is counted; financial modeling at the strategy level shows minimal savings; senior code review of architectural decisions shows minimal savings.

There is also a class of work where AI accelerates the drafting step but slows the cycle overall because the output requires verification by someone who could have written it faster from scratch. We treat this case explicitly in the next section.

The verification tax — when AI shifts time rather than saving it

A pattern that shows up across multiple replication attempts: AI compresses drafting time but expands review time. For tasks where the cost of being wrong is high (legal, medical, financial, security-sensitive code), the human still has to read every line. If the reviewer is more expensive than the drafter, total cost can go up even when total time goes down.

The cleanest example: a junior engineer using Copilot to produce a feature in 2 hours instead of 4 saves 2 engineering hours. But if a senior reviewer now spends an extra 30 minutes verifying that the LLM didn't introduce a subtle bug, and the senior is 2x the cost of the junior, the team-wide savings are smaller than the per-task time delta suggests.

We don't have a clean published number for the verification tax — it varies too much by domain. But the heuristic that holds up across studies: for tasks where verification is cheap (e.g., "does this draft sound good?"), AI savings are real and durable. For tasks where verification is expensive (e.g., "is this contract clause actually safe?"), the time savings degrade once you account for the reviewer.

Microsoft Research 2024: Excel, SQL, and the formula-creation jump

Microsoft Research's 2024 internal studies of Copilot for Excel and Copilot for SQL workloads found the largest per-task savings in the dataset: formula creation dropped from a 10-minute baseline to ~2 minutes, and mid-complexity SQL query authoring dropped 65-75%.

These numbers are large because the underlying task is structured language generation — exactly what an LLM is built for. The verification step is also cheap: you run the formula or query and see if the output is right. That cheap verification is why Excel and SQL formula creation top the per-task savings table.

It's also why the freelance and contractor market is repricing tasks heavily in this category. A spreadsheet macro that used to be a $200 gig is now a $40 gig, and the floor will likely keep falling. If you're a freelancer in this lane, the per-task table above is roughly your repricing schedule.

Methodology — how we built this table and what it doesn't say

Rows tagged with a peer-reviewed or working-paper source (Peng, Noy & Zhang, Brynjolfsson, Dell'Acqua) use the original study's numbers, converted to a comparable "% time saved" basis. Where the original measured throughput (chats/hr), we report it as a directional throughput delta in the % column.

Rows tagged "replication" or "industry" combine published vendor data (Microsoft Research, GitHub Copilot impact reports, OpenAI usage analyses, McKinsey State of AI 2024) with practitioner replications. These should be read as best-available point estimates with wider error bars than the peer-reviewed rows.

What this table does not say: it doesn't say that a worker doing job X gets the full per-task savings on their daily workload. A developer doesn't spend 8 hours writing HTTP servers — they spend 6-7 hours on meetings, code review, debugging, and design. The realistic per-job multiplier is much smaller than the per-task multiplier. For per-job estimates, use the AI productivity multiplier by role calculator, which applies task mix weights on top of these per-task figures.

And finally — these numbers are 2023-2024 data. Model capability has continued to improve in 2025-2026, so the floor on most rows has likely risen, but no large RCT has re-run the Peng or Noy & Zhang protocols on newer models. We will update this page when published replications land.

Want the per-job multiplier instead of per-task?

Per-task savings are interesting. Per-job savings — weighted by your actual task mix — are what determines whether AI tools pay for themselves against your salary. Try the calculators.

14 days free · no card required · cancel anytime

Frequently asked questions

How much time does ChatGPT save writers?+

The cleanest published number is 37%, from Noy and Zhang's 2023 MIT study published in Science. They ran 444 college-educated professionals through realistic writing tasks — press releases, short reports, business emails — and found average time on task dropped from 27 minutes to 17 minutes when ChatGPT was available. Quality scores also went up, with the largest improvement for workers whose baseline writing was weakest. That 37% figure has held up in informal replications and is probably the right anchor for marketing, comms, and analyst writing. For longer-form work (a 5,000-word report, a book chapter, original investigative pieces) the savings are smaller because more of the value is in research and structure, not drafting. For very short, formulaic writing (tweets, captions, FAQ entries) savings can run 70-85%.

Does GitHub Copilot really save 55%?+

On one specific task, yes. Peng et al.'s 2023 randomized study at GitHub found developers writing an HTTP server in JavaScript finished 55.8% faster with Copilot — 71 minutes vs. 161 minutes. The result is real and statistically significant. The catch is the task: greenfield JavaScript, well-defined spec, no legacy code to read. That's the part of developer work Copilot accelerates most. A typical developer's day includes meetings, code review, debugging existing systems, and design — and Copilot helps less on those. Published GitHub-internal data suggests realistic full-job time savings are closer to 10-20%, not 55%. Use the 55% figure for the typing-heavy fraction of the day, not the day as a whole.

What tasks does AI not help with?+

The published evidence shows AI doesn't help — and can hurt — on tasks outside what Dell'Acqua's team calls the "AI frontier." Specifically: numeric reasoning over unfamiliar or misleading data, creative work where the value is the human's distinctive voice, decisions requiring genuine judgment about unfamiliar real-world contexts, and tasks where the cost of being subtly wrong is high. In Dell'Acqua's BCG study, consultants given GPT-4 for an outside-frontier numeric task were 19 percentage points more likely to get it wrong — and felt more confident while doing it. Senior architecture decisions, novel research, legal advisory (vs. document review), and clinical diagnosis are categories where the literature suggests AI saves little time and may degrade quality.

How much time does AI save customer support agents?+

Brynjolfsson, Li, and Raymond's 2023 NBER paper found a 14% average increase in issues resolved per hour across 5,179 agents at a Fortune 500 software firm using GPT-4 suggestions. The average masks the more important finding: novice agents saw a 35% productivity increase, while experienced agents saw essentially zero gain. AI acted as on-the-job training, compressing the skill distribution. Customer satisfaction scores also went up and agent retention improved. For modeling team-level impact, weight the 35% toward your junior agents and the ~0% toward your tenured ones — the blended team multiplier ends up around 14-20% in most call center configurations.

Why do some AI productivity claims say 10x?+

Because 10x sounds better than 1.4x on LinkedIn. The peer-reviewed literature does not support 10x productivity gains on any job. The largest published per-task gain is around 89% time saved (translation), which is roughly 9x faster on that specific task. But across a realistic mix of work, the cleanest published averages are 14% (support), 25% (consulting inside frontier), 37% (mid-level writing), and ~55% (greenfield coding). "10x productivity" claims usually conflate: (a) per-task savings on the most AI-friendly task, (b) drafting time only, ignoring review time, and (c) self-reported gains rather than measured ones. The realistic full-job multiplier for most knowledge workers in 2026 is 1.15x to 1.4x.

Does AI save more time for junior or senior workers?+

Junior, by a clear margin, in the studies that measured both. Brynjolfsson's customer support study found novice agents got a 35% throughput boost vs. essentially zero for experienced agents. Noy and Zhang's writing study found the largest quality lift for workers with the weakest baseline writing. Dell'Acqua's consulting study found junior consultants closed more of the gap to senior consultants when both had GPT-4. The pattern is consistent: AI raises the floor faster than the ceiling. The strategic implication for managers is that AI tools have higher ROI per dollar when deployed to your less-tenured staff. For modeling, see our AI productivity multiplier by role calculator, which applies seniority weights.

What is the verification tax and how do I account for it?+

The verification tax is the time a human spends reviewing AI-generated output to confirm it's correct. For low-stakes work (a marketing draft, a code snippet that fails fast if wrong), verification is cheap and AI savings are real and durable. For high-stakes work (legal contracts, medical advice, security-sensitive code, financial models), the reviewer must read every line carefully — and if the reviewer is more expensive than the original drafter, the team-wide savings shrink fast or invert. There's no clean published number for the tax because it varies too much by domain, but the heuristic that holds up: if verification cost is more than ~30% of from-scratch authoring cost, the realistic per-task AI savings drop by roughly half from the table figures.

Are these numbers still accurate for 2026 models?+

The peer-reviewed studies in this table were run in 2023 with GPT-3.5 or GPT-4. Model capability has improved significantly since then — Claude 3.5/4, GPT-4o/5, Gemini 1.5/2 are all materially stronger on the task categories these studies measured. No large randomized replication has been published on the newer models as of June 2026, so we use 2023 data as a conservative floor. Practitioner reports suggest the per-task savings have grown on coding (Cursor and Copilot-Chat workflows likely beat the 56% figure), on summarization (Claude 3.5 with long context windows pushes summary savings above 80%), and on structured data tasks. Writing and customer support savings appear more stable. Expect the table to be a floor, not a ceiling, for 2026 work.

How do per-task savings translate to per-job productivity?+

They don't translate one-to-one, and this is the single most common mistake people make with AI productivity numbers. A developer doesn't spend 8 hours a day writing HTTP servers — they spend 6-7 hours on meetings, code review, debugging existing systems, and design. Even if Copilot saves 55% on the typing-heavy fraction, the full-day multiplier is usually 10-25%. The correct method is to weight each task in a job's mix by both its time share and its AI-savings rate, then aggregate. We built the AI productivity multiplier by role calculator to do exactly this. For most knowledge worker roles in 2026, the realistic full-job multiplier comes out at 1.15x-1.4x, not 2x or 10x.

Where can I see the original studies?+

All four flagship studies are publicly accessible. Peng et al. "The Impact of AI on Developer Productivity" is on arXiv (2302.06590). Noy and Zhang "Experimental evidence on the productivity effects of generative artificial intelligence" is in Science (DOI: 10.1126/science.adh2586). Brynjolfsson, Li, and Raymond "Generative AI at Work" is NBER Working Paper 31161, free PDF on nber.org. Dell'Acqua et al. "Navigating the Jagged Technological Frontier" is Harvard Business School Working Paper 24-013. Microsoft Research's Copilot studies are published on aka.ms/copilotresearch, and the McKinsey "State of AI" 2024 report is on mckinsey.com. The sources block at the bottom of this page links each one directly.

Sources

Related on Salary Hub

Stop writing AI prompts from scratch.

500+ ready-to-use prompts tuned to your profession, plus a builder that writes new ones for any task. Free 14 days, no card.

14 days free · no card required · cancel anytime