Using AI as an Answer Machine Makes You Worse at Your Job. The Research Is Specific.
Three studies published in the same week show that how you use AI determines whether it sharpens or erodes your judgment. Turkish students scored better with AI help and worse without it. BCG consultants followed AI errors they would normally have caught. Taipei students who used AI as a tutor gained six to nine extra months of learning.
By Forge Team
How you use AI determines what it does to your competence over time. That's not a general concern — three separate studies published in the same week found it directly. One showed consultants at a major firm following AI errors they would normally have caught. Another showed students who performed better on AI-assisted work and worse on independent work taken immediately after. A third showed a path to the opposite outcome.
Three studies, one pattern
Ethan Mollick published "Choosing to Stay Human" on May 26, synthesising three studies on AI and knowledge worker performance.
The first involved Turkish high-school students who used ChatGPT to complete homework assignments. They scored better on AI-assisted work — and noticeably worse on independent tests they sat without access to the model. The AI produced better outputs. Their ability to produce their own outputs had declined.
The second involved BCG consultants working with AI assistance. Overall, they performed well — until they hit problems where the model was wrong. On those problems, accuracy dropped from 84% to between 60 and 70 percent. The consultants had built a working habit of trusting AI output. When that output was wrong, they followed it anyway.
The third involved Taipei students who used AI in a different mode: the model explained reasoning, asked questions, and pushed back on student thinking rather than just supplying answers. Those students gained the equivalent of six to nine extra months of learning compared to a control group without AI access. Same tool. Very different outcome.
Mollick names the pattern in the first two studies "cognitive surrender" — the gradual outsourcing of reasoning to the model until the reasoning itself stops happening on your end.
The difference between the two modes
The practical line between tutor mode and answer-machine mode is one step: do you ask the model to explain before you move on?
Answer-machine mode: prompt → receive output → next task.
Tutor mode: prompt → ask "why is this right?" or "what's the counterargument?" → decide.
The BCG finding matters here. The consultants didn't stop thinking — they stopped catching errors the model made. Tutor mode is not about slowing down for its own sake. It is about staying in the loop long enough to catch what the model gets wrong, which it will.
Rania: the analysis problem
Rania is a senior analyst at a 45-person management consultancy. She uses Claude to synthesise client interview notes into themes before writing her analysis. For three months, the model's output was reliable enough that she'd scan the themes quickly and move on to writing. In a project review, her engagement director pointed out that two reported themes were variations of the same point — the model had separated them, and Rania had built her analysis on a distinction that wasn't there.
Her current workflow: after the model produces themes, she asks "which of these overlap or contradict each other?" She reviews the model's answer against her own read of the notes. The extra step takes five minutes. She has not had a duplicated-theme issue since.
Build the habits that keep your judgment working when AI is doing more of the analysis.
Joel: the accuracy problem
Joel is head of communications at a 200-person healthcare nonprofit. He drafts external statements using Claude — the model generates a first draft, he edits for tone, and the result goes to legal review. The process worked until a statement about a regulatory decision mischaracterised the scope of a reporting requirement. The model had been confident. The error survived his edit because he was checking for tone, not accuracy.
He now adds one prompt between draft and edit: "What is the most likely factual error in this draft?" He doesn't always act on what it surfaces. He always reads the flagged section against the source document. The critique step is the moment he stays responsible for what goes out under his name.
Run a structured critique of your AI output before it reaches anyone else — catch what the model missed.
The short-term trade
The Taipei result — six to nine extra months of learning available through a different interaction habit — doesn't require a different model or tool. It requires asking the model to show its reasoning rather than just produce its answer. The Turkish and BCG findings show the same relationship from the other direction: accept the output, skip the reasoning, and you've traded the learning for the output. That's a trade with obvious short-term benefits and specific long-term costs. The BCG consultants who followed the model into errors weren't lazy. They were efficient, right up until the moment the model was wrong.
Practice the full loop — draft, critique, refine — the cycle that keeps your judgment in the work.
Like this post?
Get the next one in your inbox. Practical AI skills, no filler.