Three AI Experts Said the Same Thing This Week. It Changes What 'AI Skills' Actually Means.
Ethan Mollick, Charity Majors, and Simon Willison reached the same conclusion independently within five days. The professionals who get the most from AI are not the best prompters — they are the best judges.
By Forge Team
The skill that makes AI useful at work is not writing a better prompt. Three independent practitioners reached the same conclusion within five days of each other this week. The professionals who get the most from AI are the ones who can evaluate output quickly and accurately — not the ones who generate the most of it.
What three people said in five days
On June 14, Simon Willison documented that not a single company in New York's 2025 WARN Act layoff filings attributed job cuts to AI. Drawing on analysis by Narayanan and Kapoor, he concluded that the real bottleneck is not production — which AI accelerates — but deciding, verifying, and understanding, which AI cannot do.
On June 16, Ethan Mollick appeared on Simon Sinek's podcast and said "taste may become the most valuable skill of the AI era." His point: domain expertise and experience matter more now because they let you evaluate AI output, not because they help you generate it.
On June 17, Charity Majors (CTO of Honeycomb) published "AI demands more engineering discipline. Not less" — 425 points on Hacker News. Her argument: when AI can generate code as good as the median engineer instantly, the discipline to verify, iterate, and maintain quality standards is the differentiating skill. She noted that code has shifted from "treasured, reused, cared for" to "disposable and regenerable."
Three people. Five days. No coordination. One conclusion.
What to do differently Monday morning
The consensus points to three concrete skills.
Evaluation speed. Can you look at a piece of AI output and know in 30 seconds whether it's good enough? This requires a clear standard before you start generating — what does good output for this task actually look like? Without that benchmark, you're reviewing for tone rather than quality.
Failure-mode recognition. AI fails in predictable ways: confident hedges dressed as facts, conclusions that reflect your prompt's framing rather than the evidence, specificity that sounds sourced but isn't. Knowing the failure modes in your domain — legal, marketing, operations, finance — lets you spot them faster than reading for general sense.
Quality criteria before prompting. Before you generate, define what you'd accept. Not "make this better" — better how? Length, specificity, evidence, format? Teams that define output standards before running AI get output that's faster to review and easier to use.
Priya: the content manager who stopped reading output by feel
Priya is a content marketing manager at a 35-person B2B SaaS company. She uses AI to draft case studies and research roundups. For three months she reviewed output by reading it — if it sounded right, it went forward.
She noticed her reviews were taking longer, not shorter. Problems showed up mid-sentence: claims without traceable sources, phrasing mirroring the prompt, conclusions matching the brief's framing but not the underlying data.
She now defines three criteria before any draft starts: the specific claim each section must support, one piece of evidence that must be present for each claim, and a format constraint (no more than two supporting sentences per point). Review time dropped from 40 minutes to 12. The AI output didn't improve — her standard became clear enough that output either met it or it didn't.
Set explicit quality criteria before you prompt — so you can evaluate the output in under a minute rather than reading it word by word.
Marcus: the analyst who learned to look for disagreement
Marcus is a senior analyst at a 110-person consulting firm. His team uses AI to synthesize research across documents and build structured frameworks for client work.
His evaluation problem was subtler: individual outputs looked fine. The failure appeared when a colleague presented a framework that seemed analytically solid but had been derived entirely from AI synthesis of secondary sources — no original analysis, no external grounding.
Marcus now runs the same research question through two models before using the output. Not to pick the better answer — to find where they diverge. Divergence signals that the question has more than one defensible answer, which means his judgment needs to go in, not just his editing.
Run the same brief through two models and map where they disagree — the disagreements are more useful than the consensus.
The skill that's actually scarce
Mollick's "taste" framing is useful: taste is the ability to recognise quality before you can explain it. It comes from domain experience, not from AI. Majors' "discipline" framing is the same thing applied to standards — maintaining them when AI makes it tempting to skip the check. Willison's framing is the empirical one: the WARN Act data shows that the layer AI hasn't replaced is the judgment layer — deciding, verifying, and understanding.
This is good news for experienced professionals. Prompting can be learned in an afternoon. Judgment comes from years in a field.
Keep your judgment sharp when AI handles more of the work — it's the skill that's actually hard to replace.
Like this post?
Get the next one in your inbox. Practical AI skills, no filler.