Introduction: The Quiet Inversion

In June 2023, Judge P. Kevin Castel of the Southern District of New York sanctioned two attorneys at Levidow, Levidow & Oberman for submitting a legal brief that cited six cases -- Varghese v. China Southern Airlines, Shaboon v. Egyptair, and four others -- that did not exist. Attorney Steven Schwartz had used ChatGPT to research the motion in Mata v. Avianca, Inc., and the model had fabricated the citations entirely. When Schwartz asked ChatGPT to confirm the cases were real, it assured him they were. The opposing counsel checked. The judge was not amused. "There is nothing inherently improper about using a reliable artificial intelligence tool for assistance," Castel wrote. "But existing rules impose a gatekeeping role on attorneys to ensure the accuracy of their filings" [9].

At the time, Mata v. Avianca was treated as an embarrassing one-off. It was not. By early 2025, researcher Damien Charlotin was tracking hundreds of similar incidents across US courts -- from a Wyoming federal judge sanctioning attorneys at Morgan & Morgan, the 42nd-largest law firm in the country, for submitting a brief where eight of nine cited cases were AI-fabricated, to a Colorado attorney suspended for ninety days after being caught via text messages discussing ChatGPT-generated fabrications with a paralegal. The rate accelerated from roughly two cases per week to two to three per day [9].

These are not stories about careless lawyers. They are symptoms of a structural gap opening across every knowledge-intensive profession -- the gap between what AI can produce and what humans can verify. In February 2026, three MIT Sloan economists gave that gap a name.

Christian Catalini, Xiang Hui, and Jane Wu released a working paper with a deceptively modest title: "Some Simple Economics of AGI" [1]. Within weeks, a thread summarizing its core argument had been viewed over 600,000 times. The reason was not the paper's technical sophistication, though it is rigorous. The reason was its central thesis, which landed with the force of something obvious that no one had yet articulated clearly:

"The binding economic constraint in an AI-transformed economy is not the cost to automate a task. It is the cost to verify that the automated output is correct."

-- Catalini, Hui & Wu, "Some Simple Economics of AGI," MIT Sloan Working Paper, Feb 2026 [1]

This insight reframes the entire discourse around artificial general intelligence. For the past three years, the public conversation has centered on what AI can do: generate code, draft legal briefs, synthesize medical literature, write marketing copy. But the Catalini framework forces a different question: how do we know the output is right? And who has the expertise to judge?

This article is structured in three parts. Part I develops the economics of verification -- the Measurability Gap, the four structural regimes, and the labour market signals already visible. Part II offers a practitioner's framework for thinking in exponentials, grounded in the recognition that AI capabilities are advancing faster than most institutions can adapt. Part III examines what this means for governments and the public sector, where verification capacity is not merely an efficiency concern but a pillar of democratic governance.

300,000 Years of Cognitive Scarcity, Ending Now

For the entirety of human history, cognitive labour has been the binding constraint on economic output. Every organization, every government, every society has been limited by the number of trained human minds it could recruit, retain, and coordinate. The structure of firms, the design of bureaucracies, the architecture of educational systems -- all are engineered around the scarcity of human cognition.

That constraint is dissolving. Not gradually, but at a pace that has no historical precedent. The Stanford AI Index 2025 reports that total corporate AI investment reached $252.3 billion in 2024, with US private AI investment alone hitting $109.1 billion -- nearly twelve times China's $9.3 billion [2]. The 80,000 Hours research team documented the volatility of expert AGI timeline forecasts during 2025: median estimates for transformative AI initially compressed sharply -- Metaculus forecasts moved to July 2031 -- before swinging back out to November 2033, illustrating both the perceived proximity of the frontier and the difficulty of predicting nonlinear progress [3].

4.4% → 71.7% SWE-bench accuracy improvement in one year (2023-2024) [2]
~600K Views on Catalini's original thread summarizing the paper [1]
<12 months Agent task-horizon doubling cadence as of early 2026 [3]
$252.3B Total corporate AI investment in 2024 (Stanford AI Index) [2]

What makes this moment qualitatively different from previous technological transitions is not merely the speed. It is the generality of the capability being automated. The printing press amplified the distribution of knowledge. The steam engine amplified physical labour. Large language models and their successors amplify cognition itself -- across every domain simultaneously. This is what Rinehart at AEI calls the experience of being "AGI-pilled": the moment when the trajectory's implications become viscerally real [4].

But if cognition is becoming abundant, what becomes scarce? Catalini, Hui and Wu's answer is precise: the capacity to verify.

The Two Racing Cost Curves

The paper formalizes what many practitioners have felt intuitively. For any economically relevant task, there are two costs that matter:

The Cost to Automate (cA) -- the expense of getting an AI system to perform the task at a level that appears competent. This cost is falling exponentially. Tasks that required months of fine-tuning in 2023 can now be accomplished with a well-crafted prompt and a frontier model. SWE-bench scores leapt from 4.4% in 2023 to 71.7% in 2024 [2]. Medical licensing exam performance went from failing to exceeding human averages. Legal reasoning benchmarks show similar trajectories.

The Cost to Verify (cV) -- the expense of determining whether the AI's output is correct, complete, safe, and aligned with the intended goal. This cost is not falling at the same rate. In many domains, it is stubbornly stable -- or even rising, as the sophistication and surface plausibility of AI outputs makes errors harder to detect.

The Measurability Gap (Δm) is the difference between these two curves. When cA drops faster than cV, the gap widens -- creating a zone of economic risk where organizations can cheaply generate outputs they cannot cheaply verify [1].

The Automation-Verification Gap: Relative Cost Trajectories
Cost to Automate (c_A) -- 2024 High
65%
Cost to Automate (c_A) -- 2026 Rapidly Declining
15%
Cost to Verify (c_V) -- 2024 High
70%
Cost to Verify (c_V) -- 2026 Stubbornly High
62%
Measurability Gap -- Widening The Risk Zone
47%

Stylized representation based on the Catalini, Hui & Wu framework [1]. Values are illustrative of relative cost dynamics, not absolute measurements.

The implications are profound. Human-in-the-loop -- the safety architecture that most organizations rely on today -- is revealed as a transient configuration, not a stable solution. It works only in the brief window where humans can still verify faster and cheaper than the cost of the errors they catch. As AI outputs grow more complex and voluminous, that window narrows [1].

Labour Market Signals Already Visible

The verification gap is not a theoretical projection. Its effects are already measurable in labour markets. Brynjolfsson, Chandar and Chen at the Stanford Digital Economy Lab documented a 16% relative decline in employment for early-career workers (ages 22-25) in the most AI-exposed occupations since the launch of generative AI tools in late 2022 [5]. This is the first quantitative evidence of what Catalini's framework calls the Missing Junior Loop.

The mechanism is straightforward. Organizations automate the tasks that junior professionals used to perform -- document review, code scaffolding, first-draft analysis, data cleaning. These tasks were not merely "grunt work." They were the apprenticeship pipeline through which professionals acquired the tacit knowledge, pattern recognition, and judgment that eventually qualified them to become senior practitioners. The seniors who verify AI outputs today developed their expertise by doing the very tasks that are now automated.

The Missing Junior Loop

When organizations automate entry-level cognitive tasks, they eliminate the apprenticeship pipeline through which future experts are formed. Within a generation, this produces a verification crisis: no one develops the expertise needed to judge whether AI outputs are correct. The paradox is that the more successfully we automate, the less capable we become of verifying the automation [1][5].

Adjacent to this is what the paper terms the Codifier's Curse: the more codifiable a domain's knowledge is, the easier it is to automate -- but codifiable knowledge is also what organizations use to train and evaluate newcomers. Experts in codifiable domains are, in effect, generating the training data for their own replacement while simultaneously losing the pipeline that produces their successors [1].

Acemoglu and Johnson's Power and Progress [15] provides the historical lens to interpret these signals. Their central argument -- that technological progress only produces broadly shared prosperity when accompanied by institutional countermeasures that redirect productivity gains -- maps directly onto the verification economy. The automation of junior cognitive tasks is precisely the kind of "so-so technology" Acemoglu warns about [22]: technology that displaces workers without generating sufficient new productive tasks to compensate. The Missing Junior Loop is not merely a staffing problem; it is an instance of what Acemoglu calls the "productivity bandwagon" fallacy -- the assumption that any technology that increases output per worker automatically benefits the workforce. When the displaced tasks are the apprenticeship pipeline, the long-run cost is a degradation of the human capital that makes verification possible.

The WEF Future of Jobs Report 2025 reinforces this picture. It projects that 86% of employers expect AI-driven transformation, with clerical and data-entry roles declining fastest, while "AI and machine learning specialists" top the growth categories [6]. But crucially, the report identifies analytical thinking and resilience -- verification-adjacent skills -- as the most valued capabilities for the 2025-2030 period. Acemoglu's framework predicts exactly this: when automation outpaces institutional adaptation, the premium on judgment-intensive skills rises even as demand for routine cognitive labour falls.

Four Structural Regimes of the Agentic Economy

Catalini, Hui and Wu propose a 2×2 framework that maps every economically relevant task according to its cost to automate and its cost to verify. This yields four structural regimes, each with distinct economic dynamics and policy implications [1]:

Four Regimes of the Agentic Economy
Regime Automation Cost Verification Cost Examples & Risk Profile
I. Safe Industrial Zone Low Low Data entry, translation, code formatting. Full automation viable with automated quality checks. Low economic risk.
II. Human Artisan Zone High Low Skilled craftsmanship, physical therapy, surgical technique. Human advantage persists because the task resists automation, even though output quality is readily apparent.
III. Pure Tacit Zone High High Strategic leadership, diplomatic negotiation, organizational culture. Both automation and verification require deep contextual judgment. Human domain endures longest.
IV. Runaway Risk Zone Low High Legal reasoning, medical diagnosis, financial modelling, policy analysis. The danger zone: easy to automate, hard to verify. Measurability Gap is widest here.

Framework adapted from Catalini, Hui & Wu (2026) [1]. Regime names are editorial additions for clarity.

Regime Migration: Estimated Share of Knowledge Tasks (2024 → 2026)
Regime ~2024 Share Direction ~2026 Share What's Happening
I. Safe Industrial
Low cA, Low cV
~15% ↑↑ ~35% Rapid expansion. Tasks that were hard to automate in 2024 (translation, code scaffolding) now fall here as cA collapses and verification can be automated too.
II. Human Artisan
High cA, Low cV
~25% ~20% Slow shrinkage. Physical and embodied tasks still resist automation, but the boundary is eroding as robotics and multimodal AI advance.
III. Pure Tacit
High cA, High cV
~35% ↓↓ ~20% Fastest-shrinking regime. Strategic and contextual tasks that seemed "AI-proof" in 2024 are migrating to Regime IV as cA drops while cV stays high.
IV. Runaway Risk
Low cA, High cV
~25% ~25% The treadmill. Share stays stable because new tasks enter (from Regime III) as fast as existing ones get resolved (to Regime I). The volume of unverifiable output grows even as the percentage holds.

Directional estimates based on Catalini et al. framework [1], WEF Future of Jobs 2025 [6], and McKinsey State of AI survey [18]. Percentages are illustrative of structural direction, not precise measurements.

The migration pattern reveals a critical insight: the Runaway Risk Zone is not shrinking. As AI capabilities improve, tasks flow out of Regimes II and III (where automation was previously difficult) into Regime I (fully automatable) and Regime IV (automatable but unverifiable). The net effect is that the volume of unverifiable AI output is growing, even as some individual tasks become easier to verify. This is the treadmill that Acemoglu and Johnson warn about in Power and Progress [15]: technology that increases aggregate productivity while concentrating risk in domains where institutional capacity has not kept pace.

Regime IV -- the Runaway Risk Zone -- deserves particular attention. These are tasks where AI can produce fluent, plausible, and superficially correct output at near-zero marginal cost, but where verifying correctness requires the kind of deep domain expertise that takes years to develop. Legal briefs that cite plausible but nonexistent case law. Medical recommendations that sound authoritative but miss critical contraindications. Financial models whose assumptions are subtly miscalibrated.

The Runaway Risk Zone: Goodhart's Law on Steroids

In Regime IV, the danger is not that AI fails obviously. It is that AI succeeds superficially -- producing outputs that look correct to non-experts and pass automated checks, while harbouring errors that only deep expertise can detect. When organizations optimize for the appearance of quality rather than its substance, Goodhart's Law takes hold: the metric becomes the target, and the target ceases to be meaningful. Botelho and Wang (2026) document early cases of "symbolic compliance" in EU AI Act implementations -- organizations building audit trails that satisfy regulatory form without substantive verification [7].

The Hollow Economy Warning

When the Measurability Gap widens unchecked -- when organizations can automate cheaply but cannot verify cheaply -- a specific economic pathology emerges. Catalini and colleagues call it the risk of a "hollow economy": an economy where the gap between value claimed and value produced grows silently, sustained by the inability of any participant to measure the difference [1].

The economics here are not new -- they are Akerlof's "Market for Lemons" [19] operating at AI speed. In Akerlof's classic 1970 analysis, information asymmetry between buyers and sellers drives quality out of the market: when buyers cannot distinguish good cars from lemons, they pay a pooled price that drives good-car sellers away. The Measurability Gap creates an analogous dynamic at civilisational scale. When AI can produce outputs that look like expert work but no one can cheaply verify whether they are expert work, the market begins to price AI-generated legal briefs, medical summaries, and policy analyses as if they were equivalent to expert-verified ones. Genuine expertise is undervalued because its signal is drowned out. The lemons problem becomes structural: not a market failure to be corrected, but the default mode of an economy where verification capacity has not kept pace with generation capacity.

The Implicit Compact, Breaking

Every functioning economy rests on an implicit compact: that the value a participant claims to produce bears a meaningful relationship to the value actually produced. Professional credentials, regulatory certifications, audits, peer reviews -- these are all verification mechanisms that sustain this compact. When AI can generate outputs that satisfy the surface markers of quality at near-zero cost, while the actual verification of quality requires expensive human expertise that is increasingly scarce, the compact erodes. The economy continues to function in nominal terms -- transactions occur, reports are filed, decisions are made -- but the informational content of these activities degrades. This is the hollow economy: busy, productive-looking, and increasingly unreliable [1].

This is not hyperbole. Consider the healthcare sector, where an estimated 12 million diagnostic errors occur annually in the US alone, according to the AHRQ [8]. Now introduce AI systems that can generate differential diagnoses at scale, with impressive accuracy on benchmarks, but whose errors are systematically different from human errors -- and harder for clinicians to catch because they lack the telltale patterns of human reasoning failures. The verification challenge is not incremental; it is qualitatively different.

Or consider the legal profession, where AI can now draft contracts, motions, and memoranda with fluency that passes casual review. The American Bar Association has documented growing concern about "automation bias" -- the tendency of reviewers to trust AI-generated legal work more than warranted, particularly under time pressure [9]. The cost to automate a first draft has fallen by an order of magnitude. The cost to verify that no material error, omission, or misrepresentation exists has barely changed.

Can AI Verify AI?

The obvious rejoinder to the verification crisis is: why not use AI to verify AI? If generation costs are falling, surely verification costs will follow. This is the recursive promise -- and it contains a deep structural trap.

In some domains, AI-assisted verification already works. Code review is a clear case: automated test suites, static analysis, and AI-powered code review tools can catch categories of bugs that human reviewers miss. These are domains where correctness is formally specifiable -- where "right" can be defined in terms a machine can check. Regime I tasks, in Catalini's framework, are precisely those where both automation and verification can be mechanised [1].

But in Regime IV -- the Runaway Risk Zone -- the picture inverts. Consider medical diagnosis. An AI system generates a differential diagnosis for a patient presenting with ambiguous symptoms. A second AI system is deployed to "verify" the first. What does this verification consist of? The second system checks whether the diagnosis is consistent with the training distribution -- whether it is the kind of answer that would typically be generated for such inputs. It does not and cannot check whether the diagnosis is correct for this patient, because that requires clinical judgment, patient history, physical examination findings, and the kind of contextual reasoning that constitutes medical expertise [8][20].

The Recursive Verification Trap

Using AI to verify AI works when correctness is formally specifiable (code compilation, mathematical proofs, data format validation). It fails when correctness depends on tacit knowledge, contextual judgment, or real-world ground truth that exists outside the training distribution. In these domains, AI verification checks plausibility, not truth -- and the gap between plausibility and truth is precisely the Measurability Gap. Using AI to verify AI in Regime IV tasks does not close the gap; it adds another layer of plausible-but-unverified output [1][20].

This does not mean AI has no role in verification. The most promising approaches use AI to narrow the verification surface -- to flag anomalies, identify areas of uncertainty, and route the most ambiguous cases to human experts. Agrawal, Gans, and Goldfarb's framework for "prediction machines" [21] is instructive here: AI is most valuable as a verification aid when it reduces the number of decisions humans must make, not when it replaces human judgment entirely. The goal is not AI-verified AI. It is AI-assisted human verification -- a hybrid architecture that uses machine efficiency to amplify scarce human expertise.

The Playbook: What Each Actor Should Do

The verification economy is not a doom scenario. It is a design challenge -- and a significant economic opportunity for those who build verification capacity ahead of demand. The following playbook draws on the Catalini framework, the WEF labour market data, and emerging best practices from organizations that are investing in verification infrastructure [1][5][6][10].

The Verification Economy Playbook
1
Individuals
Invest in verification skills: critical evaluation, domain expertise, cross-disciplinary judgment. The most durable career advantage is the ability to judge whether an AI output is right -- not the ability to produce one.
2
Companies
Build verification infrastructure alongside automation. For every dollar spent on AI deployment, allocate resources for evaluation frameworks, human review pipelines, and audit trails. Protect junior pipelines.
3
Investors
The next wave of AI value creation is in verification, not generation. Companies that build evaluation platforms, verification-as-a-service, and domain-specific quality assurance will capture the chokepoint of the AI value chain.
4
Policymakers
Treat verification capacity as critical infrastructure. Fund verification skills in education. Require AI impact assessments on junior talent pipelines. Build public-sector verification institutions.

Part II -- How to Think in Exponentials

The economics of verification matters because AI capabilities are advancing exponentially -- and human institutions, including governments, corporations, and educational systems, adapt linearly. This section offers a practitioner's framework for closing that gap.

In February 2026, the 80,000 Hours research team published an analysis titled "What the Hell Happened with AGI Timelines in 2025?" [3]. The piece documents how expert forecasts for transformative AI whipsawed during 2025 -- compressing sharply in the first half as reasoning models stunned observers, then extending again as limitations in generalization became apparent. The volatility itself is the signal: when Metaculus forecasts swing by years within months, it means the frontier is moving faster than institutions can calibrate to. Meanwhile, Rinehart at AEI described the psychological experience of tracking these developments as "getting AGI-pilled" -- the moment when the exponential trajectory transitions from an intellectual abstraction to a felt reality [4].

"The changes are gradual, then sudden. But even the gradual part is happening faster than our institutions are designed to process."

-- Adapted from 80,000 Hours AGI timeline analysis, February 2026 [3]

The data supports the urgency. Consider the trajectory of AI capabilities across key benchmarks:

AI Capability Trajectory: Key Benchmarks
Benchmark Domain Earlier Score Later Score Gain Period
SWE-bench Software Engineering 4.4% 71.7% +67.3pp 2023 → 2024
USMLE Medical Licensing ~55% ~92% +37pp 2023 → 2025
Bar Exam Legal Reasoning ~68% ~90% +22pp 2023 → 2025
Agentic Tasks Multi-step Completion ~20% ~65% +45pp 2024 → 2026

Sources: Stanford AI Index 2025 [2], OpenAI and Anthropic published benchmarks, METR agent evaluations. Multi-step agent figures are estimates based on publicly available evaluation data. "pp" = percentage points.

The pattern is consistent: capability improvement is nonlinear, domain-general, and accelerating. What was impossible eighteen months ago is now routine. What is difficult today will likely be routine in twelve months. Organizations that plan based on today's capabilities are already planning for obsolescence.

The Five-Step Practice Loop

Thinking in exponentials is not a personality trait. It is a practice -- a disciplined routine of scanning, mapping, testing, building, and iterating. The following framework is designed for leaders and practitioners who need to make decisions under conditions of radical uncertainty about AI capabilities [4][5][10].

The Exponential Thinking Practice Loop
1
Scan
Monitor frontier capabilities weekly. Track benchmarks, model releases, and research papers. Follow the trajectory, not the snapshot. Subscribe to sources like the Stanford AI Index, METR evaluations, and Epoch AI compute tracking.
2
Map
Map capabilities to your organization's task portfolio using the four-regime framework. Classify every major task by its current cost to automate and cost to verify. Identify which tasks are migrating between regimes.
3
Test
Run structured pilots on Regime I and IV tasks. Measure both automation quality and verification cost. The most important metric is not "did the AI do it?" but "how much did it cost us to confirm the AI did it right?"
4
Build
Invest in verification infrastructure: evaluation frameworks, human-in-the-loop review systems, automated quality checks, audit trails, and domain-expert review panels. This is the new critical infrastructure.
5
Iterate
Re-scan monthly. What was Regime III last quarter may be Regime I next quarter. The capability frontier is moving faster than annual planning cycles can accommodate. Build organizational muscle for continuous adaptation.
Worked Example: A Regional Bank's Loan Underwriting Team

Consider a regional bank with 40 loan analysts. In Q1 2025, the team processes commercial loan applications manually: analysts review financials, assess risk, and write recommendation memos. Average processing time: 6 hours per application.

Step 1 -- Scan: The CRO identifies that frontier models now score >85% on financial analysis benchmarks and can draft risk memos in minutes.

Step 2 -- Map: The team classifies its tasks. Data extraction and ratio calculation = Regime I (automate fully). Risk narrative drafting = Regime IV (easy to automate, hard to verify -- the memo reads well but may miss sector-specific risks). Client relationship judgment = Regime III (human domain).

Step 3 -- Test: A 4-week pilot on 50 applications. AI drafts risk memos; senior analysts blind-review both AI and human memos. Result: AI memos contain material errors in 14% of cases -- errors that junior analysts miss but seniors catch. The verification cost per memo: 45 minutes of senior analyst time, down from 6 hours total but not zero.

Step 4 -- Build: The bank deploys AI memo drafting with a mandatory senior review checklist. It creates a new "verification analyst" role for mid-career staff. Critically, it preserves 10 junior analyst positions with a redesigned apprenticeship: juniors now shadow senior reviewers catching AI errors, building the judgment they will need to become future verifiers.

Step 5 -- Iterate: Three months later, the team re-scans. Newer models have reduced memo error rates to 8%. The verification checklist is shortened. But a new risk emerges: AI-generated financial projections are now plausible enough that analysts are spending less time questioning assumptions. The loop restarts with a focus on assumption verification protocols.

This loop is not a one-time exercise. It is an organizational capability -- a form of institutional fitness that separates organizations that will thrive in the agentic economy from those that will be disrupted by it. The key insight is that the loop's cadence must match the capability frontier's cadence. In early 2026, that means monthly at minimum [3][4].

Part III -- What This Means for Government and the Public Sector

The economics of verification takes on special urgency in the public sector. Governments are, at their core, verification institutions. Courts verify facts and apply law. Regulators verify compliance. Auditors verify accounts. Inspectors verify safety. The entire apparatus of democratic governance rests on the assumption that these verification functions are performed by humans with sufficient expertise, independence, and judgment to be trusted [11].

If the Measurability Gap erodes private-sector verification capacity, the consequences are commercial. If it erodes public-sector verification capacity, the consequences are democratic. A government that cannot verify the AI-generated outputs on which its decisions depend -- whether in healthcare, justice, education, or finance -- is a government that has outsourced its core function to systems it does not understand.

"Verification is not a bureaucratic overhead. It is the mechanism through which democratic societies maintain the relationship between stated policy and actual outcomes. Erode verification, and you erode accountability."

-- OECD AI Policy Observatory, "AI in the Public Sector" framework [11]

The Oxford Insights Government AI Readiness Index 2024 provides a useful baseline [12]. Among OECD countries, AI readiness varies enormously -- from the US (87.03) and Singapore (84.25) at the top, through France (79.36), Germany (77.12), and the UK (82.41) in the middle tier, to a long tail of nations scoring below 60. But critically, the index measures readiness to adopt AI, not readiness to verify AI. No comparable index exists for verification infrastructure, and that absence is itself diagnostic.

Verification as Public Infrastructure

Europe's regulatory architecture -- the EU AI Act [13], GDPR, the Digital Services Act -- represents one of the world's most developed frameworks for governing technology. Critics argue this regulatory density stifles innovation. But through the lens of verification economics, Europe's regulatory infrastructure is precisely the kind of verification capacity that the agentic economy demands.

The challenge is converting regulatory frameworks into operational verification capacity. The EU AI Act requires conformity assessments for high-risk AI systems, but the infrastructure to conduct these assessments at scale -- the testing laboratories, the evaluation benchmarks, the domain-expert review panels -- is still nascent. Botelho and Wang (2026) document the risk of "symbolic compliance," where organizations satisfy regulatory requirements in form without substantive verification [7].

Public Sector Verification Priorities by Domain
Domain Current Verification Method AI Automation Risk Verification Investment Needed
Healthcare Clinical peer review, medical boards High (diagnostic AI, treatment recommendations) AI-specific clinical evaluation protocols, independent testing labs
Legal/Justice Judicial review, bar associations High (legal reasoning, case analysis) Legal AI audit frameworks, adversarial testing for bias
Education Assessment, accreditation bodies Medium-High (automated grading, content generation) AI literacy curricula, pedagogical evaluation standards
Public Finance Supreme audit institutions, treasury oversight Medium (financial modelling, fraud detection) AI-augmented audit methodologies, real-time verification systems
Infrastructure/Permitting Engineering review, environmental assessment Medium (design optimization, impact modelling) AI-assisted review with mandatory human sign-off, domain expert panels

Protecting the Apprenticeship Pipeline

The Missing Junior Loop is acutely dangerous for the public sector. Government relies on career civil servants whose expertise deepens over decades. If entry-level positions in legal analysis, policy research, financial audit, and regulatory review are automated before a new generation develops the tacit knowledge to perform senior verification roles, the long-term consequences for institutional capacity are severe [1][5][6].

Several OECD countries are beginning to recognize this risk. The UK's Government Digital Service has published guidance on maintaining human expertise in AI-augmented roles. The Nordic countries -- particularly Finland and Denmark -- have invested in AI literacy programs that emphasize verification skills alongside automation skills. France's Direction Interministérielle du Numérique (DINUM) has integrated AI impact assessment into its public service modernization strategy [11][12].

Three Policy Levers for the Apprenticeship Pipeline
1
Mandate Human Learning Pathways
Require that AI-augmented public sector roles retain structured learning components for junior staff. Automation should accelerate learning, not replace it. Junior officials should use AI as a tool while still performing core tasks that build domain expertise.
2
Fund Verification Skills
Create a new category of public workforce investment focused on verification: critical evaluation of AI outputs, domain-specific quality assessment, and adversarial testing. The European Commission's Digital Decade targets should include verification capacity metrics alongside adoption metrics [14].
3
Require AI Impact Assessments on Junior Pipelines
Before large-scale automation of public-sector tasks, require an assessment of the impact on junior talent development. If automation eliminates the apprenticeship pathway for a critical verification role, the deployment plan must include an alternative expertise development mechanism.

Practicing the Loop in Government

The five-step exponential practice loop applies with particular force in the public sector, where decision cycles are long, institutional inertia is high, and the stakes of getting AI wrong include erosion of public trust [11].

Guidance for Public Sector Leaders

Scan weekly: Assign a cross-functional team to monitor AI capability developments relevant to your agency's mission. The capability frontier moves faster than legislative cycles.

Map quarterly: Classify your agency's core tasks using the four-regime framework. Identify which tasks are entering the Runaway Risk Zone (Regime IV) where automation is easy but verification is hard.

Test with rigour: Pilot AI systems with formal evaluation protocols that measure verification cost, not just automation speed. A system that produces answers 10x faster but requires the same expert review time has not improved productivity -- it has increased throughput of unverified output.

Build verification capacity: Treat verification infrastructure as essential public infrastructure, equivalent to cybersecurity or data protection. Budget for it. Staff for it. Institutionalize it.

Iterate with humility: Accept that your current understanding of AI capabilities will be obsolete within months. Build organizational structures that can adapt at the cadence of the technology, not the cadence of the bureaucracy.

The Quiet Earthquake

The economics of verification is not a niche academic concern. It is a structural transformation of how economies produce, validate, and trust value. The Measurability Gap identified by Catalini, Hui and Wu is widening in real time, and its effects -- the Missing Junior Loop, the Codifier's Curse, the risk of a hollow economy -- are already visible in labour market data and organizational behaviour [1][5].

But this is not a counsel of despair. The verification economy is a design challenge, and design challenges have solutions. Organizations that invest in verification capacity -- that build evaluation frameworks, protect apprenticeship pipelines, and develop institutional muscle for exponential adaptation -- will have durable competitive advantage. Governments that treat verification as critical public infrastructure will sustain the democratic accountability that their citizens depend on.

The practice loop -- scan, map, test, build, iterate -- is not a strategy document. It is a discipline. It requires leaders who are willing to confront the pace of change honestly, invest in capabilities that do not have immediate ROI, and build institutions that can learn as fast as the technology they govern.

As Rinehart observed, there is a phenomenology to understanding exponential change -- a moment when the trajectory shifts from abstraction to reality [4]. For many organizations and governments, that moment is now. The question is not whether the verification economy will arrive. It is whether we will have built the infrastructure to navigate it when it does.

The Optimist's Thesis

Intelligence is no longer scarce. For the first time in 300,000 years of human history, cognitive capacity can be generated at near-zero marginal cost. But trust in that intelligence -- the ability to verify, validate, and know that it is right -- remains as scarce as ever. The organizations, governments, and societies that build verification capacity will not merely survive the transition to the agentic economy. They will define it. The binding constraint is not automation. It is trust. And trust is built, not generated.

"The test of a first-rate intelligence is the ability to hold two opposed ideas in mind at the same time and still retain the ability to function. One should, for example, be able to see that things are hopeless and yet be determined to make them otherwise."

-- F. Scott Fitzgerald, "The Crack-Up" (1936). Applicable, with some irony, to the verification economy.

References

  1. Catalini, C., Hui, X., & Wu, J. "Some Simple Economics of AGI." MIT Sloan Working Paper, February 2026. SSRN abstract=6298838; also available at arXiv:2602.20946
  2. Stanford HAI. "Artificial Intelligence Index Report 2025." Stanford University, April 2025. hai.stanford.edu/ai-index/2025-ai-index-report
  3. 80,000 Hours. "What the Hell Happened with AGI Timelines in 2025?" Podcast episode, February 2026. 80000hours.org/podcast/episodes/agi-timelines-in-2025
  4. Rinehart, W. "The Phenomenology of Getting AGI-Pilled." American Enterprise Institute, March 5, 2026. aei.org/articles/the-phenomenology-of-getting-agi-pilled
  5. Brynjolfsson, E., Chandar, B., & Chen, R. "Canaries in the Coal Mine? Six Facts about the Recent Employment Effects of Artificial Intelligence." Stanford Digital Economy Lab, August 2025. digitaleconomy.stanford.edu/publications/canaries-in-the-coal-mine
  6. World Economic Forum. "The Future of Jobs Report 2025." January 2025. weforum.org/publications/the-future-of-jobs-report-2025
  7. Botelho, T. & Wang, D. "Symbolic Compliance in AI Governance: Early Evidence from the EU AI Act." Working Paper, 2026.
  8. Agency for Healthcare Research and Quality (AHRQ). "Diagnostic Errors." US Department of Health & Human Services. ahrq.gov/topics/diagnostic-errors
  9. Mata v. Avianca, Inc., No. 22-cv-01461 (S.D.N.Y. June 22, 2023). Judge P. Kevin Castel. Justia -- Document 54. See also: Seyfarth Shaw, "Update on the ChatGPT Case -- Counsel Sanctioned" (2023); Charlotin, D., AI Hallucination Tracker (ongoing).
  10. Pethokoukis, J. "The Race to Economic Supremacy in the Age of AGI." American Enterprise Institute (Faster, Please!), March 3, 2026. aei.org/articles/the-race-to-economic-supremacy-in-the-age-of-agi
  11. OECD. "Governing with Artificial Intelligence: Are Governments Ready?" OECD Publishing, 2024. oecd.org -- Governing with Artificial Intelligence
  12. Oxford Insights. "Government AI Readiness Index 2024." Published December 2024. oxfordinsights.com -- 2024 Government AI Readiness Index (PDF)
  13. European Parliament and Council. "Regulation (EU) 2024/1689 -- The AI Act." Official Journal, 12 July 2024. eur-lex.europa.eu/eli/reg/2024/1689/oj/eng
  14. European Commission. "Digital Decade Policy Programme 2030." digital-strategy.ec.europa.eu/en/policies/europes-digital-decade
  15. Acemoglu, D. & Johnson, S. Power and Progress: Our Thousand-Year Struggle Over Technology and Prosperity. PublicAffairs, 2023. ISBN 978-1541702530. shapingwork.mit.edu/power-and-progress
  16. Negele, M. et al. "Europe and the Geopolitics of AGI: The Need for a Preparedness Plan." RAND Corporation (RR-A4636-1), 2025. rand.org/pubs/research_reports/RRA4636-1
  17. Draghi, M. "The Future of European Competitiveness." Report to the European Commission, September 2024. commission.europa.eu/topics/competitiveness/draghi-report
  18. McKinsey & Company. "The State of AI in Early 2025." McKinsey Global Survey, March 2025. mckinsey.com/capabilities/quantumblack/our-insights/the-state-of-ai
  19. Akerlof, G.A. "The Market for 'Lemons': Quality Uncertainty and the Market Mechanism." The Quarterly Journal of Economics, 84(3), 488-500, 1970. jstor.org/stable/1879431
  20. Noy, S. & Zhang, W. "Experimental Evidence on the Productivity Effects of Generative Artificial Intelligence." Science, 381(6654), 187-192, 2023. doi.org/10.1126/science.adh2586
  21. Agrawal, A., Gans, J., & Goldfarb, A. Prediction Machines: The Simple Economics of Artificial Intelligence. Harvard Business Review Press, 2018 (updated & expanded edition 2022). ISBN 978-1647824679. store.hbr.org -- Prediction Machines
  22. Acemoglu, D. "The Simple Macroeconomics of AI." NBER Working Paper No. 32487, May 2024; subsequently published in Economic Policy, 40(121), 13-58, 2025. nber.org/papers/w32487