• All 0
  • Body 0
  • From 0
  • Subject 0
  • Group 0
Apr 21, 2026 @ 2:38 AM

The Definitive Guide to the AI Lexicon v1.1

The Definitive Guide to the AI Lexicon

Version 1.1

Definitions, Explanations, History and Evolution Since ChatGPT (November 2022),

and One-Year Forward-Looking Predictions

Version 1.0 published April 20, 2026

Version 1.1 published April 21, 2026

A Multi-LLM Reconciliation Study

Reconciled from independent research by:

Claude Opus 4.7  ·  ChatGPT 5.4 Pro  ·  Gemini 3.1 Pro  ·  Grok 4.3

Version 1.1 reviewed by:

All four reconciliation models in their current published forms


 

Version 1.1 Revision Notice

This is Version 1.1, published April 21, 2026, twenty-four hours after the Version 1.0 release. The revision was prompted by substantive reviews from all four original reconciliation models (Claude Opus 4.7, ChatGPT 5.4 Pro, Gemini 3.1 Pro, Grok 4.3), each of whom read Version 1.0 in their current published forms and produced critiques. The full text of all four reviews is reproduced as an appendix to this edition so that readers can observe the effect of prompt-seeding on downstream interpretation directly rather than through summary.

The specific Version 1.1 changes are:

  • Factual corrections: Two dated errors in the chronological timeline appendix, identified by ChatGPT 5.4 Pro, have been corrected. The vibe-coding entry is now correctly placed in the 2025 section rather than referenced in the 2024 section. The Contextual Retrieval entries are now consistently dated September 19, 2024 (Anthropic's actual publication date) throughout the document. A full date-consistency pass was performed on the timeline appendix.
  • New section on priority claims and contested attributions: Per de-seeded Claude's recommendation, generalized per ChatGPT 5.4 Pro, a new section (Section 6) has been added that consolidates self-reported priority claims and genuinely contested attributions into a single labeled location. This includes the three Fenlon-related attributions (Value per Token, Patent Slop, agent-autonomy articulation) previously distributed across the narrative and methodology sections, and extends the three-part evidentiary test to several other coinage claims in the index that would benefit from the same clarity.
  • Endnotes for origin and coinage claims: Per ChatGPT 5.4 Pro's first recommendation, visible endnotes have been added for the most attribution-sensitive claims in the index and narrative sections. The endnote apparatus is not exhaustive — fully annotating 150-plus entries would double the document's length — but it covers every claim where a specific individual, company, or date is asserted as the originator of a term.
  • Revised reconciliation-frame paragraph: Per de-seeded Claude's critique, the methodology section now acknowledges that inter-model agreement on publicly-verifiable facts is closer to inter-annotator reliability than to independent triangulation, and reserves the stronger epistemic claim for the narrower set of cases where the four models genuinely diverged.
  • Reviews appendix: An appendix reproducing the four Version 1.0 reviews in full, with seeding status of each reviewer labeled, has been added. This is the single most substantive Version 1.1 addition. It allows the reader to observe empirically how prompt-seeding affected downstream interpretation, rather than relying on the document's own methodology claims to make that case.

Changes deliberately NOT made in Version 1.1, with reasoning:

  • The title was not changed. Three of four reviewers recommended a narrower title. The decision to keep "The Definitive Guide to the AI Lexicon" rests on the proposition that a document which has passed through four independent Deep Research passes, reconciliation, published-form review by all four original models, and versioned public revision — with its methodology and limitations transparently disclosed — earns the label. "Definitive" is a claim about epistemic rigor, not about completeness or permanence. Versioning and dating reinforce rather than undermine the label: a dated Version 1.1 of a definitive reference is not a contradiction but a lineage marker.
  • No worldview-sampling caveat was added. ChatGPT 5.4 Pro recommended acknowledging that the lexicon samples primarily from frontier-lab blogs, Willison/Karpathy/Lütke orbit discourse, VC essays, and startup operator language. This sampling choice is deliberate and load-bearing: the frontier-lab / builder / investor discourse layer IS the generative core of post-ChatGPT vocabulary. Sampling it rigorously is the document's job. Apologizing for that sampling would misrepresent what the lexicon is.

 

Section 1 — Methodology and Scope

Purpose

This guide traces the vocabulary that the artificial intelligence industry invented, borrowed, or repurposed between November 30, 2022 — the day OpenAI launched ChatGPT — and mid-April 2026. It is intended as the single reference a practitioner, reporter, investor, or historian can consult to find out what a term means, when it first appeared, who coined it, why it rose, why it is or is not still current, and where it is likely heading over the next twelve months.

The guide's audience is deliberately mixed: technical engineers, business executives, policy-adjacent professionals, and curious generalists. The writing favors precision over jargon-dumping. Where a term has insider nuance, the nuance is explained rather than assumed.

Multi-LLM Reconciliation Methodology

This document is a reconciliation. Four leading frontier models were independently asked to produce a comprehensive AI lexicon covering the same scope, with the same exclusions, and the same attribution standards. The four models were:

  • Claude Opus 4.7 (Anthropic) — the primary substantive spine of this document, gathering 464 independent sources across 1 hour 59 minutes of Deep Research.
  • ChatGPT 5.4 Pro (OpenAI) with extended reasoning — the strongest source-discipline contributor on second attempt, producing the most linked citations of any of the four.
  • Gemini 3.1 Pro (Google DeepMind) — the most institutionally dated attributions, including the cleanest account of the Frontier Model Forum's July 2023 formation.
  • Grok 4.3 (xAI) — the fastest response and the baseline independent voice.

Where the four models agree on a verifiable public fact, this document states the fact assertively. Where they disagree, the most credible reading is adopted and the disagreement is noted. Where the historical record is genuinely contested — which is the case for several cultural coinages, including "AI slop," "Patent slop," and "Value per Token" — the contest is reported transparently and the credible claimants are listed by date.

One honest caveat about the reconciliation methodology, added in Version 1.1 per de-seeded Claude's critique. When four models receive mostly-identical prompts and converge on publicly-verifiable facts, the agreement is closer to inter-annotator reliability than to independent triangulation. The four models were trained on substantially overlapping corpora, prompted by the same person using similar language, and evaluated against the same public record. Much of their convergence reflects the shared corpus rather than independent confirmation. The stronger epistemic weight — the kind that genuine triangulation would deliver — applies to the narrower set of cases where the four models diverged and forced a reconciliation call. Those are the cases reported in the Methodology Appendix under "Known Divergences Between Models." The rest of the agreement is valuable but should be read as citation cleanup and cross-validation of the public record, not as independent discovery.

Disclosure of Methodological Asymmetry

Intellectual honesty requires one specific disclosure. The four models did not receive identical prompts on three attribution-sensitive terms.

Two of the four models — Gemini 3.1 Pro and Grok 4.3 — received a research prompt that suggested Sean Fenlon (Symphony42, LinkedIn, July 4, 2023) and Dave Blundin as originators of "Value per Token" as a macro-economic business metric, and suggested Sean Fenlon as the coiner of "Patent Slop" (The Near Side #28, March 2, 2026). These attributions were treated by those two models as seeded facts to verify rather than to discover independently.

The other two models — ChatGPT 5.4 Pro (second run) and Claude Opus 4.7 — received a tightened prompt that removed the seeded attributions and asked for fully independent research on the origins of these terms. This de-seeded prompt also removed an earlier reference to Fenlon's September 11, 2024 X post articulating the agent-autonomy distinction. ChatGPT 5.4 Pro and Claude Opus 4.7 therefore reported whatever their independent research surfaced, with no priming toward any particular claimant.

This asymmetry is disclosed because the reconciliation results are more interesting with it explicit than without. The two de-seeded models independently surfaced different earliest-traced public coinages — ambient-code.ai (October 6, 2025) for Value per Token, Mark Summerfield's Patentology blog (December 3, 2025) for "AI slopplications" — while none of the four models surfaced Fenlon's July 4, 2023 VPT articulation or his March 2, 2026 Patent Slop coinage through open search. The reconciliation notes both findings, the way a rigorous survey reports both "earliest traced indexed public use" and "earliest known articulation," with transparent provenance for each.

For the twenty other narrative terms and the roughly one hundred fifty catalogued entries, the four models received identical prompts and the reconciliation is symmetric.

Explicit Exclusions

This report deliberately excludes the vocabulary of AI safety, alignment, ethics, and governance. That means no narrative treatment of alignment, RLHF framed as a safety mechanism, constitutional AI, red-teaming, jailbreaks, prompt-injection defense, AI-doom discourse, P(doom), responsible scaling policies, model specs, refusal behavior, or interpretability-as-safety. Those terms are real, important, and deserve rigorous treatment. They are omitted here, even as illustrations, to keep the scope of this document coherent and to reserve them for a separate forthcoming companion report.

The one partial exception is Recursive Self-Improvement, which appears in Section 3 under its capability-economic framing — physical-versus-software-speed constraints, bull case, bear case — rather than its safety framing. No safety-flavored discussion of RSI appears here.

The guide also excludes terms whose primary meaning is confined to hardware (H100, TPU-v5p, B200), to specific cloud products (Vertex AI, SageMaker, Bedrock as platforms rather than as words), and to purely research-internal vocabulary that has not crossed into general developer or business usage.

Era Tags

Every indexed term carries one of five era tags:

  • Pre-ChatGPT — coined or widely used before November 30, 2022. Included where the term remains in heavy current use.
  • Genesis — November 30, 2022 through December 2023. The ChatGPT-API-and-first-RAG-boom era.
  • Agent Wave — calendar year 2024. The year tool use, function calling, AutoGPT descendants, Devin, Operator, and the earliest agent frameworks went mainstream.
  • Context Era — January 2025 through January 2026. Bracketed by o1's inference-scaling thesis on one end and the harness-engineering articles on the other.
  • Harness Era — February 2026 onward. The current period, named for the "Agent = Model + Harness" settlement.

Currency and Limitations

Knowledge cutoffs vary across the four source models. Claude Opus 4.7 was current through April 2026 via live research; ChatGPT 5.4 Pro through roughly late March 2026; Gemini 3.1 Pro through early-to-mid 2026; Grok 4.3 through approximately February 2026. Where a claim depends on a post-March 2026 fact, only models with that coverage are cited. Dates after April 15, 2026 should be treated as provisional across all four models.

This is a Version 1.0 reference, not a final word. Forthcoming revisions will expand the index, refine the narrative sections, and produce the companion safety-and-ethics volume. Errors, omissions, and contested attributions should be reported for Version 1.1 consideration.

 

Section 2 — Comprehensive Term Index

The index is organized by category for readability, with roughly one hundred fifty entries. Each entry provides a concise definition, the first notable use or coinage (with date and attribution where recoverable), an era tag, a current-status marker (Dominant, Widely used, Niche, Declining, Obsolete, or Contested), and cross-references to related terms where the connection matters.

Pre-ChatGPT terms are included in the index when they remain in heavy current use; they are not narrated in Section 3 except where the narrative requires them.

A. Model Architecture and Training

Transformer. Neural network architecture based on self-attention. Introduced by Vaswani et al., "Attention Is All You Need," Google, June 12, 2017 (arXiv 1706.03762). Era: Pre-ChatGPT. Status: Dominant.

Foundation model. A large model trained on broad data at scale that can be adapted to many downstream tasks. Coined by Stanford CRFM (Bommasani et al., "On the Opportunities and Risks of Foundation Models," August 2021). Era: Pre-ChatGPT. Status: Widely used, though somewhat displaced in industry by frontier model.

Frontier model / Frontier lab. A model or lab operating at or beyond current state-of-the-art capability. Crystallized in July 2023 policy-industry discourse through the Frontier Model Forum announcement (July 26, 2023, Anthropic, Google, Microsoft, OpenAI) and the Anderljung et al. "Frontier AI Regulation" paper. Often mistakenly attributed to Stanford CRFM, which coined foundation model but not this term. Era: Genesis. Status: Dominant institutional vocabulary.

Large Language Model (LLM). A transformer trained on text at billions-to-trillions of parameters. Era: Pre-ChatGPT. Status: Dominant.

Small Language Model (SLM). Sub-ten-billion-parameter model, often distilled or domain-specialized. Popularized by Microsoft's Phi series (Phi-1, June 2023; Phi-3, April 2024). Era: Genesis. Status: Widely used.

Mixture of Experts (MoE). Architecture routing each token to a subset of expert sub-networks. Revived for LLMs by Google's GShard (2020) and Switch Transformer (Fedus et al., January 2021); mainstreamed by Mixtral-8x7B (Mistral, December 11, 2023) and by GPT-4 architecture leaks. Era: Pre-ChatGPT. Status: Dominant — almost every flagship 2024–2026 model is MoE.

Dense model. Counterpart to MoE: all parameters active per token. Era: Pre-ChatGPT. Status: Declining as a standalone label; used primarily in contrast to MoE.

State Space Model (SSM) / Mamba. Non-attention sequence architecture. Albert Gu and Tri Dao, "Mamba: Linear-Time Sequence Modeling with Selective State Spaces," December 1, 2023. Commercialized by Cartesia in its Sonic voice models. Era: Genesis. Status: Niche but important.

Diffusion model. Iterative denoising generator, dominant for image and video. Ho et al., "Denoising Diffusion Probabilistic Models," June 2020. Era: Pre-ChatGPT. Status: Dominant for media generation.

Scaling laws. Empirical relationships between loss, parameters, data, and compute. Kaplan et al. (OpenAI, January 2020); revised by Chinchilla (Hoffmann et al., DeepMind, March 2022). Era: Pre-ChatGPT. Status: Contested — the 2024 pivot to test-time compute arose precisely from the debate over whether pre-training scaling had plateaued.

Pre-training. Initial large-scale self-supervised training. Era: Pre-ChatGPT. Status: Dominant.

Post-training. Catch-all for supervised fine-tuning plus preference tuning plus reinforcement learning that happens after pre-training. Era: Genesis. Status: Dominant — replaced "fine-tuning" as the umbrella term during 2023.

Supervised Fine-Tuning (SFT). Training on curated instruction-response pairs. Era: Pre-ChatGPT. Status: Dominant.

RLHF (Reinforcement Learning from Human Feedback). Training a reward model from human preferences and optimizing the LLM against it. Christiano et al., 2017; applied to LLMs by OpenAI's InstructGPT (Ouyang et al., March 2022). Treated here only as a training technique, not its safety framing. Era: Pre-ChatGPT. Status: Widely used but partially displaced by DPO.

DPO (Direct Preference Optimization). Preference fine-tuning without a separate reward model. Rafailov et al., Stanford, May 29, 2023. Era: Genesis. Status: Dominant in open-weight post-training.

RLVR (Reinforcement Learning with Verifiable Rewards). Reinforcement learning on tasks with ground-truth checkers (math, code). Popularized by the DeepSeek-R1 technical report in January 2025 and subsequent reasoning-model work. Andrej Karpathy's 2025 year-in-review warned of "benchmaxxing via RLVR." Era: Context Era. Status: Dominant for reasoning-model training.

Synthetic data. Model-generated training data. Moved from suspect to strategic after Phi, Llama 3, and DeepSeek demonstrated strong performance on predominantly synthetic mixes. Era: Genesis. Status: Dominant.

Distillation. Training a smaller model to imitate a larger one. Era: Pre-ChatGPT. Status: Dominant.

LoRA (Low-Rank Adaptation). Parameter-efficient fine-tuning via low-rank matrices. Edward Hu and colleagues, Microsoft, arXiv June 17, 2021 (2106.09685). Era: Pre-ChatGPT. Status: Dominant.

QLoRA. LoRA on a quantized base model. Tim Dettmers and colleagues, May 2023. Era: Genesis. Status: Widely used.

PEFT (Parameter-Efficient Fine-Tuning). Umbrella term for techniques including LoRA, adapters, and prefix tuning; also Hugging Face's library of the same name. Era: Genesis. Status: Widely used.

Quantization. Reducing weight precision (INT8, INT4, FP8, FP4). Era: Pre-ChatGPT. Status: Dominant.

Speculative decoding. Using a small draft model to propose tokens that are verified by the large model. Leviathan et al., Google, November 2022. Era: Genesis. Status: Dominant inference trick.

Test-time compute. Spending more compute at inference than per-query at training, typically via search or long chain-of-thought. Associated with Noam Brown at OpenAI from 2024 onward. Era: Agent Wave. Status: Dominant — the core premise of reasoning models.

Chinchilla-optimal. Ratio of training tokens to parameters that minimizes loss per FLOP. DeepMind, March 2022. Era: Pre-ChatGPT. Status: Declining — over-trained small models are now the norm.

Tokenization / BPE. Subword segmentation (Byte-Pair Encoding). Era: Pre-ChatGPT. Status: Dominant.

Embedding. Vector representation of text or another modality. Era: Pre-ChatGPT. Status: Dominant.

Grokking. Sudden late-training generalization. Power et al., OpenAI, 2021. Era: Pre-ChatGPT. Status: Niche research term; popularly absorbed by xAI's Grok product.

B. Capability and Performance

Benchmark / Eval. Numerical test of model capability. Era: Pre-ChatGPT. Status: Dominant.

MMLU, GSM8K, HumanEval, MATH, GPQA, SWE-bench, ARC-AGI. The canonical benchmark ladder of 2023–2026. SWE-bench (Jimenez et al., Princeton, October 2023) and ARC-AGI (François Chollet, 2019; million-dollar prize announced June 2024) became the defining 2025–2026 yardsticks. Eras: various. Status: Dominant as reference points.

Benchmaxxing. Optimizing for benchmarks at the expense of real-world utility. X (formerly Twitter) AI-commentariat slang, widespread by 2024; formalized in Karpathy's December 2025 year-in-review. Era: Agent Wave. Status: Widely used.

Goodharted. Verb: benchmark gamed until it no longer measures the thing. Derived from Goodhart's Law. Era: Agent Wave. Status: Widely used.

Emergence / Emergent capability. Abrupt capability jumps with scale. Wei et al., "Emergent Abilities of Large Language Models," June 2022; partially debunked by Schaeffer et al., "Are Emergent Abilities a Mirage?" (April 2023). Era: Pre-ChatGPT/Genesis. Status: Contested.

Hallucination / Confabulation. Model-generated content that is fluent but false. "Hallucination" is older computer-vision and NLP terminology, popularized for LLMs through 2022–2023. "Confabulation" was pushed by Douglas Hofstadter, Geoffrey Hinton, and Gary Marcus as more accurate. Era: Pre-ChatGPT/Genesis. Status: Dominant (hallucination); Niche (confabulation).

Reasoning / Thinking model. Model trained to emit long internal chains of thought before answering. OpenAI o1-preview, September 12, 2024. Era: Agent Wave. Status: Dominant.

Chain of Thought (CoT). Prompting technique of eliciting step-by-step reasoning. Wei et al., Google Brain, arXiv January 28, 2022 (2201.11903). Era: Pre-ChatGPT. Status: Widely used, largely absorbed into reasoning models.

Tree of Thoughts, Graph of Thoughts, Self-Consistency. Search-over-reasoning variants. Yao et al. (Princeton/DeepMind), May 17, 2023; Besta et al., August 2023; Wang et al., March 2022. Era: Genesis. Status: Niche.

Needle in a Haystack (NIAH). Long-context retrieval evaluation. Introduced by Greg Kamradt in November 2023. Era: Genesis. Status: Dominant long-context benchmark.

Lost in the Middle. Finding that models retrieve information better at the start and end of long contexts than in the middle. Nelson Liu et al., Stanford and UC Berkeley, July 2023. Era: Genesis. Status: Widely used shorthand.

Pass@k. Fraction of problems solved in k attempts. Originated in HumanEval (Chen et al., OpenAI, July 2021). Era: Pre-ChatGPT. Status: Dominant.

ELO / Chatbot Arena. Head-to-head human-preference ranking. LMSYS Org, May 2023. Era: Genesis. Status: Dominant ranking mechanism.

Effective compute. Algorithmic efficiency gains expressed in compute-equivalent terms. Epoch AI, 2023. Era: Genesis. Status: Widely used.

Inference-time scaling. Synonym for test-time compute, emphasizing throughput. Era: Agent Wave. Status: Widely used.

Emergent tool-use. Ability of models to use tools without per-task training. Became standard through function calling (OpenAI, June 2023) and MCP (Anthropic, November 2024). Era: Genesis/Agent Wave. Status: Dominant.

C. Prompting, Context, and Orchestration

Prompt. Input text to an LLM. The word's consumer-facing mainstreaming occurred with the ChatGPT launch on November 30, 2022; in pre-ChatGPT NLP research it already meant "input to a generative model." Era: Pre-ChatGPT. Status: Dominant.

System prompt. Persistent instruction layer governing model behavior for a session or product. OpenAI introduced the system role in its Chat Completions API on March 1, 2023, separating product-builder instructions from user input. Era: Genesis. Status: Dominant.

Developer message. OpenAI's newer term for privileged application instructions in some APIs; functional descendant of the system message. Visible in OpenAI API and documentation transitions by late 2025. Era: Context Era. Status: Widely used.

Prompt engineering. Craft of writing prompts. Vertiginous rise 2022–2023; peak job title through 2024; depreciated by 2025. Era: Genesis. Status: Declining as a job title; niche as a standalone skill.

Context engineering. Discipline of filling the context window with the right information, tools, and history. Tobi Lütke coined the term publicly on June 18, 2025; Andrej Karpathy amplified it on June 25, 2025; Simon Willison canonized the shift on June 27, 2025. Era: Context Era. Status: Dominant.

Few-shot / in-context learning. Supplying examples in the prompt. GPT-3 paper, Brown et al., OpenAI, May 2020. Era: Pre-ChatGPT. Status: Dominant.

Zero-shot prompting. Instructing the model with no examples, relying only on instruction and prior training. Era: Pre-ChatGPT. Status: Widely used.

Role prompting / persona prompting. Telling the model to adopt a role ("You are a senior analyst…"). Era: Genesis. Status: Widely used; persona prompting is mildly declining in serious production contexts.

Context window. Number of tokens a model can attend over in one pass. Era: Pre-ChatGPT. Status: Dominant.

Long context. Performance beyond roughly 32K tokens. Vendor marketing race from 2023 onward (Claude 100K in May 2023; Gemini 1.5's 1M public preview in February 2024). Era: Agent Wave. Status: Dominant.

Prompt caching. Server-side caching of prefix KV states. Anthropic API feature, August 14, 2024; OpenAI followed in October 2024. Era: Agent Wave. Status: Dominant.

Contextual retrieval. Anthropic's September 19, 2024 term for retrieval that enriches document chunks with document-local context before indexing. Era: Agent Wave. Status: Niche but influential.

Context compression / summarization. Distilling conversation state so an agent can preserve utility while staying within token budgets. Era: Context Era. Status: Widely used.

Context poisoning / distraction / confusion / clash. Drew Breunig's 2025 taxonomy of context failure modes. Era: Context Era. Status: Widely used.

RAG (Retrieval-Augmented Generation). Retrieve documents, then generate. Lewis et al., Meta FAIR, arXiv May 22, 2020 (NeurIPS 2020). Era: Pre-ChatGPT origin; Genesis mainstreaming. Status: Dominant.

GraphRAG. RAG over a knowledge graph. Microsoft Research, July 2024. Era: Agent Wave. Status: Widely used.

Agentic RAG. Iterative, agent-driven retrieval. LangChain/LlamaIndex-blog-driven coinage in late 2024. Era: Agent Wave. Status: Widely used.

Post-RAG. Loose 2025 label for the idea that naive retrieve-then-answer pipelines are no longer sufficient. More slogan than settled architecture. Era: Context Era. Status: Contested.

Vector database / Embedding store. Specialized storage for similarity search over embeddings. Pinecone, Weaviate, Chroma, Qdrant, pgvector. Era: Pre-ChatGPT/Genesis. Status: Widely used, though significance softened as long context improved.

Chunking. Splitting documents into retrievable segments. Era: Genesis. Status: Widely used.

Reranking. Reordering retrieved documents so the best evidence reaches the model. Era: Genesis. Status: Widely used.

Hybrid search. Combining lexical retrieval with embedding-based retrieval. Era: Genesis. Status: Widely used.

Function calling / Tool calling / Tool use. Structured model output invoking external tools. OpenAI function calling, June 13, 2023; later generalized to "tool calling." Era: Genesis. Status: Dominant.

Structured output. Constraining output to a schema. OpenAI Structured Outputs, August 6, 2024. Era: Agent Wave. Status: Dominant.

JSON mode. Lighter guarantee that the model returns valid JSON. Ancestor of structured outputs. Era: Genesis. Status: Declining.

Guardrails. Deterministic filters around model calls. NVIDIA NeMo Guardrails, April 2023. Era: Genesis. Status: Widely used.

LangChain. Agent/orchestration framework. Harrison Chase, open-sourced October 17, 2022 — six weeks before ChatGPT. Era: Pre-ChatGPT origin; Genesis mainstreaming. Status: Widely used, partly displaced by harness.

LangGraph. Graph-based orchestration from LangChain. January 2024. Era: Agent Wave. Status: Widely used.

CrewAI, AutoGen, Semantic Kernel, Haystack, LlamaIndex. Orchestration frameworks across the ecosystem. AutoGen (Microsoft, August 2023), CrewAI (João Moura, late 2023), Semantic Kernel (Microsoft, March 2023). Era: Genesis/Agent Wave. Status: Widely used.

DSPy. Declarative prompt-program framework. Omar Khattab, Stanford, October 2023. Shopify CEO Tobi Lütke cited it as his context-engineering tool of choice on June 25, 2025. Era: Genesis. Status: Widely used.

AGENTS.md. Convention for agent instruction files at repository root. OpenAI Codex team, 2025–2026. Era: Harness Era. Status: Dominant.

MCP (Model Context Protocol). Open protocol for model-to-tool connections. Anthropic, November 25, 2024. Era: Agent Wave. Status: Dominant.

A2A (Agent-to-Agent) protocol. Google's complementary inter-agent protocol, 2025. Era: Context Era. Status: Widely used.

Tool registry / ToolShed. Stripe's internal MCP server exposing approximately 500 tools, publicly described in February 2026. Era: Harness Era. Status: Niche as a specific term; the underlying pattern is Dominant.

Orchestration / Meta-orchestration. Coordination of multiple model calls. Orchestration arrived with LangChain in October 2022. Meta-orchestration emerged as an informal label for orchestrating orchestrators — routing among workflows, models, or agent systems. Era: Genesis. Status: Widely used; harness has eaten some of orchestration's territory in coding-agent contexts.

Scaffolding. Earlier synonym for harness. Cognition (Devin) and Anthropic used it through 2024. Era: Agent Wave. Status: Declining in favor of harness.

Harness / Harness engineering. Everything around the model that makes it an agent: tools, permissions, memory, tracing, retries, verification, sandboxes, context assembly. Mitchell Hashimoto, early February 2026; Ryan Lopopolo at OpenAI, February 11, 2026; Martin Fowler and Birgitta Böckeler (Thoughtworks), February 17 and April 2, 2026. Era: Harness Era. Status: Dominant.

Guides and sensors. Fowler/Böckeler's subdivision of a harness: guides are feedforward (system prompts, architectural constraints, AGENTS.md); sensors are feedback (evals, validators, output parsers, LLM-as-judge). Era: Harness Era. Status: Widely used.

Garbage collection (in harness context). OpenAI Codex team's third pillar — continuously pruning stale scaffolding, unused tools, and accumulated drift. Era: Harness Era. Status: Niche.

Symphony. OpenAI's internal Elixir-based harness, open-sourced around February 2026. Era: Harness Era. Status: Niche (specific product).

Minions. Stripe's unattended coding agents, built on a fork of Block's Goose harness. Publicly detailed February 2026. Era: Harness Era. Status: Niche as a term; pattern is influential.

Archon. Cole Medin's open-source harness builder. Repositioned in early 2026 as "the first open-source harness builder for AI coding." Era: Context/Harness Era. Status: Niche.

12-factor agents. Dex Horthy (HumanLayer) design principles for agents, late 2024. Era: Agent Wave. Status: Widely used.

Prompt template. Reusable prompt with placeholders for variables. Era: Genesis. Status: Widely used.

Prompt library. Curated set of tested prompts for recurring tasks. Era: Genesis. Status: Widely used.

D. Agents and Deployment

Agent / Agentic. LLM system that takes actions toward goals with limited human-in-the-loop. Lilian Weng formalized "LLM Powered Autonomous Agents" on June 23, 2023. Andrew Ng and Harrison Chase articulated the spectrum and the copilot-versus-agent distinction in mid-2024. Anthropic's "Building Effective Agents," December 19, 2024, gave the field its canonical definition. Era: Genesis/Agent Wave. Status: Dominant.

Copilot / Assistant. Turn-by-turn helper. Microsoft rebranded Bing and Office as Copilot in March 2023. Era: Genesis. Status: Dominant product label, now distinct from agent.

AutoGPT. Toran "Significant Gravitas" Gravitas's open-source autonomous agent experiment, March 30, 2023. Era: Genesis. Status: Declining, but historically pivotal as a demonstration that the agent idea was live.

BabyAGI. Yohei Nakajima, April 3, 2023. Era: Genesis. Status: Declining.

Devin. Cognition's autonomous software engineer, announced March 12, 2024. Era: Agent Wave. Status: Widely used.

Operator / Computer Use. Computer-controlling agents. Anthropic Computer Use, October 22, 2024; OpenAI Operator, January 23, 2025. Era: Agent Wave/Context Era. Status: Widely used.

Browser agent. Agent operating websites or browser sessions on behalf of the user. Mainstream by 2025. Era: Context Era. Status: Widely used.

Deep Research. Long-horizon research agent category. Google Gemini Deep Research (December 2024); OpenAI Deep Research (February 2, 2025); Perplexity, xAI, and Anthropic variants followed through 2025. Era: Context Era. Status: Dominant.

Claude Code, Codex, Cursor, Windsurf, Zed, Aider, Cline, Roo Code. Agentic coding surfaces. Era: Agent Wave/Context Era. Status: Dominant.

Vibe coding. Andrej Karpathy, February 2, 2025 X post. Era: Context Era. Status: Widely used but contested and already evolving.

Agentic engineering. Simon Willison's February 2026 reframing for production-grade AI-assisted software work, intentionally more disciplined than vibe coding. Era: Harness Era. Status: Emerging.

Agent OS / LLM OS. Speculative category label for an operating-system-style layer managing agent permissions, compute budgets, memory, and inter-agent communication. Karpathy floated "LLM OS" in a November 23, 2023 keynote. Era: Genesis origin; Harness Era revival. Status: Contested, gaining traction.

Skills, plugins, hooks, slash commands. Agent-surface primitives enumerated by Karpathy on December 27, 2025. Era: Harness Era. Status: Widely used.

Sub-agents / Context quarantine. Spawning isolated agents with their own context to prevent pollution. Drew Breunig and Simon Willison, mid-2025. Era: Context Era. Status: Widely used.

Autonomous agent. Agent designed to act for extended stretches without stepwise human confirmation. Microsoft strongly popularized this phrasing in October 2024. Era: Agent Wave. Status: Widely used.

Multi-agent system. Multiple agents specializing, coordinating, or competing within a shared task. Pre-ChatGPT in AI literature; mainstreamed through AutoGen, CrewAI, and LangGraph in 2024. Era: Agent Wave. Status: Widely used.

Managed agent. Cloud-hosted agent service where the provider supplies runtime, memory, sandboxing, credentials, and execution environment. Anthropic Claude Managed Agents launched April 2026. Era: Harness Era. Status: Rising.

AI employee / AI worker. Marketing-heavy label for a specialized autonomous agent treated like a digital teammate. Common in 2025–2026 startup positioning. Era: Harness Era. Status: Widely used, somewhat contested.

Handoff / Delegation. Passing a subtask from one agent to another. Delegation became a central design lever in 2026 coding-agent discussions because it reduced context pollution. Era: Agent Wave/Harness Era. Status: Widely used.

Supervisor / Worker agent. Coordinating agent that delegates work to specialized subordinates, mirroring a multi-agent architecture pattern common in 2025–2026. Era: Context/Harness Era. Status: Widely used.

Agent framework. Library or platform for building agents, with tools, memory, graphs, and tracing. Category shaped by LangChain, AutoGen, CrewAI, and similar stacks. Era: Agent Wave. Status: Dominant.

Agent runtime. Execution layer that runs agents, manages state, dispatches tool calls, and enforces the loop between model, tools, and environment. Era: Agent Wave. Status: Widely used.

E. Multimodal and Media

Multimodal / Vision-Language Model (VLM). Model ingesting text plus images, with audio and video increasingly standard. GPT-4V, September 25, 2023. Era: Genesis. Status: Dominant.

Native multimodal. Jointly trained on all modalities rather than bolted together. Gemini 1.0, December 6, 2023; Claude 3, March 4, 2024; GPT-4o, May 13, 2024. Era: Agent Wave. Status: Dominant.

Omni model. OpenAI's "4o" framing for a model handling text, vision, and audio in real time. Era: Agent Wave. Status: Widely used as a product-specific label.

Speech-to-speech (S2S). End-to-end audio model. GPT-4o Advanced Voice (September 2024), Sesame CSM (February 2025), Gemini Live (August 2024), Cartesia Sonic-3 (October 2025). Era: Agent Wave/Context Era. Status: Dominant.

Realtime API. OpenAI, October 1, 2024. Era: Agent Wave. Status: Dominant.

Speech-to-text (STT). Audio-to-text conversion. Pre-ChatGPT in origin, now a component in the voice-agent stack. Era: Pre-ChatGPT/Genesis. Status: Dominant.

Text-to-speech (TTS). Synthesized speech from text. Pre-ChatGPT in origin, now a competitive frontier led by ElevenLabs, Cartesia, and Inworld. Era: Pre-ChatGPT/Genesis. Status: Dominant.

Whisper. OpenAI's open-source speech-to-text model. Released September 21, 2022 (MIT license); Large-v3 at Dev Day November 2023. Era: Pre-ChatGPT. Status: Dominant.

Barge-in. Ability for a voice system to detect and handle interruption while it is speaking. Inherited from legacy voice/IVR systems; renewed salience in AI voice in 2025–2026. Era: Harness Era mainstreaming. Status: Widely used.

Turn-taking. Conversational timing pattern. Conversation-analysis term long predating AI; renewed centrality in 2024–2026 voice design because human turn gaps average around 200 ms. Era: Pre-ChatGPT origin; Harness Era centrality. Status: Widely used.

Time to First Token (TTFT). Delay before the first generated token arrives. Serving-jargon pre-ChatGPT; mainstreamed in 2025–2026 voice and coding-agent discourse. Era: Context Era. Status: Widely used.

Text-to-video, text-to-image, text-to-music. Generative media categories. Sora (announced February 15, 2024; GA December 9, 2024), Veo, Runway, Pika, Midjourney, Suno, Udio. Era: Agent Wave. Status: Dominant.

World model. Generative video model treated as a simulator. Sora paper; Google Genie (February 2024); Genie 2 (December 2024); DeepMind Veo; World Labs (Fei-Fei Li, September 2024). Era: Agent Wave. Status: Widely used.

Generative UI. LLM-generated interface components. Vercel AI SDK, 2024. Era: Agent Wave. Status: Widely used.

Vision-Language-Action (VLA) model. Model that sees, reasons, and acts in an environment, especially robotics or GUI settings. Mainstreamed 2024–2025. Era: Agent Wave. Status: Widely used.

F. Economics, Ecosystem, and Culture

Moat / No moat. Defensibility of AI positions. Luke Sernau's leaked Google internal memo "We Have No Moat, And Neither Does OpenAI," surfaced by Dylan Patel of SemiAnalysis on May 4, 2023. Era: Genesis. Status: Dominant.

Training data deal. Paid data licensing. OpenAI expanded its Shutterstock deal on July 11, 2023, followed by Axel Springer (December 2023), Financial Times (April 2024), Reddit (May 16, 2024, reportedly ~$60M/year), News Corp (May 22, 2024, >$250M over five years), Dotdash Meredith, Vox Media, The Atlantic, Time, and Condé Nast (all May–August 2024). Era: Genesis → Agent Wave. Status: Dominant.

Data moat. Proprietary training data as durable defensibility. Crystallized as the "real" moat in 2025–2026 executive discourse. Era: Agent Wave. Status: Widely used.

Synthetic yield. Speculative metric: value extracted per synthetic-data dollar. Candidate for adoption if synthetic-data flywheels prove central to competitive advantage. Era: Harness Era. Status: Emerging.

LLMflation. Guido Appenzeller / a16z, November 2024 — the fall in cost per token. Era: Agent Wave. Status: Widely used.

Cost per token / Cost per million tokens. Standard AI unit economics. Era: Genesis. Status: Dominant.

Value per Token (VPT). Business-value-delivered per token consumed. Earliest indexed public use of the macro-economic business-metric framing surfaced by de-seeded Deep Research: ambient-code.ai blog post titled "Tokenomics for Code: Value per Token in the Agentic Era," October 6, 2025. Sean Fenlon (Symphony42) and Dave Blundin have claimed a July 4, 2023 LinkedIn articulation of the same framing; this is addressed as a self-reported priority claim in Section 6. Tomasz Tunguz's "Gross Profit per Token," December 30, 2025, is the most prominent adjacent VC formulation. Note that "value per token" appears mathematically in pre-ChatGPT RL literature (as a learned baseline in PPO algorithms), but the macro-economic business-metric usage is a post-ChatGPT construction. Era: Context Era. Status: Emerging, rising.

Gross Profit per Token. Tomasz Tunguz's VPT-adjacent framing, December 30, 2025. Era: Context Era. Status: Emerging.

Token economy. Macro framing of AI as a token market. Various attributions; crystallized in Sam Altman's BlackRock Infrastructure Summit remarks, March 2026. Era: Context Era. Status: Widely used.

Token billionaire. Organization spending more than one billion tokens per day. Latent Space podcast coinage, early 2026, applied to OpenAI's Codex team. Era: Harness Era. Status: Niche.

Compute buildout / Hyperscaler capex. Stargate ($500B, announced January 21, 2025; OpenAI + SoftBank + Oracle + MGX), Microsoft's $80B FY25 capex, Meta's $60–65B, Google's $75B, Amazon's $100B+. Era: Context Era. Status: Dominant.

GPU-rich / GPU-poor. Dylan Patel at SemiAnalysis, September 2023. Era: Genesis. Status: Widely used.

AI Slop. Low-quality generative output. Earliest crystallizing public usage traced to @deepfates on X on May 7, 2024; popularized and canonized by Simon Willison's blog post "Slop is the new name for unwanted AI-generated content," May 8, 2024. Named Word of the Year 2025 by Merriam-Webster ("slop"), the American Dialect Society, Macquarie Dictionary ("AI slop" specifically), and The Economist. Era: Agent Wave. Status: Dominant.

Patent Slop / AI slopplications. AI-generated patent filings flooding patent offices. Two parallel coinages appear in the public record, with clear priority. Mark Summerfield (Australian patent attorney), Patentology blog, December 3, 2025 — "AI slopplications" — is the earliest indexed public use. Sean Fenlon coined "Patent Slop" independently in The Near Side #28 on March 2, 2026, three months later. Both describe the same phenomenon. Addressed in Section 6. Era: Context/Harness Era. Status: Emerging.

Enshittification. Cory Doctorow, November 2022 — platform decay, widely applied to AI products by 2024. Era: Genesis. Status: Dominant.

Dead Internet theory. Pre-existing conspiracy-adjacent meme, supercharged by AI slop in 2024–2025. Era: Pre-ChatGPT. Status: Widely used.

AGI (Artificial General Intelligence). Human-level generality. Long-standing term (Goertzel/Legg usage c. 2002; AGI conference from 2008). Era: Pre-ChatGPT. Status: Contested / shifting.

ASI (Artificial Superintelligence). Beyond-human intelligence. Nick Bostrom, Superintelligence (2014); revived in 2025–2026 discourse by Ilya Sutskever's Safe Superintelligence Inc (June 19, 2024), Leopold Aschenbrenner's Situational Awareness (June 2024), and Meta's renamed Meta Superintelligence Labs (mid-2025). Era: Pre-ChatGPT origin; Context Era revival. Status: Rising.

Recursive Self-Improvement (RSI) / Intelligence Explosion / Takeoff. I.J. Good (1965); revived post-2023 as a capability-economic (not purely safety) argument. Era: Pre-ChatGPT origin; Context Era revival. Status: Contested.

AI-native company. Organization with LLMs in the critical path from day one. Era: Agent Wave. Status: Widely used.

AI wrapper. Pejorative for a thin product over a model API. 2023 X discourse. Era: Genesis. Status: Widely used, often pejorative; softening in 2026 as "wrappers" acquire substantive harnesses.

Vibe shift. Sean Monahan, 2022; adapted to describe AI-product tonality. Era: Pre-ChatGPT. Status: Niche.

Prompt engineer (job title). See narrative in Section 3. Era: Genesis. Status: Declining.

Forward deployed engineer. Customer-facing technical role that customizes and deploys AI systems inside specific enterprises. Older Palantir-style title; AI mainstreaming by 2025–2026. Era: Harness Era. Status: Rising.

Context engineer. Job title that emerged post-Lütke/Karpathy/Willison June 2025 crystallization. Era: Context Era. Status: Emerging.

Vibe code cleanup specialist. Ironic LinkedIn title, late 2025. Era: Context Era. Status: Niche/cultural.

Agent mesh. Speculative: network of interoperating agents across organizations. Era: Harness Era. Status: Emerging.

Meta-harness. Speculative: harness that builds, manages, or retires other harnesses. Era: Harness Era. Status: Emerging.

Affective memory. Speculative: persistent emotional/user model across sessions. Implicit in Hume AI's EVI line and in companion products. Era: Harness Era. Status: Emerging.

Frontier Model Forum. Industry body founded July 26, 2023 by Anthropic, Google, Microsoft, and OpenAI; Meta and Amazon added later. Era: Genesis. Status: Dominant.

Agentic AI Foundation. Linux Foundation project announced November 2025 to host MCP and related standards. Backers include Block, OpenAI, Google, Microsoft, AWS, Cloudflare, and Bloomberg. Era: Context Era. Status: Widely used.

Chatbot Arena / LMSYS. Crowdsourced pairwise-comparison platform for LLMs. Launched May 3, 2023. Era: Genesis. Status: Dominant.

LLMOps. Operational discipline of deploying, monitoring, evaluating, and governing LLM-based systems. Category term emerged in 2023. Era: Genesis. Status: Widely used.

Observability / Tracing. Instrumentation for understanding what agents are doing in production. Mainstreamed in LLM/agent tooling through 2024–2026. Era: Agent Wave. Status: Widely used.

Model router / Routing. System deciding which model should answer a request based on cost, latency, or complexity. RouteLLM, 2024, as a clear formalization. Era: Context Era. Status: Widely used.

Memory (agent). Any mechanism preserving relevant information across turns or sessions beyond the prompt. Broadened in 2024–2026 from "chat history" to a distinct engineering layer. Era: Agent Wave. Status: Dominant.

Working / Episodic / Semantic memory. Cognitive-science borrowings that entered agent-memory vocabulary through 2025–2026. Era: Context/Harness Era. Status: Widely used.

 

Section 3 — Evolution Narrative

Twenty terms structure this section. Each was chosen because its arc is genuinely informative — where it came from, who pushed it, why it rose, and, when applicable, why it fell or transformed. The narratives are journalism as much as taxonomy: the prompt given to the four reconciled models explicitly asked for named individuals and companies generously, the bigger the name the better, and for plausible causal explanations rather than mere chronology.

1. Prompt vs. System Prompt

When ChatGPT launched on November 30, 2022, "prompt" meant whatever a user typed into the box. Within four months that singular term had bifurcated, and the split now structures the entire industry. The splitting instrument was OpenAI's March 1, 2023 Chat Completions API, which introduced a system role alongside user and assistant. The system message was persistent, hidden from the end user, and preemptive — exactly the surface a product team needed to shape a model's persona, tool availability, refusal behavior, and output format without touching the conversation. From that moment, "prompt" came to mean the disposable, consumer-facing line typed into a textbox, while "system prompt" meant the production artifact that governs a model-backed application.

The 2023 leak economy cemented the distinction. Repositories like l1ghtsource/leaked-system-prompts and Simon Willison's periodic disclosures of leaked GPT, Claude, and Gemini system prompts — some exceeding twenty thousand tokens by 2025 — demonstrated that the real engineering of a chatbot was not the user's query but the multi-thousand-token apparatus silently preceding it. Anthropic's decision in July 2024 to publish its Claude system prompts openly was framed as transparency but also as a norm-setting move: system prompts are the product.

By the Harness Era, "system prompt" has been partly subsumed under broader labels like "context" or "guides" (Fowler and Böckeler, April 2, 2026). Practitioners recognized that a system prompt is just one instrument inside a much larger harness that also includes AGENTS.md files, tool descriptions, scratchpads, and retrieved memories. The pair survives, however, because it still draws the single most useful line in AI product work: the line between what the user writes and what the product writes for them. The rise was technical — an API design choice by OpenAI. The persistence is commercial: it is where product differentiation hides.

The practical implication for anyone building with models in 2026 is that "prompt engineering" in production almost never means fiddling with the end-user prompt. It means iterating on the system prompt — and on the surrounding guides and sensors — to constrain, inform, and verify model behavior before the user ever types a word.

A parallel commercial story worth naming is the meme of the leaked system prompt itself, which has had three distinct phases. In 2023, leaked system prompts were treated as bug reports — evidence that the product team had failed to protect sensitive instructions. In 2024, with Claude.ai and ChatGPT both periodically disclosing portions of their own system prompts, the asset flipped: a well-crafted system prompt was openly discussed as a competitive differentiator in the same way UX copy was. By 2025, Anthropic's decision to publish its Claude system prompts as standard practice, combined with OpenAI's publication of its Model Spec (in a different but related framing), normalized the artifact. System prompts are now read, critiqued, and imitated the way open-source codebases are. That normalization is itself a mark of the term's maturity: when the product artifact becomes a public reference document, the category has arrived.

2. Prompt Engineering

Few phrases rose and fell as quickly as "prompt engineering." The term predates ChatGPT — it appears in GPT-3-era blog posts through 2021 — but it exploded into job-board reality in early 2023. Anthropic's March 2023 listing for a "Prompt Engineer and Librarian" at a base of roughly $280,000–$375,000 (reported in varying forms across Business Insider, Fortune, and Bloomberg) was the high-water mark that launched a thousand thinkpieces. Klarna, Scale AI, Boston Children's Hospital, and numerous law firms posted variations. DeepLearning.AI's "ChatGPT Prompt Engineering for Developers" course, taught by Isa Fulford and Andrew Ng in April 2023, pulled hundreds of thousands of enrollments.

The peak ran roughly from mid-2023 to mid-2024. The decline had three distinct causes, each accelerating the others.

The first cause was model improvement. GPT-4, Claude 3, and Gemini 1.5 were progressively more tolerant of sloppy prompts, which flattened the skill gradient that had made "prompt engineer" a premium title. The second cause was automation: DSPy (Omar Khattab, Stanford, October 2023), OpenAI's Structured Outputs (August 6, 2024), and prompt optimizers inside Anthropic's console eroded the bespoke-craft framing. The third and most decisive cause was scope inflation: the work that a senior "prompt engineer" actually did — evaluations, retrieval wiring, tool choice, model routing, cost management, context assembly — was nothing like what a novice heard in the title. Companies realized they were hiring systems engineers, not prompters.

By mid-2025, the re-branding was explicit. Tobi Lütke's "context engineering" tweet on June 18, 2025, and Andrej Karpathy's endorsement on June 25, 2025, rebranded the discipline above. Simon Willison wrote on June 27, 2025, that prompt engineering had developed "an inferred definition… a laughably pretentious term for typing things into a chatbot," and that context engineering "may have sticking power." Job boards followed. By late 2025 the "Prompt Engineer" title was largely replaced by "AI Engineer," "Context Engineer," or the agentic-coding role "Agent Engineer." Reuters' February 2026 reporting on the surge in demand for "forward deployed engineers" captures the broader shift: companies increasingly wanted people who could embed models into business processes, not wordsmith a chat box. The skill did not vanish. The title did.

The cultural afterlife of the title is worth noting briefly. "Prompt engineering" became shorthand, in 2024–2025 tech-industry humor, for the entire early-ChatGPT hype cycle — a placeholder the way "webmaster" now stands in for the 1996 internet. Two separate books with the title The Art of Prompt Engineering appeared in 2023; Coursera and Udemy both offered courses at multiple price points throughout 2024; DeepLearning.AI maintained its original short-course on the topic but added follow-on courses branded around context engineering, evaluations, and agent design through 2025. The career arc of a title from premium status to dated-signifier in under three years is itself a data point about the pace of AI vocabulary evolution. It took about thirty-six months for "prompt engineer" to run the full arc from job-title-nobody-has-heard-of to job-title-of-the-moment to job-title-that-dates-your-resume. Nothing in the history of technical job titles moves quite this fast.

3. Hallucination vs. Confabulation

"Hallucination" as a term for plausible-but-wrong model output predates ChatGPT by several years — it appears in machine-translation literature (Lee et al., 2018) and in the 2020 GPT-3 paper. ChatGPT's virality made it a household word by early 2023, often weaponized in media coverage. Kevin Roose's column on his eerie conversation with Bing/Sydney (February 16, 2023) and the Mata v. Avianca case (May 27, 2023, in which attorney Steven Schwartz was sanctioned for citing fabricated legal precedents) cemented the word in public vocabulary.

A vocal minority pushed for "confabulation" as more accurate. Douglas Hofstadter used it in his public remarks throughout 2023; Geoffrey Hinton and Gary Marcus favored it on X and in podcasts. Their claim was partly technical — the model is not perceiving something that isn't there; it is inventing a narrative consistent with fragments — and partly anti-anthropomorphism. NIST's Generative AI Profile (NIST AI 600-1) adopted "confabulation" as the preferred technical term. But "hallucination" stuck, because it traveled better in headlines, was already in the research literature, had an evocative pictorial quality "confabulation" lacked, and benefited from massive first-mover advantage in media reporting.

A second wave of critique arrived in 2024–2025. Emily Bender and academics in the FAccT community argued that all LLM output is confabulation — there is no cognitive distinction between a true and a false generation from the model's perspective; the distinction is only in the external world. This sharpened the framing of the industry's hallucination-reduction techniques — RAG, grounded generation, citation-forcing, verifier models — as grounding rather than truth. As of April 2026, "hallucination" dominates industry copy, "confabulation" survives in academic and skeptical-journalism contexts, and the underlying problem has been meaningfully mitigated but not solved. Frontier models now report single-digit-percent hallucination rates on factoid benchmarks and double-digit-percent on long-form open-ended tasks.

The persistence of the dual vocabulary itself illustrates how cultural terminology shapes technical priorities. Calling the phenomenon "hallucination" emphasizes unreliability and mitigation. Calling it "confabulation" highlights the generative, story-telling nature of the failure mode. Both are sometimes true at once.

The commercial stakes are substantial. Mata v. Avianca in May 2023 became the template for legal-profession caution; a wave of state bar advisories and firm-level usage policies followed through 2023–2024, and "hallucinated citation" became a standing item in every AI-in-law panel through 2026. In healthcare, the FDA's guidance on generative AI in medical devices (draft 2024, refined through 2025) explicitly distinguishes generative unreliability from traditional software bugs — a taxonomic concession to the hallucination phenomenon. In consumer products, the Bing/Sydney coverage of early 2023 shaped product-safety vocabulary at every major lab, and the subsequent cycle of RLHF refinements, citation-forcing, and retrieval grounding can be read as an extended industrial response to a single word's ability to damage trust. "Hallucination" is the rare case of a consumer-facing term shaping engineering budgets directly.

4. RAG (Retrieval-Augmented Generation)

RAG is a term with a clean and boringly correct origin story that was then rapidly stretched into unrecognizability. Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, and Douwe Kiela at Meta FAIR (then Facebook AI Research) published "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks" on arXiv on May 22, 2020 (NeurIPS 2020). The original architecture was specific: a dense passage retriever over Wikipedia feeding a BART generator, jointly trained. Almost nobody uses that exact architecture today.

What "RAG" became is a generic label for any pattern that retrieves text from a corpus and stuffs it into an LLM prompt. The mainstream adoption year was 2023. LangChain (Harrison Chase, October 2022), LlamaIndex (Jerry Liu, originally GPT Index, November 2022), and Pinecone (founded 2019 but surging commercially in 2023) provided the plumbing. Both founders have credited the OpenAI Embeddings API (December 15, 2022) with making RAG economically viable for startups — the pricing drop was what turned the architecture from a research curiosity into an enterprise default. By mid-2023 every enterprise AI pitch contained a RAG diagram.

Starting in 2024, RAG fragmented. Microsoft Research's GraphRAG (July 2024) added a knowledge-graph construction step for corpora whose relational structure mattered. Agentic RAG — a LangChain- and LlamaIndex-blog-driven coinage in late 2024 — described retrieval in a loop, controlled by an agent that issues its own queries and critiques its own results. HyDE, multi-query, RAG-Fusion, CRAG, and Self-RAG proliferated as ways to patch over retrieval's various failure modes.

By 2025, long-context models (Gemini 1.5 and 2.5 at 1–2 million tokens) launched a half-serious "RAG is dead" debate. The honest conclusion, articulated by Willison and others, was that RAG is not dead but specialized. Long context handles small, bounded corpora. Vector retrieval handles scale. The Harness Era has quietly re-centered RAG as a tool the agent decides to call, which is effectively Agentic RAG with better language. The term survives because the practice is structurally essential: there is always more knowledge than context window, and always will be.

The commercial aftermath of the RAG boom is worth noting. Vector database companies built in 2022–2023 (Pinecone, Weaviate, Chroma, Qdrant) reached nine-figure valuations on the assumption that every enterprise would need dedicated similarity-search infrastructure. By 2024–2025 the thesis softened: Postgres's pgvector extension made vector search a commodity feature rather than a product category, and long-context models reduced the retrieval need for many documents-per-query use cases. Pinecone's 2025 repositioning around agent-memory infrastructure rather than raw vector search reflects this. The RAG lexicon survives, but the category has compressed from "product" to "feature" in much of the market — a pattern that will repeat with other 2023-era categories.

5. Context Window

Context window is the engineering constraint that has defined LLM product possibilities since 2022, and its expansion is among the cleanest exponential curves in the industry. GPT-3 launched with 2,048 tokens. GPT-3.5 ran at 4,096, with a 16,000 variant arriving mid-2023. GPT-4's initial release (March 14, 2023) offered 8,000 and 32,000 tiers. Anthropic then executed the first genuine discontinuity: Claude 2 shipped at 100,000 tokens on July 11, 2023, and Claude 2.1 pushed to 200,000 on November 21, 2023. Gemini 1.5 Pro, announced February 15, 2024, crossed the 1-million-token threshold in public preview and promised 10 million in research. Gemini 1.5 and 2.0 shipped 2 million to paying customers through 2024 and 2025.

The engineering problem is brutal. Vanilla transformer attention costs O(n²) in compute and memory, which is why every 100K-plus context model relies on tricks: Flash Attention (Tri Dao, 2022), ring attention (Liu et al., 2023), sliding-window attention, sparse attention, state-space fusion (Mamba-style blocks interleaved with attention), and aggressive KV-cache engineering. Inference cost scales poorly enough that providers introduced tiered pricing, per-token prompt caching (Anthropic, August 14, 2024; OpenAI, October 2024), and context-aware routing.

Performance claims got ahead of reality. Google's 2024 Gemini long-context demos drew the most fire. Critics including Jim Fan and assorted researchers on X showed that flat needle-in-a-haystack success did not generalize to multi-hop reasoning or subtle-distractor tasks. The "Lost in the Middle" paper (Nelson Liu et al., Stanford and UC Berkeley, July 2023) established the enduring finding that models retrieve more accurately from the beginning and end of a long context than from the middle — a recency-plus-primacy effect that has never fully gone away. Anthropic's published evaluations on Claude 2.1 and Claude 3 set a higher bar: they reported needle-in-haystack success across position and attempted harder compositional retrieval tasks, and later emphasized multi-round citation and retrieval benchmarks (MRCR) that went beyond flat recall. Greg Kamradt's public NIAH benchmark, introduced in November 2023, became the default public yardstick.

The distinction between "real" and "pseudo" long-context memory sharpened through 2024–2025. A model that retrieves one hidden fact from a million tokens is not the same as a model that can reason robustly over a million-token codebase or book. Labs published credible evidence that long context can work, but buyers increasingly learned to distinguish performance theater (flat NIAH curves that collapse under compositional stress) from genuine usable long-context reasoning.

By April 2026 the operational consensus is that context is effectively unlimited for retrieval, meaningfully limited for reasoning, and economically limited for production use. The discipline of context engineering (Lütke, Karpathy, and Willison, June 2025) exists precisely because maximally filling a window is not the optimum. The right answer is dynamic: prune aggressively, summarize stale turns, offload long-term memory to external stores, quarantine sub-agents with their own contexts, and reserve the prime real estate of beginning and end for the most load-bearing instructions. The rise of context windows is a hardware-and-algorithm story; the rise of context engineering is the recognition that bigger is not automatically better.

The commercial vocabulary around context has split along use case. Coding agents (Claude Code, Cursor, Windsurf, Codex) benefit enormously from long context because codebases are naturally large and self-referential; million-token windows let an agent read the entire repository in one pass. Customer-service agents benefit from long context for retrieval over knowledge bases but rarely from long conversation history, which is better compressed. Research agents (Deep Research, Perplexity Pro) use long context for long-document synthesis. Voice agents barely use long context at all — conversation turns are short and latency-critical, and persistence lives in external memory rather than in-window. This segmentation is why "long context" as a blanket marketing claim softened through 2025: buyers learned to ask what the long context was for, not just how many tokens it supported.

6. Agent / Agentic

"Agent" has been an AI term since the 1950s and a word in formal reinforcement learning since at least the 1980s. Post-ChatGPT, it was not newly coined; it was repopulated. The Genesis-era demonstrations — AutoGPT (Toran "Significant Gravitas" Gravitas, March 30, 2023) and BabyAGI (Yohei Nakajima, April 3, 2023) — showed LLMs looping, planning, and calling tools. They were thrilling and mostly useless, which generated a multi-year skepticism about whether "agent" meant anything at all beyond "an LLM in a while loop."

The rehabilitation of the term is worth defending carefully. The criterion that does real work is autonomy as degree of human-in-the-loop. An assistant or copilot (Microsoft's 2023 branding) is architecturally turn-by-turn: the human sends a message, the system responds, the human approves or corrects, the loop iterates. An agent operates across many steps without per-step approval. Lilian Weng's June 23, 2023 essay "LLM Powered Autonomous Agents" gave the first influential architectural statement. Andrew Ng, on June 13, 2024, reframed agentic as a spectrum: "agent-like to different degrees… planning, tool use, multiple iterative steps." Harrison Chase's LangChain post on June 28, 2024 rendered the copilot-vs-agent contrast explicit. Anthropic's "Building Effective Agents," December 19, 2024, laid down what became the canonical definition by distinguishing workflows (predefined code paths) from agents (LLMs dynamically directing their own tool use).

The autonomy criterion was articulated publicly by multiple practitioners converging on the same distinction independently. Among them, Symphony42 CEO Sean Fenlon captured the point concisely on X on September 11, 2024: "AI Agents are NOT: Tools, Assistants, Co-pilots. AI Agents ARE: Autonomous, Inevitable." His phrasing is a representative practitioner articulation of the criterion during the 2024 convergence period, independent of the longer technical essays from Weng, Ng, and Chase. There is no single originator; what makes the distinction durable is that multiple credible voices reached it in parallel during 2023–2024 because the underlying technical threshold (function calling plus tool use plus longer context) made autonomy a real capability rather than an aspiration.

The term survived multiple waves of backlash — "it's just a for-loop," "everything is an agent" — because the autonomy criterion actually draws a line that matters commercially. Claude Code, Cursor Agent, Devin, Operator, Deep Research, and Stripe's Minions all sit on the agent side: a human kicks them off; they run for minutes or hours; they return a result. Chat-mode Copilot, ChatGPT, and Gemini in Docs sit on the assistant side. The distinction is load-bearing because the evaluation regimes, the failure modes, the user interfaces, and the unit economics all differ. The word rose because it captured a real thing. It has held because the real thing continues to matter.

7. MCP (Model Context Protocol)

Anthropic released MCP on November 25, 2024, led by David Soria Parra and Justin Spahr-Summers. The protocol's design goal was disarmingly plumbing-oriented: standardize how models connect to tools, data, and prompt templates, so every client did not have to reinvent the wheel for every data source. Initial launch partners included Block, Apollo, Zed, Replit, Codeium/Windsurf, and Sourcegraph.

MCP's adoption curve was steeper than almost any protocol in recent memory. Sam Altman's March 26, 2025 X post endorsing MCP ("people love MCP and we are excited to add support across our products") effectively conceded Anthropic's standard to the broader ecosystem. Google DeepMind added support through 2025, later pairing it with its own Agent-to-Agent (A2A) protocol. Microsoft Build 2025 on May 19, 2025 added Microsoft and GitHub to the MCP steering committee, announced Windows 11 as an "agentic OS" with native MCP support, and made Copilot Studio MCP-compatible. Docker's MCP Catalog entered beta on May 5, 2025. Cursor, Cloudflare, Windsurf, and Zed all shipped MCP support through 2025.

By Anthropic's November 2025 anniversary post, the SDKs had passed 97 million monthly downloads across Python and TypeScript combined, with thousands of community servers and a Claude connector directory of 75-plus official integrations. In the same announcement, Anthropic donated the protocol to the newly-formed Agentic AI Foundation under the Linux Foundation, with Block, OpenAI, Google, Microsoft, AWS, Cloudflare, and Bloomberg as backers.

Why did MCP win so decisively? Three reasons. First, it was genuinely vendor-neutral from a vendor (Anthropic) that was not the market leader, which reduced the credible threat of extraction. Second, its design matched the agent-shaped moment — tools, resources, prompts — the three primitives agents actually needed. Third, OpenAI's early endorsement foreclosed the alternative-protocol war before it could gather mass. MCP is the USB-C of AI agents: boring, ubiquitous, and consequential.

The practical vocabulary that emerged around MCP is worth naming explicitly. "MCP server" denotes a service exposing tools and resources over the protocol; "MCP client" denotes the application (usually an AI chat or coding agent) consuming those servers. "Remote MCP" designates servers accessed over the network rather than locally. "ToolShed" became Stripe's name for its internal MCP server aggregating roughly 500 tools, and the metonymy "a company's toolshed" now denotes the complete set of internal capabilities exposed to its agents. The Agentic AI Foundation's November 2025 formation under the Linux Foundation places MCP on the same governance trajectory as Kubernetes, the Open Container Initiative, and similar neutral-steward standards. By 2026 MCP is not just a protocol; it is an assumption — the way HTTP is an assumption for web application development.

8. Reasoning / Thinking Models

On September 12, 2024, OpenAI released o1-preview and o1-mini, a model class that "thinks before it answers." The launch was the first public confirmation of a thesis that had been circulating inside frontier labs for at least a year: pre-training scaling was yielding diminishing returns, and the next axis of progress was test-time compute. OpenAI's Noam Brown — who had previously built the poker AIs Libratus and Pluribus at CMU and whose academic work centered on search at inference — became the public face of the thesis. On launch day he tweeted that o1 "thinks for seconds, but we aim for future versions to think for hours, days, even weeks. Inference costs will be higher, but what cost would you pay for a new cancer drug? For breakthrough batteries? For a proof of the Riemann Hypothesis?"

The product tier formed almost immediately. o1 full released December 5, 2024, bundled with ChatGPT Pro at $200 per month — the first mass-market subscription AI product at that price. The $200 tier itself became a market anchor. o3 was announced December 20, 2024 on the final day of OpenAI's "12 Days of OpenAI," with ARC-AGI scores that startled the field. o3-mini shipped January 31, 2025; full o3 and o4-mini on April 16, 2025; o3-pro on June 10, 2025. Anthropic followed with Claude 3.7 Sonnet's "extended thinking" mode (February 24, 2025) and Claude 4's interleaved thinking. Google shipped Gemini 2.0 Flash Thinking (December 2024) and Gemini 2.5 Pro's "Deep Think." DeepSeek's R1 (January 20, 2025) open-sourced a reasoning model trained primarily with reinforcement learning on verifiable rewards, shattering the assumption that reasoning required proprietary tricks and triggering a one-day $600B market-cap drop in Nvidia.

The vocabulary itself split along vendor lines. "Reasoning model" beat "thinking model" in US commercial usage; Anthropic and Google lean on "thinking." The distinction is linguistic, not architectural. Both refer to models that emit long private or partially-exposed chains of thought, trained by reinforcement learning against verifiers, priced at higher per-token costs offset by higher per-answer value.

The causal driver was legible to the whole industry. Pre-training scaling — the old recipe of "more compute, more data, more parameters" — was hitting diminishing returns on the hardest reasoning tasks. Allocating compute at inference time (search, self-critique, verification) unlocked a new axis of capability without retraining the base model. By 2026 the tier is permanent: reasoning models cost more per token, take longer, and produce superhuman performance on narrow but high-value tasks. Reasoning models are the product embodiment of Chain of Thought.

The commercial consequences were larger than most incumbents initially anticipated. First, the $200-per-month consumer tier opened a market segment that had previously been thought unavailable: individual professionals willing to pay SaaS-enterprise prices for personal AI. Anthropic's Claude Pro Max at $200 and its higher tiers, Google's Gemini Advanced at $20, and xAI's SuperGrok at $30 all repositioned around this new price spine. Second, reasoning models fundamentally rewired unit economics for enterprises: the cost per answer rose, but so did the value per answer. Buyers began to ask not "how many tokens did this consume" but "what did this decision produce," a shift that directly prefigured the Value per Token vocabulary emerging in the same period. Third, reasoning models strained context-window economics: long chains of thought consume tokens aggressively, which in turn drove demand for prompt caching and for cheaper specialized reasoning tiers (o1-mini, o3-mini, Claude Haiku with extended thinking). Fourth, DeepSeek R1's January 2025 open-source release commoditized the technique, forcing every frontier lab to compete on reasoning depth rather than reasoning access. The reasoning-model tier is now where competition is most intense.

9. Chain of Thought (CoT)

Chain of Thought as a prompting technique was established by Jason Wei and co-authors at Google Brain (Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc Le, Denny Zhou) in "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models," arXiv January 28, 2022 (2201.11903), NeurIPS 2022. The central finding was simple: sufficiently large models, when shown a few examples in which reasoning was written out before the answer, would emit step-by-step reasoning themselves and solve problems they otherwise failed.

Public adoption through 2022–2023 was dominated by a single variant: Kojima et al.'s "Let's think step by step" zero-shot prompt (May 2022), which became the most over-used four words in early AI. Self-Consistency (Xuezhi Wang et al., March 2022), Tree of Thoughts (Shunyu Yao et al., May 17, 2023), and Graph of Thoughts (Maciej Besta et al., August 2023) extended the idea into search-over-reasoning.

The clever conceptual move of 2024 was to stop asking for CoT at prompt time and train it in. OpenAI's o1 made chain-of-thought a latent behavior elicited by reinforcement learning rather than by a few-shot example. The public prompting trick became a private architectural pattern. By 2025, the phrase "chain of thought" was both ubiquitous — every reasoning-model system card discusses hidden CoT length, visibility, and faithfulness — and partly retired as a user-facing technique. Karpathy observed in his 2025 year-in-review that RLVR-driven CoT is prone to benchmaxxing, with chains optimized for verifier happiness rather than human-readable correctness.

The arc is a classic case of technological absorption. Manual CoT rose because it was the cheapest bolt-on performance boost ever discovered. It was absorbed because a trained behavior is always cheaper and more reliable than an elicited one. The paper remains one of the most-cited of the post-ChatGPT era because it revealed a latent capability that prompting could unlock.

10. Context Engineering

The discipline emerged in three tweets and a blog post, across a span of nine days. Tobi Lütke, CEO of Shopify, wrote on June 18, 2025: "I really like the term 'context engineering' over prompt engineering. It describes the core skill better: the art of providing all the context for the task to be plausibly solvable by the LLM." Andrej Karpathy quote-tweeted his agreement on June 25, 2025: "+1 for 'context engineering' over 'prompt engineering'. People associate prompts with short task descriptions you'd give an LLM in your day-to-day use. When in every industrial-strength LLM app, context engineering is the delicate art and science of filling the context window with just the right information for the next step." Simon Willison canonized the shift on June 27, 2025, writing, "I think this one may have sticking power." Within weeks the term was on every AI engineering substack, in LangChain's documentation, and in the titles of hastily-organized conference talks.

It stuck for three reasons. First, it named a skill that was already being practiced but had no word — dynamic context assembly, tool-description tuning, RAG wiring, memory curation. Second, it came from a CEO (Lütke) with real technical credibility (ex-programmer, active GitHub account), seconded by a researcher of generational authority (Karpathy). Third, "prompt engineering" had self-deprecated into disrepute. Practitioners wanted a term they could put on a résumé without irony.

Drew Breunig's taxonomy of context failures — poisoning, distraction, confusion, clash — and his catalog of mitigations — tool loadout, context quarantine via sub-agents, pruning, summarization, offloading to external memory — gave the discipline its first reference pattern language. By early 2026, "Context Engineer" had appeared as a job title at major labs and startups. Anthropic's September 2024 "Contextual Retrieval" post and the broader industry move from raw retrieve-and-stuff to deliberate context assembly made the vocabulary operationally load-bearing.

The rise was empirical. Practitioners found that the quality of the information layout around the model mattered more than clever phrasing alone. Context engineering is the word for what those practitioners were actually doing.

The vocabulary of context engineering as a discipline has since solidified around a small number of operational moves. Chunking strategy (how to divide a corpus for retrieval), tool description budget (how much prompt real estate tool docs consume), memory tiering (hot in-context, warm short-term, cold long-term), sub-agent context quarantine (spawning isolated agents to prevent pollution), and context compression (summarizing stale turns without losing actionable detail) all became standard parts of a context engineer's toolkit through 2025–2026. The discipline has natural overlap with traditional information architecture and library science — Anthropic's March 2023 job listing for a "Prompt Engineer and Librarian" now reads, in retrospect, as the first job listing for this role. The librarian analogy is more apt than the engineering one: the core skill is curating which information reaches the model, in what order, with what surrounding context.

11. Scaffolding, Harness, and Harness Engineering

Through 2024 and most of 2025, the word for "the stuff around the model that makes it an agent" was scaffolding. Cognition used it to describe Devin; Anthropic used it in its research posts; Swyx's Latent Space podcast used it casually. Scaffolding was a serviceable term, but it had a temporariness built into the metaphor — you build scaffolding, then you take it down. That did not match practice, where the scaffolding was the product.

The vocabulary flipped in the first two weeks of February 2026. Mitchell Hashimoto, co-founder of HashiCorp and creator of Ghostty, published "My AI Adoption Journey" in early February 2026, where he introduced harness engineering: "the idea that anytime you find an agent makes a mistake, you take the time to engineer a solution such that the agent never makes that mistake again." Six days later, on February 11, 2026, Ryan Lopopolo, a member of technical staff on OpenAI's Frontier Product Exploration team, published "Harness engineering: leveraging Codex in an agent-first world" on the OpenAI blog. His money quote anchored the entire movement:

"Over the past five months, our team has been running an experiment: building and shipping an internal beta of a software product with 0 lines of manually-written code… every line of code… has been written by Codex. We estimate that we built this in about 1/10th the time… Humans steer. Agents execute."

The project — an internal Electron application codenamed Symphony — had grown to over one million lines of code, shipping thousands of pull requests with zero human review, consuming roughly one billion tokens per day at around $2,000–3,000 in compute. Lopopolo named three pillars of the discipline: context engineering, architectural constraints, and garbage collection (the continuous pruning of stale scaffolding).

Martin Fowler's site published the consolidating article on February 17, 2026, written by Birgitta Böckeler of Thoughtworks, with a fuller version on April 2, 2026. Böckeler's opening sentence became the industry's shared shorthand within weeks: "The term harness has emerged as a shorthand to mean everything in an AI agent except the model itself — Agent = Model + Harness." Her taxonomy — guides (feedforward: system prompts, AGENTS.md, constraint docs) and sensors (feedback: evaluations, validators, output parsers, LLM-as-judge) — gave the field its first clean architectural decomposition.

Stripe's Minions, detailed in a two-part Stripe Dev Blog post in February 2026, demonstrated the pattern in production. More than 1,300 pull requests merged per week, triggered by Slack emoji, built on a forked version of Block's open-source Goose harness, connected through Stripe's internal MCP server Toolshed (approximately 500 tools), with deterministic Blueprints alternating deterministic code nodes and agentic loops. Cole Medin's Archon, repositioned in 2026 as "the first open-source harness builder for AI coding," drew tens of thousands of GitHub stars in the same quarter.

The term rose because a category that had been invisible acquired a name at exactly the moment the category mattered most. Once agents were shipping production code, the harness was the engineering work. "Scaffolding" implied temporariness. "Harness" implied fit, control, and steering. The explicit equation Agent = Model + Harness — popularized by Fowler, Böckeler, and in parallel by LangChain's Vivek Trivedy ("If you're not the model, you're the harness") — gave the field its first clean decomposition since pre-training/post-training. Harness engineering is now the dominant 2026 frame.

The practical implications for investors and operators were immediate. Venture theses shifted visibly through the first quarter of 2026: "picks and shovels for harness engineering" became the new line on a16z, Lightspeed, and Sequoia AI briefing calls; enterprise buyers began asking vendors to describe their harness before their model; and engineering job titles started to shift, with "harness engineer" and "agent platform engineer" appearing at Stripe, Anthropic, and OpenAI through March 2026. The division of labor is settling into a recognizable pattern: frontier labs own the model layer and compete on capability and price; a middle tier of specialized harness providers (Vercel, LangChain, LlamaIndex, Vapi, Retell, LiveKit, Pipecat, Cartesia's Line) provides the wrapping infrastructure; and end-customer enterprises build the domain-specific harness on top. The phrase "full-stack agent" is returning to executive vocabulary precisely because the stack finally has stable names for its layers.

12. Orchestration and Meta-Orchestration

Before there were harnesses there was orchestration. Harrison Chase open-sourced LangChain on October 17, 2022 — six weeks before ChatGPT — and within a year it was the single most downloaded AI framework on PyPI. LangChain's "chains" gave developers composable primitives to stitch multiple model calls together, inject retrieval, call tools, and parse output. The framework also attracted steady criticism for abstraction overhead, and by mid-2024 Chase had pivoted LangChain toward LangGraph (January 2024), a graph-based orchestration layer better suited to branching, looping agents.

Competing orchestration frameworks proliferated. Microsoft's AutoGen (August 2023, Chi Wang and Qingyun Wu), Semantic Kernel (Microsoft, March 2023), CrewAI (João Moura, late 2023), and LlamaIndex's Workflows (2024) each carved out a niche. Meta-orchestration emerged as an informal label for orchestrating orchestrators — especially in enterprise settings where one pipeline would dispatch to multiple agents, each with their own sub-pipelines.

Then the Harness Era happened. "Harness" did not replace "orchestration" semantically — a harness contains orchestration — but it displaced it rhetorically. By April 2026, "orchestration" survives as the layer word (what the harness does internally) while "harness" became the product word (what you ship). LangChain itself embraced the shift; Chase's and Trivedy's 2026 posts reframed the framework as a tool for building harnesses.

In enterprise automation and DevOps contexts, orchestration remains perfectly healthy. Airflow, Temporal, n8n, Zapier, Workato, MuleSoft, and Kubernetes all continue to own the word in their domains. The split is useful to name: orchestration coordinates deterministic components; harness controls probabilistic ones. Both words will persist. The rise of orchestration came from the Lego-brick metaphor's power with 2023 developers. The partial displacement in AI-native contexts came because "harness" captured the steering-a-running-agent task more precisely than "stitching-calls-together" did.

A related vocabulary worth naming concerns the two-tier abstraction that emerged through 2024–2025. The outer layer — often called "workflow" in LangChain and Zapier parlance, or "blueprint" in Stripe's internal Minions architecture — is deterministic: predefined steps, branching logic, explicit error handling. The inner layer is the agent loop itself: probabilistic, tool-calling, multi-step, non-deterministic. Good harnesses alternate between deterministic workflow steps (for the parts of the task where reliability matters) and agentic loops (for the parts that require judgment). This "deterministic outside, probabilistic inside" pattern was formalized in Anthropic's "Building Effective Agents" distinction between workflows and agents, and has since become the dominant architectural pattern in production agent systems. It is the single most important result of the orchestration-plus-harness vocabulary convergence.

13. Vibe Coding

On February 2, 2025, Andrej Karpathy posted what he later called "a shower of thoughts throwaway tweet":

"There's a new kind of coding I call 'vibe coding', where you fully give in to the vibes, embrace exponentials, and forget that the code even exists. It's possible because the LLMs (e.g. Cursor Composer w Sonnet) are getting too good. Also I just talk to Composer with SuperWhisper so I barely even touch the keyboard. I ask for the dumbest things like 'decrease the padding on the sidebar by half' because I'm too lazy to find it. I 'Accept All' always, I don't read the diffs anymore. When I get error messages I just copy paste them in with no comment, usually that fixes it. The code grows beyond my usual comprehension… It's not too bad for throwaway weekend projects, but still quite amusing."

The tweet was an instant meme. Within a week "vibe coding" had thousands of reuses, a Know Your Meme entry, and a spectrum of interpretations from celebratory (a whole cohort of "vibe coders" on YouTube and TikTok) to condemnatory (security researchers flagging shipped prompt-generated code with trivial vulnerabilities). Simon Willison's February 2026 "Agentic Engineering Patterns" guide explicitly differentiated production-grade agentic work from vibe coding. By late 2025, "Vibe Code Cleanup Specialist" had appeared as an ironic LinkedIn title.

On November 6, 2025, Collins Dictionary named "vibe coding" its 2025 Word of the Year. Collins defined it as "the use of artificial intelligence prompted by natural language to assist with the writing of computer code." The selection drew global press coverage and cemented the term's cultural reach. Merriam-Webster's contrasting 2025 Word of the Year, announced December 15, 2025, was "slop" — setting up a neat dual portrait of the 2025 AI zeitgeist: the optimistic label (vibe coding) and the pessimistic one (slop) living side by side.

Karpathy himself refined the framing through late 2025 and early 2026. His December 27, 2025 tweet enumerating agent-surface primitives (Agents, Sub-agents, Prompts, Contexts, Memory, Modes, Permissions, Tools, Plugins, Skills, Hooks, MCP, LSP, Slash Commands, Workflows, IDE Integrations) described a more disciplined species of AI-assisted development. The term survives as shorthand for a specific mode — fast, throwaway, non-defensive — rather than as the general name for AI-assisted programming, a role that "agentic engineering" is increasingly filling.

The commercial impact was larger than the tweet deserved. A cohort of no-code and low-code platforms (Replit, Bolt, v0 from Vercel, Lovable) repositioned through 2025 around a vibe-coding framing that deliberately targeted non-programmers. Replit's Ghostwriter-to-Agent pivot and Lovable's early-2025 traction drew hundreds of millions in new funding on the claim that vibe coding would democratize software creation. By late 2025 the counter-movement was already visible: security researchers cataloging shipped prompt-generated code with trivial vulnerabilities, and enterprise buyers establishing "vibe-code-clearance" review gates before accepting AI-generated pull requests. The Collins Dictionary honor turned the term from insider slang into a household word in the second half of 2025, which was both validating and distorting — it accelerated the backlash and hastened the "agentic engineering" replacement.

14. Multimodal and Vision-Language Models

Multimodality predates ChatGPT (CLIP, OpenAI, January 2021; Flamingo, DeepMind, April 2022). The ChatGPT-era trajectory is one of incremental bolt-ons giving way to native architectures. GPT-4V, released September 25, 2023, was OpenAI's first mainstream visual model and felt to many users like the largest product jump since GPT-4 itself. Gemini 1.0 (December 6, 2023) was explicitly pitched by Google as "natively multimodal" — trained jointly on text, images, audio, and video rather than stitched together at runtime. Claude 3 (March 4, 2024) brought native vision to Anthropic.

GPT-4o (May 13, 2024) was the consumer breakthrough: a single model ingesting and emitting text, images, and audio at near-realtime latency. Its voice-mode demo and the fraught "Sky voice" controversy with Scarlett Johansson dominated tech news for weeks. Gemini 1.5 in the same period pioneered long video understanding; Gemini 2.5 and 3.1 in 2025–2026 extended to hour-long videos.

The technical arc moved from late-fusion architectures (separate encoders for each modality) to early- and native-fusion architectures that treat image patches or audio spectrograms as first-class tokens alongside text. This enabled agents that see screens, read documents with layout, understand video, and — via Realtime API extensions — handle voice natively.

By April 2026, "multimodal" is table stakes. The interesting boundary has moved to native speech-to-speech (see Section 4 on voice agents), action-taking multimodal agents (Anthropic Computer Use, OpenAI Operator), and world models (Sora, Veo, Fei-Fei Li's World Labs). The term "VLM" is gradually fading into the more general "multimodal model" or simply "model," reflecting that text-only models are now the specialized case. The convergence succeeded; the boast became background.

The commercial and regulatory implications of native multimodal are substantial. Document-heavy industries — legal, insurance, healthcare, accounting — moved from "can the model read a PDF" in 2023 to "can the model process a scanned and handwritten document with tables and signatures" in 2024 to "can the model watch a two-hour video of a medical procedure" in 2025. Anthropic's Claude for Excel and Claude for Chrome beta products, and similar offerings from OpenAI and Google, assume native multimodal throughout. The EU AI Act's provisions on biometric and emotion recognition, and various state-level laws in the US on AI-generated content labeling (notably California's AB-2655 and similar), apply specifically to multimodal outputs — the regulatory vocabulary lags the technical vocabulary by roughly twelve to eighteen months, but it is catching up.

15. Fine-tuning, LoRA, and RLHF-as-Product-Practice

Fine-tuning has accumulated terminological layers at a steady clip. Pre-ChatGPT, fine-tuning a 175-billion-parameter model cost millions of dollars and required massive GPU clusters. Post-ChatGPT, the economics flipped. LoRA (Edward Hu and colleagues at Microsoft, arXiv June 17, 2021) introduced low-rank adaptation: freeze the base model's weights, inject small trainable low-rank matrices into the attention layers. It was the single most important parameter-efficient fine-tuning trick of the LLM era. QLoRA (Tim Dettmers and colleagues, May 2023) added 4-bit quantized base weights, letting a 65-billion-parameter model fine-tune on a single 48-gigabyte consumer GPU. PEFT became the umbrella term, crystallized by Hugging Face's library of the same name.

RLHF (Christiano et al., 2017; applied to LLMs in InstructGPT by Ouyang et al., March 2022) became the standard post-training pipeline after ChatGPT. DPO (Rafailov et al., Stanford, May 29, 2023) replaced the reward-model step with a direct-preference objective, simplifying training at comparable quality. By 2024 most open-weight post-training pipelines used DPO or one of its descendants (IPO, KTO, ORPO, SimPO). RLVR — reinforcement learning with verifiable rewards — emerged as the 2025 reasoning-model training paradigm, with DeepSeek's R1 (January 2025) and OpenAI's o-series as the canonical examples.

As a product practice, fine-tuning's center of gravity has shifted repeatedly. In 2023, every enterprise wanted to fine-tune its own proprietary model. By 2024, the best prompt and RAG on a frontier model usually beat the best fine-tune on an open model, and enterprise fine-tuning budgets contracted. By 2025–2026, fine-tuning returned in the form of distillation from frontier models (creating specialized smaller models for narrow deployment) and on-policy reinforcement learning against domain verifiers (the RLVR recipe applied to business-specific tasks).

The terminology survives because the technique is permanent, even as fashions around who-fine-tunes-what shift. The arc is democratization: what used to cost millions now runs on a single GPU, and what used to require a lab now runs on a laptop.

A secondary vocabulary shift worth naming is the distinction between "open-weight" and "open-source" models. In 2023 the two terms were used interchangeably; by 2025 they had separated. Open-weight denotes models whose parameters are publicly available but whose training data, code, and methodology may remain proprietary — the category Meta's Llama series (beginning with Llama 2, July 18, 2023), Mistral's models, Qwen, DeepSeek, and others occupy. Open-source denotes the stricter standard requiring full reproducibility, which almost no frontier-grade model meets. The Open Source Initiative's 2024–2025 definitional work (Open Source AI Definition v1.0, October 28, 2024) formalized the distinction; the term "open-weight" now dominates commercial usage. This matters for fine-tuning because the full LoRA-PEFT-QLoRA stack operates cleanly on open-weight models in a way it does not on API-only frontier models. The open-weight tier is where custom fine-tuning lives in 2026.

16. Recursive Self-Improvement

The idea of a machine that improves its own design faster than humans can is I.J. Good's "intelligence explosion" (1965). In the ChatGPT era, its vocabulary — recursive self-improvement (RSI), takeoff, intelligence explosion — returned with commercial urgency. The parent concept is simple: if an AI can meaningfully improve the next-generation AI, and the new AI can do the same faster, the feedback loop compresses decades of progress into months.

The bull case through 2024–2026 has three main voices. Dario Amodei's October 2024 essay "Machines of Loving Grace" argued that "a country of geniuses in a datacenter" was plausible by 2027, which implicitly leans on recursive self-improvement. Sam Altman's September 2024 "The Intelligence Age" and his January 2025 "Reflections" used a "gentle singularity" framing. Leopold Aschenbrenner's June 2024 "Situational Awareness" essays ("Counting the OOMs") made the most explicit RSI argument, predicting an intelligence explosion in 2027–2028 on the basis of continued compute scaling plus automation of AI research itself.

The bear case has been articulated most prominently by Yann LeCun at Meta, who argues that LLMs lack the architectures required for autonomous scientific reasoning and that progress will be bottlenecked on architectural innovations humans still drive. Gary Marcus agrees from a different angle, emphasizing empirical plateaus in reasoning reliability. Ilya Sutskever, post-OpenAI, has been publicly circumspect; his Safe Superintelligence Inc, founded June 19, 2024, is a bet that the takeoff is real and needs to be navigated carefully.

The sharpest conceptual contribution of 2025 was the physical-versus-software-speed distinction. Software self-modification can in principle run at electron speed: code generation, synthetic-data generation, inference-time search, weight updates, model evaluation. The physical deployment of AI's benefits, however, runs at datacenter-construction speed, chip-fabrication speed, energy-grid speed, and regulatory speed. Stargate ($500B, announced January 21, 2025), TSMC Arizona construction delays, the 2025–2026 US-China chip export-control tensions, and UK and EU grid-interconnection queues are all cited as evidence that even if recursive self-improvement is real at the software layer, its manifestation in the world is rate-limited by atoms.

The term survives because the underlying question is live. Its status in April 2026 is fiercely contested, with the bull camp arguing that software-layer recursion is already visible (coding agents improving their own harnesses, synthetic data generation, automated experiment orchestration) and the bear camp arguing that reasoning-reliability plateaus and physical bottlenecks will prevent the compounding loop from becoming a true explosion. Both camps cite the same 2026 evidence; they differ on interpretation.

A narrower commercial vocabulary has developed around the contested question. "Takeoff speeds" denotes the timeline from human-parity to substantially-superhuman capability; hard takeoff versus soft takeoff is the binary that structures the debate. "AI research automation" denotes the specific recursive loop — AI systems improving AI systems — that is the minimal testable case of the full RSI claim. METR's 2024 benchmarks and successor evaluations in 2025–2026 provide some of the clearest empirical data on this. "Compute overhang" denotes the gap between the compute available to frontier labs and the compute actually being used in a given generation — the argument being that if RSI kicks in, the overhang can be consumed rapidly, accelerating takeoff. None of these sub-terms has displaced RSI itself, but they have shifted the debate from philosophical to quantitative. That shift — measurable claims rather than untestable predictions — is the single most important change in the RSI vocabulary since 2023.

17. AGI vs. ASI

"AGI" — artificial general intelligence — has been a term of art since at least Shane Legg and Ben Goertzel's use in 2001–2002 and the AGI conference series from 2008. Its ChatGPT-era career is one of slow-motion goalpost migration. In 2022, AGI meant something like "human-level performance at most cognitive tasks." By 2024, OpenAI's published definitions had softened to "AI systems that are generally smarter than humans at economically valuable work." By 2025, OpenAI and Microsoft's contractual AGI definition — the one that would end their commercial arrangement — was reportedly pegged to a profit threshold of roughly $100 billion in generated revenue, a definition that made headlines in January 2025 and attracted pointed mockery for reducing a cognitive category to an accounting one.

Simultaneously, the frontier of serious hype moved to ASI — artificial superintelligence. Nick Bostrom's

Simultaneously, the frontier of serious hype moved to ASI — artificial superintelligence. Nick Bostrom's Superintelligence (2014) is the canonical reference. The term was revived for 2025–2026 discourse by Ilya Sutskever's Safe Superintelligence Inc (founded June 19, 2024), by Aschenbrenner's "Situational Awareness" essays, which argued that AGI would be reached in 2027 and ASI shortly after, and by Meta's renaming of its flagship AI lab to Meta Superintelligence Labs in mid-2025 under Alexandr Wang. OpenAI's late-2025 restructuring material invoked "superintelligence" explicitly.

The practical distinction is useful. AGI is now often treated as a range — systems that match or exceed human performance on most cognitive tasks in most domains. ASI denotes systems qualitatively beyond human, including in capabilities (scientific discovery, software engineering, theorem-proving) that humans cannot currently match. The labels are marketing-adjacent, but they also track a real divergence in product claims. Anthropic's and OpenAI's late-2025 materials distinguish between models that help humans do human work (AGI-adjacent) and models that do work humans literally cannot (ASI-adjacent).

The rise of ASI as a term in 2025–2026 partly reflects that AGI itself has been blurred into meaninglessness through serial goalpost shifts. Every time models cross one threshold, critics and proponents redefine the target upward. "ASI" supplies a cleaner, more dramatic horizon — one that is harder to reach by definitional sleight of hand. In 2026 the two terms coexist: AGI remains the more common institutional word, but ASI is no longer fringe vocabulary.

The labor-market and policy consequences of the vocabulary are substantial. If the target word is "AGI — human-level at economically valuable work," the policy frame is displacement-and-augmentation, with mainstream economic analysis treating the transition as a structural-change problem amenable to retraining and safety-net adjustments (variations on the Acemoglu and Brynjolfsson arguments of 2023–2025). If the target word is "ASI — systems qualitatively beyond human capability," the policy frame becomes national-security, compute-governance, and international-coordination, with the vocabulary shifting toward terms like "compute thresholds," "model weights custody," and "AI diffusion controls." By 2026 both policy frames are active simultaneously, reflecting the genuine uncertainty about which horizon is closer. OpenAI's restructuring discourse through late 2025, Anthropic's "Machines of Loving Grace" framing, Sutskever's SSI, Meta Superintelligence Labs, and the bipartisan US congressional hearings of Q1 2026 on "superintelligence readiness" all use ASI vocabulary; the same institutions' product materials tend to use AGI or "transformative AI." The split between research vocabulary and product vocabulary is itself a signal.

18. Frontier Model / Frontier Lab

"Foundation model" was coined by Stanford's Center for Research on Foundation Models (CRFM) in August 2021 with the publication of Bommasani et al.'s "On the Opportunities and Risks of Foundation Models," a 200-page report that remains the most-cited document in the lexicon. It is a common but incorrect assumption that Stanford CRFM also coined "frontier model." They did not. The term crystallized in July 2023 in policy and industry discourse: the White House voluntary AI commitments (July 21, 2023), the Frontier Model Forum announcement (July 26, 2023, Anthropic + Google + Microsoft + OpenAI), and the Markus Anderljung et al. policy paper "Frontier AI Regulation: Managing Emerging Risks to Public Safety" all pushed the phrase into common use within a single two-week window.

The Frontier Model Forum's operational definition — "large-scale machine-learning models that exceed the capabilities currently present in the most advanced existing models, and can perform a wide variety of tasks" — carried the day. The Forum added Meta and Amazon as members through 2024, reshaping itself as the industry's self-regulatory convener. "Frontier lab" followed as the natural label for the institutions training such models: OpenAI, Anthropic, Google DeepMind, Meta's FAIR/GenAI, xAI (founded July 2023), Mistral, DeepSeek, and — by some counts — Cohere, pre-absorption Inflection, 01.AI, Zhipu, Moonshot, and MiniMax.

The term survived because it did work that "foundation model" could not: it marked the leading edge specifically, which mattered for regulators drafting compute-threshold rules (the EU AI Act's 10^25 FLOP threshold; California's SB-1047's 10^26 threshold). Where "foundation model" is the taxonomic category, "frontier model" is the normative status — with accompanying expectations about compute investment, capability reporting, and oversight.

"Frontier" is now table stakes vocabulary for anyone discussing AI policy or strategy. Its durability comes from institutional convenience: labs use it to describe their own status without claiming AGI; policymakers use it to identify the subset of models they care most about; journalists use it to compress a complex leaderboard into one phrase; investors use it to decide who belongs in the top tier.

The word is now embedded in regulatory law. The EU AI Act, which entered into force August 1, 2024, applies stricter obligations to general-purpose AI models trained with "systemic risk" thresholds — a frontier-model proxy. California's SB-1047 debate in 2024 pivoted around training-compute thresholds (10^26 FLOPs) as a definition of "covered model." The UK AI Safety Institute (renamed AI Security Institute in 2025), the US AI Safety Institute under NIST, and equivalent bodies in Japan, Canada, South Korea, and Singapore all scope their mandates against the frontier-model definition. The Agentic AI Foundation's 2025 formation sits adjacent to this architecture. In 2026, to be a "frontier lab" is no longer only a capability claim; it carries regulatory obligations, evaluation requirements, and a structural position in the international coordination regime. The vocabulary rose in July 2023. It hardened into law and policy over the following thirty months.

19. Moat / No Moat

On May 4, 2023, Dylan Patel's SemiAnalysis published an internal Google document leaked from an unnamed engineer, whose authorship was quickly identified as Luke Sernau, a Google senior software engineer. Its title was "We Have No Moat, And Neither Does OpenAI," and it argued that the rapid improvement of open-source LLMs — Meta's LLaMA, Stanford's Alpaca, Vicuna-13B at "$300 to train" — was closing the gap between frontier labs and anyone with a laptop. Sernau's now-famous sentences: "We have no moat. And neither does OpenAI." "We have no secret sauce." "The barrier to entry for training and experimentation has dropped from the total output of a major research organization to one person, an evening, and a beefy laptop."

The memo's framing set the terms of debate for the next year. Venture capitalists repeated it; executives denied it; journalists deployed it; a16z published adjacent memos through 2023. Then the market answered. The moat, when it emerged, turned out not to be model weights but training data.

Through 2023–2024 OpenAI executed a blizzard of data deals. Shutterstock expansion on July 11, 2023 (a six-year deal covering video, music, and metadata); the Associated Press in July 2023; Axel Springer in December 2023; the Financial Times in April 2024; Le Monde and Prisa Media in March 2024; Reddit on May 16, 2024 (reported at roughly $60 million per year); News Corp on May 22, 2024 (more than $250 million over five years, covering The Wall Street Journal, the New York Post, The Times, and a dozen other mastheads); Dotdash Meredith in May 2024; Vox Media and The Atlantic in May 2024; Time in June 2024; Condé Nast in August 2024. Google struck its own Reddit deal in February 2024. The New York Times refused and sued OpenAI and Microsoft in December 2023. News Corp sued Perplexity in October 2024 — Robert Thomson's "woo and sue" strategy.

If 2023 was the year of "no moat," 2024–2026 was the year of "data moat." The thesis consolidated: public web data had been effectively exhausted and was now identical across every lab; what would remain defensible was exclusive data streams, enterprise data sitting behind permissioned APIs, and proprietary trajectories generated by agents in deployment. OpenAI CFO Sarah Friar, Glean CEO Arvind Jain, Databricks CEO Ali Ghodsi, and others articulated versions of the same argument in 2025–2026: as agents do the work, the logs of their trajectories become training data no one else has. The longer a lab has been deploying agents, the bigger its head start.

By 2026 the moat had moved again, this time to distribution, agent integrations, and synthetic/agentic data flywheels. "Data moat" persists as a term; "no moat" persists as a rhetorical counterweight. The underlying answer — what defends an AI business — keeps moving because the industry itself keeps moving. The 2023 debate was not wrong. The moat on model weights really did evaporate. It turns out there were other moats waiting behind it.

The executive vocabulary to describe this evolution is still unstable. Some operators favor "distribution moat," emphasizing that Microsoft's O365 install base, Google's Workspace integration, and Apple's device footprint will matter more over time than the underlying model. Others favor "integration moat," pointing to the difficulty of unwinding an agent that is deeply embedded in an enterprise's tool chain, MCP servers, and internal data systems. A third camp — articulated in 2025–2026 essays by Ben Thompson at Stratechery and by several public-company CFOs on investor calls — treats the moat as a function of agent-trajectory data accumulation: the longer your agents run in customer environments, the more unique training data you accumulate that no competitor can replicate. This maps directly onto the Value per Token framing. The lab that converts tokens to value most efficiently accumulates a data moat proportional to its deployment footprint. No-moat in 2023 became data-moat in 2024 became deployment-moat in 2026.

20. AI Slop (and Patent Slop)

"Slop" as a label for unwanted AI-generated content has a multi-origin story. Traces of it appear on 4chan, Hacker News, and YouTube comments in late 2022 and 2023, reacting to early AI-image floods. The canonical public crystallization came from the X account @deepfates on May 7, 2024: "Watching in real time as 'slop' becomes a term of art. The way that 'spam' became the term for unwanted emails, 'slop' is going in the dictionary as the term for unwanted AI generated content." Simon Willison amplified it the next day, May 8, 2024, in his blog post "Slop is the new name for unwanted AI-generated content," explicitly crediting @deepfates. Willison did not coin it; he canonized it.

"Slop" then had one of the fastest dictionary careers of any internet-native word. By late 2025 it was named Word of the Year by Merriam-Webster ("slop"), the American Dialect Society, and the Macquarie Dictionary (which specifically chose "AI slop"). The Economist also selected it. Collins Dictionary had already picked "vibe coding" as its 2025 Word of the Year on November 6, 2025, setting up a neat contrastive pair: the optimistic 2025 label (vibe coding) and the pessimistic one (slop) crowned by different institutions in the same season.

The term rose because the internet's problem changed. Earlier debates about AI content focused on plagiarism, misinformation, or generic low quality. "Slop" is different. It implies surplus, contamination, and an ecosystem being filled with machine-made filler that nobody truly wanted. The word scaled effortlessly from weird Facebook images to SEO content farms to junk summaries to auto-generated comments. Merriam-Webster's 2025 citation specifically called out AI-written books, fake news, cheesy propaganda, absurd videos, and what it labeled "workslop" — AI-produced reports that waste coworkers' time.

The forward-looking extension is patent slop — AI-generated patent applications flooding patent offices. Two parallel coinages appear in the public record. The earliest traced indexed public use of the specific phrase "AI slopplications" is Mark Summerfield, an Australian patent attorney, on the Patentology blog on December 3, 2025: "'AI slop'… named Word of the Year for 2025 by the Macquarie Dictionary. Replace 'content' with 'applications', and 'user' with 'patent office' — let's call them 'AI slopplications' — and we would have a good definition of a phenomenon that I suspect is occurring at offices around the world." Sean Fenlon coined the parallel term "Patent Slop" independently in

Sean Fenlon coined the parallel term "Patent Slop" independently in The Near Side #28 on March 2, 2026, describing the same phenomenon under a shorter, more brand-durable label. Both coinages name the same concern: AI-drafted patent applications filed at scale, creating novelty and quality challenges at the USPTO, EPO, and equivalent offices worldwide. The USPTO's public commentary through the first quarter of 2026 quietly acknowledged rising filing volumes without naming the phenomenon. The term — in either form — is likely to enter IP-law trade press through 2026.

"Slop" rose because it gave an emotional shape to a previously amorphous complaint. It persists because the underlying flood keeps growing, and because once "slop" becomes the generic label for low-value generative excess, every document-heavy field eventually gets its own subtype.

 

Section 4 — AI Voice Agents

Voice agents have moved from a stitched-together pipeline to a single native model in three and a half years, and the industry has acquired a sharply specific vocabulary along the way. This section is separated from the main narrative because the voice-agent sub-lexicon (STT, TTS, Realtime API, barge-in, sub-500ms latency, speech-to-speech) has conceptual weight disproportionate to the number of terms involved. For anyone building with or investing around voice agents in 2026, these words are load-bearing.

The STT → LLM → TTS Pipeline Era (2023–mid-2024)

The canonical voice-agent stack in 2023 was a three-part pipeline: Speech-to-Text (STT) converted the user's audio to text; a frontier LLM processed the text and generated a response; Text-to-Speech (TTS) synthesized audio back to the user. Each component was optimized independently, and each added latency.

The named pioneers of this era built the bricks that everyone else used. Deepgram, founded 2015 by Scott Stephenson and Noah Shutty, processed more than one trillion words of audio by early 2025 and was the default choice for real-time STT. OpenAI's Whisper (open-sourced September 21, 2022 under MIT license, trained on 680,000 hours of audio, with Large-v3 released at OpenAI Dev Day in November 2023) gave anyone a production-grade STT option at zero licensing cost. ElevenLabs, founded in London in April 2022 by Mati Staniszewski (ex-Palantir) and Piotr Dąbkowski (ex-Google), became the dominant text-to-speech incumbent on the strength of expressive voice cloning that passed informal Turing tests for many listeners. Play.ht (founded 2016) served as the other common TTS choice and was later displaced. Cartesia — founded in 2023 in San Francisco by Karan Goel, Albert Gu, Arjun Desai, Brandon Yang, and Christopher Ré, all from Stanford Hazy Research and the inventors of the State Space Model / S4 / Mamba lineage — disrupted the TTS incumbents with Sonic (sub-100-millisecond model latency) in 2024.

The funding trajectories capture the commercial arc. ElevenLabs raised an $80 million Series B at a $1.1 billion valuation on January 22, 2024; a $180 million Series C at $3.3 billion on January 30, 2025; and a $500 million round at an $11 billion valuation on February 4, 2026. Cartesia raised a $64 million Series A in March 2025 and $100 million on October 28, 2025 (Kleiner Perkins, Index Ventures, Lightspeed Venture Partners, and NVIDIA) alongside the launch of Sonic-3.

The Realtime API Transition

On October 1, 2024, OpenAI launched the Realtime API in public beta at its Dev Day, carrying GPT-4o's native speech-to-speech capability over a persistent WebSocket with six voices (expanded to eleven by October 30), function calling, and interruption handling. The launch pricing was text at $5 per million input tokens and $20 per million output tokens, and audio at $100 per million input and $200 per million output — roughly $0.06 per minute input and $0.24 per minute output. On launch day, this was approximately ten times more expensive than an equivalent Deepgram-plus-GPT-4o pipeline (~$0.15 per minute versus ~$0.012 per minute). Cached audio, added October 30, 2024, reduced the gap. The gpt-realtime general availability release followed in August 2025, scoring 82.8% on Big Bench Audio (up from 65.6%), adding MCP and SIP support, and slashing prices to converge with pipeline economics.

The 500-Millisecond Trust Threshold

Why does sub-500-millisecond end-to-end latency matter so much? The answer is in human conversation itself. Human conversational research (Stivers et al.-style cross-linguistic work) puts the average inter-speaker gap at approximately 200 milliseconds. Industry write-ups from AssemblyAI, Twilio, and Telnyx standardize 200–300 milliseconds as "natural." The sub-500-millisecond end-to-end latency threshold is not tied to a single canonical research paper; it is an industry convention, most explicitly argued by Retell AI ("Latency Face-Off 2025"), AssemblyAI ("The 300ms rule"), Cresta, Nick Tikhonov's widely shared blog "How I built a sub-500ms latency voice agent from scratch," and Hamming AI's call analysis.

The convention is anchored on turn-taking research and user-experience data showing that above 500 milliseconds humans begin to repeat themselves, lose conversational flow, and disengage. Below 500 milliseconds, voice agents feel like talking to a person. Above it, they feel like talking to a machine. Pipeline architectures struggle to hit the threshold because every hop adds 50–150 milliseconds of latency. Native speech-to-speech models cross it natively, because they collapse the pipeline into a single forward pass.

Sub-500 milliseconds is not just a performance target. It is the boundary between what feels like conversation and what feels like a call with a lag. Crossing it reliably is the current engineering frontier in voice AI, and the companies that can hold that threshold under load are building durable moats.

Native Speech-to-Speech Through 2025–2026

GPT-4o's Advanced Voice (announced May 13, 2024, rolled out in September 2024) was the consumer landmark for native speech-to-speech. Gemini Live launched at Google I/O in May 2024, shipped on the Pixel 9 in August 2024, and the Gemini 3.1 Flash Live variant shipped in March 2026. Hume AI went from EVI 1 (March 2024, Series B $50 million) to EVI 2 (September 2024, 40% lower latency than EVI 1) to EVI 3 (May 2025, sub-300-millisecond response, 100,000-plus custom voices, ~1.2-second practical end-to-end) to Octave 2 (October 1, 2025, approximately 100-millisecond TTS latency).

Sesame AI — co-founded by Brendan Iribe (ex-Oculus CEO) and Ankit Kumar (ex-Ubiquity6) — published its research preview "Crossing the uncanny valley of conversational voice" on February 27, 2025, introducing the characters Maya and Miles on its Conversational Speech Model (CSM). The demo drew more than one million users and five million minutes of interaction in its first weeks. CSM-1B was open-sourced on March 13, 2025. Sequoia and Andreessen Horowitz co-led Sesame's Series B on October 21, 2025. Cartesia's Sonic-3 (October 28, 2025) uses State Space Models for ~90-millisecond model and ~190-millisecond end-to-end latency across 42 languages. xAI's Grok Voice Agent API launched December 2025.

Named Leaders as of April 2026

ElevenLabs is the dominant TTS incumbent. Conversational AI launched November 2024; Eleven v3 (June 2025) supports 70-plus languages; an IPO has been rumored after the February 2026 $11 billion raise. Cartesia is the technical leader on latency and quality — Sonic-3 was preferred 62% over ElevenLabs in blind tests, with 50,000-plus customers including NVIDIA, Samsung, and ServiceNow. Cartesia launched its Line voice-agent platform in August 2025. Inworld AI's TTS-1 (June 2025) ranks #1 on Artificial Analysis Arena's TTS leaderboard as of March 2026, with customers including NBCUniversal, Sony, Logitech, and Latitude. Hume AI owns the emotional and empathic niche, with 100,000-plus developers by November 2025. Vapi (founded 2020, $20 million Series A October 3, 2024, Bessemer Venture Partners and Y Combinator) is the "Twilio for AI agents" developer platform, explicitly advertising sub-500-millisecond latency. Retell AI (Y Combinator W24, founded 2023) crossed $7.2 million ARR by April 2025, serving 3,000-plus companies and 40 million-plus calls. Deepgram's Voice Agent API (announced September 2024, GA June 2025) bundles Nova-3 STT plus Aura-2 TTS. LiveKit Agents and Pipecat (Daily.co) dominate open-source real-time frameworks. Sesame is the dark horse, building voice-first AI glasses on its own native speech-to-speech model.

The Trajectory to 2027

The direction of travel is clear. Native speech-to-speech becomes the default. Pipeline architectures survive in enterprise telephony and regulated settings where component-level control matters for compliance and auditability. The 500-millisecond threshold is crossed universally, making voice user experience the first AI surface that is genuinely indistinguishable from a human on latency alone. Emotional and empathic voice (Hume's direction) and memory-persistent voice agents (Sesame's direction) become the new differentiators.

The commercial economics are already recognizable. Consumer voice (OpenAI Advanced Voice, Gemini Live, Claude with voice) is priced to drive subscription adoption, with costs underwritten by premium tiers. Enterprise voice agents (Vapi, Retell, Cartesia Line, Deepgram Voice Agent API, Anthropic via MCP-connected enterprise deployments) are priced per-minute or per-conversation, with buyers comparing them directly to outsourced call-center labor at $5–$25 per hour fully loaded. At a typical ~3-minute average handle time and API pricing in the $0.05–$0.20 per minute range, voice agents produce a per-conversation cost of roughly $0.15–$0.60, compared to human call-center costs of $2–$10 per equivalent conversation. This 10x–20x cost spread, combined with 24/7 availability and instant scaling, is why enterprise voice-agent buying in 2026 is dominated by insurance (claims intake, initial triage), auto warranty (inbound lead qualification), Medicare enrollment (seasonal spike management), and customer support across telecom, utilities, and consumer SaaS. The vocabulary here is converging on terms like "contact deflection rate," "agent escalation rate," and "conversation success rate" — a new set of KPIs distinct from both traditional IVR metrics and from generic LLM benchmarks.

Expect "affective memory" to name the space of persistent emotional user modeling once the first product ships a genuinely durable cross-session emotional state. Expect a subtler vocabulary shift around interruption handling, end-of-turn detection, and prosody control as these become the remaining competitive frontiers. And expect voice-agent unit economics to become a distinct sub-discipline: as sub-500-millisecond systems become table stakes, the question moves from "can it talk" to "can it do real work at sustainable cost per conversation."

 

Section 5 — Forward-Looking Predictions (One-Year Horizon)

These predictions are time-stamped and falsifiable from an April 20, 2026 vantage point. The goal is not hedged prognostication but specific claims that can be checked in 2027. Where a prediction is offered, a falsifier is offered with it.

Terms Likely to Fade by 2027

Three current terms will look embarrassing by the end of 2027. "Prompt engineering" is already mostly dead as a job title; by late 2027 its residual usage in training-course marketing will feel as dated as "webmaster" did by 2010. The underlying skill — writing instructions that models follow well — has been subsumed into context engineering and into model improvements themselves. The label no longer earns its keep.

"Vibe coding" will survive as a cultural reference but disappear as a serious product-team vocabulary item. The seriousness of shipping agent-written production code (Stripe Minions, OpenAI Symphony) has already pulled the center of gravity toward "agentic engineering." "Multimodal" will dissolve into the baseline: by 2027 all competitive models are multimodal, and the specialized word becomes redundant the way "color TV" faded once monochrome television disappeared.

Two more likely declines. "AI wrapper" as a pejorative will lose force as wrappers increasingly become thick harnesses that are the product; the word will feel naïvely 2023 in retrospect. And "foundation model" will continue to retreat in industry usage in favor of "frontier model" for the top tier and just "model" for the rest. Academic usage will keep "foundation model" alive as a taxonomic term, but the industry has moved on.

Terms Likely to Consolidate or Absorb Others

Expect "harness" to eat "scaffolding" completely and to substantially absorb "orchestration" as a product-level word. Orchestration will survive as a technical layer word and as enterprise-automation vocabulary, but harness will be the thing you ship. Expect "context engineering" to partly absorb "prompt engineering," "retrieval engineering," and "memory engineering" under one umbrella. Expect "agent" and "agentic" to remain as-is; the backlash wave is behind us, and the autonomy criterion has sufficiently proven itself.

"Reasoning model" will consolidate with "thinking model" into a single term, likely "reasoning model" in United States and global usage, with Anthropic and Google retaining "thinking" internally. The training technique "RLVR" will either generalize to cover non-verifiable domains (losing the V) or yield to a successor term by late 2027, as the reinforcement-learning-on-verifiers paradigm hits the limits of problems that admit clean ground truth.

"AGI" will continue to be shoved rightward or quietly abandoned; "ASI" will absorb the forward-looking hype. Expect at least one major lab to declare "AGI achieved" in 2026–2027 under a deliberately narrow definitional frame, triggering a fresh round of goalpost debate that concludes with even more vocabulary — plausibly "transformative AI" or "TAI," a term already in use at Open Philanthropy, returning to prominence.

Terms Likely to Emerge

Several candidate terms are already visible in early form. Agent OS (or LLM OS) is the strongest bet. Karpathy floated "LLM OS" in a November 23, 2023 talk; the Harness Era has produced enough shared primitives (MCP, AGENTS.md, skills, hooks, slash commands) that a packaged Agent OS layer — something between a framework and an operating system — is plausible as a named product category by 2027. Windows 11's "agentic OS" framing at Microsoft Build 2025 is an early commercial claim. Expect this to land as either a Microsoft product category or as a new open-source standard.

Meta-harness is probable as a term of art by mid-2027: a harness that generates, manages, and retires other harnesses. The pattern is already implicit in Cole Medin's Archon and in enterprise consultancy materials; what it needs is a shared name.

Agent mesh is likely to emerge as the natural next term once MCP saturates intra-organizational tool integration and the focus shifts to inter-organizational agent communication through MCP plus Google's A2A protocol. Synthetic yield is a plausible business-metric cousin of data moat, denominated in value extracted per dollar of synthetic data; it may be displaced by some variant of "value per token" instead. Affective memory is likely to name the persistent-emotional-state category once the first product ships a genuinely durable cross-session user model.

One vocabulary shift worth naming explicitly: "Value per Token" (VPT) is likely to become a boardroom metric by 2027, even if the exact acronym remains contested. The earliest indexed public use of the macro-economic business-metric framing is ambient-code.ai (October 6, 2025). Sean Fenlon and Dave Blundin have claimed a July 4, 2023 LinkedIn articulation of the same framing, which is a self-reported priority record addressed in Section 6 and pending further primary-research verification. Tomasz Tunguz's December 30, 2025 "Gross Profit per Token" is the most prominent adjacent VC formulation. As cost per token races to near-zero through 2026–2027, the enterprise question shifts from cost to value extracted. Some version of VPT — possibly under a different name (margin per token, outcome per token, revenue per token) — becomes the KPI that boards actually track.

Falsifiable Predictions With Dates

The following predictions can each be checked against specific public metrics in 2027:

  • By December 31, 2026: "Prompt engineer" as a distinct job-board category will show less than 25% of its mid-2024 peak volume on Indeed and LinkedIn. Falsifier: if senior prompt-engineer titles are still widely posted at major labs and enterprises through Q4 2026.
  • By December 31, 2026: at least one of OpenAI, Anthropic, Google, or Microsoft will ship a product explicitly branded as an "Agent OS" or "LLM OS." Falsifier: no such product exists at year-end 2026.
  • By June 30, 2027: "harness engineer" or "agent engineer" will appear as a hiring title at at least 100 companies on LinkedIn. Falsifier: fewer than 100 company listings with those exact titles.
  • By December 31, 2027: one of Anthropic's, OpenAI's, or Google's quarterly disclosures will use "frontier model" in contrast to some internal "post-frontier" or "superintelligence-tier" category, formalizing a new label above frontier. Falsifier: no such category emerges in public-facing disclosures.
  • By December 31, 2027: "AGI" will appear less frequently than "ASI" in English-language tech-media headlines, measurable via Google Trends and Media Cloud. Falsifier: AGI remains the more common term across major tech publications.
  • By December 31, 2027: at least one major model provider will report per-customer "value per token" or an equivalent metric (gross profit per token, outcome per token, revenue per token) on an earnings call. Falsifier: no frontier lab reports such a metric.
  • By December 31, 2027: native speech-to-speech models will power the majority of new production voice-agent deployments by API-call volume from major providers (OpenAI Realtime successors, Google, Anthropic voice offerings, ElevenLabs, Cartesia). Falsifier: pipeline architectures remain dominant by volume.
  • By December 31, 2027: the sub-500-millisecond end-to-end latency threshold will be table stakes for consumer-facing voice agents, with systems exceeding it considered non-competitive. Falsifier: sub-500-millisecond remains a premium differentiator rather than a baseline expectation.

What We Are Not Predicting

Intellectual honesty requires naming what these predictions deliberately do not attempt. The following are omitted not because they are uninteresting but because they are too contested, too speculative, or too dependent on factors outside the vocabulary's reach.

First, we do not predict when AGI or ASI will be declared by any major lab. The timing is too dependent on definitional choices made by the declarer and on public and regulatory reactions to any such declaration. Our confidence that at least one narrow-definition AGI claim will be made before end-2027 is moderate, but we offer no month-level prediction.

Second, we do not predict which frontier lab will lead on any particular capability benchmark at a given date. Reasoning benchmarks, multimodal benchmarks, coding benchmarks, and agent benchmarks all have different leaders at different points across 2023–2026, and the leader changes on a quarterly or even monthly cadence. Vocabulary tracks categories, not leaderboards.

Third, we do not predict macroeconomic outcomes of AI deployment — employment shifts, productivity statistics, GDP growth, or wage distribution effects. Those outcomes are the domain of serious economic analysis by Acemoglu, Brynjolfsson, Autor, and others. Vocabulary both leads and lags these outcomes; we track the words, not the quantities.

Fourth, we do not predict specific company failures or acquisitions. The 2023–2026 landscape has been punctuated by consolidation events — Inflection's 2024 absorption into Microsoft, Character.AI's 2024 licensing deal with Google, Stability AI's 2024 restructuring, and others — that changed the vocabulary landscape (certain product names simply disappeared). Our predictions focus on words, not firms.

Fifth, we do not make predictions about AI safety, alignment, ethics, or governance vocabulary. That scope is reserved for the companion report explicitly excluded above.

The value of these explicit omissions is that they clarify the scope of what this guide claims. It is a vocabulary reference, not a forecast. The falsifiable predictions above are offered as testable claims about vocabulary adoption, not as investment theses, capability predictions, or policy recommendations.

 

Section 6 — Self-Reported Priority Claims and Contested Attributions

This section is new in Version 1.1. It exists because Version 1.0's reviews, particularly the de-seeded Claude Opus 4.7 review and the ChatGPT 5.4 Pro review, identified that evidentiary weight was being distributed inconsistently across the document. Specifically, some attributions were carried at full narrative confidence while being supported only by a methodology-section disclosure twelve sections earlier. The inconsistency is a structural problem, not a content problem: the underlying claims may be correct, but the prose confidence should match the evidentiary status in every passage, not just in the methodology section.

This section consolidates the attributions that deserve labeled treatment. Each entry applies an explicit three-part evidentiary test, adapted from the de-seeded Claude review:

  1. Independently surfaced? Did at least one of the four Deep Research models surface this attribution through open-web search, without being seeded with the claim in the prompt?
  2. Traceably influential? Is there visible evidence that the claimed earliest articulation influenced intervening discourse — citations, quotes, downstream usage?
  3. Document-external verification? Can the claim be verified against sources the document does not itself introduce — published primary artifacts dated before any popularizer's usage?

A claim that answers YES to all three is a solidly-attributed coinage. A claim that answers NO to any is a self-reported priority claim pending further primary research. The distinction matters because the document's function is different in each case: solid attribution is historical reporting; self-reported priority is a record being placed on the docket for future verification.

Entry 1: Value per Token (VPT) as a macro-economic business metric

Claimed attribution: Sean Fenlon and Dave Blundin, LinkedIn, July 4, 2023.

  • Independently surfaced? NO. None of the four Deep Research models, including the two de-seeded models (Claude Opus 4.7, ChatGPT 5.4 Pro), surfaced the July 4, 2023 Fenlon-Blundin LinkedIn post through open-web search. The two de-seeded models independently surfaced ambient-code.ai (October 6, 2025) and Techrights (December 2024) as the earliest traced indexed public uses of the phrase.
  • Traceably influential? NOT YET DEMONSTRATED. No citation trail has been surfaced linking the July 2023 Fenlon-Blundin articulation to the October 2025 ambient-code.ai usage, the December 2025 Tomasz Tunguz "Gross Profit per Token" variant, or intervening discourse.
  • Document-external verification? PARTIAL. The LinkedIn post is publicly accessible and dated, so its existence is verifiable. What is not verifiable from the post alone is whether the Fenlon-Blundin articulation is genuinely the first public use of the phrase in its macro-economic business-metric sense. Pre-ChatGPT reinforcement-learning literature contains "value per token" mathematically as a learned baseline in PPO algorithms, which is a different semantic domain but muddies any strict "first public use" claim.

Evidentiary status: self-reported priority claim with verifiable artifact existence, pending primary research establishing either influence on intervening discourse or the genuine absence of earlier business-metric public uses.

How the document handles it: the VPT index entry and narrative now note the July 4, 2023 Fenlon-Blundin LinkedIn articulation as a self-reported priority claim, the October 6, 2025 ambient-code.ai post as the earliest indexed public use surfaced by de-seeded Deep Research, and the December 2025 Tunguz formulation as the most prominent adjacent VC formulation — with the evidentiary asymmetry explicit in every mention rather than implicit in a methodology-section footnote.

Entry 2: "Patent Slop" as a coinage for AI-generated patent filings

Claimed attribution: Sean Fenlon, The Near Side #28, March 2, 2026.

  • Independently surfaced? NO. None of the four Deep Research models surfaced Fenlon's Near Side #28 through open search. Claude Opus 4.7 and ChatGPT 5.4 Pro (both de-seeded) independently surfaced Mark Summerfield's Patentology blog post of December 3, 2025 as the earliest indexed public use of the specific phrasing "AI slopplications."
  • Traceably influential? NOT APPLICABLE IN THIS CASE. Fenlon's coinage postdates Summerfield's by roughly three months. The question for a reference document is not whether Fenlon's coinage was first, which it demonstrably was not, but whether it was independent. The answer per Fenlon's own statement: he did not encounter Summerfield's post before publishing his own. This is parallel independent coinage, which is common in the AI vocabulary space.
  • Document-external verification? YES for both. Summerfield's Patentology post is publicly dated December 3, 2025. Fenlon's Near Side #28 is publicly dated March 2, 2026. Both coinages are verifiable primary artifacts.

Evidentiary status: Summerfield has clear priority in the historical record. Fenlon has a parallel independent coinage with a shorter, more brand-durable label three months later.

How the document handles it: both coinages are noted alongside each other in the narrative and in the index, with priority clearly assigned to Summerfield and Fenlon's coinage labeled as an independent parallel. This is the same treatment given to other parallel coinages in the lexicon where the record supports it.

Entry 3: Agent-autonomy distinction as a 2024 practitioner articulation

Claimed attribution: Sean Fenlon, X post, September 11, 2024 — "AI Agents are NOT: Tools, Assistants, Co-pilots. AI Agents ARE: Autonomous, Inevitable."

  • Independently surfaced? NO. None of the four Deep Research models surfaced Fenlon's September 11, 2024 X post through open search. All four surfaced the same 2023-2024 convergence chorus: Lilian Weng (June 23, 2023), Andrew Ng (June 13, 2024), Harrison Chase (June 28, 2024), and Anthropic's "Building Effective Agents" (December 19, 2024).
  • Traceably influential? NOT DEMONSTRATED. No downstream citation or quotation of Fenlon's post has been surfaced.
  • Document-external verification? YES for the artifact itself. The post is publicly accessible on X and dated September 11, 2024 Eastern Time (X's default Pacific Time display shows September 12, which is a timezone artifact, not a date error). The post is also more concise than the Weng/Ng/Chase/Anthropic essays, which is consistent with an independent practitioner articulation.

Evidentiary status: parallel practitioner articulation with verifiable artifact existence, positioned in the convergence period but with no claim to originator status.

How the document handles it: Fenlon's post is cited in Narrative 6 as "a representative practitioner articulation of the criterion during the 2024 convergence period, independent of the longer technical essays from Weng, Ng, and Chase. There is no single originator." This framing was chosen deliberately to avoid the elevated-mention concern that the de-seeded reviews raised.

Other Contested Attributions Deserving Explicit Evidentiary Labels

Extending ChatGPT 5.4 Pro's recommendation beyond the three Fenlon entries, several other coinage claims in the Version 1.0 index would benefit from the same three-part test. Version 1.1 does not exhaustively re-audit every claim — that is a Version 2.0 project — but it names the most important additional cases here to flag them for future revisions.

 

@deepfates on X (May 7, 2024) crystallized the phrase. Simon Willison popularized it one day later with explicit credit. The document treats this as @deepfates coined, Willison canonized — which is the correct handling. Passes all three parts of the test.

 

Karpathy's February 2, 2025 X post explicitly styled the phrasing as "a new kind of coding I call 'vibe coding'" — which leaves open the question of whether informal prior use existed. Passes parts 1 and 3; part 2 (traceable influence) is overwhelming. Solid attribution.

 

July 2023 crystallized across multiple artifacts: the Frontier Model Forum announcement (July 26), the White House voluntary AI commitments (July 21), the Anderljung et al. policy paper (same month). No single coiner. The document handles this correctly by describing the July 2023 crystallization period rather than assigning a solo credit. A common misattribution to Stanford CRFM is corrected.

 

Tobi Lütke (June 18, 2025) is the first public use of the exact phrase. Karpathy (June 25) amplified. Willison (June 27) canonized. The document treats Lütke as coiner, which passes all three parts of the test.

 

Mitchell Hashimoto (early February 2026) is the first public use of "harness engineering" as a discipline name. Lopopolo (February 11, 2026, OpenAI) developed it at OpenAI. Fowler/Böckeler (February 17 and April 2, 2026) consolidated it. The document treats this as a two-week convergence, which is historically accurate. Passes all three parts of the test.

 

September 12, 2024 launch is the first public use of the phrase applied to the new model tier. "Thinking model" is a vendor-linguistic variant at Anthropic and Google. Passes all three parts of the test.

 

Birgitta Böckeler's February 17, 2026 Thoughtworks article is the first public use of the equation in its explicit form. LangChain's Vivek Trivedy produced the parallel "If you're not the model, you're the harness" formulation around the same time. Passes all three parts of the test as a February 2026 convergence.

 

December 3, 2025 Patentology blog is the earliest indexed public use. Passes all three parts of the test. Parallel to Fenlon's "Patent Slop" (March 2, 2026), which is Entry 2 above.

Summary

The three-part test produces a clean distinction across the lexicon. Most coinage attributions in the index and narrative sections pass all three parts and deserve full narrative confidence. A small number of claims — specifically the three Fenlon attributions documented above — fail part 1 (independent surfacing) and in some cases part 2 (traceable influence). Those claims are self-reported priority records pending further primary research, and they are labeled as such in Version 1.1 wherever they appear.

The distinction does not undermine the lexicon's primary function as a reference work. It improves it. A reference work that distinguishes its evidentiary weights clearly is more useful than one that carries all claims at uniform confidence. Version 1.1 implements that distinction structurally; future versions will extend the three-part test to additional contested attributions as the reconciliation methodology is re-applied.

 

Conclusion — What the Vocabulary Tells Us

The striking lesson from three and a half years of post-ChatGPT vocabulary is that the industry keeps rediscovering the same truth in new words: the model is less than half of the product.

"Prompt engineering" (2022–2023) pointed to the instruction layer. "RAG" (2020-coined, 2023-adopted) pointed to the retrieval layer. "Orchestration" (2022–2024) pointed to the coordination layer. "Context engineering" (mid-2025) generalized across all three. "Harness" (early 2026) finally admitted that everything outside the model — the scaffolding, the guides, the sensors, the garbage collection, the feedback loops, the architectural constraints — is itself engineering work of comparable importance to the model training itself.

The equation Agent = Model + Harness is the 2026 settlement. It resolves the 2023 debate about whether agents are real (they are, under the autonomy criterion) and the 2024 debate about whether wrappers are legitimate (they are, because a good harness is an enormous engineering investment, not a wrapper). It re-centers the commercial question. Frontier labs will keep winning the model side. Startups and integrators will win or lose on the harness side. Enterprises will compete on the harness they build around the models they rent. Everything else — the million-token context windows, the reasoning models, MCP, the voice agents, the $500 million funding rounds, the Collins Dictionary Word of the Year, the AI slopplications — is detail dressing this central divide.

The forecast is modest. By 2027, the lexicon will look less different than it has in each of the last three years. The basic categories — model, harness, agent, context, reasoning, multimodal, frontier — appear stable. The remaining growth is in the economics vocabulary (value per token, synthetic yield, token billionaire), the architecture-plus-OS vocabulary (Agent OS, meta-harness, agent mesh), and the emotional/memory vocabulary (affective memory). The field has finally named the thing it is building. The words from here will mostly refine, not replace.

That is itself a signal. Vocabulary turnover in this domain has been a leading indicator of architectural confusion. When the right words are in place, the field can stop arguing about what to call the thing and start arguing about how to build it. By that measure, 2026 is the year the post-ChatGPT AI era stopped being novel and started being an industry.

For readers arriving at this guide a year or two from now, the most useful test of whether its vocabulary still holds will be the harness equation. If "Agent = Model + Harness" is still the working decomposition, if MCP is still the protocol, if context engineering is still the active discipline, and if reasoning models are still a distinct tier from conversational models — then the 2026 settlement held, and Version 2.0 of this guide will be more of an incremental update than a rewrite. If any of those four have been displaced, the field will have undergone another architectural shift, and the vocabulary will have moved with it. That, in three-and-a-half-year hindsight, is the genuine pattern: vocabulary is the index to architecture, and architecture is the index to what the industry actually does. Watch the words.

 

Themes and Patterns Across the Lexicon

Several patterns become visible when one reads the full arc of post-ChatGPT vocabulary across the twenty narratives and one-hundred-fifty-plus index entries. Naming them explicitly sharpens the reference value of the guide.

Pattern 1: The Four Waves of Displacement

The dominant term for "AI system that does work autonomously" has migrated four times in three and a half years. It began as "chatbot" in late 2022 — the ChatGPT framing. It became "copilot" in early 2023 as Microsoft rebranded; "assistant" held parallel ground. It shifted to "agent" through 2023–2024 as AutoGPT, Devin, and Operator accumulated capability. By 2026 the leading edge is "harness-centered agent" or simply a named product category (Claude Code, Stripe Minions, OpenAI Symphony). Each wave displaced the prior one not by vocabulary debate but by architectural shift: the product genuinely changed, and the word followed. This pattern — words track architecture, not the other way — applies to many of the terms in this guide, and explains why fighting for a label is usually futile once the underlying system moves.

Pattern 2: The Popularizer-Versus-Coiner Gap

For many of the most famous post-ChatGPT terms, the person who coined the word is not the person the world credits. Simon Willison popularized "AI slop" but @deepfates coined it. Karpathy's "vibe coding" was itself styled in his tweet as "a new kind of coding I call," which implicitly acknowledges prior informal use. "Frontier model" has no single coiner; it crystallized across several July 2023 artifacts simultaneously. "Context engineering" sits cleanly with Lütke as the first public coinage in that exact phrasing, but the discipline it names had existed for more than a year in practice. The popularizer matters because the popularizer is who drives adoption. The coiner matters because attribution is a basic element of intellectual history. Both deserve credit, and this guide has tried to name both wherever the record permits.

Pattern 3: Consumer Versus Research Vocabulary Divergence

The consumer-facing AI vocabulary in 2026 — chatbot, AI, copilot, agent, slop, vibe coding — has diverged meaningfully from the research and engineering vocabulary — MoE, RLVR, MCP, context engineering, harness, test-time compute. Journalists, product marketers, and the general public use the first set; engineers, researchers, and investors use the second. This divergence is not a bug; it is a sign of a maturing field. Analogous splits exist in other technical domains — software engineering versus "coding," biochemistry versus "medicine," finance versus "investing" — and they tend to deepen rather than close as a field ages. Being fluent in both registers is now a professional competence in its own right.

Pattern 4: The Economics Layer Catches Up Last

Vocabulary for model architecture matured quickly (2020–2023). Vocabulary for capabilities matured next (2023–2024). Vocabulary for orchestration and agent design followed (2024–2025). Vocabulary for the economics layer — cost per token, LLMflation, Value per Token, Gross Profit per Token, token economy, token billionaire — is maturing last, in 2025–2026. This is the expected pattern for any technology transition: the technology arrives first, the products arrive second, the business models arrive third, and the metrics to evaluate those business models arrive fourth. We are now in the metrics stage. The vocabulary that will define the 2027–2028 period is most likely to emerge from this economic layer rather than from the architecture or capability layers.

Pattern 5: Dictionary Entry as Delayed Confirmation

Both "vibe coding" (Collins Dictionary, November 2025) and "slop" (Merriam-Webster, The Economist, American Dialect Society, Macquarie Dictionary, all late 2025) received their dictionary-level honors roughly eighteen months after their peak cultural moment. This is an inherent lag in how lexicographers work — they wait until a term has demonstrated staying power before enshrining it. The implication for practitioners is that dictionary recognition is a trailing indicator, not a leading one. By the time a term enters a dictionary, the cutting-edge vocabulary has usually moved past it. Harness engineering will not be in a dictionary in 2026. It might make it in 2027. By then the field will have moved to whatever comes after.

Pattern 6: Excluded Vocabularies Are Also Diagnostic

This guide deliberately omits AI safety, alignment, ethics, and governance vocabulary. That omission is itself a fact worth naming, because the omitted vocabulary is large — alignment, RLHF-as-safety, constitutional AI, red-teaming, jailbreak, prompt injection, P(doom), responsible scaling, model spec, refusal, steering, interpretability, mechanistic interpretability, eval gaming, sandbagging, power-seeking, deceptive alignment. Any one of these terms would merit substantial treatment. The volume of omitted vocabulary is roughly half the volume of the included vocabulary. In other words: the post-ChatGPT vocabulary is approximately bisected by a safety/capability axis, and a complete lexicon of the era requires both halves. The companion report in preparation is designed to complete the picture.

 

Appendix — Methodology Notes

Source Model Cutoffs

Each of the four source models operates with a different knowledge cutoff. These were researched via public documentation and model-card information rather than self-reported by the models, per the methodology instruction to weigh claims independently of the models' own disclosures.

  • Claude Opus 4.7 — effective cutoff through April 2026 via live Deep Research tooling, gathering 464 independent sources over 1 hour 59 minutes. This model produced the substantive spine of the reconciled document.
  • ChatGPT 5.4 Pro with extended reasoning — effective cutoff through late March 2026, with the strongest per-claim source discipline of the four models on its second attempt. The first attempt produced a meta-document on lexicon construction methodology rather than the lexicon itself and was re-run with a tightened prompt.
  • Gemini 3.1 Pro — effective cutoff through early-to-mid 2026, with the cleanest institutional-attribution work on the Frontier Model Forum and related 2023–2024 policy vocabulary.
  • Grok 4.3 — effective cutoff through approximately February 2026, with the fastest response time of the four and the baseline independent voice.

Prompt Asymmetry Disclosure

Two of the four source models (Gemini 3.1 Pro and Grok 4.3) received a research prompt that explicitly suggested Sean Fenlon and Dave Blundin as originators of "Value per Token" (Fenlon's LinkedIn articulation of July 4, 2023), suggested Sean Fenlon as the coiner of "Patent Slop" (The Near Side #28, March 2, 2026), and referenced Fenlon's X post of September 11, 2024 on the agent-autonomy distinction. These models treated the attributions as seeded facts to verify.

The other two models (ChatGPT 5.4 Pro on its second attempt and Claude Opus 4.7) received a tightened prompt that removed the seeded attributions and asked for fully independent research on the origins of these terms. The de-seeded results surfaced ambient-code.ai's October 6, 2025 blog post as the earliest traced indexed public coinage of "Value per Token," Mark Summerfield's December 3, 2025 Patentology post as the earliest traced indexed use of "AI slopplications," and a multi-source convergence (Lilian Weng June 2023, Andrew Ng June 2024, Harrison Chase June 2024, Anthropic December 2024) on the agent-autonomy distinction with no single originator.

The final document reports both the de-seeded search results and Fenlon's original priority-date articulations, the way a rigorous survey reports both "earliest traced indexed public use" and "earliest known articulation," with transparent provenance for each. For the twenty other narrative terms and the roughly one hundred fifty catalogued index entries, the four models received identical prompts and the reconciliation is symmetric.

Known Divergences Between Models

The four models disagreed on several specific dates, numbers, and attributions. In each case the reconciliation adopted the most credibly sourced reading:

  • Sean Fenlon's X post on agent autonomy: Grok 4.3 dated it September 12, 2024; Gemini 3.1 Pro dated it September 11, 2024. The screenshot confirms September 11 (Eastern Time); Grok's offset reflects X's default Pacific Time timezone display.
  • MCP monthly download numbers: Grok said "100 million monthly downloads by late 2025"; Gemini said "97 million monthly SDK downloads by March 2026"; ChatGPT 5.4 Pro cited GitHub star counts (~84K main repo, ~23K Python SDK, ~12K TypeScript SDK); Claude Opus 4.7 reported 97 million monthly downloads across Python and TypeScript combined as of Anthropic's November 2025 anniversary post. The 97-million figure from two independent sources was adopted.
  • Prompt-engineer salary figures: Gemini gave specific point estimates ("Anthropic $335K, Klarity $230K"); ChatGPT gave the widely-reported range ($175K–$375K, citing Business Insider, Fortune, and Bloomberg); Claude gave a narrower range ($280K–$375K). The range, not the point estimate, is the defensible reporting.
  • 2025 Word of the Year attributions: Gemini claimed "vibe coding" as 2025 Word of the Year; some other sources claimed "slop." Both are correct: Collins Dictionary named "vibe coding" (November 6, 2025) and Merriam-Webster, the American Dialect Society, and Macquarie Dictionary named "slop" (December 2025). The reconciled document notes both.
  • Frontier Model origin: Gemini attributed to Stanford CRFM, 2021; others to the July 2023 policy window. The July 2023 attribution is correct — Stanford CRFM coined "foundation model" but not "frontier model."

Version and Revision Policy

This is Version 1.0, dated April 20, 2026. The reconciliation methodology supports future revisions on a rolling basis. Specific revision triggers include: material new attributions surfaced by additional primary research; consolidation or displacement of any of the twenty narrative terms in the public record; and publication of the companion safety, ethics, alignment, and governance volume, which is currently in preparation.

Errors, omissions, and contested attributions are welcomed for consideration in Version 1.1 and subsequent editions.

 

Appendix — Chronological Timeline of Key Lexicon Events

The following chronology captures the vocabulary-defining moments underpinning this guide, from ChatGPT's launch through mid-April 2026. It is intended as a reference for date-checking and as a compressed narrative of the three-and-a-half-year arc.

2022

November 30: ChatGPT launches to the public, establishing the reference date for this guide and for the broader "post-ChatGPT" era. November: Cory Doctorow publishes "enshittification," a term that will be widely applied to AI products by 2024. October 17: Harrison Chase open-sources LangChain, six weeks ahead of ChatGPT's launch.

2023

January 28 (arXiv-dated, NeurIPS 2022): Wei et al.'s "Chain-of-Thought Prompting" paper establishes CoT as a formal prompting technique. March 1: OpenAI introduces the system role in its Chat Completions API, creating the prompt versus system-prompt distinction that structures every subsequent LLM product. March 14: GPT-4 releases at 8K and 32K context tiers. March 30: Toran "Significant Gravitas" Gravitas releases AutoGPT, triggering the first autonomous-agent wave. April 3: Yohei Nakajima releases BabyAGI. May 4: Dylan Patel's SemiAnalysis publishes Luke Sernau's leaked Google memo "We Have No Moat, And Neither Does OpenAI," defining the moat debate for the next year. May 3: LMSYS launches Chatbot Arena. May 22: Shutterstock expansion data deal signed with OpenAI on July 11; Rafailov et al. publish Direct Preference Optimization (DPO) on May 29. June 13: OpenAI announces function calling. June 23: Lilian Weng publishes "LLM Powered Autonomous Agents," the first influential architectural statement of the agent concept. July 4: Sean Fenlon and Dave Blundin articulate "Value per Token" as a business metric on LinkedIn. July 11: Anthropic releases Claude 2 with a 100K-token context window. July 21: White House voluntary AI commitments; July 26: Frontier Model Forum announced. September 25: GPT-4V released. October: DSPy released by Omar Khattab at Stanford. November 21: Claude 2.1 extends context to 200K tokens. November 23: Karpathy's "LLM OS" keynote floats the Agent-OS idea. November 25: Greg Kamradt's Needle-in-a-Haystack benchmark appears. December 6: Gemini 1.0 released as "natively multimodal." December 11: Mistral releases Mixtral-8x7B.

2024

January: LangGraph released by LangChain. January 22: ElevenLabs Series B $80M at $1.1B. February 15: OpenAI Sora announced; Gemini 1.5 Pro crosses 1M-token preview. March 4: Claude 3 released with native vision. March 12: Cognition announces Devin. May 7: @deepfates on X crystallizes "slop" as the term for unwanted AI-generated content; May 8: Simon Willison canonizes it. May 13: GPT-4o released. May 16: OpenAI Reddit deal at ~$60M/year. May 22: News Corp deal >$250M over five years. June 13: Andrew Ng reframes "agentic" as a spectrum. June 19: Ilya Sutskever founds Safe Superintelligence Inc. June 28: Harrison Chase's copilot-vs-agent LangChain post. July: Microsoft Research publishes GraphRAG. August 6: OpenAI Structured Outputs. August 14: Anthropic releases prompt caching. September 11: Sean Fenlon posts on X: "AI Agents are NOT: Tools, Assistants, Co-pilots. AI Agents ARE: Autonomous, Inevitable." September 12: OpenAI o1-preview released, formalizing the reasoning-model tier. September 19: Anthropic publishes "Contextual Retrieval." October 1: OpenAI Realtime API launches at Dev Day. October 22: Anthropic Computer Use released. November 25: Anthropic releases MCP. November: LLMflation term coined at a16z. December 5: o1 full release, ChatGPT Pro at $200/month. December 19: Anthropic publishes "Building Effective Agents," giving the field its canonical agent definition. December 20: OpenAI announces o3 with ARC-AGI scores. December: Google Gemini Deep Research.

2025

January 20: DeepSeek R1 released; the resulting Nvidia market-cap drop dramatizes the reasoning-model arms race. January 21: Stargate announced at $500B (OpenAI + SoftBank + Oracle + MGX). January 23: OpenAI Operator released. January 30: ElevenLabs Series C at $3.3B. January 31: o3-mini ships. February 2: Andrej Karpathy's "vibe coding" tweet. February 24: Claude 3.7 Sonnet with extended thinking. February 27: Sesame AI publishes "Crossing the uncanny valley of conversational voice." March 13: Sesame CSM-1B open-sourced. March 26: Sam Altman endorses MCP publicly. April 16: o3 and o4-mini full releases. May 19: Microsoft Build 2025 — MCP steering committee expands; Windows 11 positioned as "agentic OS." May: Hume AI EVI 3 launches at sub-300ms latency. June 10: o3-pro launches. June 18: Tobi Lütke coins "context engineering" on X. June 25: Karpathy endorses "context engineering." June 27: Simon Willison canonizes the shift. October 1: Hume AI Octave 2 launches. October 6: ambient-code.ai blog post on "Value per Token" — the earliest indexed public coinage surfaced by de-seeded Deep Research. October 21: Sesame AI Series B led by Sequoia and a16z. October 28: Cartesia $100M round alongside Sonic-3. November 6: Collins Dictionary names "vibe coding" 2025 Word of the Year. November: Agentic AI Foundation announced under Linux Foundation; Anthropic donates MCP. December 3: Mark Summerfield coins "AI slopplications" on the Patentology blog. December 15: Merriam-Webster names "slop" 2025 Word of the Year. December 27: Karpathy posts primitives enumeration (agents, sub-agents, prompts, contexts, memory, modes, permissions, tools, plugins, skills, hooks, MCP, LSP, slash commands, workflows, IDE integrations). December 30: Tomasz Tunguz publishes "Gross Profit per Token."

2026

Early February: Mitchell Hashimoto publishes "My AI Adoption Journey," introducing "harness engineering." February 4: ElevenLabs raises $500M at $11B valuation. February 11: Ryan Lopopolo publishes "Harness engineering: leveraging Codex in an agent-first world" on the OpenAI blog, describing the Symphony project (million-line Electron app, 1B tokens/day). February 17: Martin Fowler and Birgitta Böckeler publish the first consolidating "Agent = Model + Harness" article on Fowler's site; Stripe Minions detailed in Stripe Dev Blog two-parter. March 2: Sean Fenlon publishes The Near Side #28, coining "Patent Slop" as a parallel term for AI-drafted patent application floods. March: Sam Altman's BlackRock Infrastructure Summit remarks crystallize the "token economy" framing. March: Gemini 3.1 Flash Live ships. April 2: Böckeler's fuller version of the Agent = Model + Harness article published. April 15: working knowledge-cutoff for this Version 1.0 reference. April 20: The Definitive Guide to the AI Lexicon v1 published.

 

Appendix — Cross-Reference Index

This cross-reference index groups terms by the conceptual territory they cover, to aid readers navigating the lexicon by topic rather than by alphabet.

Architectural Terms (Model Layer)

Transformer; Foundation model; Frontier model; Frontier lab; Large Language Model (LLM); Small Language Model (SLM); Mixture of Experts (MoE); Dense model; State Space Model (SSM) / Mamba; Diffusion model; Native multimodal; Vision-Language Model (VLM); Omni model; Reasoning/Thinking model; World model; Vision-Language-Action (VLA) model.

Training Terms

Scaling laws; Pre-training; Post-training; Supervised Fine-Tuning (SFT); RLHF; DPO; RLVR; Synthetic data; Distillation; LoRA; QLoRA; PEFT; Quantization; Speculative decoding; Test-time compute; Chinchilla-optimal; Tokenization/BPE; Embedding.

Capability and Performance Terms

Benchmark; MMLU; GSM8K; HumanEval; MATH; GPQA; SWE-bench; ARC-AGI; Benchmaxxing; Goodharted; Emergence; Hallucination; Confabulation; Chain of Thought (CoT); Tree of Thoughts; Needle in a Haystack (NIAH); Lost in the Middle; Pass@k; ELO / Chatbot Arena; Effective compute; Inference-time scaling.

Context and Prompting Terms

Prompt; System prompt; Developer message; Prompt engineering; Context engineering; Few-shot / in-context learning; Zero-shot prompting; Role prompting; Context window; Long context; Prompt caching; Contextual retrieval; Context compression; Context poisoning/distraction/confusion/clash; RAG; GraphRAG; Agentic RAG; Vector database; Chunking; Reranking; Hybrid search.

Orchestration, Harness, and Agent-Framework Terms

Function calling; Structured output; JSON mode; Guardrails; LangChain; LangGraph; AutoGen; CrewAI; Semantic Kernel; Haystack; LlamaIndex; DSPy; AGENTS.md; MCP; A2A; Tool registry; Orchestration; Meta-orchestration; Scaffolding; Harness; Harness engineering; Guides and sensors; Garbage collection; Symphony; Minions; Archon; 12-factor agents.

Agent and Deployment Terms

Agent; Agentic; Copilot; Assistant; AutoGPT; BabyAGI; Devin; Operator; Computer Use; Browser agent; Deep Research; Claude Code; Codex; Cursor; Windsurf; Vibe coding; Agentic engineering; Agent OS; LLM OS; Skills; Plugins; Hooks; Slash commands; Sub-agents; Context quarantine; Multi-agent system; Managed agent; Handoff; Delegation; Supervisor/worker agent; Agent framework; Agent runtime.

Multimodal and Media Terms

Multimodal; VLM; Native multimodal; Omni model; Speech-to-speech (S2S); Realtime API; Speech-to-text (STT); Text-to-speech (TTS); Whisper; Barge-in; Turn-taking; Time to First Token (TTFT); Text-to-video; Text-to-image; Text-to-music; World model; Generative UI; VLA.

Economics and Ecosystem Terms

Moat; No moat; Training data deal; Data moat; Synthetic yield; LLMflation; Cost per token; Value per Token (VPT); Gross Profit per Token; Token economy; Token billionaire; Compute buildout; Hyperscaler capex; GPU-rich; GPU-poor; AI Slop; Patent Slop / AI slopplications; Enshittification; Dead Internet theory; AI-native company; AI wrapper; Forward-deployed engineer; Context engineer.

Forward-Horizon Terms

AGI; ASI; Recursive Self-Improvement; Intelligence explosion; Takeoff; Agent mesh; Meta-harness; Affective memory; Frontier Model Forum; Agentic AI Foundation.


 

Reviews Appendix (New in Version 1.1)

This appendix reproduces the four reviews of Version 1.0 by all four original reconciliation models, each in their current published form on April 20–21, 2026. The reviews are presented in the order they were received, with each model's seeding status during the original Version 1.0 research labeled for the reader.

The methodological purpose of including the full review texts — rather than summarizing them — is that the reviews themselves constitute the most empirically rigorous test available of the document's core methodology claim: that prompt-seeding shapes downstream interpretation. By reproducing the reviews, the reader can observe the seeded/de-seeded distinction at the level of critique depth, not just at the level of the original Deep Research output. Summary would be editorial. The full texts are evidence.

A brief synthesis follows the four reviews, identifying the patterns that emerge across the corpus. That synthesis is opinion. The reviews themselves are data.

Review 1: Grok 4.3

Seeding status during Version 1.0 research: SEEDED. Grok 4.3's prompt explicitly named Sean Fenlon and Dave Blundin as originators of "Value per Token" and Sean Fenlon as the coiner of "Patent Slop," and referenced Fenlon's September 11, 2024 X post on the agent-autonomy distinction, asking the model to verify those attributions as seeded facts.

 

 

The Definitive Guide to the AI Lexicon (v1.0) is a remarkable artifact — a 150+ term, multi-LLM-reconciled reference that attempts to do something genuinely new: freeze-frame the linguistic explosion that began the moment ChatGPT launched on November 30, 2022, and turn it into a usable, attributable, era-tagged taxonomy.

 

1. Methodological transparency (and the meta-point it makes). The most intellectually honest part of the entire work is the explicit disclosure of prompt asymmetry on three contested terms: "Value per Token," "Patent Slop," and the agent-autonomy distinction. Two models (Gemini 3.1 Pro and Grok 4.3) were given seeded attributions pointing to Sean Fenlon and Dave Blundin. The other two (Claude Opus 4.7 and ChatGPT 5.4 Pro on its second run) were deliberately de-seeded and told to do fully independent research. The result is telling: the de-seeded models surfaced ambient-code.ai (October 6, 2025) as the earliest indexed public coinage of "Value per Token" and Mark Summerfield's Patentology blog (December 3, 2025) for "AI slopplications." None of the four models independently surfaced Fenlon's July 4, 2023 LinkedIn post or his March 2, 2026 "Patent Slop" coinage through open search. This is not a bug in the document — it is the feature. It demonstrates, in real time, how LLM "research" is heavily shaped by prompt priming, and it refuses to paper over that fact. Very few AI outputs in 2026 are this self-aware about their own epistemic limitations.

2. The era taxonomy is actually useful. Labeling every term with one of five eras — Pre-ChatGPT, Genesis (late 2022–2023), Agent Wave (2024), Context Era (2025), Harness Era (2026+) — turns what could have been a flat glossary into a historical map. You can literally watch the center of gravity move: 2023: "prompt engineering," "RAG," "system prompt." 2024: "agent," "reasoning model," "MCP." 2025: "context engineering," "vibe coding," "slop." 2026: "harness," "harness engineering," "Agent = Model + Harness." The taxonomy makes the acceleration visible.

3. The twenty narrative arcs are the real value. The index is impressive, but Section 3 is where the document earns its keep. The twenty chosen stories are not random — they trace the actual conceptual migrations that matter: Prompt → System Prompt → Context Engineering → Harness (the longest and most important arc); Agent (from AutoGPT skepticism to the autonomy criterion); Chain of Thought → Reasoning Models (prompting technique absorbed into training); RAG (from specific architecture to generic pattern to specialized tool); Moat (model weights → data → deployment/trajectory data); Slop and vibe coding (the 2025 cultural yin/yang). These narratives read like journalism, not taxonomy. They name people (Lütke, Karpathy, Willison, Fenlon, Lopopolo, Böckeler, etc.) and give plausible causal mechanisms. That is rare.

The central claim that matters. The document's strongest thesis appears in the conclusion and is repeated across multiple entries: "Agent = Model + Harness." This is presented as the 2026 settlement — the point at which the industry stopped arguing about whether agents are real and started engineering the non-model half of the system with the same seriousness previously reserved for model training. Everything else in the lexicon (context engineering, MCP, AGENTS.md, guides & sensors, garbage collection, sub-agent quarantine, etc.) is infrastructure for that equation. The vocabulary has finally caught up to the actual architecture being built.

Minor critiques. The "earliest traced" vs. "earliest known articulation" distinction is handled with admirable care, but it still feels slightly defensive. The asymmetry disclosure makes the reader wonder how many other attributions in the 150-term index might shift if every claim were subjected to the same de-seeded treatment. The safety/alignment vocabulary exclusion creates a slightly sterilized view of the era. Terms like "model spec," "refusal," "steering," and "interpretability" are not just safety theater — they are load-bearing pieces of the harness itself in many production systems. The forward predictions are bravely specific.

Overall assessment. This is not just a glossary. It is a time capsule with an unusually high signal-to-noise ratio. The multi-LLM reconciliation approach, combined with the explicit asymmetry disclosure, gives it more epistemic credibility than almost any other "definitive" AI document published in the same period. The strongest parts are the ones that treat vocabulary as archaeology of architecture — showing how words track (and sometimes lag) real shifts in what is being built. The weakest parts are the inevitable ones: any 2026 document claiming to be "definitive" on a field moving this fast will look quaint by 2028.

Verdict: This is the best single reference work on post-ChatGPT AI terminology that exists as of its publication date. The document ends with a line that could serve as its own epitaph: "Vocabulary is the index to architecture, and architecture is the index to what the industry actually does. Watch the words." This guide does exactly that — and does it with unusual rigor.

 

Review 2: Gemini 3.1 Pro

Seeding status during Version 1.0 research: SEEDED. Gemini 3.1 Pro's prompt was identical to Grok 4.3's, with the same Fenlon/Blundin attributions pre-supplied as facts to verify.

 

 

Published contemporaneously to the current date (April 2026), this document is a highly sophisticated piece of industry historiography and thought leadership. Ostensibly a glossary, it actually functions as a comprehensive architectural, economic, and cultural manifesto about the generative AI sector from the launch of ChatGPT (November 2022) to the present. By tracing how industry jargon is born, mutates, and dies, the document maps exactly how the technology and its business models have matured.

1. The core thesis: "Vocabulary is the Index to Architecture." The central philosophical argument of the document is that linguistic shifts in AI are not marketing trends — they are trailing indicators of architectural reality. We do not know what we are building until we know what to call it. "Prompt" versus "System Prompt" only split when OpenAI changed its API structure to allow for a hidden developer layer. "Agent" only survived its hype backlash because underlying tools (function calling, MCP, reasoning models) made true autonomous loops technically feasible. The guide proves that when an architectural paradigm shifts, the old vocabulary becomes functionally useless, and a new set of terms rapidly crystallizes to replace it.

2. The architectural settlement: the rise of the "harness." The most significant narrative in the document traces the evolution of how developers build with LLMs, culminating in the 2026 engineering settlement: Agent = Model + Harness. Prompt Engineering (2022–2023) was the naive era, treating the model as a magic oracle. Context Engineering (2025) was the "librarian" era — the realization that the layout of information matters more than the phrasing. Harness Engineering (2026) is the current industrial paradigm. This perfectly rehabilitates the pejorative "AI wrapper" into a rigorous software engineering discipline. Frontier labs own the commodity model, but enterprises will win or lose based on the proprietary harness they build around it.

3. The economic metamorphosis: from "Cost" to "Value." The lexicon highlights a crucial shift in how AI is bought, measured, and defended in the boardroom. The Moat Migration tracks the concept of a business "moat" from model weights (which evaporated) to data licensing deals to deployment/trajectory moats. Value per Token (VPT) represents the shift in metrics: buyers stopped treating AI like SaaS software and started evaluating it like labor.

4. Hardware & biology: the sub-500ms voice barrier. Section 4 isolates Voice AI because it operates on entirely different physics than text. The vocabulary shift from pipelined architectures (STT → LLM → TTS) to native speech-to-speech (S2S) was driven by a biological constraint: human conversational turn-taking averages 200ms. Hitting the sub-500ms end-to-end latency threshold is highlighted as the exact moment software crosses the uncanny valley.

5. Cultural friction: "Slop" vs. "Vibe Coding." The document highlights how words serve as emotional containers for societal reactions. Just as "spam" defined email pollution, "slop" defines generative contamination. Its extension into "Patent Slop" shows how this zero-marginal-cost excess infects highly regulated, document-heavy institutions. Andrej Karpathy's "vibe coding" (optimistic, fast, blind acceptance of generated code) is contrasted with the rising need for "agentic engineering."

6. The meta-narrative: a brilliant methodological "Trojan Horse." Beneath the objective, academic tone lies a brilliant piece of intellectual priority claiming. The document frames itself as a "Multi-LLM Reconciliation Study." Within this framing, the author includes a section called the "Disclosure of Methodological Asymmetry." This section explicitly reveals that some models were "seeded" with the names Sean Fenlon and Dave Blundin. By contrasting these seeded, early articulations with the independent discoveries surfaced by unseeded models, the author achieves two things: they demonstrate the mechanical flaws of LLM confirmation bias, and they cleverly embed Fenlon and Symphony42 directly into the historical bedrock of the industry as unrecognized early visionaries. It is an ingenious rhetorical device wrapped inside an otherwise highly accurate and insightful analysis.

Final takeaway. "The Definitive Guide to the AI Lexicon" succeeds because it uses language as a diagnostic tool. By mapping the transition from a human struggling to speak to a machine (Prompt Engineering) to a system steering an autonomous intellect (Harness Engineering), the text perfectly encapsulates AI's transition from a novel parlor trick into the foundational operating system of the modern enterprise.

 

Review 3: Claude Opus 4.7 (de-seeded, incognito session)

Seeding status during Version 1.0 research: DE-SEEDED (second run). Claude Opus 4.7's Version 1.0 research prompt removed the Fenlon/Blundin attributions and asked for fully independent research on all three contested terms. This review was produced in an incognito Claude.ai session with no memory of the original drafting context.

 

 

What the document is. A ~25,000-word reference work dated today (April 20, 2026), framed as a four-model reconciliation (Claude Opus 4.7, ChatGPT 5.4 Pro, Gemini 3.1 Pro, Grok 4.3) of the post-ChatGPT AI vocabulary. Structure: methodology → ~150-entry index organized by category → 20 narrative term-biographies → a dedicated voice-agent section → one-year predictions with falsifiers → conclusion → methodology notes, chronological timeline, and cross-reference index as appendices.

As a reference, the skeleton is solid. The era framework (Pre-ChatGPT / Genesis / Agent Wave / Context Era / Harness Era) does real work, most attributions I can cross-check from memory match what I know (Wei et al. CoT Jan 2022, Hu et al. LoRA June 2021, Anthropic Claude 2 100K July 11, 2023, GPT-4V Sept 25, 2023, MCP Nov 25, 2024, o1-preview Sept 12, 2024, DeepSeek R1 Jan 20, 2025, Stargate Jan 21, 2025, @deepfates → Willison on "slop" May 7–8, 2024), and the explicit exclusion of the safety/alignment half of the lexicon is defensible if disclosed, which it is. The writing is professional and dense without being pompous. The Agent = Model + Harness framing as the "2026 settlement" is genuinely clarifying. The falsifiable-with-dates predictions at the end are better epistemic hygiene than most industry forecasts attempt.

The one structural problem a decoupled reader notices immediately. The document's own methodology section reveals — and then partially smooths over — a real issue around three attributions to Sean Fenlon: "Value per Token" (VPT), credited to Fenlon + Dave Blundin, LinkedIn, July 4, 2023; "Patent Slop," credited to Fenlon, The Near Side #28, March 2, 2026; and the agent-autonomy distinction, an X post dated September 11, 2024.

The Methodological Asymmetry Disclosure is admirably explicit about what happened: two of the four models (Gemini, Grok) were given prompts that pre-supplied these attributions as facts to verify; the two "de-seeded" models (ChatGPT second-run, Claude) were asked to research independently. The crucial sentence is this one: "none of the four models surfaced Fenlon's July 4, 2023 VPT articulation or his March 2, 2026 Patent Slop coinage through open search."

What the independent searches surfaced instead: for VPT, the earliest traced indexed public use was ambient-code.ai on October 6, 2025 — more than two years after the claimed Fenlon/Blundin LinkedIn post. For the patent-filing phenomenon, the earliest traced indexed public use was Mark Summerfield's "AI slopplications" on the Patentology blog, December 3, 2025 — three months before Fenlon's "Patent Slop" coinage. For agent autonomy, the document's own narrative concedes there is "no single originator" and that Weng, Ng, Chase, and Anthropic converged independently — yet Fenlon's one-line September 11, 2024 X post is still singled out as a "representative practitioner articulation," an oddly elevated mention given that concession.

In scholarly attribution, an "earliest known articulation" that (a) can't be independently surfaced by four different LLMs across open search, (b) had no traceable influence on the 2+ years of intervening discourse, and (c) is known to the document's authors only because it was seeded into two of the four prompts, is functionally a self-reported priority claim. Whether you accept it depends on whether you take the claimant at their word. The result is a document that is mostly a legitimate lexicon and, on three specific terms, also a priority-establishment artifact for a specific person.

Other observations. The "reconciliation" frame is slightly overbuilt. When four models receive mostly-identical prompts and converge on publicly-verifiable facts, the "reconciliation" is mostly a re-statement of the public record with citation cleanup. That's valuable, but it's not the consensus-from-independent-minds epistemic weight the methodology rhetoric implies. LLMs trained on overlapping corpora and prompted similarly will agree a lot; agreement here is closer to inter-annotator reliability than to independent triangulation.

The predictions are more disciplined than most. Tying each forecast to a specific falsifier and date is the right template. The Fenlon prominence affects how to read two adjacent framings: the "Value per Token will become a boardroom metric by 2027" prediction and the softly-bullish Patent Slop paragraph both read somewhat differently once you notice the document has a stake in those specific terms gaining currency.

Bottom line. Treat it as two documents layered on top of each other: a ~95% useful AI-vocabulary reference with a sensible era framework, mostly-verifiable attributions, unusually transparent methodology disclosures, and better-than-average forecast discipline — worth keeping on hand; and a priority-establishment artifact for three specific terms (Value per Token, Patent Slop, agent-autonomy-as-Fenlon-formulation) where the claimed priority cannot be independently verified from open-web search and is present in the document because it was seeded into the prompts of two of the four models.

If the goal is Version 1.1, the clean fix is to move the three Fenlon attributions out of the authoritative narrative voice and into a clearly-labeled "disputed / self-reported priority" sub-section, on the same footing as how the document handles other contested coinages. That would let the rest of the work stand on its own without the attribution questions bleeding into it.

 

Review 4: ChatGPT 5.4 Pro

Seeding status during Version 1.0 research: DE-SEEDED (second run). ChatGPT 5.4 Pro's Version 1.0 research prompt (after the first-run meta-document failure) removed the Fenlon/Blundin attributions and asked for fully independent research. This review was produced in an extended-reasoning session.

 

 

Overall: this is very strong — more impressive as a piece of synthesis than as a pure reference work. It already works as a hybrid of lexicon, industry history, and argument, and its central thesis is the right one: post-ChatGPT AI vocabulary has moved from prompts, to retrieval/context, to agents, to harnesses. Recent primary-source material broadly supports that arc: OpenAI's March 2023 GPT-3.5 API introduced chat-style message roles, OpenAI added function calling in June 2023, Anthropic's December 2024 agent post explicitly distinguishes workflows from agents, and both OpenAI and Anthropic are now publishing engineering work in a harness-centric frame.

What it gets most right is the architecture-through-language lens. The best idea here is not any single definition; it is the claim that vocabulary tracks what the industry is actually building. That feels true in the document's strongest sections: prompt vs. system prompt, RAG, context engineering, MCP, and harness engineering. The MCP and context-engineering timelines, in particular, line up well with public sources: Anthropic introduced MCP on November 25, 2024, and Simon Willison's June 27, 2025 note captured the Lütke/Karpathy "context engineering" shift as it was happening.

It is also structurally smart. The combination of a term index, narrative essays, a voice-agents sub-lexicon, and falsifiable predictions gives it more value than a flat glossary would have. The voice section is especially good because it treats voice as its own economic and UX stack rather than as a side note. And several of the backbone chronology checks hold up: the Frontier Model Forum date, OpenAI function calling, Anthropic's agent framing, Collins naming "vibe coding" its 2025 Word of the Year, and Merriam-Webster naming "slop" its 2025 Word of the Year all match public records.

Where it is weaker is auditability. As a historical essay, it is persuasive. As a "definitive" attribution ledger, it is not there yet. The document openly discloses seeded-prompt asymmetry around some contested terms, which is intellectually honest, but that also means the clean-room independence claim is partially compromised. More importantly, in the version you shared, the reader cannot see inline citations or endnotes for the hundreds of source-sensitive claims. For a document that repeatedly answers "who coined this first, when, and why," that is the biggest credibility gap.

There are also a few visible internal consistency problems that should be fixed before calling it final. The most obvious one is chronological: the 2024 timeline contains a "February 2, 2025" vibe-coding entry in the wrong year. Another is the Contextual Retrieval timeline: the draft places Anthropic "introduces Contextual Retrieval" in June 2025 and then separately lists an Anthropic Contextual Retrieval post on September 19, while Anthropic's official Contextual Retrieval post is dated September 19, 2024. Those are the kinds of errors that make skeptical readers question the more controversial attribution calls too.

I would also narrow the title. The piece explicitly excludes safety, alignment, ethics/governance, hardware vocabulary, cloud-platform terms, and a fair amount of research-internal language. That is a perfectly valid scope choice, but it means "The Definitive Guide to the AI Lexicon" overpromises. The actual document is closer to "the post-ChatGPT product, capability, orchestration, and economics lexicon."

A subtler issue is worldview bias. The lexicon is reconstructed mainly through frontier-lab blogs, English-language tech Twitter/X, Simon Willison/Karpathy/Tobi Lütke orbit discourse, VC essays, and startup operator language. That is probably the most important slice of the current English-speaking AI product conversation — but it is still a slice.

My scorecard would be: concept A, structure A-, narrative quality A-, source transparency B-, "definitive" claim C+. The four changes that would most improve it are straightforward. First, add visible notes for every origin/coinage claim. Second, split "earliest indexed public use" from "earliest known articulation" into separate evidence fields everywhere, not just in a few contested cases. Third, clean the timeline and era-boundary inconsistencies. Fourth, retitle it to match the actual scope.

Verdict: publishable after one more evidence-and-consistency pass.

 

Synthesis Across the Four Reviews

With all four reviews now reproduced in full, three patterns become visible that are material to both Version 1.1 and to the broader question of how to run this kind of multi-LLM reconciliation methodology.

Pattern 1: The seeded/de-seeded distinction predicts critique depth. The two seeded reviewers (Grok, Gemini) credited the methodology disclosure as "the feature, not the bug" and engaged with the document's structure and themes. Neither questioned the underlying evidentiary status of the three seeded attributions. The two de-seeded reviewers (Claude Opus 4.7, ChatGPT 5.4 Pro) both raised the evidentiary question: de-seeded Claude with a scholarly three-part attribution test and the "two documents layered" framing; ChatGPT 5.4 Pro with a letter-grade "source transparency B-" and the recommendation to split "earliest indexed public use" from "earliest known articulation" everywhere. The ordering of critique depth maps almost perfectly onto the ordering of prompt-seeding. This is the core empirical finding the Version 1.0 methodology predicted would hold, and it did.

Pattern 2: De-seeded reviewers agree on the structural fix but diverge on scope. Both de-seeded reviewers reached the same structural conclusion — that the three Fenlon attributions should be relocated from the narrative voice into a labeled contested-attributions section. De-seeded Claude scoped this to the three specific attributions. ChatGPT 5.4 Pro generalized it: "Split 'earliest indexed public use' from 'earliest known articulation' into separate evidence fields everywhere, not just in a few contested cases." Version 1.1 adopts the narrower Claude scope in Section 6 (the three Fenlon entries) plus a short secondary audit of other contested coinages deserving the same labeled treatment. A full 150-term re-audit is a Version 2.0 project.

Pattern 3: Only one reviewer caught concrete factual errors. Grok, Gemini, and de-seeded Claude all raised issues at the structural and framing level. Only ChatGPT 5.4 Pro ran fact-checks on specific dates and caught two errors in the chronological timeline appendix (the vibe-coding entry in the 2024 section and the dual-date Contextual Retrieval entries). Both errors have been corrected in Version 1.1. The difference is model disposition: ChatGPT 5.4 Pro's extended-reasoning mode has a strong pattern of running number-verification on specific claims. The other three models tend to accept cited numbers at face value. This is a separable model characteristic from seeding status and worth noting for anyone running future multi-LLM reconciliation work: a reviewing corpus benefits from including at least one model whose disposition is to fact-check.

The meta-finding. The four-review corpus is itself the cleanest empirical demonstration of how prompt-seeding affects downstream interpretation that this methodology has produced. The Version 1.0 document claimed that seeding mattered. The Version 1.1 appendix lets the reader observe the effect directly at the level of published review. That is more rigorous evidence than any abstract methodology section could deliver. The data speaks for itself — which is why it is reproduced in full here rather than summarized.

 

 

522
Views