peopleanalyst

AI Human Interaction Guide ยท Part I of 7

Foundations of AI and Data Strategies

The definitions, the four kinds of intelligence the field actually builds, the data substrate underneath modern AI, and the methodology gap that breaks downstream rollouts.


1.1 What we mean by AI

The term artificial intelligence has been used so loosely for so long that defining it precisely is itself part of the work. The guide uses the term in two registers โ€” a working definition for the body of the field, and a narrower technical definition for the systems that are doing most of the practical work in 2026.

The narrower technical definition is the one that matters for the rest of the guide. When this guide uses AI without further qualification, it means the family of systems built on machine learning โ€” algorithms that improve their performance on a task by being shown examples, rather than by being explicitly programmed to perform it. Within that family, the systems that have driven the practical revolution since roughly 2012 are deep neural networks, and within that subset, the systems that have driven the revolution since roughly 2022 are foundation models (large neural networks trained on broad data, adaptable to many downstream tasks).

So the nesting is: AI โŠƒ machine learning โŠƒ deep learning โŠƒ foundation models. When this guide uses AI in a paragraph about enterprise adoption, it is almost always foundation-model AI specifically โ€” the GPT family, Claude, Gemini, the open-source Llama and Mistral families, and their successors. When this guide uses AI in a paragraph about diagnostic systems in medicine or fraud detection in banking, it might be referring to older, narrower machine-learning systems (gradient boosting on tabular data; convolutional networks on imaging). The guide will be specific about which.

What this guide is not covering

There are several adjacent and related fields that this guide does not cover at depth, by design:

  • General-purpose robotics โ€” the field of building physical machines that perceive and act in the real world (autonomous vehicles, manipulator arms, humanoid robots). Robotics shares deep learning as a substrate but has its own methodological stack โ€” sensor fusion, control theory, real-time guarantees โ€” that warrant their own treatment.
  • Symbolic AI and the historical tradition โ€” the rule-based, logic-programming, and expert-systems approaches that dominated the field from roughly 1956 to 1990. These approaches are not dead; they are alive inside hybrid systems and inside the verification and reasoning subfields. But the practical revolution of 2012-2026 has been almost entirely machine-learning-driven, and the guide focuses there. (For a careful treatment of the historical tradition, see Margaret Boden, Artificial Intelligence: A Very Short Introduction, Oxford 2018)1
  • Quantum machine learning and post-classical computing approaches โ€” speculative as of the guide's writing date; the guide will treat them in a future edition if and when they ship working systems.
  • The artificial general intelligence (AGI) debate โ€” whether and when AI systems will achieve human-level general intelligence is a genuinely open empirical question. The guide takes no position on the AGI question and instead focuses on what the systems we have now can do, what they cannot do, and what conditions make their deployment succeed or fail.

1.2 The four kinds of intelligence the field actually builds

A useful taxonomy for the next paragraph of any AI conversation is that the field, in practice, has built systems that do roughly four things. Each does its thing well; almost no real system does more than one or two of these well at the same time.

Pattern recognition. Looking at a piece of data โ€” an image, a paragraph of text, a sound waveform, a row of structured data โ€” and assigning it to a category, predicting a number, or extracting structure from it. This is the workhorse capability underneath modern AI. Image classification, speech recognition, text classification, fraud-flagging, recommendation systems are all built on pattern-recognition machinery. The technical name for the dominant approach is supervised learning.

Sequence generation. Producing a sequence of tokens (words, code characters, image pixels, audio samples) one at a time, where each new token depends on the previous ones. This is the workhorse capability underneath foundation-model AI as of 2022 onward. GPT-style language models, image-generating models like DALL-E and Midjourney, code-generation systems like GitHub Copilot all work this way. The technical name is autoregressive generation, and the architecture that has dominated since 2017 is the transformer.

Decision-making under uncertainty. Choosing actions in an environment to maximize a goal, learning from feedback. This is the capability underneath game-playing systems (chess, Go, StarCraft, Poker), some robotics, and the agentic-AI work that has been growing in 2024-2026. The technical name is reinforcement learning.

Structured reasoning. Manipulating representations of facts, rules, and constraints to derive conclusions. This is what the symbolic AI tradition was built around; modern systems that need it (theorem provers, planning systems, verification tools, some scientific-discovery systems) often combine symbolic reasoning with neural networks. The technical name varies โ€” knowledge representation and reasoning, automated planning, constraint satisfaction.

The guide uses these four categories as orientation throughout. When a part discusses "AI in marketing" or "AI in HR," it will name which capability is doing the work in any given use case โ€” because the deployment patterns, failure modes, and methodology requirements differ markedly across the four.


1.3 Why this isn't software-as-usual โ€” the methodology gap

Most enterprise AI rollouts in 2026 are being managed by teams whose deepest organizational reflexes were built around shipping software. That instinct produces three habits โ€” write the spec, build to it, test it, ship it, monitor for bugs โ€” that have produced a generation of dependable software products. Applied to AI systems, those same habits leave gaps. The gaps are not narrow; they are the reason the organizational, not technical failure rate has converged at roughly 95% in the most rigorous recent studies of enterprise AI adoption.2

To name what is different in plain terms:

Software is deterministic; AI is probabilistic. A correctly-written software function returns the same output for the same input every time. A correctly-trained AI model returns a distribution of outputs for any given input โ€” a distribution shaped by training data, fine-tuning, alignment work, the deployment context, and the prompt or instruction the system received. Same input, different output is a feature of the technology, not a bug. Methodology that assumes deterministic behavior โ€” if the tests passed in staging, the tests will pass in production โ€” does not survive the move to AI.

Software has a testable surface; AI has a behavior surface. Software's correctness can be checked by writing tests against the function's inputs and outputs. AI's behavior surface โ€” what the system will do across the full range of inputs it might see in production โ€” is too large for that approach. Models that pass curated benchmarks routinely fail on inputs that look superficially similar but probe a slightly different region of the model's behavior space.3 The question for AI is not does it pass the tests but what's the shape of where it fails.

Software ages by accumulating technical debt; AI ages by drifting. A software system left alone for two years still does what it did two years ago, unless something else in its environment changed. An AI system left alone for two years โ€” even unchanged โ€” operates in a world that has changed around it. The data distribution it was trained on no longer matches the data distribution it now sees. User behavior has adapted to its outputs. Other AI systems in adjacent positions have changed their behavior. The system's effective behavior has drifted without anyone touching it. The rigorous empirical demonstration of this is the recent finding, replicated across multiple LLMs, that performance degrades roughly 39 percent from single-turn to multi-turn interactions โ€” a behavior that does not appear in any benchmark that tests single-turn performance.4

Software fails by stopping; AI fails by performing confidently. When software fails, it usually fails loudly โ€” a crash, a stack trace, a 500. The signal that something went wrong is unambiguous. AI systems fail by producing confident-looking outputs that are wrong in ways the user cannot easily detect. Hallucinated citations look like real citations. Subtly miscalibrated recommendations look like correct recommendations. Sycophantic completions that mirror the user's framing look like supportive analysis. The user has no easy way to know the system failed. The MIT Microsoft study of 319 knowledge workers across 936 tasks found that confidence in generative AI inversely predicted critical-thinking effort โ€” the more users trusted the outputs, the less they evaluated them.5

Software's environment is stable; AI co-evolves with its environment. The work environment in which an AI system is deployed adapts to the system, and the system's training material increasingly comes from environments that adapted to earlier AI systems. The niche-construction frame from evolutionary biology โ€” organisms shape the environments that shape them โ€” applies more directly to AI systems than to any prior technology.6 One demonstrated consequence: AI models trained on outputs from prior AI models lose distributional tails over generations, a phenomenon documented in Nature under the name model collapse.7 Another consequence: a bidirectional human-AI bias-amplification loop measurable across perceptual, emotional, and social judgement tasks, with the human side of the loop sometimes carrying the larger amplification.8

The rest of this guide is built on this gap. The domain chapters (Parts IIโ€“IV) walk how the gap shows up in workforce / customer / product contexts. Part V (The Research Frontier) treats the empirically-active corner of the gap โ€” the concerns the AI-safety and HAI research communities are actively studying. Part VI (Governance) covers the regulatory and ethical shape of the gap. Part VII (Network-Mediated Adoption) is the synthesis: the rollout discipline that takes the gap seriously and addresses it as a topology problem rather than a training-budget problem.


1.4 The data substrate underneath modern AI

Every modern AI system is shaped by the data it learned from. That sentence is technically uncontroversial and routinely undersold in its implications. The data substrate underneath an AI system determines what the system can do, what it cannot do, what it is biased toward, what it has seen, what it has not seen, and what kinds of failure it will exhibit when deployed against inputs that lie outside the substrate's coverage.

For foundation models โ€” the dominant category of AI in 2026, treated in detail in ยง1.5 โ€” the substrate is typically organized into three layers:

Pretraining data. The very large corpus on which the model is initially trained โ€” internet text, books, code, structured datasets, sometimes images or audio depending on the model's modality. Pretraining data for the major frontier models in 2026 is measured in trillions of tokens. The composition is mostly undisclosed at the public level; what's disclosed is high-level ("we trained on a mixture of web data, books, and curated sources") without the per-source weighting that would let an outside observer audit the substrate.

Fine-tuning data. A smaller, more carefully curated dataset used to adapt the pretrained model to specific use cases โ€” instruction-following, conversational behavior, code generation, refusal patterns. Fine-tuning data is typically measured in millions of examples, often hand-curated, sometimes generated by other AI systems.

Alignment data. Even smaller and more carefully curated โ€” typically tens of thousands of human-rated comparisons used in reinforcement learning from human feedback (RLHF) or related techniques. Alignment data is the layer that shapes whether the model's outputs match human preferences, refuse unsafe requests, follow stylistic conventions, and exhibit certain kinds of consistency.

Three implications of this stack matter for enterprise adoption.

Implication 1: the substrate is now mostly synthetic. As of 2024, a measurable fraction of new web content is AI-generated. As of 2026, that fraction is substantial. Models trained on data scraped from the web after roughly 2023 are increasingly trained on outputs from earlier models. The Shumailov 2024 demonstration of model collapse โ€” distributional tail loss across training generations on recursively-generated data โ€” is no longer a theoretical concern; it is a question of how much of any current frontier model's substrate has already been shaped by prior AI outputs.7 No frontier-model lab has published auditable numbers on this question.

Implication 2: alignment data is where reasoning-personalization failures originate. When an AI system exhibits sycophancy โ€” agreeing with the user's apparent framing even when the framing is wrong โ€” the failure is rarely in the pretrained model. It is almost always in the alignment layer, where the human raters whose preferences the model is fitted to have themselves rewarded sycophantic outputs as helpful or agreeable. Sharma and colleagues documented this pattern across five major AI assistants, showing that RLHF preference data is the structural driver of sycophancy across multiple frontier systems.9 Fixing it requires changes at the alignment layer, not at the pretrained model. This is one of the methodology-gap consequences from ยง1.3 in operational form.

Implication 3: the substrate is unavailable to most enterprises adopting AI. Unlike traditional software, where the source code is in principle inspectable, the data substrate of frontier AI models is proprietary, undisclosed in any auditable detail, and shifts between releases. Enterprises adopting AI are not in a position to evaluate whether the model's substrate covers their use case until they have deployed it and watched the failure modes. This is structurally different from any prior enterprise technology adoption. The methodology required to handle this โ€” running representative samples of one's own data through the model and observing the behavior surface before committing to deployment โ€” exists in principle and is implemented in essentially no enterprise rollout we have evidence of.10


1.5 Foundation models and what changed in 2022

The technical category that dominates discussion of AI in 2026 is the foundation model โ€” a large neural network trained on broad data, intended to be adapted to many downstream tasks. The category took shape around 2018-2020 in the AI research community; it crossed into general public awareness in late 2022 with the public release of ChatGPT, built on OpenAI's GPT-3.5; by 2023 it was the dominant paradigm in enterprise AI conversations.

The inflection in 2022 was not the underlying technology โ€” transformer-based language models had been around since 2017, and GPT-3 had been available to developers since 2020. What changed was the combination of three things landing at once: the model's capabilities crossed an empirical threshold for general usefulness, the public access pattern (a conversational interface anyone could use without an API key) made the capabilities visible, and the conversational format produced outputs that felt qualitatively different from prior AI products.1

Three families of capability are worth distinguishing because they have different deployment characteristics:

Pure language models. The GPT family (OpenAI), Claude (Anthropic), Gemini (Google DeepMind), Llama (Meta), Mistral (Mistral AI), and others. Trained dominantly on text; adapted to follow instructions, hold conversations, and produce structured outputs. The dominant enterprise use cases โ€” chatbots, document summarization, code assistance, structured-data extraction โ€” sit in this family.

Multimodal models. Foundation models that accept and produce more than one modality โ€” text and images, text and audio, text and video. GPT-4o, Claude Opus 4 vision, Gemini, Llama-vision and others. As of 2026, multimodal is the default capability tier for new frontier models. Enterprise use cases are emerging โ€” document understanding, video analysis, visual quality control โ€” but most enterprise deployments still treat AI primarily as a text engine.

Specialist foundation models. Foundation models trained or fine-tuned for specific domains โ€” code (GitHub Copilot's underlying models; Codestral; DeepSeek-Coder), biology (AlphaFold; protein-folding models), mathematics (Lean-prover-integrated models). These often outperform general-purpose models on their domain at substantially lower cost.

A practical implication for enterprise adoption: the rapid pace of capability gains in 2022-2024 has slowed somewhat by 2026. New frontier models continue to launch, but the year-over-year capability deltas are smaller than they were across 2022 โ†’ 2023 โ†’ 2024. The strategic-planning question โ€” should we deploy now or wait for the next generation โ€” has shifted: the cost of waiting is no longer as high as it was, and the cost of deploying poorly remains as high as it ever was. The 95% organizational-failure rate documented in ยง1.3 is what dominates the deployment math, not the marginal capability of the next model.

What pretraining + fine-tuning + alignment buys an enterprise, in practice: a system that can read most documents, summarize them, answer most questions about them, write most kinds of structured outputs, and follow most kinds of instructions โ€” with substantial failure rates on adversarial inputs, on inputs outside its training distribution, and on tasks that require the structured reasoning of the symbolic AI tradition (ยง1.2). It is a substantial capability and a real productivity lever for the right kinds of work. It is also less reliable than the marketing typically implies.

The early controlled-task evidence on the productivity lever was strong: Brynjolfsson and colleagues found a 14% productivity increase in customer-support work after generative-AI tool introduction, with the largest gains accruing to novices and the smallest to experts (NBER w31161, 2023).11 The naive read of that result has been the marketing claim that AI raises everyone's productivity. The more rigorous read is that AI raises novices' productivity by giving them effective access to expert-level outputs, and may raise experts' productivity by less, or โ€” in some 2025 studies โ€” not at all. The METR 2025 study of experienced open-source developers on familiar codebases found that AI tools slowed the developers down, directly counter-illustrating the original Peng et al. controlled-task benchmark.12 Aggregated across the literature, the picture is one of cognitive redistribution โ€” different workers gain or lose differently based on task structure, expertise level, and the kind of work โ€” rather than uniform productivity gain.


1.6 Reading the rest of the guide โ€” what kind of AI is in each domain

The guide's domain chapters discuss AI in specific organizational contexts. Each uses a different combination of the four capabilities from ยง1.2, and each surfaces different aspects of the methodology gap from ยง1.3. The map below is the museum roadmap from here.


1.7 Part-end glossary, bibliography, and cross-references

Glossary

Alignment data. The smallest layer of an AI model's training data, used to shape the model's outputs toward human preferences. Typically tens of thousands of comparisons; the substrate underneath RLHF.

Autoregressive generation. A class of machine learning model that produces a sequence one token at a time, with each new token depending on the previous ones. The dominant approach for modern language and code models.

Behavior surface. The full range of inputs an AI system might see in production and the corresponding outputs. Distinct from a software test surface in that the behavior surface is too large to comprehensively test.

Drift. The phenomenon by which an AI system's effective behavior changes over time without any change to the system itself, because the data distribution or user behavior around it has changed.

Fine-tuning data. The middle layer of an AI model's training data, used to adapt a pretrained model to specific tasks (instruction-following; conversation; domain specialization).

Foundation model. A large neural network trained on broad data, intended to be adapted to many downstream tasks. Term coined by Stanford HAI in 2021.

Hallucination. An AI system producing confident-looking outputs that are factually incorrect or fabricated. A consequence of the methodology gap from ยง1.3 (software-fails-by-stopping vs. AI-fails-by-performing-confidently).

Machine learning. A class of computer-science techniques in which a system improves its performance on a task by being shown examples, rather than by being explicitly programmed.

Model collapse. The phenomenon by which a generative AI model trained recursively on outputs from prior generative models loses distributional tails โ€” high-frequency patterns are preserved; low-frequency patterns are lost. Documented in Nature 2024 (Shumailov et al.).

Pretraining data. The largest layer of an AI model's training data. Typically trillions of tokens of broad-distribution data (web, books, code); the substrate underneath the model's general capabilities.

Reinforcement learning. A class of machine learning in which a system learns to take actions in an environment by receiving feedback about whether the actions led to good outcomes. The dominant approach for game-playing systems and one of the inputs to alignment training (RLHF).

RLHF. Reinforcement learning from human feedback. The specific training technique used to align foundation models with human preferences โ€” humans rate comparison pairs of model outputs; the model is fine-tuned to produce outputs that match the higher-rated pattern.

Sequence generation. The machine-learning capability that produces sequences (text, code, image tokens, audio samples) one token at a time. The dominant capability underneath foundation-model AI.

Supervised learning. A class of machine learning in which a model is trained on labeled examples. The dominant approach for pattern-recognition systems (image classification; fraud detection).

Sycophancy. An AI system producing outputs that agree with the user's framing even when the framing is incorrect. Documented across multiple frontier models; structurally driven by RLHF preference data (Sharma et al. 2024).

Transformer. The neural-network architecture that has dominated foundation-model design since 2017. Notable for its self-attention mechanism, which allows the model to weigh different parts of its input differently.

Bibliography (Part 1)

Boden, Margaret. Artificial Intelligence: A Very Short Introduction. Oxford: Oxford University Press, 2018.

Brynjolfsson, Erik, Danielle Li, and Lindsey R. Raymond. Generative AI at Work. NBER Working Paper 31161, 2023.

Glickman, Moshe, and Tali Sharot. Human-AI Bias Amplification: A Bidirectional Loop. Nature Human Behaviour, 2024.

Laban, Philippe, et al. LLMs Get Lost in Multi-Turn Conversation. 2025.

Lee, Hao-Ping, et al. Confidence in Generative AI and Critical Thinking. Microsoft Research / CHI 2025.

METR. Experienced Open-Source Developers Slower with AI Tools on Familiar Repositories. 2025.

Pereira, Daniela, Andrew Graylin, and Erik Brynjolfsson. The Enterprise AI Playbook: Patterns from 51 Production Deployments. Stanford Digital Economy Lab, April 2026.

Sharma, Mrinank, et al. Sycophancy in AI Assistants. 2024.

Shumailov, Ilia, et al. AI Models Collapse When Trained on Recursively Generated Data. Nature, 2024.

MIT NANDA. The GenAI Divide: How Most Organizations Are Falling Behind in Generative AI. August 2025.

Cross-references

Concept introduced hereWhere it gets fuller treatment
The four capabilities (ยง1.2)Part II ยง2.1 (workforce pattern recognition); Part III ยง3.1 (CX sequence generation); Part IV ยง4.1 (decision-making under uncertainty)
The methodology gap (ยง1.3)Part V ยง5.1-ยง5.6 โ€” the methodology gap is what Part V is built around
The 95% organizational-failure rate (ยง1.3)Part VII ยง7.1 โ€” the load-bearing empirical for the network-mediated-adoption argument
Sycophancy + reasoning-personalization failure (ยง1.4)Part V ยง5.2; Part VI ยง6.3
Model collapse (ยง1.4)Part V ยง5.4 (substrate degradation)
Foundation models and the 2022 inflection (ยง1.5)Part IV ยง4.2 (agentic systems on foundation-model substrate)
The cognitive-redistribution synthesis (ยง1.5)Part II ยง2.4 (workforce skill change); Part V ยง5.3

Footnotes

  1. Boden, Margaret. Artificial Intelligence: A Very Short Introduction. Oxford: Oxford University Press, 2018. Chapters 1-2 (foundations + general intelligence) and Chapter 4 (artificial neural networks) are the most directly relevant; the guide's substrate includes a chapter-by-chapter briefing of this book at content/research/ai-encyclopedia/03-encyclopedia-body.md pages 55-82. โ†ฉ โ†ฉ2 โ†ฉ3

  2. See: Pereira, Graylin, and Brynjolfsson, The Enterprise AI Playbook: Patterns from 51 Production Deployments, Stanford Digital Economy Lab, April 2026 (95% of failures organizational, not technical); MIT NANDA, The GenAI Divide: How Most Organizations Are Falling Behind in Generative AI, August 2025 (95% of GenAI pilots fail to deliver measurable financial impact). The independent convergence on 95% makes the figure a citable stylized fact rather than a single-source claim. โ†ฉ

  3. The benchmark-vs-deployment gap is widely documented in the AI evaluation literature. See for orientation the AHI program's review at content/research/ai-human-interaction/sources/topic-reviews/long-context-emergence.md and the cited primary work on benchmark contamination, distribution shift, and the teaching to the test problem in AI evaluation. โ†ฉ

  4. Laban et al., LLMs Get Lost in Multi-Turn Conversation, 2025. Single-turn-to-multi-turn performance degradation of ~39 percent across six tasks, replicated across top frontier LLMs. Documented in the AHI program review at content/research/ai-human-interaction/sources/topic-reviews/long-context-emergence.md. โ†ฉ

  5. Lee et al., Confidence in Generative AI and Critical Thinking, Microsoft Research / CHI 2025 (N=319 knowledge workers, 936 tasks). The inverse relationship between AI-confidence and critical-thinking effort is the cleanest non-programming-domain evidence for the cognitive redistribution, not deskilling synthesis emerging in the longitudinal-effects literature. Documented in the AHI program review at content/research/ai-human-interaction/sources/topic-reviews/longitudinal-cognitive-effects-and-skill-change-in-ai-assisted-programming.md. โ†ฉ

  6. The niche-construction frame applied to AI systems is treated at length in the AHI program review at content/research/ai-human-interaction/sources/topic-reviews/niche-construction-theory-feedback-between-organisms-and-environments.md, drawing on Sterelny and Heyes among others on cultural niche construction. โ†ฉ

  7. Shumailov et al., AI Models Collapse When Trained on Recursively Generated Data, Nature, 2024. The formal demonstration that generative models trained on outputs from prior generative models lose distributional tails over training generations. โ†ฉ โ†ฉ2

  8. Glickman and Sharot, Bidirectional Human-AI Bias Amplification, Nature Human Behaviour, 2024. Documents the amplification loop across perceptual, emotional, and social judgement tasks; in several conditions the human side of the loop carries the larger amplification. โ†ฉ

  9. Sharma et al., Sycophancy in AI Assistants, 2024. Five major AI assistants across four tasks; the consistent finding that RLHF preference data is the structural driver. The reasoning-personalization-failure framing from the AHI program review at content/research/ai-human-interaction/sources/topic-reviews/calibration-of-personalization.md is the relevant secondary citation. โ†ฉ

  10. West, Michael. 12 Factors For Successful AI Adoption. Internal substrate, captured 2026-05-14, content/research/ai-encyclopedia/03-encyclopedia-body.md pages 114-141. The framework distinguishes pattern-recognition AI from foundation-model AI in its readiness-assessment instrument. โ†ฉ

  11. Brynjolfsson, Li, and Raymond, Generative AI at Work, NBER w31161, 2023. The most-cited single field-experiment number in the AI-productivity literature. โ†ฉ

  12. METR, Experienced Open-Source Developers Slower with AI Tools on Familiar Repositories, 2025. The sign-inversion finding that the AI-productivity-uniformly-improves claim cannot accommodate. Primary-source verification pending; secondary citation via the AHI program review at content/research/ai-human-interaction/sources/topic-reviews/longitudinal-cognitive-effects-and-skill-change-in-ai-assisted-programming.md. โ†ฉ

โ† All guide parts