Overview
What this research program is and why it exists. The frame the rest of the work hangs on.
research / arc / capability-development
What does AI do to human capability over months and years, and what kinds of system design support development rather than dependence? Authorship-system design, the Penwright Measurement Framework, the longitudinal test that asks whether a writer is better with the system, than without it, after six months.
Why this matters
The portable claim — what this arc lets you understand outside the surface domain.
Existing AI–human-interaction research clusters in single-session, individual-level, descriptive studies. Almost no longitudinal work exists. The arc is a bet that capability-development can be measured, and that the design of the interaction structure — not the model on the other side — is what determines whether AI augments or substitutes. Penwright (inside Vela) is the lead empirical apparatus; the AHI program owns the published-paper trajectory.
Spans
Products this arc cuts through. Each application is sometimes the lead empirical apparatus, sometimes the funding/data-collection platform.
In the magazine
Editorial pieces from principal-issues that draw on this arc.
The trade-press AI-failure piece runs as a parade of executive embarrassment. The deployment record argues something else: the failure modes are not idiosyncratic — they recur because the methodology gap recurs. Seven cases drawn from the public record, each rendered as the category of failure it instantiates and mapped to the structural correction the literature has already named.
People analytics fails predictably when statistics and systems ship without the layer that defines constructs, defends labels, and names what can go wrong when you intervene. This is the long case for the Science S: the machine fantasy, science vs statistics, failure modes, and a usable minimum that does not require Google's headcount.
Every people-analytics team claims statistics. Most are bluffing — running tests without owning the protocol, reading p-values as findings, rolling up to N=12 cohorts and calling them comparable. This is the long case for the Statistics S: inferential honesty, protocol discipline, decision-grade reasoning, the five failure modes that show up across the deployment record, and a usable minimum that does not require NIH-scale infrastructure.
Most people-analytics functions ship dashboards from a reporting-team configuration and wonder why their work doesn't compound. The structural alternative is the function as a software stack — substrate, normalization, privacy, statistical, decision-support, surface — composed into capabilities rather than rebuilt per request. This is the long case for the Systems S, with the People Analytics Toolbox as the reference implementation.
Most HR strategy in 2026 is built as a deliverable rather than as decision-support substrate. The deck has twelve initiatives, four pillars, a north-star statement — and no named alternatives, no characterized uncertainty, no financial translation. This is the long case for the Strategy S: decision-shape clarity, Three A's lifecycle measurement, NAV financial translation, and the difference between the strategy that survives the budget cycle and the strategy that gets retired.
The twelve-factor AI-readiness self-assessment is not a pass/fail gate before deploying AI. It is a fuel-readiness map for where the rollout will catastrophically fail when it meets the early majority. Three dimensions reframed — management buy-in, change fatigue, workflow mapping — to teach the move; the remaining nine work the same way.
Capability, Alignment, Motivation, Support — four conditions for consistent above-expectation performance, every one of them required. The case for the conjunction, the eight-item survey, and the thresholds that turn the index into action at the team level.
A CHRO walks into a capital-allocation meeting without a number. NAV is the number — the single indexable KPI tying human-capital state to dollar outcomes, segment by segment. The math is simple; the discipline is in what NAV deliberately doesn't try to be.
Most organizations cannot do people analytics — not for lack of data, but for lack of methodology. The field has stayed stuck at a few elite organizations because most attempts copy Google's *outputs* without the underlying four-S capability that produced them. The principal-issues thesis names the load-bearing set, fast.
Read first
The general-audience entry point for this arc. Drill down below for the full set.
Public-audience version of Tier-1 Paper 1. Frames the case for capability-development AI writing systems for a general reader. Outline draft.
Read →Drill-down — full arc surface
Cross-product. Source application shown on each entry.
Overview
What this research program is and why it exists. The frame the rest of the work hangs on.
Methodology
How the work is done — instruments, protocols, the standards each report inherits.
Reports
The actual research findings — phased results, research-question briefs, applied analyses.
Track A — twelve sub-papers across three tiers (foundational theory · measurement and mechanism · longitudinal empirical studies) drawing from a shared dataset generated by Penwright in production. Track B — six cross-cutting research programs the Penwright evidence may eventually seed.
Tier-1 foundational paper. Synthesizes phenomenology of skill, attention theory, transactive memory, standpoint theory, epistemic injustice, cognitive apprenticeship, working-alliance theory, psychoanalytic theory, improv theory, translation theory, niche construction, and institutional economics into a single positioning argument for Penwright. Outline draft.
Tier-1 foundational paper. Position-and-mechanism paper for the epistemic-control alternative to default-LLM-corpus inheritance. Theoretical positioning across standpoint theory (Harding, Haraway), epistemic injustice (Fricker), and the AI bias tradition (Bender et al., Bommasani et al., Birhane et al.). Distinguishes Corpus Control from adjacent technical approaches (RAG, fine-tuning, prompt engineering) at the layer level. Three mechanisms in Penwright (corpus selection · attribution visibility · genre-aware source integration), four threats to validity, and three production-data tests.
Tier-1 foundational paper. Position-and-mechanism paper for the structural-input alternative to freeform prompting. Five-field model (intent · structure · key ideas · relevant passages · counterpositions), theoretical positioning across transactive memory / translation theory / cognitive load theory, the seven non-negotiable rules carried verbatim, comparison to alternative input shapes (freeform, structured chain-of-thought, RAG, form-based), threats to validity, and the three production-data tests that will adjudicate.
Tier-2 foundational paper. Operationalizes the Penwright Measurement Framework that underwrites every subsequent empirical paper. Three measurement layers (output · process · development), six skill dimensions, six derived indices (WQI · II · INTI · OI · MI · DV), five-step learning loop, and four non-negotiable failure modes acting as veto conditions. Measurement-theory positioning (reflective-vs-formative · latent-vs-observed · why standard psychometric scaffolding only partly applies under continuous measurement and model drift). Compares to HCI usability scales, educational skill assessments, and standardized writing rubrics — none fit for purpose for capability-development measurement. Names what's measurable in v1, what awaits Wave-2-through-6 features, what requires longitudinal data, and what requires external validation.
Tier-2 measurement-and-mechanism paper. Companion to Paper 4 — where Paper 4 specifies what gets measured, Paper 6 specifies the loop architecture the measurement is against. Five-step learning loop (write · analyze · reflect · practice · re-measure) and four intervention types (teach moments · constraint challenges · counterposition drills · reconstruction exercises). Theoretical positioning across cognitive apprenticeship (Collins, Brown & Newman), working-alliance theory (Bordin), retrieval practice and desirable difficulties (Roediger & Karpicke; Bjork), and self-regulated learning (Zimmerman). Argues that output-quality optimization is structurally incapable of producing capability development. Four threats to validity, three production-data tests.
Publication-and-product-integration strategy for the twelve-paper Penwright program. Three tiers, shared-dataset discipline, three-phase publication windows, paper-to-feature mapping. Verbatim source.
Audience tiers
The same headline research surfaced four ways: peer-review, engineering, general audience, product.
Public-audience version of Tier-1 Paper 1. Frames the case for capability-development AI writing systems for a general reader. Outline draft.
Senior referee's lens on the AHI program if submitted today as a sequenced series. Positioning against named bodies of theory (HCI, CSCW, educational psychology, working-alliance theory, transactive memory, niche construction, distributed cognition, phenomenology of skill) and an explicit threats-to-validity register.
Bibliography
Field positioning — formal references and literature maps grounding the research threads.
Index of 29 secondary literature reviews + 6 cross-LLM syntheses, organized by the twelve-branch HAI field map plus frontier zones, adjacent fields, and deep intersections. Honest gap-flagging at the bottom.
The twelve-branch HAI field map, frontier zones, adjacent fields the mainstream has under-engaged, deep intersections where genuine theoretical integration is possible, and six cross-cutting research programs. Verbatim source — load-bearing field orientation.
Preregistrations & protocols
Studies and intervention protocols filed before execution.
OSF-style preregistration for Paper 5 of the Penwright Research Program. Pre-registers a non-monotonic threshold model for the AI-utilization-vs-capability-transfer relationship, with H1 (primary), H2 (genre-specific thresholds), and H3 (loop-completion moderation). Two-stage analysis: v1.0 descriptive at small-N pilot enrollment; v2.0 confirmatory deferred to N≥20 writers. Genre-stratified across memoir / nonfiction / fiction. Threats-to-validity register includes auto-ethnography, self-selection, LLM-scoring drift, reflection-prompt social desirability, Constraint-Mode artifact, small-N detection-floor.
OSF-style preregistration for Paper 7 of the Penwright Research Program. Pre-registers the test that adjudicates the program's most architecturally consequential commitment — the genre-fork. Three genre-specific predictions with pre-specified effect-size thresholds (memoir → emotional flattening; nonfiction → shallow argument; fiction → generic narrative). The no-world outcome (genre effects do not differ) is explicit and explicitly consequential — pre-commits the program to abandoning the load-bearing genre-fork claim if the data does not support it. Two-stage analysis (v1.0 descriptive, v2.0 confirmatory at N≥18). Paired with Paper 5 preregistration; shares analytical infrastructure.
The pilot protocol that mitigates the auto-ethnography threat-to-validity for the Penwright Research Program. Recruits 5–10 outside writers across memoir / nonfiction / fiction and across emerging / mid-career / established experience tiers, instruments them under the same Penwright protocol the principal investigator runs against himself, and establishes the data baseline for Paper 5 and Paper 7 v2.0 confirmatory analyses. Sections cover recruitment criteria, recruitment channels (PI network capped + writing-community + Prolific + open form), onboarding flow with consent and instrumentation discipline, data-handling protocol (anonymization + retention + consent), success criteria (cohort + data + quality thresholds), pilot-vs-formal-study boundary, six downstream PA-009a..f next-action assignments, and a pilot-specific threats-to-validity register.
Index entry for the Penwright Research Program preregistration set. Paper 5 (dependency) and Paper 7 (genre) preregistrations both drafted at v1.0 (filed 2026-05-05). External-operator pilot protocol drafted at v1.0 (filed 2026-05-09). Formal OSF filing follows once the external-operator pilot enrolls its first cohort and the analysis pipeline is built.