Overview
What this research program is and why it exists. The frame the rest of the work hangs on.
research / ai-human-interaction
A research program on what AI does to human capability over time, and what kinds of system design support development rather than dependence. Penwright — an authorship-development system shipped inside Vela — is the lead empirical apparatus; the broader frame extends to AI as a long-term cognitive partner across professions, domains, and life stages.
Why this matters
The portable claim — what this research lets you understand outside the surface domain.
Existing AI–human-interaction research clusters in single-session, individual-level, descriptive studies. We know remarkably little about what happens to a person's reasoning, vocabulary, social life, or skill acquisition over months and years of daily interaction with capable AI systems. If the Ironies of Automation generalize — operator vigilance falling as system reliability rises — then the productivity story being told today systematically overstates net effect. The portable contribution: a measurement framework for AI-augmented capability development with explicit failure modes, a pre-registered twelve-paper empirical program, and a theoretical bridge between mainstream HAI and the bodies of theory it has under-engaged (companion-species studies, cognitive apprenticeship, working-alliance theory, distributed cognition, niche construction, transactive memory, ritual studies, indigenous relational ontologies). The methods generalize beyond writing — to coding, design, research, education, and clinical practice.
Read first
The general-audience explainer is the entry point. Everything below is the drill-down.
Public-audience version of Tier-1 Paper 1. Frames the case for capability-development AI writing systems for a general reader. Outline draft.
Read →Drill-down — full research surface
Seven-slot baseline. Forthcoming slots shown openly.
Overview
What this research program is and why it exists. The frame the rest of the work hangs on.
Methodology
How the work is done — instruments, protocols, the standards each report inherits.
Reports
The actual research findings — phased results, research-question briefs, applied analyses.
Track A — twelve sub-papers across three tiers (foundational theory · measurement and mechanism · longitudinal empirical studies) drawing from a shared dataset generated by Penwright in production. Track B — six cross-cutting research programs the Penwright evidence may eventually seed.
Tier-1 foundational paper. Synthesizes phenomenology of skill, attention theory, transactive memory, standpoint theory, epistemic injustice, cognitive apprenticeship, working-alliance theory, psychoanalytic theory, improv theory, translation theory, niche construction, and institutional economics into a single positioning argument for Penwright. Outline draft.
Tier-1 foundational paper. Position-and-mechanism paper for the epistemic-control alternative to default-LLM-corpus inheritance. Theoretical positioning across standpoint theory (Harding, Haraway), epistemic injustice (Fricker), and the AI bias tradition (Bender et al., Bommasani et al., Birhane et al.). Distinguishes Corpus Control from adjacent technical approaches (RAG, fine-tuning, prompt engineering) at the layer level. Three mechanisms in Penwright (corpus selection · attribution visibility · genre-aware source integration), four threats to validity, and three production-data tests.
Tier-1 foundational paper. Position-and-mechanism paper for the structural-input alternative to freeform prompting. Five-field model (intent · structure · key ideas · relevant passages · counterpositions), theoretical positioning across transactive memory / translation theory / cognitive load theory, the seven non-negotiable rules carried verbatim, comparison to alternative input shapes (freeform, structured chain-of-thought, RAG, form-based), threats to validity, and the three production-data tests that will adjudicate.
Tier-2 foundational paper. Operationalizes the Penwright Measurement Framework that underwrites every subsequent empirical paper. Three measurement layers (output · process · development), six skill dimensions, six derived indices (WQI · II · INTI · OI · MI · DV), five-step learning loop, and four non-negotiable failure modes acting as veto conditions. Measurement-theory positioning (reflective-vs-formative · latent-vs-observed · why standard psychometric scaffolding only partly applies under continuous measurement and model drift). Compares to HCI usability scales, educational skill assessments, and standardized writing rubrics — none fit for purpose for capability-development measurement. Names what's measurable in v1, what awaits Wave-2-through-6 features, what requires longitudinal data, and what requires external validation.
Tier-2 measurement-and-mechanism paper. Companion to Paper 4 — where Paper 4 specifies what gets measured, Paper 6 specifies the loop architecture the measurement is against. Five-step learning loop (write · analyze · reflect · practice · re-measure) and four intervention types (teach moments · constraint challenges · counterposition drills · reconstruction exercises). Theoretical positioning across cognitive apprenticeship (Collins, Brown & Newman), working-alliance theory (Bordin), retrieval practice and desirable difficulties (Roediger & Karpicke; Bjork), and self-regulated learning (Zimmerman). Argues that output-quality optimization is structurally incapable of producing capability development. Four threats to validity, three production-data tests.
Publication-and-product-integration strategy for the twelve-paper Penwright program. Three tiers, shared-dataset discipline, three-phase publication windows, paper-to-feature mapping. Verbatim source.
Audience tiers
The same headline research surfaced four ways: peer-review, engineering, general audience, product.
Public-audience version of Tier-1 Paper 1. Frames the case for capability-development AI writing systems for a general reader. Outline draft.
Senior referee's lens on the AHI program if submitted today as a sequenced series. Positioning against named bodies of theory (HCI, CSCW, educational psychology, working-alliance theory, transactive memory, niche construction, distributed cognition, phenomenology of skill) and an explicit threats-to-validity register.
Engineering reviewer's lens on the Penwright Measurement Framework, the Adaptive Authorship Control Kernel (F-19), and the instrumentation discipline.
What the program tells us to build next — in Penwright and in adjacent authoring environments. Anchored in the measurement framework, the seven non-negotiable rules of authorship, and the genre-aware behavior pattern.
Bibliography
Field positioning — formal references and literature maps grounding the research threads.
Index of 29 secondary literature reviews + 6 cross-LLM syntheses, organized by the twelve-branch HAI field map plus frontier zones, adjacent fields, and deep intersections. Honest gap-flagging at the bottom.
The twelve-branch HAI field map, frontier zones, adjacent fields the mainstream has under-engaged, deep intersections where genuine theoretical integration is possible, and six cross-cutting research programs. Verbatim source — load-bearing field orientation.
Preregistrations & protocols
Studies and intervention protocols filed before execution.
OSF-style preregistration for Paper 5 of the Penwright Research Program. Pre-registers a non-monotonic threshold model for the AI-utilization-vs-capability-transfer relationship, with H1 (primary), H2 (genre-specific thresholds), and H3 (loop-completion moderation). Two-stage analysis: v1.0 descriptive at small-N pilot enrollment; v2.0 confirmatory deferred to N≥20 writers. Genre-stratified across memoir / nonfiction / fiction. Threats-to-validity register includes auto-ethnography, self-selection, LLM-scoring drift, reflection-prompt social desirability, Constraint-Mode artifact, small-N detection-floor.
OSF-style preregistration for Paper 7 of the Penwright Research Program. Pre-registers the test that adjudicates the program's most architecturally consequential commitment — the genre-fork. Three genre-specific predictions with pre-specified effect-size thresholds (memoir → emotional flattening; nonfiction → shallow argument; fiction → generic narrative). The no-world outcome (genre effects do not differ) is explicit and explicitly consequential — pre-commits the program to abandoning the load-bearing genre-fork claim if the data does not support it. Two-stage analysis (v1.0 descriptive, v2.0 confirmatory at N≥18). Paired with Paper 5 preregistration; shares analytical infrastructure.
The pilot protocol that mitigates the auto-ethnography threat-to-validity for the Penwright Research Program. Recruits 5–10 outside writers across memoir / nonfiction / fiction and across emerging / mid-career / established experience tiers, instruments them under the same Penwright protocol the principal investigator runs against himself, and establishes the data baseline for Paper 5 and Paper 7 v2.0 confirmatory analyses. Sections cover recruitment criteria, recruitment channels (PI network capped + writing-community + Prolific + open form), onboarding flow with consent and instrumentation discipline, data-handling protocol (anonymization + retention + consent), success criteria (cohort + data + quality thresholds), pilot-vs-formal-study boundary, six downstream PA-009a..f next-action assignments, and a pilot-specific threats-to-validity register.
Index entry for the Penwright Research Program preregistration set. Paper 5 (dependency) and Paper 7 (genre) preregistrations both drafted at v1.0 (filed 2026-05-05). External-operator pilot protocol drafted at v1.0 (filed 2026-05-09). Formal OSF filing follows once the external-operator pilot enrolls its first cohort and the analysis pipeline is built.
Pipeline
What is running, what is queued, what is forthcoming.