Overview
What this research program is and why it exists. The frame the rest of the work hangs on.
research / arc / adaptive-measurement
What does it take to do longitudinal measurement that compounds across studies without confounding? Item-response accumulation, adaptive sampling, RID/SID architecture, instrument validation, and the canonical-vocabulary discipline that lets sibling studies share evidence rather than silo it.
Why this matters
The portable claim — what this arc lets you understand outside the surface domain.
Most fields with measurement traditions reinvent instruments per study and pool effect sizes after the fact. The adaptive-measurement arc is a bet that the right substrate — questions as first-class objects with stable IDs, response data accumulating across studies, instrument-quality grading enforced at write-time — is what lets a measurement program actually compound. The technical contribution shows up at Vela (Reincarnation engine), Principia (organizational-measurement registry), and Namesake (within-session preference calibration). The methodology travels: anywhere rigorous measurement is unevenly distributed across the working community is a candidate for the same architectural pattern.
Spans
Products this arc cuts through. Each application is sometimes the lead empirical apparatus, sometimes the funding/data-collection platform.
Read first
The general-audience entry point for this arc. Drill down below for the full set.
A source-graded survey of organizational measurement, output as a book + queryable database, rendered from one underlying registry. Outside-reader brief on what it codifies, why now, and how it relates to the rest of the portfolio via @measurement/core.
Read →Drill-down — full arc surface
Cross-product. Source application shown on each entry.
Overview
What this research program is and why it exists. The frame the rest of the work hangs on.
Methodology
How the work is done — instruments, protocols, the standards each report inherits.
Adaptive measurement, dual-grade corpus ingestion, RID/SID architecture, and the standards reports inherit.
Methodological note on the Vela ingestion architecture — when to favor the dual-grade pattern, when multi-faceted extraction wins.
How cultural-diffusion research is conducted at Namesake — instruments, sources, and standards.
Source selection, source-quality grading rubric (A–D), statistical-metadata extraction protocol, schema discipline against @measurement/core, versioning + snapshots, novelty verification, threats to validity, author position.
Multi-LLM deep-research with synthesis discipline; Penwright production-instrument design; the Penwright Measurement Framework (six skill dimensions, six derived indices, three measurement layers, five-step learning loop) with four non-negotiable failure modes; pre-registered hypotheses; genre-aware analysis required.
Forthcoming — adaptive CAMS measurement, protected-feedback substrate, anchor-registry ingest, dual-grade corpus discipline, and evidence-chain standards the diagnostic inherits.
Reports
The actual research findings — phased results, research-question briefs, applied analyses.
Which dimensions of variance separate participants and how stable they are.
First-volume analysis of the desire-index instrument.
Tier-2 foundational paper. Operationalizes the Penwright Measurement Framework that underwrites every subsequent empirical paper. Three measurement layers (output · process · development), six skill dimensions, six derived indices (WQI · II · INTI · OI · MI · DV), five-step learning loop, and four non-negotiable failure modes acting as veto conditions. Measurement-theory positioning (reflective-vs-formative · latent-vs-observed · why standard psychometric scaffolding only partly applies under continuous measurement and model drift). Compares to HCI usability scales, educational skill assessments, and standardized writing rubrics — none fit for purpose for capability-development measurement. Names what's measurable in v1, what awaits Wave-2-through-6 features, what requires longitudinal data, and what requires external validation.
Tier-2 measurement-and-mechanism paper. Companion to Paper 4 — where Paper 4 specifies what gets measured, Paper 6 specifies the loop architecture the measurement is against. Five-step learning loop (write · analyze · reflect · practice · re-measure) and four intervention types (teach moments · constraint challenges · counterposition drills · reconstruction exercises). Theoretical positioning across cognitive apprenticeship (Collins, Brown & Newman), working-alliance theory (Bordin), retrieval practice and desirable difficulties (Roediger & Karpicke; Bjork), and self-regulated learning (Zimmerman). Argues that output-quality optimization is structurally incapable of producing capability development. Four threats to validity, three production-data tests.
Anticipated thread — cross-study item-response accumulation without confounding.
Preregistrations & protocols
Studies and intervention protocols filed before execution.