peopleanalyst

research / principia / audience tiers

Peer-review framing

Positioning against the existing measurement-handbook tradition (Schmitt & Highhouse 2013; Borman et al. 2003) and the meta-analytic synthesis tradition (Hunter & Schmidt 2004; Cooper 2017). What Principia adds, what it does not claim to add.

By Mike West

Principia·Audience tiers·Peer-review framing·source: people-analyst/principia/docs/research/reviews/peer-review-framing.md

Principia: a peer-review framing

Where the registry sits against four established traditions in measurement and synthesis, what it claims to add, what it does not, and the threats-to-validity register a peer reviewer should hold us to.


This document is written for the reader who would be assigned Principia as a peer reviewer — an I/O psychologist, an organizational-science methodologist, a meta-analyst — and who has the legitimate question "is this useful, and is it honest about what it is?" Principia is neither a new measurement theory nor a new statistical method. It is infrastructure positioned against four mature traditions, each of which already does part of what the field needs. The contribution claim is structural: a queryable, source-graded, continuously-updated registry that complements those traditions rather than competing with them. The honest framing is easier to defend than the inflated one, so we will be precise about both.


1. The measurement-handbook tradition

The canonical reference points are Handbook of Psychology, Vol. 12: Industrial and Organizational Psychology (Borman, Ilgen & Klimoski, 2003) and Handbook of Psychology, Vol. 12 (Schmitt & Highhouse, 2013), with the Hulin & Judge (2003) Job Attitudes chapter as a representative load-bearing entry. Their contribution is real: synthesis at depth, organized by domain, written by subject-area experts, with citation density no individual reader could reproduce alone.

What handbooks do well, Principia does not try to replace: domain synthesis at theoretical depth (a chapter on job attitudes carries the conceptual genealogy — Locke's value-affect hypothesis, Weiss & Cropanzano's affective-events theory, the Hulin family — in a form a graduate student can read once and understand the shape of the field); editor-curated weighting (the implicit here is what counts as load-bearing as of this date); and argumentation as well as enumeration (the engagement chapter in the 2013 edition is not a list of instruments; it is a position on where the construct boundary sits).

What handbooks structurally cannot do is what Principia is positioned to do:

  • Continuous update. A handbook chapter freezes at publication. The next chapter on the same construct gets written ten to fifteen years later, by a different author, with a different theoretical lean. Validation studies and cross-cultural adaptations that arrive between editions tend to disappear into the gap. The format does not permit anything else; books are frozen objects.
  • Machine-readable rows. A reader who wants to know the population, sample size, design, country, and effect-size value for every study connecting engagement to performance must extract that from the citation tree by hand, then trace each citation to the primary source. The chapter is not queryable.
  • Transparent grading. Handbooks weight evidence implicitly. A reader cannot tell, from the chapter alone, why one cited study carries the chapter's argument and another is in a footnote.

Handbooks are synthesis-as-argument; Principia is synthesis-as-infrastructure. The two coexist. A 2030 handbook author who sits on top of a populated registry spends attention on synthesis decisions rather than extraction; a registry user who needs to understand why engagement-vigor-dedication-absorption is treated as a single construct still goes to the chapter.


2. The meta-analytic synthesis tradition

The canonical references are Hunter & Schmidt (2004), Methods of Meta-Analysis; Cooper (2017), Research Synthesis and Meta-Analysis; and Borenstein, Hedges, Higgins & Rothstein (2009), Introduction to Meta-Analysis. What this lineage made possible — pooling effect sizes with explicit weighting, confidence intervals, heterogeneity statistics, and publication-bias diagnostics — is what lets the field ask "what does forty years of evidence on this relationship say?" with any quantitative rigor at all.

What meta-analyses do well: quantitative pooling with explicit weights (DerSimonian-Laird random-effects; REML; the modern Hartung-Knapp-Sidik-Jonkman corrections); publication-bias machinery (funnel plots, Egger's test, trim-and-fill, PET-PEESE, p-curve, selection models — none of this existed in usable form thirty years ago); inclusion-criteria transparency (a well-conducted meta-analysis publishes search strategy and coding decisions in the same paper as the results).

Where Principia complements the meta-analytic tradition is not in pooling method — the synthesis engine in packages/registry/src/loop/prior-synthesis.ts is a fairly conventional random-effects implementation with quality weighting, with REML and HKSJ available for sensitivity — but in cadence and reuse:

  • Continuous pooling. A meta-analysis is a snapshot. The next meta-analysis on the same relationship comes out four to seven years later, by a different team, with a different inclusion-criteria reading and a different set of contributing studies. Principia runs the pool continuously: every new EffectSize row recomputes the relevant CanonicalPrior. The Cochrane equivalent is the living-systematic-review machinery (Elliott et al. on living reviews; PROSPERO registration); the I/O analog does not exist outside Principia, and the registry is structured to be that analog.
  • A queryable input layer for future meta-analysts. A meta-analyst in 2030 who needs effect-size rows for engagement → performance currently spends most of their attention on extraction: PDF retrieval, statistic-conversion, sample-size verification, design coding. If Principia accumulates as intended, that work is already done, with each row carrying its DOI, its quality grade, and its provenance back to the source paper. The meta-analyst starts from the registry table and spends attention on the synthesis decisions — which subset of rows to include, which heterogeneity model to fit, which moderators to test. The cost of writing a meta-analysis drops because the input layer is shared.
  • Provenance from synthesis to study. The Bayesian prior returned at /api/v1/priors/{from}/{predicate}/{to} carries contributing_effect_size_ids[]. Every claim a downstream consumer makes from the prior is traceable back to a study-level row, which is traceable back to a DOI. The synthesis is reproducible from the registry rows + the BibTeX, without the consumer needing to trust the synthesis author.

What Principia explicitly does not add to the meta-analytic tradition: no new pooling estimator, no new heterogeneity statistic, no new bias-correction. The methods are the ones the meta-analytic literature already invented. The contribution is plumbing.


3. The IRT and measurement-theory literature

The third tradition is item response theory and the broader measurement-theory literature — Lord & Novick (1968), Lord (1980), Embretson & Reise (2000), with Messick (1989) and Kane (2013) on validity argumentation, and the measurement-invariance lineage (Vandenberg & Lance, 2000) for cross-population testing. This literature defines what it means for an instrument to be measuring a construct rather than merely correlated with it.

A common misreading would be to assume the registry replaces IRT analyses. It does not:

  • Principia hosts the data IRT and measurement-invariance work runs on. Items, response options, scoring rules, calibration parameters when published, and the validation evidence per population. The CanonicalSurveyItem entity (per the @people-analyst/measurement-core schema) is the item-level row; the Instrument entity is the scale; the Construct entity is the latent target.
  • Principia does not estimate IRT parameters in-house. When a published validation paper reports calibrated discrimination and difficulty parameters, those are ingested as evidence. When a paper reports configural / metric / scalar invariance testing against a new population, that is ingested as evidence. The registry is the place those analyses become discoverable; it is not where they are conducted.
  • The registry exposes the input layer the field needs for replication. A measurement-invariance test conducted in 2027 against a new cultural sample is reproducible by other researchers only if they can find the original item set, the calibration sample's response distribution where reported, and the prior invariance evidence in one place. The registry is structured to be that one place. Today, the same work requires reading three to five papers, emailing the original developer for the calibration matrix, and reconstructing the item set from supplementary materials that have moved across journal hosting providers.

The honest framing for IRT reviewers: Principia is infrastructure that makes IRT-tradition work discoverable, not a substitute for IRT-tradition work. If the registry inherits errors in the underlying calibration evidence, the errors propagate. The mitigation is the source-grading rubric: a calibration claim from an A-grade replicated validation paper carries different weight than the same claim from a C-grade unreplicated single-site study, and the grades travel with the row.


4. The open-science and preregistration tradition

The fourth tradition is the open-science movement — the OSF ecosystem, preregistration practice, replication initiatives (Many Labs; Registered Replication Reports), and the broader transparency framework that has reshaped psychology over the last fifteen years.

Principia inherits more from this tradition than from any other. The discipline is the same: every claim source-graded, every synthesis reproducible from the underlying rows, every artifact carrying its frozen-at date.

  • Source-grading at the row level. Every citation carries a quality grade (A | B | C | D) per the rubric in docs/research/methodology.md §2. Grades are not a private annotation; they are published alongside the citation and travel with every consumer's pull from the registry.
  • Snapshots and versioning. Each construct-family survey is frozen at a date; updates create new snapshots; consumers pin to a version the way they pin a dependency. The book-build artifact (the rendering of the registry into a PDF or HTML reference) carries a deterministic snapshot id (hash of the JSON store contents at render time). Two builds against the same input produce byte-identical artifacts.
  • Preregistration of synthesis-analytic decisions. When a construct-family survey surfaces a meta-analytic gap — a tuple the existing meta-analyses do not cover — Principia files a synthesis-analytic preregistration under the protocol at docs/research/preregistrations/synthesis-analytic-protocol.md. The protocol mirrors the Cochrane / PROSPERO discipline: inclusion criteria first, coding protocol with inter-rater reliability targets, heterogeneity-analysis plan, publication-bias diagnostics, sensitivity-analysis plan, deviations log. The point is not to declare new methodology; it is to refuse to let synthesis decisions get made retrospectively.
  • Verification log as published artifact. AI-assisted extraction will produce wrong rows. The mitigation, per methodology.md §7, is a paired second-pass: extractor produces the row, verifier reads the source and confirms the row's claims against the source text. Disagreements queue for human review. The verification log is part of the registry and exposed to consumers; rows carry a verification_status and a link to the verifier's notes.

What Principia does not claim to add to the open-science tradition: the methodology is borrowed. The discipline that the OSF ecosystem made standard practice — preregistration, transparent inclusion criteria, source-grading, snapshot-and-version, deviations log — is what Principia inherits. The contribution is the application of that discipline to organizational-measurement infrastructure at registry scale.


5. What Principia claims to add — explicitly

Pulling the four traditions together, the contribution claim:

  • A continuously-updated, source-graded registry of organizational measurement rows — constructs, instruments, items, citations, effect sizes, evidence edges, deployment evidence — typed against a shared canonical vocabulary (@people-analyst/measurement-core, currently v0.9.0).
  • A Bayesian-prior synthesis layerCanonicalPrior rows produced from study-level EffectSize rows under a given (from_construct, predicate, to_construct) tuple, available via /api/v1/priors/{from}/{predicate}/{to} with provenance attached. The methodology is conventional random-effects meta-analysis with quality weighting; the contribution is that the prior is queryable and current as of T, not buried in a 2018 meta-analysis paper.
  • A theoretical-model layer as queryable dataCanonicalTheoreticalModel rows for named theories (Job Demands-Resources; Conservation of Resources; Self-Determination Theory; Affective Events Theory; Job Characteristics Model), each carrying canonical constructs, canonical relations with central / boundary flags, and foundational citations. Existing measurement catalogs surface instruments; the named-theory layer that owns those instruments is, as far as we have found, novel as a first-class registry entity.

Three structural additions, each layered on top of work the four traditions did before us.


6. What Principia does NOT claim to add — explicitly

The peer-review reading we want to pre-empt is the assumption that an infrastructure paper is also a methods paper. It is not. Principia does not claim:

  • Original empirical work. No new primary data is collected. The registry catalogs and synthesizes work other researchers have done.
  • Novel statistical methods. The synthesis engine implements DerSimonian-Laird random-effects meta-analysis with Fisher-z transform for correlations, quality-weighted pooling, and fixed-effects fallback at k=1. REML and HKSJ are available for sensitivity. None of these are new; the contribution is that they run continuously against the current registry state.
  • Novel measurement theory. Reflective vs formative, unidimensional vs multidimensional, latent vs observed: the registry carries these as measurement-model attributes on each Construct. It does not propose new conventions.
  • A replacement for existing meta-analyses. When a recent, well-graded meta-analysis answers a tuple's question, the registry cites it and defers to it. Most of the time, the right answer is the existing meta-analysis is the answer; here is the DOI; trust it; move on.
  • A peer-reviewed claim layer. v1 is single-author so the methodology proof-of-method is unambiguous. The post-v1 aspiration of peer-graded external extensions is not yet implemented.

7. Threats-to-validity register

This is the section a peer reviewer should hold us to most closely. We name the risks; we name the mitigations; we do not pretend the mitigations close them.

7.1 Source-grading subjectivity

The A-D rubric in methodology.md §2 is a curator judgment, not an algorithmic score. Two reasonable graders could disagree on whether a single study is B or C. The rubric is documented and conservative (when uncertain, take the lower grade); grading rationale stays attached to the citation. Mitigation: rubric public; grades published per row; post-v1 external graders produce inter-rater data v1 lacks. Residual: single-curator drift over time; audit log captures changes but does not correct for them.

7.2 Author position — single-author registry

Single-author work (Mike West) with AI-assisted extraction. Positionality shapes what gets surveyed, what gets graded down, what enters the construct-family roadmap. The curator-policy decisions D1–D4 in docs/specification/loop/curator-policy.md codify daily operations (D4: proposals only; no auto EffectSize). Mitigation: rubric published; curator-policy decisions public and dated; roadmap justified, not assumed. Residual: a single curator's preferences shape coverage. A reviewer should read the construct-family roadmap as a load-bearing positionality statement.

7.3 Schema-extraction errors

AI-assisted extraction is fast and lossy. An effect size mis-extracted as Pearson r when the source reports Spearman ρ propagates into the synthesis layer without obvious surface signs. Mitigation: novelty-verification pass (PRN-032) — extractor produces the row, separate verifier reads the source and confirms each load-bearing claim. Disagreements queue for human resolution. Unverified rows carry pending_verification and are filtered from consumer-facing API responses. Residual: false confirms where both extractor and verifier agree on a wrong row. Multi-model verification (different model families for extraction and verification) is the partial mitigation; double-agreed false confirms remain open.

7.4 Instrument-selection bias in construct families

Tier-1 families (engagement, job satisfaction, organizational commitment, organizational climate) are chosen for densest accumulated literature, not representative sampling. Mitigation: ordering justified in PROGRAM.md; each family explicitly marked forthcoming until live. Residual: literature-thin but organizationally-important constructs look mature later than they should. Psychological safety is a present-tense example — the Edmondson-tradition validation work in the corpus today is sparser than the construct's organizational salience suggests.

7.5 Cross-cultural representation

Most validation literature in the corpus is Western and white-collar. Cross-cultural adaptations exist for many Tier-1 instruments (UWES has been translated across dozens of languages; Allen & Meyer similarly), but underlying validation samples skew toward European, North American, and East Asian university and corporate populations. Mitigation: CulturalAdaptation schema carries per-adaptation evidence; cross-cultural notes required in every survey; PRN-045 surfaces culture-conditioned sub-priors when sample data permits. Residual: the registry inherits the cultural composition of the underlying literature. For many tuples, post-translation validation work is thin or absent. The registry reports culture_unavailable_note rather than fabricate a sub-prior, but the absence itself is a finding the registry surfaces.


8. What honest acceptance would look like

A peer reviewer is right to ask: under what conditions would I accept this work as a contribution?

Accept Principia as infrastructure when the engagement family's effect-size table is populated to ≥ 30 load-bearing rows with verification status on each; when ≥ 10 CanonicalPrior rows are synthesized from those effect sizes; when the verification pass rate is at or above 90% automated-propose / human-confirm; and when the time to onboard a new construct family is under two curator sessions. Those are the metrics in the research-ingestion vision document. Below them, the registry is a credible proof-of-method but not yet a load-bearing reference.

We are not above them yet. The engagement family is in deep extraction; the first CanonicalPrior rows are synthesized against fixtures rather than the live corpus; the verification pass is operational but still manual. The methodology is in place, the schema discipline is in place, the synthesis engine is in place — the substantive content is what the next two quarters produce.

The bet: the right shape for organizational-measurement knowledge is not a book and not a periodically refreshed meta-analysis. It is a registry that compounds. The four traditions named above each established part of what the field needs. Principia's claim is that the missing part is the queryable, source-graded, continuously-updated layer that sits on top of them.

If we have understated a tradition we should be positioning against, the literature map at docs/research/literature-map.md and the bibliography at docs/research/bibliography.bib are entry points. The grades are public. The disagreements are public. The verification log is public.