research / principia / audience tiers
General-audience explainer
Public framing of the survey-as-instrument argument — what builders, practitioners, and researchers can do with a queryable, source-graded measurement registry that they could not do before.
By Mike West
The measurement library nobody built
A queryable, source-graded registry of workplace measurement — what it is, why fifty years of handbook chapters did not produce one, and what it does not yet know.
There is a meeting that happens in approximately every company over fifty people, once a quarter, and it goes like this. A senior executive turns to whoever is sitting in the HR seat and asks: "What's our engagement number?" The number comes back. Seventy-three. And then — depending on whether you are watching a competent organization or a typical one — one of two things happens.
The typical version is that the room agrees to behave as though seventy-three means something. The CHRO defends it. The board reviews it. The number goes up. Whether anything about the actual employees changed is a question that does not get asked, because the question that gets asked first is whether seventy-three is good — and the answer requires another number, a benchmark, which is itself a composite of other companies' seventy-threes, none of whom can tell you with confidence what their seventy-three is measuring either.
The competent version of this meeting is rarer than you would think. In the competent version, somebody in the room can point to a piece of paper that names the specific items the number is built out of, the specific population it was collected from, the research that says those items measure what the company says they measure, and the evidence that says the thing they measure is connected — by how much, in which direction, with how much certainty — to whatever the company actually cares about.
That paper does not exist at most companies. It also, surprisingly, does not exist in the published research literature in a form anybody can actually use. There are handbooks. There are meta-analyses. There are measurement chapters in measurement chapters in measurement chapters. What there isn't, and what people analytics has been working around for forty years, is a single place a practitioner or a researcher can go to type engagement → performance into a box and get back: here are the instruments, here is what is known about each, here is the effect size from the studies that report one, here is the quality grade on each — go.
That is what Principia is trying to be. A measurement library, in the literal old-fashioned sense: rows you can look up, with sources you can trace, with grades you can argue with.
The problem the field has not named
There is a tradition in industrial-organizational psychology of producing very good handbooks. The Handbook of Psychology (Borman, Ilgen, and Klimoski, 2003) is one. Handbook of Industrial and Organizational Psychology (Schmitt and Highhouse, 2013) is another. They are excellent books. They synthesize decades of measurement work into long, careful chapters with hundreds of citations each.
Here is the problem with handbooks. The chapter on engagement gets written once and enters print. The next chapter on engagement gets written ten or fifteen years later, by a different author, with a different theoretical lean, and it does not reliably tell you what changed. Instruments that came out in the meantime appear or do not depending on the new author's reading. Cross-cultural validation work done between editions tends to drop into a paragraph or vanish. When you actually go to use a handbook chapter — to pick an instrument, or to defend a measurement decision in a room full of skeptics — you find that the handbook tells you most of what was true on the day of the citation freeze, and almost nothing about what is true now.
This is not the handbook authors' fault. The format does not permit anything else. A book is a frozen object. The synthesis it produces is real — but the synthesis cannot be re-asked four years later with the post-2020 longitudinal evidence layered in. You would need a new book. New books are slow.
The way the field has worked around this is meta-analysis. The Hunter and Schmidt tradition — Methods of Meta-Analysis (2004) is the canonical reference; The Handbook of Research Synthesis (Cooper, Hedges, and Valentine, 2017) is the methods textbook — takes a specific question and pools the available studies into a single estimate with confidence intervals, heterogeneity statistics, and publication-bias diagnostics.
Meta-analyses are wonderful. They are also, structurally, the same problem at a different scale. A meta-analysis is written once. It freezes. The next one on the same question comes out four to seven years later, by a different team, with a different inclusion-criteria reading. The Cochrane reviewers know this and have built elaborate machinery (PROSPERO, living systematic reviews) to keep their corner of medicine from drifting. The organizational-research literature has nothing comparable. The result is that any practitioner trying to make a measurement decision in 2026 is reading a meta-analysis whose search ended in 2018, citing studies whose data was collected in 2014, deploying an instrument whose validation work was done in 2009.
That is the problem nobody has quite named. Most fields with measurement traditions reinvent the instrument inventory per project. They pool effect sizes after the fact, in meta-analytic bursts that have to be redone every few years to stay current. Each researcher rebuilds, from scratch, knowledge hundreds of previous researchers already built. It is a strange amount of duplicated work for a field as mature as I-O psychology.
The bet of Principia is that the right shape for this knowledge is not a book and not a periodically refreshed meta-analysis. It is a registry. A registry is not frozen. It accumulates. New evidence enters as the literature produces it. Old evidence stays in it with its grade attached, so you can see whether the dominant claim today rests on twenty studies from 1995 or on eighty from the last decade. A registry can be queried — type a tuple, get back a row. A book cannot.
The closest analog from another field is something like UniProt for proteins, or arXiv for preprints. These are not books. They are infrastructures the field uses daily. They are queryable, source-tagged, versioned, and they compound. They are the thing the rest of the field can build on top of without having to redo the foundation work.
That is the shape Principia is reaching for. It is not yet that shape — at the time of this writing it is a partial shape, with a schema, a small handful of construct families, a public API, and a longer list of construct families that are not yet covered. We will come back to what it does not know.
What a row in the registry actually looks like
Plain English. No jargon yet.
For a construct — say, engagement — the registry stores the canonical name, the alternate names the field uses (job engagement, work engagement, employee engagement — distinct in their original sources, frequently merged in practice), the measurement-model assumptions, and a list of citations with grades attached.
For an instrument — say, the Utrecht Work Engagement Scale — the registry stores the developers, the items (the actual survey questions), the response scale, the reliability and validity evidence per population it has been deployed in, the cross-cultural adaptations, and the studies that used it to predict something else. With grades.
For an effect size — say, engagement → job performance — the registry stores every primary study Principia has ingested that reports the relationship, the effect value, the sample size, the design, the population, the country, the DOI, and the quality grade. Then it stores the synthesis: the pooled estimate across all the studies, the between-study heterogeneity, the publication-bias diagnostics, and a Bayesian prior downstream tools can plug into their own analyses.
That last piece — a Bayesian prior, returned by API — is the part nobody else has tried to build. The point is not to declare a single number. The point is to return a distribution, with provenance attached, that another researcher's analysis can use as informed-prior input. A grad student running a study in 2027 should not have to redo the meta-analysis to know what fifty years of literature has already shown. The prior should be a queryable artifact.
The grades are the discipline that makes this work. Every citation gets one of four letters: A for high-N, replicated, methodologically sound; B for solid single studies; C for methodologically caveated; D for pointer-only. Grading is conservative — if the rubric admits two grades, take the lower. The effect-size table promotes only A and B grades into headline numbers; C grades appear as caveated rows; D grades do not appear in the table at all. This is not novel. Cochrane has done it for decades. It is just not what the organizational-measurement literature has been doing.
The rest of the discipline is schema. Every construct, instrument, item, citation, and effect-size row is typed against the same canonical shape — a package called @people-analyst/measurement-core that other tools in this portfolio consume directly. The schema is the contract. When a sibling tool needs a field the schema does not yet have, the discussion happens at the schema level, not as a silent local extension. That is the part that takes the registry from "a database I could have built in a weekend" to "a thing the rest of the portfolio can actually rely on."
What Principia is not trying to replace
Principia does not replace the meta-analytic tradition. It complements it.
This is worth being precise about, because the easiest way to misread what we are doing is to assume we are claiming meta-analysis is obsolete. We are not. Hunter and Schmidt's work, and the broader machinery built by Cooper, Hedges, Valentine, Viechtbauer (whose metafor R package is what most modern meta-analyses run on), and a small army of careful methodologists, is what made it possible to ask "what does forty years of evidence on this say?" with any rigor at all.
What Principia adds is not a new pooling method. Principia's prior-synthesis engine is, under the hood, a fairly conventional random-effects DerSimonian–Laird meta-analysis with quality weighting, with REML and Hartung–Knapp–Sidik–Jonkman alternatives available for sensitivity. There is nothing in the engine the meta-analysis literature did not already invent.
What is new is the plumbing. The registry runs the pool continuously, against the current set of ingested rows, with the pool re-firing whenever new evidence enters. A meta-analysis is a snapshot. Principia is a standing meta-analysis — one that consumes new studies as they arrive, applies the same inclusion criteria, regrades, re-pools, and exposes the new prior at the same API endpoint, with the same shape, every time. The methods are the same. The cadence is different.
When an existing meta-analysis is recent, well-graded, and answers the question, Principia uses it. It does not duplicate it. The registry cites it and refers downstream consumers to it. A synthesis-analytic preregistration is filed only when there is a real gap — a tuple the existing meta-analyses do not cover, a population they did not include, a time window they predate, an instrument generation they did not separate. Most of the time, the right answer is the existing meta-analysis is the answer; here is the DOI; trust it; move on.
The complementary relationship runs the other direction too. A registry of typed, source-graded primary-study rows is the input a future meta-analyst wants. If Principia accumulates as intended, the meta-analyst of 2030 starts from the registry's effect-size table, with grades and DOIs attached, and spends attention on the synthesis decisions rather than the extraction grind. That is a real productivity gain for the field, and it costs the registry nothing — the rows have to be extracted to power the standing pool anyway.
What it does not know yet
Honest accounting.
At the time of this writing, Principia has a working schema, a working ingestion pipeline, a working public API, a working prior-synthesis engine — and approximately one construct family approaching what we would call survey-ready coverage. Engagement, the densest accumulated literature in the space and our proof-of-method, is in deep extraction. Job satisfaction and organizational commitment are queued. Burnout has skeleton coverage. Psychological safety has skeleton-of-skeleton coverage — the corpus has almost no Edmondson-scale validation work in the inventory, and what is there is mixed in with occupational-safety-climate work that is a different construct under the same English-language word.
This is what forthcoming looks like in practice. The schema can hold the rows, the API can return them, the registry recognizes the canonical name — and the actual extracted, graded, verified content is not yet there. A reader who looks up organizational citizenship → contextual performance today gets back a thin row with a small handful of citations and a note saying the construct family is queued for survey work in late 2026. The whole point of the grade-and-source discipline is that the registry tells you what it knows and tells you what it does not know with equal precision.
Other limits worth naming. Coverage bias — what gets surveyed first shapes which families look mature; the current order is sequenced by literature density, not by perceived organizational importance. Extraction error — AI-assisted extraction is fast and lossy; the mitigation is a verification log that is itself a published artifact. Author position — single-author work, with the rubric public so you can argue with the grades; the planned post-v1 model opens the registry to peer-graded extensions. Selection effects in the underlying literature — the registry inherits the publication bias of the literature it is built on; the diagnostics are run and reported, but a diagnostic is not a correction.
None of this is the language of a finished product. It is the language of a registry being built honestly in public. The alternative — a polished marketing surface that overstates coverage and hides the limits — is exactly the failure mode the field has been working around for forty years.
What it lets you actually do
Three concrete readers, three concrete cases. Not hypothetical.
The practitioner picking an instrument. A people-analytics lead is being asked to recommend an engagement instrument the company will deploy quarterly for the next several years — Gallup Q12, UWES-9, or an internally developed instrument the previous CHRO commissioned. A registry that returns, per instrument, the reliability evidence per population, the validity evidence against the outcomes the company cares about, and the deployment evidence lets the practitioner make that decision with sources in hand. Without it, the decision is whichever vendor pitched most recently.
The researcher needing a defensible prior. A graduate student running a structural equation model on engagement antecedents in a small-N organizational sample needs an informative prior for the engagement → performance path. The choices today: flat prior (ignores 40 years of evidence), cite a single meta-analysis (better, but frozen), or hand-construct one (correct, but a research project in itself). A registry that returns a Bayesian prior at /v1/priors/engagement/predicts/performance, with provenance attached, lets the student plug it in and get on with the study.
The author of the next handbook chapter. The team writing the 2030 edition of Handbook of Industrial and Organizational Psychology sits down to update the engagement chapter. They need the post-2020 longitudinal evidence, the cross-cultural validation work, the new instrument generations, the meta-analytic updates. A registry that holds all of that, grades it, and exposes a snapshot version they can pin their chapter to makes the chapter writeable in months rather than years. The handbook is not obsolete. It is faster.
Practitioners, researchers, synthesizers. Each one currently spends time on work the field could have organized once and didn't.
What you should take from this
If you came wondering whether to use the registry today: probably not yet, unless your interest is engagement, in which case watch this space. Construct-family coverage is thin. Instrument coverage is thinner. The effect-size table will be sparse on most tuples through 2027. We will not pretend otherwise.
If you came wondering whether the idea is worth taking seriously: the bet has a long tail. The cost of writing one more handbook chapter is low. The cost of building a registry that the next handbook chapter can sit on top of is higher, and it is paid once. If the registry compounds — if a construct family that took a year to survey the first time takes a quarter the second time and a week the third — the productivity argument lands. The math says it should. We will know in a few years whether the math survives contact with the actual literature.
If you came from the I-O measurement literature itself — if you have written one of these handbook chapters, or meta-analyses, or instrument-validation papers — what we owe you is honesty about what this is. It is not a replacement for your work. It is a continuous index against your work, with grades attached, with the parts that hold up and the parts that have been superseded both visible. The grades are conservative. The disagreements are public.
We are early. The registry knows what it knows. It knows what it does not know. If your favorite construct is not in here yet, file a request. If your favorite instrument got graded down, argue. If you find a bad row, the verification log is public — point at it. The registry gets better when the field shows up to it, which is the whole point of building it in public in the first place.
API live at peopleprincipia.com/api/v1/*. Methodology at docs/research/methodology.md.