What is PeopleAnalyst?

PeopleAnalyst is the front door for people-analytics research: 205+ works indexed and profiled, 40+ citation-grade findings extracted, and peer-reviewed behavioral science translated from academic to actionable — the missing manual for the people analytics you always meant to do.

What is people analytics?

People analytics is not a dashboard. It is behavioral science and statistical inference applied to workforce decisions — a discipline with its own methodology, spanning measurement, organizational design, talent, leadership, and analytics craft.

Why does AI in HR need measurement science?

AI is being deployed in high-stakes people decisions — hiring, performance, attrition — without the measurement science to evaluate whether it works or whom it harms. Construct validity, effect sizes, and criterion validity are the vocabulary for asking an AI vendor the right questions.

How is the research made accessible?

The evidence is indexed and searchable: 205+ works, 40+ citation-grade insight cards, and 8 research arcs, so the right finding reaches the right decision at the right time.

What separates good people measurement from assertion?

Good measurement has a method: construct validity, reliability, and effect-size interpretation are not optional — they are what separates evidence from assertion.

research / principia / audience tiers

General-audience explainer

Public framing of the survey-as-instrument argument — what builders, practitioners, and researchers can do with a queryable, source-graded measurement registry that they could not do before.

By Mike West

Principia·Audience tiers·General audience·source: people-analyst/principia/docs/research/reviews/general-audience-explainer.md

The measurement library nobody built

A queryable, source-graded registry of workplace measurement — what it is, why fifty years of handbook chapters did not produce one, and what it does not yet know.

There is a meeting that happens in approximately every company over fifty people, once a quarter, and it goes like this. A senior executive turns to whoever is sitting in the HR seat and asks: "What's our engagement number?" The number comes back. Seventy-three. And then — depending on whether you are watching a competent organization or a typical one — one of two things happens.

The typical version is that the room agrees to behave as though seventy-three means something. The CHRO defends it. The board reviews it. The number goes up. Whether anything about the actual employees changed is a question that does not get asked, because the question that gets asked first is whether seventy-three is good — and the answer requires another number, a benchmark, which is itself a composite of other companies' seventy-threes, none of whom can tell you with confidence what their seventy-three is measuring either.

The competent version of this meeting is rarer than you would think. In the competent version, somebody in the room can point to a piece of paper that names the specific items the number is built out of, the specific population it was collected from, the research that says those items measure what the company says they measure, and the evidence that says the thing they measure is connected — by how much, in which direction, with how much certainty — to whatever the company actually cares about.

That paper does not exist at most companies. It also, surprisingly, does not exist in the published research literature in a form anybody can actually use. There are handbooks. There are meta-analyses. There are measurement chapters in measurement chapters in measurement chapters. What there isn't, and what people analytics has been working around for forty years, is a single place a practitioner or a researcher can go to type engagement → performance into a box and get back: here are the instruments, here is what is known about each, here is the effect size from the studies that report one, here is the quality grade on each — go.

That is what Principia is trying to be. A measurement library, in the literal old-fashioned sense: rows you can look up, with sources you can trace, with grades you can argue with.

The problem the field has not named

There is a tradition in industrial-organizational psychology of producing very good handbooks. The Handbook of Psychology (Borman, Ilgen, and Klimoski, 2003) is one. Handbook of Industrial and Organizational Psychology (Schmitt and Highhouse, 2013) is another. They are excellent books. They synthesize decades of measurement work into long, careful chapters with hundreds of citations each.

Here is the problem with handbooks. The chapter on engagement gets written once and enters print. The next chapter on engagement gets written ten or fifteen years later, by a different author, with a different theoretical lean, and it does not reliably tell you what changed. Instruments that came out in the meantime appear or do not depending on the new author's reading. Cross-cultural validation work done between editions tends to drop into a paragraph or vanish. When you actually go to use a handbook chapter — to pick an instrument, or to defend a measurement decision in a room full of skeptics — you find that the handbook tells you most of what was true on the day of the citation freeze, and almost nothing about what is true now.

This is not the handbook authors' fault. The format does not permit anything else. A book is a frozen object. The synthesis it produces is real — but the synthesis cannot be re-asked four years later with the post-2020 longitudinal evidence layered in. You would need a new book. New books are slow.

The way the field has worked around this is meta-analysis. The Hunter and Schmidt tradition — Methods of Meta-Analysis (2004) is the canonical reference; The Handbook of Research Synthesis (Cooper, Hedges, and Valentine, 2017) is the methods textbook — takes a specific question and pools the available studies into a single estimate with confidence intervals, heterogeneity statistics, and publication-bias diagnostics.

Meta-analyses are wonderful. They are also, structurally, the same problem at a different scale. A meta-analysis is written once. It freezes. The next one on the same question comes out four to seven years later, by a different team, with a different inclusion-criteria reading. The Cochrane reviewers know this and have built elaborate machinery (PROSPERO, living systematic reviews) to keep their corner of medicine from drifting. The organizational-research literature has nothing comparable. The result is that any practitioner trying to make a measurement decision in 2026 is reading a meta-analysis whose search ended in 2018, citing studies whose data was collected in 2014, deploying an instrument whose validation work was done in 2009.

That is the problem nobody has quite named. Most fields with measurement traditions reinvent the instrument inventory per project. They pool effect sizes after the fact, in meta-analytic bursts that have to be redone every few years to stay current. Each researcher rebuilds, from scratch, knowledge hundreds of previous researchers already built. It is a strange amount of duplicated work for a field as mature as I-O psychology.

The bet of Principia is that the right shape for this knowledge is not a book and not a periodically refreshed meta-analysis. It is a registry. A registry is not frozen. It accumulates. New evidence enters as the literature produces it. Old evidence stays in it with its grade attached, so you can see whether the dominant claim today rests on twenty studies from 1995 or on eighty from the last decade. A registry can be queried — type a tuple, get back a row. A book cannot.

The closest analog from another field is something like UniProt for proteins, or arXiv for preprints. These are not books. They are infrastructures the field uses daily. They are queryable, source-tagged, versioned, and they compound. They are the thing the rest of the field can build on top of without having to redo the foundation work.

That is the shape Principia is reaching for. It is not yet that shape — at the time of this writing it is a partial shape, with a schema, a small handful of construct families, a public API, and a longer list of construct families that are not yet covered. We will come back to what it does not know.

What a row in the registry actually looks like

Plain English. No jargon yet.

For a construct — say, engagement — the registry stores the canonical name, the alternate names the field uses (job engagement, work engagement, employee engagement — distinct in their original sources, frequently merged in practice), the measurement-model assumptions, and a list of citations with grades attached.

For an instrument — say, the Utrecht Work Engagement Scale — the registry stores the developers, the items (the actual survey questions), the response scale, the reliability and validity evidence per population it has been deployed in, the cross-cultural adaptations, and the studies that used it to predict something else. With grades.

For an effect size — say, engagement → job performance — the registry stores every primary study Principia has ingested that reports the relationship, the effect value, the sample size, the design, the population, the country, the DOI, and the quality grade. Then it stores the synthesis: the pooled estimate across all the studies, the between-study heterogeneity, the publication-bias diagnostics, and a Bayesian prior downstream tools can plug into their own analyses.

That last piece — a Bayesian prior, returned by API — is the part nobody else has tried to build. The point is not to declare a single number. The point is to return a distribution, with provenance attached, that another researcher's analysis can use as informed-prior input. A grad student running a study in 2027 should not have to redo the meta-analysis to know what fifty years of literature has already shown. The prior should be a queryable artifact.

The grades are the discipline that makes this work. Every citation gets one of four letters: A for high-N, replicated, methodologically sound; B for solid single studies; C for methodologically caveated; D for pointer-only. Grading is conservative — if the rubric admits two grades, take the lower. The effect-size table promotes only A and B grades into headline numbers; C grades appear as caveated rows; D grades do not appear in the table at all. This is not novel. Cochrane has done it for decades. It is just not what the organizational-measurement literature has been doing.

The rest of the discipline is schema. Every construct, instrument, item, citation, and effect-size row is typed against the same canonical shape — a package called @people-analyst/measurement-core that other tools in this portfolio consume directly. The schema is the contract. When a sibling tool needs a field the schema does not yet have, the discussion happens at the schema level, not as a silent local extension. That is the part that takes the registry from "a database I could have built in a weekend" to "a thing the rest of the portfolio can actually rely on."

The number on the wall

Go back to the executive asking for the engagement number. For most of this essay the registry I have described could not quite answer that question — because seventy-three is not a construct. It is an operational metric: a number computed from a survey administration (or, for attrition, from an HRIS export), living in a different world from the latent thing it gestures at. Constructs are inferred; metrics are calculated. They do not collapse into the same table.

Principia now holds that second world too, and attaches it to the first. The metrics organizations actually report — the engagement score, the attrition rate, time-to-fill, the pay gap, absenteeism, quality-of-hire, around a hundred of them — are first-class rows that operationalize the constructs beneath them. Which means the two questions in that quarterly meeting finally have sourced answers instead of vibes.

What is seventy-three measuring? → the construct it operationalizes, the specific items underneath it, the reliability and validity evidence for reading those items as that construct.

Is seventy-three good? → two things. A per-item reference range, because items have wildly different baselines — a middling compensation score may be perfectly typical and a high "my manager respects me" score merely average — so a raw number only becomes a judgment against a benchmark. And then the part that actually earns the meeting: why it matters — the evidenced links from that construct to the outcomes the company says it cares about. Engagement is not interesting because it is seventy-three; it is interesting because the registry can show you, with effect sizes and the studies behind them, that engagement predicts task performance (around ρ = .49 across six meta-analyses and eighty thousand people) and tracks turnover and absence. The metric is a door; what it matters for is on the other side of it.

And the whole chain is walkable, in both directions and all the way down: metric → the construct it measures → what predicts or follows that construct → the effect size → the citation → and, newest, a verdict on how the literature itself receives that source (does the citing literature support it, dispute it, or merely mention it). That last layer is the answer to the only question a skeptic in that meeting actually has — how do you know? You can hand them the sentence the literature wrote, not just a number you assert. Building that evidence layer so it can be shown publicly, sourced from open-access papers rather than rented from a vendor, is work in progress; the verdicts cover most of the corpus already.

What Principia is not trying to replace

Principia does not replace the meta-analytic tradition. It complements it.

This is worth being precise about, because the easiest way to misread what we are doing is to assume we are claiming meta-analysis is obsolete. We are not. Hunter and Schmidt's work, and the broader machinery built by Cooper, Hedges, Valentine, Viechtbauer (whose metafor R package is what most modern meta-analyses run on), and a small army of careful methodologists, is what made it possible to ask "what does forty years of evidence on this say?" with any rigor at all.

What Principia adds is not a new pooling method. Principia's prior-synthesis engine is, under the hood, a fairly conventional random-effects DerSimonian–Laird meta-analysis with quality weighting, with REML and Hartung–Knapp–Sidik–Jonkman alternatives available for sensitivity. There is nothing in the engine the meta-analysis literature did not already invent.

What is new is the plumbing. The registry runs the pool continuously, against the current set of ingested rows, with the pool re-firing whenever new evidence enters. A meta-analysis is a snapshot. Principia is a standing meta-analysis — one that consumes new studies as they arrive, applies the same inclusion criteria, regrades, re-pools, and exposes the new prior at the same API endpoint, with the same shape, every time. The methods are the same. The cadence is different.

When an existing meta-analysis is recent, well-graded, and answers the question, Principia uses it. It does not duplicate it. The registry cites it and refers downstream consumers to it. A synthesis-analytic preregistration is filed only when there is a real gap — a tuple the existing meta-analyses do not cover, a population they did not include, a time window they predate, an instrument generation they did not separate. Most of the time, the right answer is the existing meta-analysis is the answer; here is the DOI; trust it; move on.

The complementary relationship runs the other direction too. A registry of typed, source-graded primary-study rows is the input a future meta-analyst wants. If Principia accumulates as intended, the meta-analyst of 2030 starts from the registry's effect-size table, with grades and DOIs attached, and spends attention on the synthesis decisions rather than the extraction grind. That is a real productivity gain for the field, and it costs the registry nothing — the rows have to be extracted to power the standing pool anyway.

What it does not know yet

Honest accounting.

At the time of this writing, Principia has a working schema, ingestion pipeline, public API, and prior-synthesis engine — and roughly five hundred synthesized priors across about a hundred and thirty construct families: leadership, organizational justice, engagement and the job-demands–resources tradition, commitment, turnover and its antecedents, personality and selection validity, safety, compensation, diversity. The spine of the field is, for the first time, actually covered. A reader who looks up organizational citizenship → contextual performance today gets a real row, not a placeholder.

So the honest accounting has moved. The limit is no longer breadth — it is depth. Most of those five hundred priors currently rest on a single meta-analysis rather than several, which means the honest reading of a row is often "one good pooled estimate" rather than "the settled consensus of the field." A single-study prior and a six-study prior look similar in the API and are not similar in what they license you to believe; the registry shows you which one you are holding (the k, the heterogeneity), but the work of adding the second and third independent meta-analysis to each relationship — so the credibility interval earns its name — is ongoing, not done. That is the current frontier, and the registry is precise about which relationships have crossed it and which have not.

Other limits worth naming. Coverage bias — what gets surveyed first shapes which families look mature; the current order is sequenced by literature density, not by perceived organizational importance. Extraction error — AI-assisted extraction is fast and lossy; the mitigation is a verification log that is itself a published artifact. Author position — single-author work, with the rubric public so you can argue with the grades; the planned post-v1 model opens the registry to peer-graded extensions. Selection effects in the underlying literature — the registry inherits the publication bias of the literature it is built on; the diagnostics are run and reported, but a diagnostic is not a correction.

None of this is the language of a finished product. It is the language of a registry being built honestly in public. The alternative — a polished marketing surface that overstates coverage and hides the limits — is exactly the failure mode the field has been working around for forty years.

What it lets you actually do

Three concrete readers, three concrete cases. Not hypothetical.

The practitioner picking an instrument. A people-analytics lead is being asked to recommend an engagement instrument the company will deploy quarterly for the next several years — Gallup Q12, UWES-9, or an internally developed instrument the previous CHRO commissioned. A registry that returns, per instrument, the reliability evidence per population, the validity evidence against the outcomes the company cares about, and the deployment evidence lets the practitioner make that decision with sources in hand. Without it, the decision is whichever vendor pitched most recently.

The researcher needing a defensible prior. A graduate student running a structural equation model on engagement antecedents in a small-N organizational sample needs an informative prior for the engagement → performance path. The choices today: flat prior (ignores 40 years of evidence), cite a single meta-analysis (better, but frozen), or hand-construct one (correct, but a research project in itself). A registry that returns a Bayesian prior at /v1/priors/engagement/predicts/performance, with provenance attached, lets the student plug it in and get on with the study.

The author of the next handbook chapter. The team writing the 2030 edition of Handbook of Industrial and Organizational Psychology sits down to update the engagement chapter. They need the post-2020 longitudinal evidence, the cross-cultural validation work, the new instrument generations, the meta-analytic updates. A registry that holds all of that, grades it, and exposes a snapshot version they can pin their chapter to makes the chapter writeable in months rather than years. The handbook is not obsolete. It is faster.

Practitioners, researchers, synthesizers. Each one currently spends time on work the field could have organized once and didn't.

What you should take from this

If you came wondering whether to use the registry today: for the covered spine — leadership, justice, engagement, commitment, turnover, selection validity, safety, compensation, diversity — yes, you can query a prior and get a sourced distribution back now. Where it is still thin is depth (many relationships rest on a single meta-analysis) and the long tail of narrower construct families. The registry tells you which is which, on every row. We will not pretend a single-study prior is a consensus.

If you came wondering whether the idea is worth taking seriously: the bet has a long tail. The cost of writing one more handbook chapter is low. The cost of building a registry that the next handbook chapter can sit on top of is higher, and it is paid once. If the registry compounds — if a construct family that took a year to survey the first time takes a quarter the second time and a week the third — the productivity argument lands. The math says it should. We will know in a few years whether the math survives contact with the actual literature.

If you came from the I-O measurement literature itself — if you have written one of these handbook chapters, or meta-analyses, or instrument-validation papers — what we owe you is honesty about what this is. It is not a replacement for your work. It is a continuous index against your work, with grades attached, with the parts that hold up and the parts that have been superseded both visible. The grades are conservative. The disagreements are public.

We are early. The registry knows what it knows. It knows what it does not know. If your favorite construct is not in here yet, file a request. If your favorite instrument got graded down, argue. If you find a bad row, the verification log is public — point at it. The registry gets better when the field shows up to it, which is the whole point of building it in public in the first place.

API live at peopleprincipia.com/api/v1/*. Methodology at docs/research/methodology.md.