You Can't Compute With a PDF
The number she needed existed. That was the maddening part.
An analyst is building the case for a hiring change, and she needs one figure: how much does a structured interview actually predict job performance, on average, across the studies that have asked? It is one of the most-researched questions in her field. The answer has been estimated, re-estimated, meta-analyzed, and argued over for forty years. It is known. And it is sitting in a results table on page nineteen of a paper she's looking at through a university proxy, a single corrected correlation with a confidence interval beside it — visible, citable, and completely inert. She cannot drop it into her model. She cannot fuse it with her own data. She can read it and retype it, by hand, hoping she copied the right row.
So she does what nearly everyone does at that moment: she gives up on the literature and uses a round number she half-remembers. The most studied question in the field, and the decision gets made on a guess — not because the evidence is missing, but because the evidence is unusable. You cannot compute with a PDF.
That gap — between research that exists and research you can use — is the subject of this essay, and closing it is a different kind of work than producing more research. It's the difference between a catalog and a calculator.
They say it's in the literature
Raise the problem with most people and they'll point you to the catalog. It's in the handbooks. It's in the review articles. It's a Google Scholar search away. The research is out there — which is true, and beside the point.
A catalog tells you what is known. It does not hand you something you can act with. The handbook entry on a construct tells you what it is, how it's measured, and roughly what the field believes — beautifully, authoritatively, and in prose. None of that enters a model. None of it tightens an estimate. When the moment comes to actually use a finding — to set a prior, size a study, weight a predictor, answer "how do we know" in a meeting — the catalog leaves you exactly where the analyst was: looking at a number you can see but can't compute with.
The reference shelf was built for a reading animal, not a computing one. That was the right design for a century in which the consumer of research was a person turning pages. It is the wrong design for the moment we're in, where the consumer is increasingly a model, an analysis, a pipeline — something that needs the number as data, not as a sentence.
The number on the page is three kinds of broken
Even when you do retype the figure off the page, what you've got is weaker than it looks. A published estimate, lifted into your spreadsheet, is broken in three ways.
It's frozen. It was true as of the studies available when that paper went to press, and the literature has not stopped moving since. The handbook synthesizes once and sets, like concrete, at its publication date; everything discovered after is in a different book you haven't opened.
It's naked. What you copied is usually a point estimate — a single number — stripped of the uncertainty that is the most important thing about it. An effect of "about a third" pooled from sixty tight studies and the same number pooled from three noisy ones are different facts, and the bare figure hides which one you have.
And it's biased in a direction you can't see by looking. The published record over-represents the studies that found something — the file-drawer problem, named decades ago: null results quietly never get written up, so the literature you can read is a survivor's account of the experiments that worked.1 On top of that, the first published estimate of an effect tends to be the inflated one — the winner's curse of discovery, the reason so many findings shrink when someone finally repeats them.2 A naive catalog launders all of this. It presents the surviving, inflated, frozen point estimate with the authority of print, and you compute against it as if it were the truth.
Catalog and calculator
Here is the turn. The fix is not a better-organized shelf. It's to stop cataloging the literature and start computing it.
The first people to make that move were the meta-analysts. Faced with fifty studies that all measured the same thing and disagreed, they refused to just list them — they pooled them, weighted by precision, corrected for the artifacts that bias any single study, and produced one synthesized estimate with its uncertainty attached.3 That was the conceptual leap: a body of research is not a bibliography to be read but a dataset to be computed. Psychometric meta-analysis took it further, correcting the pooled estimate for the measurement unreliability and range restriction that attenuate every individual study.4
A registry of record is meta-analysis made queryable and live. It takes the synthesized estimate — a distribution, with its credible interval and its sources riding along — and makes it something you can hit over an API, drop straight into your model, and fuse with your own data the way the local-validity loop does. The catalog answers what is known about this? The calculator answers the question the analyst actually had: give me the number, with its uncertainty, in a form I can compute with — now.
That is the whole distinction. A reference work catalogs the research. A registry computes with it.
Provenance is the difference between citable and convenient
There's an obvious objection, and it's the right one: a number you can compute with instantly is dangerous if you can't trust where it came from. Convenience without provenance is how you get confidently wrong, fast — and in an age where a language model will cheerfully invent a plausible effect size to fill a gap, that danger is now the default.
So the computing has to carry its receipts. Every figure in a registry of record has to trace back through the synthesis to the studies to the specific table it was read from — source-graded, citation-verified, followable. And when there is no defensible prior for something, the registry has to say so: a first-class, honest "no prior available" state, never a guessed number dressed as a finding. That discipline — that the system structurally cannot hand you a figure it can't source — is what separates a registry you can cite in the meeting where it gets challenged from one that's merely faster to be wrong with.
It keeps moving
The last difference is the one the analyst feels six months later. The handbook she didn't use is already going stale; the field published three relevant studies the quarter after it shipped, and the book will not mention them until its next edition, years out. A catalog is a photograph of what was known on a date. A registry of record is curated continuously — it keeps absorbing the literature as the literature arrives, so the prior you pull next year is sharper than the one you pull today, not older. Evidence that updates is worth more than evidence that's merely correct-as-of-press-time, because the decisions don't stop coming.
What you can finally do
Go back to the analyst with the hiring case. In the catalog world she retyped a half-remembered round number and hoped. In the calculator world she queries the construct, gets the synthesized prior with its interval and its citations, drops it into her model on day one — on a smaller sample than she'd otherwise need, because the prior is doing real work — and when someone asks "how do you know," she follows the figure back through the synthesis to the studies, in front of them. The AI sitting beside her answers with that number instead of inventing one. And the whole apparatus gets more credible as more evidence lands, not less.
The research was never the bottleneck. It exists; it's been existing for decades, accumulating in tables on pages nineteen of papers nobody can compute with. The bottleneck was that we built the evidence for reading and then asked it to do arithmetic. Catalog the literature and it stays a thing you cite. Compute with it and it becomes a thing you can actually use — which, for a body of knowledge that cost a century of work to produce, is the least it owes us.
This is a companion in the Measurement Meets AI program — that the machinery of measurement science is the under-used answer to what we're now asking data to do. It sits beneath its siblings Borrowed Validity and Themes Aren't Evidence: the registry of record is the substrate that makes their priors computable. The capability described here — synthesizing the literature into source-graded, queryable priors that update — is Principia; the capability positioning carries the compressed version. No effect sizes are quoted in this essay; every claim about the literature is about its structure, not an invented number.
Footnotes
-
Robert Rosenthal, "The File Drawer Problem and Tolerance for Null Results," Psychological Bulletin, 86 (1979): 638–641 — the published record over-represents statistically significant findings because null results disproportionately go unpublished, biasing any naive reading of "what the literature says." ↩
-
John P. A. Ioannidis, "Why Most Published Research Findings Are False," PLoS Medicine, 2 (2005): e124 — among the reasons published effects are systematically inflated, and why initial estimates of an effect tend to shrink on replication (the "winner's curse" of discovery). ↩
-
Larry V. Hedges & Ingram Olkin, Statistical Methods for Meta-Analysis (1985) — the statistical foundation for pooling effect sizes across studies, weighting by precision and quantifying heterogeneity, rather than narratively listing them. ↩
-
John E. Hunter & Frank L. Schmidt, Methods of Meta-Analysis: Correcting Error and Bias in Research Findings — psychometric meta-analysis, which corrects the pooled estimate for artifacts (measurement unreliability, range restriction) that attenuate individual studies, yielding an operational estimate rather than a raw average. ↩