People Analytics Is Not Data Science for HR
I reduced two libraries to their underlying models — 25 books on people analytics, 25 on data science and business intelligence. They are not the same discipline, and the gap between them turned out to be the most useful thing I learned all year.
Here is a move I have watched companies make for fifteen years. Someone decides it is time to do people analytics. They hire a data scientist — a real one, someone who can defend a cross-validation scheme without sweating — point them at the HRIS, and wait for insight to fall out. The assumption underneath is so ordinary that nobody says it out loud: people analytics is data science with an HR dataset. Same tools, different table.
It is a reasonable-sounding assumption. It is also wrong, and this year I finally have something better than an opinion to say so, because I did something slightly absurd to check.
I took two shelves of books and turned each one into a diagram.
On one shelf, twenty-five books on people analytics — the McNulty handbooks, The Power of People, the Fitz-enz lineage, the HR-metrics manuals, the era-of-big-data surveys. On the other, twenty-five books on data science, business intelligence, big data, and machine learning — Davenport and Harris's Competing on Analytics, Provost and Fawcett's Data Science for Business, An Introduction to Statistical Learning, Kimball's Data Warehouse Toolkit, Géron's Hands-On Machine Learning. For each book I extracted the model it argues for — not its topics, its model: which things the author claims drive which other things. Then I reconciled each shelf into one map of what that whole literature believes causes what.
Two maps. I expected them to overlap a lot — two analytics disciplines, surely mostly the same. They barely overlap at all. And the way they fail to overlap is the entire argument.
What each shelf is actually a model of
Start with the shapes, because the shapes are the tell.
The people analytics map has thirty-three concepts, and they sort into two unequal piles. A small pile — maybe ten — is the build-the-function layer: analytics maturity, methodology, data quality, sponsorship, culture, storytelling, evidence-based decisions. The other twenty-three are a model of human beings at work: selection and quality of hire, learning, compensation, leadership quality, engagement, capability, personality, goal clarity, wellbeing, productive behavior, performance, turnover, networks, the labor market. The literature spends most of its oxygen on people. The analytics is a thin shell wrapped around a thick theory of why people perform, stay, and leave.
The data-science map has forty-one concepts, and they sort the opposite way. The big pile — better than twenty — is method: feature engineering, model complexity, regularization, overfitting and the bias-variance tradeoff, validation, cross-validation, evaluation metrics, generalization, statistical inference, sampling, plus the whole engineering stack — warehouse architecture, dimensional modeling, ETL, deployment, system reliability. And the subject these methods get pointed at? There isn't one. The data-science library has no model of any domain in the world. Its substance is the method. It will teach you, in punishing and admirable detail, how to fit a model and how to know whether the fit will hold — about anything at all.
So the honest one-line summary is this. The people analytics shelf is a deep theory of people with a thin coat of analytics. The data-science shelf is a deep theory of method with no subject underneath it. Hand those two shelves to the same person and tell them it's "the same field with a different dataset," and you can see immediately that something is off. They are not the same field. They are two different halves of one. Or, if you'd rather: they are Adam and Eve in the garden, apple on the ground between them, fig leaves in hand.
The four-S thesis, written by the books themselves
I have argued for a while that high-quality people analytics is the integration of four things — behavioral Science, Statistics, computer Systems, and business Strategy — and that the work is excellent exactly when all four are present and load-bearing at once. I will admit I expected to have to argue that perspective in here. Instead the two corpora argued for it without me.
The data-science shelf is the Statistics pillar and the Systems pillar. Bias-variance, validation, generalization — that is Statistics in its modern, computational dress. Warehouses, schemas, pipelines, reliability — that is Systems, meaning actual computer systems, not "systems thinking." What the shelf almost entirely lacks is behavioral Science: there is no construct on it for what a person is or why they act. And its Strategy is generic — competitive advantage, business value — true of any firm doing anything.
The people analytics shelf is the mirror image. It carries the behavioral Science — an entire articulated model of selection, motivation, leadership, attrition — and a domain-specific Strategy: workforce planning, talent segmentation, human-capital risk. What it mostly assumes, rather than models, is the Statistics and the Systems. It cites the methods; it rarely teaches them.
Which means the merger that matters is not "data science, now with HR data." It is behavioral Science fused with statistical method — two halves that were written in different buildings by people who mostly didn't read each other. People analytics done right is that fusion. Done wrong, it is one half borrowing the other half's letterhead. This is the chocolate-and-peanut-butter point I keep making, except now I can point at the shelves: each is delicious and each is incomplete, and nobody set out to combine them.
Where the four S's came from
I should say where the four-S idea came from, because I didn't go looking for a framework. It came from at-bats — years of watching what actually produced lasting success and what only looked like it from across the room. What I kept noticing was that each S, taken alone, is already a finished thing. Investigate any one of them and you find a deep, applied body of knowledge — fifty years of it, in places a hundred. Statistics did not begin with machine learning. Behavioral science did not begin with the engagement survey. These are mature tools, and they more or less work. Almost anyone could apply them and get some results now and then, but a master deeply understands what they say and what they don't say, and how we arrived at them.
But a mature tool is not the thing. The thing — the part that is actually new, notable, worth talking about, the part that isn't just a louder repeat of what came before — is what happens when you bring the tools together in a precise context where each one, without the others, quietly fails to deliver its full value. That doesn't sit on a shelf. It floats out there as a mystery to be discovered, a puzzle to be solved — a real jam. And the thing I'm most sure of is that solving it is the job. That is what a practitioner in this field is for: not to operate the tools, but to stay open, grasp around with them, and work the gap until it gives.
Each library's blind spot is the other one's core
Here is where it stops being a cute symmetry and starts being a warning.
Run down what the data-science shelf has that the people analytics shelf does not: the whole apparatus of overfitting, regularization, validation, inference validity, generalization. People analytics, as a literature, largely takes those for granted. So the data scientist you hired is right to worry — a lot of people-analytics practice is methodologically naïve, extrapolating insights about a whole company from a few hundred employees and reporting the lift as if it would survive contact with next year.
Now run it the other way. What does the people analytics shelf have that the data-science shelf does not? More than a subject. A theory of the thing being measured. The data-science library can validate a model to three decimal places while modeling the wrong constructs entirely — predicting "high performer" from a measure that is really just being liked by one manager, and never noticing, because noticing is not a statistical question. It is a behavioral-science question, and that shelf doesn't carry it.
So the two failure modes are exact opposites, and each shop tends to own the one it cannot see. The data scientist parachuted into the people team builds something that generalizes beautifully and measures the wrong thing. The HR analyst with no statistics adopts a vendor's model that measures roughly the right thing and generalizes too much, not at all, or incorrectly. Both are confident. Both are half-equipped. And the org, watching two confident half-equipped efforts, concludes that "the analytics didn't work" — when what didn't work was the merge that never happened.
The part that surprised me: the failure is the same
I went in expecting the differences. I did not expect the agreement.
Strip the technical cores away — the bias-variance math on one side, the engagement-and-attrition model on the other — and look at what's left on each shelf. On both, about a third of the concepts are the same handful: executive sponsorship, an analytical culture, the right talent, trust in the numbers, decision quality, demonstrated business value, analytical maturity. Two libraries, swept independently, written by communities that do not cite one another, and they converge — almost construct for construct — on the same theory of why analytics dies inside organizations.
That convergence is worth sitting with. The disciplines could not disagree more about the technical work. They do not disagree at all about the organizational work. The binding constraint on the value of analytics — any analytics — is not the model. It is whether anyone with power sponsored it, whether the culture will act on a number that contradicts a gut, whether people trust the measure enough to change a decision. Both shelves know this. They just can't supply it, because sponsorship and trust aren't in a Python package or a competency model. To his credit, Davenport has been saying the organizational half is the hard half since Competing on Analytics; the people analytics canon says it too, in its own vocabulary. When two literatures that share no methods share a diagnosis, the diagnosis is probably true.
The deepest difference is what counts as an answer
There is one more split, and it is the one I would put on the wall.
The data-science shelf has, sitting right in the middle of it, a construct I'd name the prediction mindset: if a correlation is strong and it generalizes, you can act on it — you don't need to know why. That is not a flaw. For most of what business intelligence and data science are for — what will churn, what will fail, what to stock — it is exactly right. The question that shelf is built to answer is will it generalize?
The people analytics shelf is built around a different question, and it can't help it, because of what it's a model of. Its spine is causal: leadership shapes engagement, engagement shapes performance, performance shapes retention. It asks why do people perform and stay? — and it asks that because the whole point of the work is to intervene, and you cannot intervene on a prediction. A model that flags a flight risk with 90% accuracy and cannot tell you which lever moved them is, for a leader trying to keep that person, close to useless. Knowing who is leaving is a data-science answer. Knowing what would make them stay is a behavioral-science answer. Those are different sports played with some of the same equipment.
This is the real reason the hire-a-data-scientist move underdelivers. It imports a discipline optimized for predict-and-act into a domain that runs on explain-and-intervene, and then everyone is quietly puzzled that the beautiful model didn't change anyone's Monday.
So what
Nobody has the whole truth — it requires a synthesis. I believe that about most things, and the two shelves made it literal: here are two libraries, each genuinely deep, each missing precisely what the other one is made of. The mistake is not hiring the data scientist. The data scientist is half the answer, and the better half on Statistics and Systems than most HR teams will ever build on their own. The mistake is believing that half is the whole — pointing it at people and waiting, when what the work actually needs is for that half to be married to a behavioral-science subject, a domain strategy, and the organizational spine that both libraries swear by and neither can install.
People analytics is not data science for HR. It is the place where statistical method has to learn what a person is, and behavioral science has to learn to validate. The companies that treat it as the first thing keep getting half a result and calling the field disappointing. The ones that treat it as the merge are, to borrow the phrase, doing a different sport.
For what it's worth, the gap is ok. More than ok. If I came to work and the whole path were already laid out for me in a how-to manual, I wouldn't be much interested in spending my life on it — the unprescribed part is the part worth showing up for. I used to wonder whether that was just temperament talking. Now that AI is quietly absorbing the manual-able work, I think I was either right about that or lucky, and at this point I'll take either.
What I've managed to offer so far is limited, and I know it. Mostly it has been to point at patterns like this one — the gap, the four S's, the thing that lives on neither shelf — and say look. My hope is that before I stop working and pass the work on, this gets articulated better than I've managed here, packaged better, made repeatable enough that the next practitioner doesn't have to rediscover it from scratch.
I read fifty books so I could tell you the gap between two shelves. The gap is the work.
The work, packaged — the "made repeatable" this piece asks for. Each S is now a capability guide synthesized across its half of the canon: Science (the behavioral model), Statistics (choosing and trusting the method), Systems (the data and AI engineering), and Strategy (the decision it all serves). And the resolution of the four into one functional system — the framework, the failure modes, and a process you can run — is the Four-S master guide. The merge of the four is the discipline.
How this was made. I reduced 50 books — 25 on people analytics, 25 on data science, business intelligence, big data, and machine learning — to the model each one argues: the constructs its author treats as load-bearing and the causal links it claims between them. Reconciled, the two shelves came to 33 and 41 distinct constructs, joined by 38 and 47 claimed relationships. It is a mechanical method and an imperfect one — a model of a book is not the book — but it is traceable, and that is the point. Every title, its author, and the model I pulled from it sits in the collection, each linked to a fuller read. Check the shelves yourself; I would rather you did than take my word for it.