peopleanalyst

← The PeopleAnalyst Guide to Work Rules·Ch 08

The Two Tails

What Bock argues

The claim is that your best people and your struggling people are not two ends of one normal curve to be managed by the same machinery — they are two different populations that need two different responses. Study your best (most companies spend all their attention on the bottom and learn nothing from the top); for the struggling, help genuinely and then, if it doesn't take, move them with dignity. The move underneath it, and the one with the most teeth, is statistical: talent is not normally distributed. A small number of people produce an outsized share of the output — so the bell curve that hides inside forced ranking, "rate everyone a 3," and equal-treatment policy is the wrong model of reality.

That's not a management opinion; it's a measured fact, and once you take it seriously it breaks several common HR practices on contact.

What the research actually says (and where 2015 needs an update)

The bell curve is the default assumption almost no one examines, and it is wrong for individual performance. O'Boyle and Aguinis ("The Best and the Rest," 2012) looked across a large range of domains — researchers, entertainers, politicians, athletes — and found individual output follows a heavy-tailed, power-law-like distribution far more often than a normal one: a few extreme producers, a long thin tail, a hump of modest contributors well below the mean. The practical translation is brutal for standard practice. If output is power-law, the mean is a fiction (dragged around by the top tail), forced distributions that assume normality misrank almost everyone, and "average performance" describes no one in particular.

But here is the part the star-performer enthusiasm skips, and it's why this chapter belongs next to the reliability book. Identifying the tails is exactly where unreliable measurement does the most damage. The tails are, by definition, small-N — a handful of people — and a single noisy rater (Ch 7) crowning a "star" off one impression is overwhelmingly likely to be reading rater idiosyncrasy, not talent. Worse, heavy tails plus small samples means one genuine star can drag a whole team's average up and make a mediocre team look excellent, or one bad read can exile someone who was fine. So the power-law finding and the reliability finding are the same caution wearing two hats: the tail is where the decisions are most consequential and the measurement is least trustworthy. Take it seriously and you get more careful at the tails, not more confident.

A caveat to carry honestly: the power-law result is contested at the edges (whether it's strictly power-law vs lognormal vs job-dependent), and it is clearly weaker for highly interdependent work where "individual output" is barely a coherent quantity. The defensible claim is strong enough without the overreach: individual performance is routinely heavy-tailed, not normal — so any practice that assumes a bell curve is mis-specified.

Where 2015 needs the update: AI makes it cheap to characterize the distribution and to spot top-performer patterns at scale — and equally cheap to do it badly, by surveilling individuals or by treating one model's confident ranking as truth. The discipline is the same as everywhere in this book: measure the distribution before you assume its shape, and identify the tails with reliability attached, not with a single confident reader.

How you run it

The analysis you can execute

The performance-distribution / star-identification analysis in calculus (distribution-aware: fit and test normal vs heavy-tailed, not a forced curve), paired with the reliability discipline from the Consensus Coder / G-theory program so tail identification carries a reliability and an honest CI. Min-N gated throughout — the tails are small, and small + sensitive is exactly where the privacy gate earns its keep. Mostly composition over existing spokes.

The AI-era turn

Use AI to characterize the distribution and the top-performer pattern, not to surveil people or to rank them with a single confident model. The two-tails idea is only as good as your ability to tell a real star from a lucky read — which is a reliability problem, and the one this whole program is about. A power-law world rewards getting the tail right and punishes getting it confidently wrong.

What to do Monday

  1. Plot your performance distribution and test it against normal. If it's heavy-tailed, stop forcing ratings to a curve that doesn't describe your people.
  2. Re-identify your tails with reliability — never off a single rater; show the uncertainty, especially before any consequential action at the bottom tail.
  3. Make the top tail a study, not just a list. What's teachable about how they work?
  4. Before any AI "top-talent identifier," ask: what's its reliability, and is it one rater? The tail is the worst place to trust a single confident reader.

Cross-refs: Book 1 Unreliable Ch 3/7–9 and content/magazine/the-reliability-problem.md (tail identification is a reliability problem); Ch 7 (rating variance); Ch 10 (the power-law is what makes wide pay dispersion legitimate).