peopleanalyst

magazine · Methodology · measuring leadership

We judge leaders by how they're perceived — 360s, reputation, room presence — not by the measurable state of the systems they're responsible for. The second is harder, slower, and the only one that should decide who leads.

By Mike West

June 17, 2026

Measured Against Reality

The board spent an hour on the leader and never once looked at the team.

They had the 360 results — a fan of competency scores, peer and report and self, rendered to two decimals. They had the reputation, the room presence, the way the quarterly narrative landed. They debated whether she was "strategic enough," whether he "inspired the org." What they did not have, and did not ask for, was the only thing that would have settled it: the measurable state of the systems these people were actually responsible for. Was the team's performance variance shrinking or widening under them? Were resources flowing to where the value was? Was the org more aligned this year than last? The leaders were assessed at length. What they had done to the system went unmeasured.

This is the quiet default in how organizations judge leadership, and it's worth naming plainly: we measure leaders by how they are perceived, not by the state they produce. The essay's claim is that the second is measurable, that the first is mostly noise, and that the gap between them is where careers and companies go wrong.

They say leadership is what the ratings say

Open the standard leadership assessment and you are looking at perception dressed as measurement. The 360 asks people what they think of the leader; the competency model scores the leader against a list of traits; the talent review trades in reputation. All of it feels rigorous — there are numbers, there are rubrics — and almost all of it inherits a problem the measurement literature settled decades ago.

When researchers decompose where a performance or leadership rating actually comes from, the largest single source of variance is not the person being rated. It is the idiosyncrasy of the rater — each evaluator's personal, systematic way of seeing — which routinely accounts for more of the score than the target's actual behavior does.1 A 360 doesn't escape this by averaging; it averages a chorus of biased instruments and reports the mean to two decimals, which is precision without accuracy. And underneath the rater problem sits a deeper one: we don't perceive leaders neutrally at all. We carry a prototype of what a leader looks like and score people against the prototype, so the tall, confident, articulate candidate reads as "more of a leader" before they've produced a single result.2

The whole apparatus measures the leader's image. It was built to.

The romance, and what it costs

There's a reason the image is so seductive, and it has a name. The romance of leadership is the well-documented tendency to over-attribute an organization's outcomes — good and bad — to its leader, because a single causal protagonist is a more satisfying story than the tangle of market, structure, luck, and team that actually drives results.3 When the company wins, the CEO was a genius; when it loses, the CEO was the problem. The narrative is clean and mostly wrong about magnitude, and it drives real decisions: who gets promoted, who gets fired, what a board spends its scarce hour on.

The romance is expensive precisely because it's measured in perception. You can ride a strong market with a weak hand and rate beautifully; you can hold a deteriorating system together with an excellent one and rate poorly, because the system's decline is legible and your prevention of a worse decline is not. Judge leaders on the story and you reward the ones who produce a good story, which is not the same population as the ones who produce a good state.

None of this means leadership doesn't matter. Measured properly, against organizational reality rather than reputation, leadership has a real and substantial effect on how organizations fare.4 The point is not that leadership is a myth. It's that we are measuring the myth instead of the leadership.

What "reality" means, concretely

The alternative isn't softer; it's harder, which is why it's rarer. Measuring a leader against reality means measuring the state of the system they're accountable for and its trajectory — the things that are true whether or not anyone is charmed.

A few that are genuinely measurable: Is the performance variance of the team shrinking — are more people doing well for reasons you can name, or is success still a lottery? Is the team's alignment tightening — do people lower down share the priorities of people higher up, measured by dispersion, not by a town-hall show of hands? Are resources moving toward where the value and impact actually are, or peanut-buttered to keep the peace? Do the leader's forecasts come true — is their read of their own world calibrated, or confident and wrong? And are the conditions that actually drive a team's output — the ones a diagnostic can isolate — improving on their watch, or quietly eroding while the slide deck improves?

Each of those is a measurement against the world, not a vote. None of them cares whether the leader is tall. Together they answer the question the 360 only pretends to: not how does this person come across, but what is becoming true in the part of the organization they were given.

The objection, and the honest limit

The fair objection is attribution: a leader doesn't control the market, the legacy systems, the talent they inherited, so isn't measuring "the state of their system" just blaming them for things outside their hands? It's the right worry, and the answer is the same discipline that rescues any causal claim — you measure the change against a baseline and a comparison, you separate the conditions a leader plausibly moves from the ones they plainly don't, and where you can't cleanly attribute, you say so rather than assigning credit by vibe. That's harder than reading a 360. It is also the difference between an assessment that can be wrong — and therefore means something — and one that merely reflects the room.

And reality-based measurement has a limit worth stating: it lags. Perception is available the day someone walks in; the state they produce takes quarters to show and longer to attribute. That lag is exactly why the romance fills the gap — it offers an answer now. The discipline is to resist buying the fast, satisfying, perception-based answer when the decision can wait for the true one, and to instrument the slow signals early so they're ready when it can't.

What changes when you switch yardsticks

Go back to the board with the hour to spend. On the perception yardstick they debate whether she's "strategic enough" and adjourn with a feeling. On the reality yardstick they put up the trajectory of the system she owns — variance, alignment, resource-to-value, forecast calibration, the conditions underneath — and ask what she did to it and what she'd do next. The conversation stops being about whether they like her and becomes about whether the part of the company she runs is measurably better for her running it. One of those is an opinion poll with decimals. The other is a measurement, and only the measurement should decide who leads.

We have spent a long time getting very precise about leaders' reputations and staying almost willfully vague about their results. The instruments to measure the second exist; they're just harder to read and slower to ripen than the chorus of opinion we've settled for. The leaders worth keeping are the ones who improve the state of the system on a yardstick that doesn't care how they come across. The least we can do is hold up that yardstick.


This is a piece in the People Analyst program, and the thesis under the Measuring Leadership work + the forthcoming Leading People with Data guide: judge leaders against the measurable state of the systems they're responsible for, not against perception. It shares its backbone with The Reliability Problem — the rater noise that makes 360s untrustworthy is the same noise this essay routes around. No numbers are invented here; the claims about rater variance and the leadership–outcome link trace to the cited sources.

Footnotes

  1. Michael J. Scullen, Michael K. Mount & Maynard Goff, "Understanding the Latent Structure of Job Performance Ratings," Journal of Applied Psychology, 85 (2000): 956–970 — idiosyncratic rater effects are the single largest variance component in performance ratings, exceeding actual ratee performance. See also Viswesvaran, Ones & Schmidt (1996) on the modest interrater reliability of single-source ratings.

  2. Robert G. Lord and colleagues on implicit leadership theories: observers evaluate leaders against a cognitive prototype of "leader," so prototype-matching (appearance, confidence, fluency) inflates leadership ratings independent of actual effectiveness.

  3. James R. Meindl, Sanford B. Ehrlich & Janet M. Dukerich, "The Romance of Leadership," Administrative Science Quarterly, 30 (1985): 78–102 — the documented tendency to over-attribute organizational outcomes to leaders relative to the situational and collective factors that drive them.

  4. Robert B. Kaiser, Robert Hogan & S. Bartholomew Craig, "Leadership and the Fate of Organizations," American Psychologist, 63 (2008): 96–110 — argues that leadership, assessed against organizational effectiveness rather than reputation or ratings, has a real and substantial impact, and critiques the field's reliance on perception-based criteria.

Was this useful?

Anchored in

← All magazine pieces