peopleanalyst

← The PeopleAnalyst Guide to Work Rules·Ch 13

It's Not All Rainbows and Unicorns

What Bock argues

The claim is humility about the model itself: Google's open, high-freedom, employee-empowered approach is not all rainbows and unicorns — it carries real tensions, costs, and failures, and an honest culture names them instead of polishing the story. Bock walks the flops (programs that didn't work, freedoms that got abused, the friction of running radical openness at scale) and lands on a load-bearing norm: the "obligation to dissent" — the duty to speak up when you think a decision is wrong, not the mere permission to. A culture only learns from its failures and tensions if surfacing them is required, not just tolerated.

So this chapter is two honesties braided together: the high-freedom model has genuine downsides you must report (the failure-reporting discipline that makes Chapter 12's experimentation real), and the people inside it must be obligated to voice what's not working (the dissent norm that makes Chapter 6's "voice" and Chapter 2's psychological safety operational). Celebrate-only is not a learning culture; it's a marketing culture, and it learns the wrong lessons fastest.

What the research actually says (and where 2015 needs an update)

Two findings make "report the failures" a discipline rather than a virtue.

The first is the file-drawer problem (publication bias; Rosenthal, 1979, and the decades since): when only positive results get reported, the visible record systematically overstates what works, because the nulls sit in a drawer. This is usually told about academic journals, but it runs identically inside organizations — only the pilot that "worked" gets the all-hands slide; the three that did nothing are never mentioned. The result is an internal evidence base that is confidently, structurally wrong: you "know" things that aren't true because you only ever saw the wins. Chapter 12's nudges are exactly where this bites — import the published effect, skip the nulls, and you'll deploy interventions that do nothing while believing they work.

The second is psychological safety (Edmondson, 1999 — the same construct as Chapter 2): people report failures, admit error, and surface bad news only when they believe it's safe to do so. A team without safety doesn't have fewer failures; it has the same failures, hidden — which is the most expensive kind, because hidden failure can't be learned from and compounds. So "report the failures" is not an exhortation you can will into being; it's a condition you have to engineer, and the condition is safety.

The pre-registration discipline closes the loop: deciding what you'll measure and what would count as success before you run, so you can't quietly redefine a flop as a win after the fact (the HARKing/p-hacking failure). It's the same honesty Unreliable's program applies to its own results — report the uncomfortable parts, pre-commit to the analysis.

Where 2015 needs the update: AI evaluation has a raging file-drawer problem of its own — the benchmark wins get the press release, the failures and the regressions get a quiet model-card footnote, if that. The same discipline transfers exactly: pre-register what you're testing, report what didn't work, and treat a model's failures as data you owe the people it affects, not embarrassment to bury. Honest null reporting is the AI-era version of Bock's humility.

How you run it

The analysis you can execute

An experiment registry + pre-registration surface (ties directly to the transparency thesis and to Chapter 12's experiment/nudge analysis), plus the psychological-safety pulse from Chapter 2. The registry's discipline is the deliverable: pre-committed criteria, nulls reported, no quiet drawer. Near-zero net-new — it's the experimentation machinery (Ch 12) plus the honesty rule (Ch 2/the program's own pre-registration practice).

The AI-era turn

Treat AI evaluation the way an honest lab treats experiments: pre-register the test, report the failures and regressions, and refuse the benchmark-win-only narrative. The file-drawer problem doesn't care whether the rater is human or a model — it cares whether you reported the nulls. Honest AI evaluation is just this chapter applied to the instruments from the rest of the book.

What to do Monday

  1. Pre-register your next people experiment — write down what you'll measure and what would count as success, before you run it.
  2. Stand up a one-page failure log and report the next null with the same prominence as the next win. The asymmetry is the disease.
  3. Run the psychological-safety pulse — if it's low, your failure reporting is fiction no matter what process you mandate.
  4. For AI you deploy, pre-register the eval and publish the regressions, not just the wins.

Cross-refs: Ch 12 (experiments — this is the honest other half); Ch 2 (psychological safety; the transparency thesis); content/magazine/the-reliability-problem.md and the program's own pre-registration practice (report the uncomfortable parts, pre-commit the analysis).