peopleanalyst

Tools · People analytics

Kirkpatrick Evaluation

Prove training changed behavior and results — not just that people enjoyed it.

The method

Kirkpatrick four-level training evaluation (+ Phillips ROI extension)

A leadership program is up for renewal and the only evidence on the table is a 4.6-out-of-5 satisfaction average. Everyone in the room knows a happy-sheet score is not proof the program changed anything, and nobody has the measurement design to show whether it did. The L&D budget gets defended on anecdote — which works until the year it doesn't.

The Kirkpatrick model's four levels — Reaction, Learning, Behavior, Results — are less a framework than a standing accusation: most training evaluation stops at level one because levels three and four require designing measurement before the program runs, not surveying afterward. Laszlo Bock's account in Work Rules! is the practitioner's version of the claim — Google's people organization concluded that measuring training means measuring behavioral change, not satisfaction, and adopted Kirkpatrick's levels as the way to do it. The mechanics that matter are the chain of evidence: instruments, timing, and success bars at each level, specified in advance, so that a level-four claim can trace back through behavior change and learning gain instead of leaping from attendance to revenue.

Diez, Bussin, and Lee treat training in Fundamentals of HR Analytics as an investment with a computable return — their claim is that ROI calculations must carry the full costs and the long-term benefits, and that the Kirkpatrick structure is what keeps the benefit side honest. Cascio and Boudreau's chapter on the costs and benefits of HR development programs adds the decision-science posture: development spending is an investment under uncertainty, and the analysis should say so. The honest limit sits at attribution — results move for a hundred reasons, and isolating the program's share requires a comparison design or an explicit, defensible adjustment. An ROI figure without an isolation method is a marketing number wearing a finance costume.

In the books this is where you'd be told to go design the instruments; here the four-level design comes back with the chain of evidence drawn and the weak links named — and when you supply financials, ROI and benefit-cost ratios are computed deterministically in code, never estimated by the model.

The books behind this tool

How it works

Grounded in the Kirkpatrick/Phillips corpus (people-analytics): designs measurement at all four levels (Reaction → Learning → Behavior → Results) with instruments, timing, and success bars, the chain-of-evidence between levels, an optional Phillips Level-5 ROI extension (per-stream monetization + isolation plans with credibility rules; ROI% and BCR computed deterministically in code when financials are supplied — never by the model), and honest caveats about attribution. Reuses the reliability stats engine for Level-2 assessment.

You bring

{ program, context?, include_roi?, financials? (program_cost · benefit_streams · isolation_adjustment), cluster? }

You get

{ program_summary, levels[1..4] (measures · instruments · timing · success_indicator), chain_of_evidence, level5? (benefit_streams · computed ROI%/BCR · honesty_notes), roi_level5?, caveats[], grounded_in, provenance }

Use it for

See it work

example output

Program: a two-day frontline-manager coaching skills workshop for a 600-person retail chain, meant to lift team engagement and cut first-year turnover.

Kirkpatrick Evaluation — Frontline-Manager Coaching Workshop

Program: A two-day coaching-skills workshop for ~120 store managers, targeting better team engagement and lower first-year turnover.

Level 1 — Reaction

Did they find it relevant and worth their time?

  • Measures: perceived relevance, intent to apply, facilitator effectiveness.
  • Instrument: end-of-session survey (relevance + intent-to-apply items, not just satisfaction).
  • Timing: end of day 2.
  • Success bar: ≥80% rate "will apply this on my team next week."

Level 2 — Learning

Did coaching skill actually increase?

  • Measures: knowledge of the coaching model; skill demonstrated in a role-play.
  • Instruments: pre/post knowledge check + observer-scored role-play rubric (inter-rater reliability checked via the reliability engine).
  • Timing: pre-program baseline; post at end of day 2.
  • Success bar: mean post-score ≥ baseline by a pre-set effect size, rater agreement α ≥ 0.7.

Level 3 — Behavior

Are they coaching differently on the floor?

  • Measures: frequency/quality of coaching conversations; use of the model.
  • Instruments: direct-report pulse items; manager self-log; spot observation.
  • Timing: +30 / +60 / +90 days.
  • Success bar: sustained lift in direct-report-reported coaching at +90 days.

Level 4 — Results

Did engagement and retention move?

  • Measures: team engagement index; first-year turnover.
  • Instruments: engagement survey + HRIS turnover data.
  • Timing: +6 months, trended.
  • Success bar: turnover in trained-manager stores below matched-control stores.

Chain of evidence

Reaction → Learning protects against "liked it but learned nothing"; Learning → Behavior is the weakest link (skill rarely survives the floor) so the +30/60/90 cadence is load-bearing; Behavior → Results is read against control stores.

Level 5 — ROI (Phillips)

Isolate via matched-control comparison; benefit = avoided replacement cost of retained first-year hires minus fully-loaded program cost.

Caveats

  • Turnover is multiply-determined (pay, scheduling, labor market) — control stores are essential or attribution collapses.
  • Self-logged coaching frequency is inflation-prone; weight the direct-report signal.

Grounded in: people-analytics cluster · Kirkpatrick four levels, Phillips Level-5 ROI, isolation-of-effects · sources: Kirkpatrick's Four Levels (Kirkpatrick & Kirkpatrick), Return on Investment in Training (Phillips).

Run it on your data

Call it on your own inputs — over the API, or hand it to your AI agent via MCP. Discovery is open; running it is metered.

REST  POST /api/bicycle/kirkpatrick-evaluation
MCP   design_kirkpatrick_evaluation

← All tools