peopleanalyst

Insight Cards · analytics

analyticsQ7to verify

Huffcutt, Culbertson & Weyhrauch 2013 — interview interrater reliability .74 (panel) vs .44 (separate interviewers)

Updating the meta-analytic estimates of employment-interview interrater reliability with 125 coefficients (total N = 32,428), the authors found mean interrater reliability of .74 for panel interviews versus .44 for separate interviews conducted by different interviewers — and showed that credible estimates require accounting for all three sources of measurement error (random response, transient, and conspect/rater).

Mean interrater reliability of employment interviews by format (panel vs separate interviewers)Mean interrater reliability ≈ .74 for panel interviews vs ≈ .44 for separate interviews by different interviewers. Estimates depend on modeling all three sources of measurement error (random response, transient, conspect); highly structured interviews conducted separately showed lower-than-expected reliability.
Sample
125 interrater-reliability coefficients; total sample size 32,428
Methodology
Psychometric meta-analysis of interrater reliability partitioned by interview structure and format, decomposing random-response, transient, and conspect (rater) error sources.

What this means

  • Quantifies the multi-rater fix in the interview domain: pooling raters into a panel raises interrater reliability from ≈ .44 (a single separate interviewer) to ≈ .74 — the same averaging-buys-reliability result seen in performance rating and in LLM ensembles, restated for interviews.
  • A single interviewer's judgment (.44) is a strikingly unreliable instrument, reinforcing that the disease is single-rater measurement; the panel is not bureaucratic overhead but the mechanism that makes the interview a defensible measurement.
  • The three-source error decomposition (random-response, transient, conspect) is generalizability-theory machinery applied to interviews: most reliability over-claims come from estimates that ignore transient and rater-specific (conspect) error, exactly the systematic-rather-than-random rater variance the essay foregrounds.

Source

Employment Interview Reliability: New Meta-Analytic Estimates by Structure and Format

International Journal of Selection and Assessment · Allen I. Huffcutt et al. · 2013 · peer-reviewed

Context

What came before
Earlier interview-reliability estimates often ignored transient and conspect error, inflating apparent reliability. Conway, Jako & Goodman 1995 had established structure as a reliability moderator; this study updated the magnitudes and isolated the panel-vs-separate gap.
What comes next
Supplies the precise multi-rater coefficients (.74 panel vs .44 separate) for the interview case study's fix section, alongside Conway 1995 (validity ceilings) and Gardner 2022 (ICC gain from structure + training). Sets the human reliability bar against which AI/async-video interview reliability should be measured. Verify the .74/.44 split and N=32,428 against full text.
← All insight cards