peopleanalyst

Insight Cards · analytics

analyticsQ6to verify

Wilson et al. 2024 — embedding-model resume screeners replicate name-based bias, favoring white-associated names in 85% of cases

Running a resume-audit study through a document-retrieval framework that simulates candidate selection, the authors tested Massive Text Embedding (MTE) models on 500+ resumes against 500+ job descriptions across nine occupations. The models significantly favored White-associated names in 85.1% of cases and female-associated names in only 11.1% of cases; Black males were disadvantaged in up to 100% of cases — replicating the human resume-audit pattern in the AI screener.

Share of resume-screening cases in which the embedding model favored a protected-group-associated name, by groupWhite-associated names favored in 85.1% of cases; female-associated names favored in only 11.1% of cases; Black males disadvantaged in up to 100% of cases. Document length and corpus frequency of names also affected selection.
Sample
500+ publicly available resumes x 500+ job descriptions across 9 occupations; selection of Massive Text Embedding (MTE) models
Methodology
Document-retrieval framework simulating candidate selection; resume-audit design (names varied by race/gender) ported to LLM-embedding retrieval; statistical comparison of selection rates across protected groups, testing three intersectionality hypotheses.

What this means

  • The AI screener walks into the same wall: the disease is single-rater judgment of construct-irrelevant signals, and swapping a human screener for an embedding model does not cure it — the name-driven bias reappears, here at an 85.1% rate favoring white-associated names.
  • The study is methodologically the AI analogue of Bertrand & Mullainathan: the same audit design (randomized race/gender name signals on otherwise-comparable applications) applied to the new substrate, which is precisely why the findings are directly comparable to the human baseline.
  • Intersectional structure persists (Black males disadvantaged up to 100% of cases), and the bias couples to surface features the model is sensitive to (document length, name corpus frequency) — evidence that the model is scoring text statistics, not the underlying construct of candidate fit.

Source

Gender, Race, and Intersectional Bias in Resume Screening via Language Model Retrieval

ArXiv · Kyra Wilson & Aylin Caliskan · 2024 · peer-reviewed

Context

What came before
AI resume-screening tools are marketed as more objective than human reviewers. The human resume-audit literature (Bertrand & Mullainathan 2004; Quillian et al. 2017) established that human screeners exhibit large name-driven callback bias.
What comes next
Pairs directly with the Bertrand & Mullainathan card as the human-vs-AI comparison for the resume-screening case study. The shared fix is the same as the human case: standardize/anonymize inputs, validate selection criteria against an outcome (criterion validity), and audit for adverse impact. Verify exact model list and per-occupation breakdown against full text; note this is a preprint at capture time.
← All insight cards