analyticsQ6to verify
Wilson et al. 2024 — embedding-model resume screeners replicate name-based bias, favoring white-associated names in 85% of cases
Running a resume-audit study through a document-retrieval framework that simulates candidate selection, the authors tested Massive Text Embedding (MTE) models on 500+ resumes against 500+ job descriptions across nine occupations. The models significantly favored White-associated names in 85.1% of cases and female-associated names in only 11.1% of cases; Black males were disadvantaged in up to 100% of cases — replicating the human resume-audit pattern in the AI screener.
Share of resume-screening cases in which the embedding model favored a protected-group-associated name, by groupWhite-associated names favored in 85.1% of cases; female-associated names favored in only 11.1% of cases; Black males disadvantaged in up to 100% of cases. Document length and corpus frequency of names also affected selection.
- Sample
- 500+ publicly available resumes x 500+ job descriptions across 9 occupations; selection of Massive Text Embedding (MTE) models
- Methodology
- Document-retrieval framework simulating candidate selection; resume-audit design (names varied by race/gender) ported to LLM-embedding retrieval; statistical comparison of selection rates across protected groups, testing three intersectionality hypotheses.
What this means
- The AI screener walks into the same wall: the disease is single-rater judgment of construct-irrelevant signals, and swapping a human screener for an embedding model does not cure it — the name-driven bias reappears, here at an 85.1% rate favoring white-associated names.
- The study is methodologically the AI analogue of Bertrand & Mullainathan: the same audit design (randomized race/gender name signals on otherwise-comparable applications) applied to the new substrate, which is precisely why the findings are directly comparable to the human baseline.
- Intersectional structure persists (Black males disadvantaged up to 100% of cases), and the bias couples to surface features the model is sensitive to (document length, name corpus frequency) — evidence that the model is scoring text statistics, not the underlying construct of candidate fit.
Source
Gender, Race, and Intersectional Bias in Resume Screening via Language Model Retrieval
ArXiv · Kyra Wilson & Aylin Caliskan · 2024 · peer-reviewed
Context
- What came before
- AI resume-screening tools are marketed as more objective than human reviewers. The human resume-audit literature (Bertrand & Mullainathan 2004; Quillian et al. 2017) established that human screeners exhibit large name-driven callback bias.
- What comes next
- Pairs directly with the Bertrand & Mullainathan card as the human-vs-AI comparison for the resume-screening case study. The shared fix is the same as the human case: standardize/anonymize inputs, validate selection criteria against an outcome (criterion validity), and audit for adverse impact. Verify exact model list and per-occupation breakdown against full text; note this is a preprint at capture time.