What is PeopleAnalyst?

PeopleAnalyst is the front door for people-analytics research: 205+ works indexed and profiled, 40+ citation-grade findings extracted, and peer-reviewed behavioral science translated from academic to actionable — the missing manual for the people analytics you always meant to do.

What is people analytics?

People analytics is not a dashboard. It is behavioral science and statistical inference applied to workforce decisions — a discipline with its own methodology, spanning measurement, organizational design, talent, leadership, and analytics craft.

Why does AI in HR need measurement science?

AI is being deployed in high-stakes people decisions — hiring, performance, attrition — without the measurement science to evaluate whether it works or whom it harms. Construct validity, effect sizes, and criterion validity are the vocabulary for asking an AI vendor the right questions.

How is the research made accessible?

The evidence is indexed and searchable: 205+ works, 40+ citation-grade insight cards, and 8 research arcs, so the right finding reaches the right decision at the right time.

What separates good people measurement from assertion?

Good measurement has a method: construct validity, reliability, and effect-size interpretation are not optional — they are what separates evidence from assertion.

Tools · General business

Scale Validator

Upload a home-grown survey scale (items + a response matrix) — get a psychometric credibility report you can defend.

The method

Classical test theory scale validation (reliability · dimensionality · item analysis · DIF)

Somebody built the engagement survey in a workshop — twelve items, a Likert scale, a name — and the organization has been steering on its scores ever since. Nobody has checked whether the items hang together, whether they measure one thing or three, or whether the scale reads differently across groups you compare. Decisions are riding on an instrument that has never been inspected.

DeVellis and Thorpe's Scale Development is the field's standard walkthrough of a claim most survey-writers never confront: a scale is a measurement instrument for an unobservable construct, and its quality is an empirical question, not a matter of whether the items sound right. Reliability, in their classical-test-theory framing, is the proportion of score variance that is true-score rather than noise — estimated by coefficient alpha and its relatives — and it is earned through item quality, content sampling, and an adequate development sample, then verified, not assumed. Carmines and Zeller's slim Reliability and Validity Assessment supplies the distinction that keeps the whole exercise honest: reliability is threatened by random error, validity by systematic error, and the two are different failures. A scale can be beautifully consistent and consistently measure the wrong thing; a high alpha is where validation starts, not where it ends.

Watkins's Exploratory Factor Analysis adds the caution that applies directly to dimensionality checks: EFA is a century old and still routinely misapplied, largely because researchers accept software defaults. The classic trap is the Kaiser eigenvalue-greater-than-one rule for deciding how many factors a scale contains — a convenient default that Watkins's evidence-based-practice review treats as a first estimate at best. So when a dimensionality report gives you a Kaiser factor count, the literature's own advice is to read it alongside the first-factor share and the item loadings rather than as a verdict. That is the right posture for the whole report: psychometric statistics are diagnostics that tell you which items to keep, review, or drop — they do not certify that the construct is the one you named.

The textbooks end with formulas and an exhortation to go compute them; here you upload items and a response matrix and the statistics — alpha, omega, eigenvalues, item-level flags, and Mantel-Haenszel DIF across groups — run in deterministic code, with the model confined to narrating numbers already computed.

The books behind this tool

Scale Development: Theory and Applications — Robert F. DeVellis & Carolyn T. Thorpe
Reliability and Validity Assessment — Edward G. Carmines & Richard A. Zeller
Exploratory Factor Analysis — Marley W. Watkins

How it works

The statistics are code's, the read is the corpus's: a deterministic engine (the vendored MF-158 reliability library) computes internal consistency (Cronbach's α, McDonald's ω, mean inter-item r), dimensionality (correlation-matrix eigenvalues, Kaiser factor count, first-factor share), per-item quality (corrected item-total r, α-if-deleted, single-factor loading → keep/review/drop flags), and — when a two-group variable is supplied — Mantel-Haenszel differential item functioning with ETS A/B/C classification; the model then narrates the already-computed numbers in plain language for a non-statistician, grounded in the measurement corpus. This validates the INSTRUMENT (survey-scale psychometrics) — NOT rater/rating quality (rater convergence/calibration), which is a different question. Numbers always ship; the narrative degrades gracefully if the model is unavailable.

You bring

{ scaleName, items: [{ id?, text }], responses: number[][], groups?, referenceGroup?, cluster? } — responses is a respondents × items matrix; reverse-scored items already recoded

You get

{ scaleName, nItems, nRespondents, nDropped, reliability (cronbachAlpha · mcdonaldOmega · meanInterItemCorrelation · tier), dimensionality (eigenvalues · nFactorsKaiser · firstFactorShare · unidimensional), items[] (itemTotalCorrelation · alphaIfDeleted · loading · flags · recommendation), dif (Mantel-Haenszel · ETS A/B/C · flagged[]), verdict, narrative, grounded_in, provenance }

Use it for

→Validate a home-grown engagement/climate scale before you trust its scores: items + a response matrix → α, ω, dimensionality, and which items to keep/review/drop
→Check measurement invariance before comparing groups: supply a two-group variable → a DIF pass that flags items functioning differently across groups (review before comparing means)
→One-click demo: run the built-in UWES-9 sample data to see the full credibility report (α≈.92, unidimensional, one DIF-flagged item)

See it work

example output

A home-grown 6-item "Manager Trust Scale" with a 220-respondent × 6-item response matrix and a two-group tenure variable for a DIF check.

Scale credibility report — Manager Trust Scale

6 items · 220 respondents · 4 dropped (listwise on non-finite rows) → n = 216 All statistics computed deterministically; the LLM narrates the already-computed numbers.

Reliability — tier: good

Metric	Value
Cronbach's α	0.87
McDonald's ω	0.88
Mean inter-item r	0.53

Reading: internal consistency is strong enough to trust composite scores for group-level decisions.

Dimensionality — unidimensional: yes

Eigenvalues: [3.71, 0.78, 0.54, 0.40, 0.32, 0.25] · Kaiser factors: 1 · first-factor share: 62% · first-to-second ratio: 4.8. One dominant factor — the items measure one construct.

Item quality

Item	Item-total r	α-if-deleted	Loading	Flags	Rec
MT1 "My manager keeps promises"	0.71	0.84	0.78	—	keep
MT2 "…admits mistakes"	0.66	0.85	0.73	—	keep
MT3 "…has my back"	0.74	0.83	0.80	—	keep
MT4 "…shares context"	0.69	0.84	0.75	—	keep
MT5 "…plays favorites" (R)	0.31	0.89	0.36	weak_discrimination, drop_improves_alpha	review
MT6 "…is fair"	0.63	0.85	0.71	—	keep

DIF / measurement invariance — tenure (ref: tenured · focal: new-hire)

Likert items dichotomized at the median for Mantel-Haenszel. Summary: A = 5 · B = 1 · C = 0. Flagged: MT5 (ETS B, ΔMH = -1.06, p = .03, favors reference) — functions slightly differently across tenure; review before comparing tenured vs. new-hire means.

Verdict — acceptable

A trustworthy one-factor scale with one weak, possibly-miskeyed reverse item.

Review or replace MT5 ("plays favorites") — low discrimination and the lone DIF flag; dropping it would raise α to .89.
Safe to report a composite; re-run invariance after revising MT5.

Grounded in the measurement corpus (Cronbach, McDonald, ETS DIF conventions).

Run it now

Validate a survey scale

Paste a home-grown scale's items and a respondents × items response matrix to get a psychometric credibility report — Cronbach's α, McDonald's ω, dimensionality (eigenvalues), per-item keep/review/drop flags, and a differential-item-functioning check across groups. The statistics are computed in code; the model only explains them. (The fields below are pre-filled with sample UWES-9 data — just hit run, or replace it with your own.)

Scale name *

Items (one wording per line) *

One item per line, in the same order as the columns of your response matrix. Reverse-scored items must already be recoded.

Response matrix (one respondent per line, values comma-separated) *

One row per respondent; one number per item, comma- or space-separated. At least 3 respondents and 3 items.

Group per respondent (optional — enables the DIF check)

Optional. One group label per line (exactly two distinct groups), same order as the rows above. Leave blank to skip the differential-item-functioning check.

DIF reference group (optional)

Which group is the reference (the other is the focal group). Defaults to the first group seen.

Prefer code? Call it over the API or hand it to your AI agent via MCP — POST /api/bicycle/scale-validator · validate_scale. API & agent access →

← All tools