What is PeopleAnalyst?

PeopleAnalyst is the front door for people-analytics research: 205+ works indexed and profiled, 40+ citation-grade findings extracted, and peer-reviewed behavioral science translated from academic to actionable — the missing manual for the people analytics you always meant to do.

What is people analytics?

People analytics is not a dashboard. It is behavioral science and statistical inference applied to workforce decisions — a discipline with its own methodology, spanning measurement, organizational design, talent, leadership, and analytics craft.

Why does AI in HR need measurement science?

AI is being deployed in high-stakes people decisions — hiring, performance, attrition — without the measurement science to evaluate whether it works or whom it harms. Construct validity, effect sizes, and criterion validity are the vocabulary for asking an AI vendor the right questions.

How is the research made accessible?

The evidence is indexed and searchable: 205+ works, 40+ citation-grade insight cards, and 8 research arcs, so the right finding reaches the right decision at the right time.

What separates good people measurement from assertion?

Good measurement has a method: construct validity, reliability, and effect-size interpretation are not optional — they are what separates evidence from assertion.

Tools · People analytics

HR Data Quality

Find out what your HR data can and can't answer — before the analysis embarrasses you.

The method

Data-quality dimensions audit with fitness-for-purpose assessment

The CHRO asks for an attrition analysis by Friday. The analyst opens the warehouse and finds three HRIS migrations, terminations coded five different ways, and a manager field that is a third stale. The data's problems will surface either way — the only choice is whether they surface in profiling or in front of the executive team.

Ferrar and Green's Excellence in People Analytics, built on research with over a hundred organizations, treats data as one of nine dimensions of a working analytics function — and pointedly not the first. Their case-study organizations start from business questions and invest in governance and the data foundation deliberately, as infrastructure for value, rather than reactively after an analysis embarrasses someone. The ordering matters: data quality is not a virtue to maximize in the abstract, it is a capability you build toward the questions you intend to answer.

Guenole, Ferrar, and Feinzig's The Power of People makes that concrete with their eight-step model: frame the business question and build hypotheses before touching data, so that gathering and quality-checking serve the question. The implication practitioners live daily is that fitness is purpose-relative — the same dataset can be perfectly fit for a headcount report and unusable for a survival analysis, because the two make different demands on grain, history, and coding consistency. Edwards, Edwards, and Jang's Predictive HR Analytics shows the same truth from the trenches: their click-by-click case studies work only because messy organizational data gets converted, field by field, into something a statistical test can honestly run on. What the data cannot support, the analysis cannot claim.

The method's honest boundary: no audit can certify accuracy from a description. A described stack supports finding structural risk — join keys that will not join, coding drift across migrations, staleness in slowly-updated fields — but accuracy claims need profiling against the actual rows.

Describe the systems and the analyses you intend, and the audit runs the seven dimensions — every finding with severity, what it breaks downstream, and one concrete fix — plus a fitness verdict per intended analysis and a leverage-ordered remediation plan. Where only profiling can answer, it says so honestly: the service never invents facts about data it has not seen.

The books behind this tool

Excellence in People Analytics — Jonathan Ferrar & David Green
The Power of People — Nigel Guenole, Jonathan Ferrar & Sheri Feinzig
Predictive HR Analytics — Martin R. Edwards, Kirsten Edwards & Daisung Jang

How it works

Audits a described HR dataset (systems, fields, known issues, optionally pasted schema/profile stats) across the seven canonical data-quality dimensions — completeness, validity, consistency, uniqueness, timeliness, accuracy, lineage/joinability — grounded in the people-analytics corpus. Every finding carries severity, what it breaks downstream, and one concrete remediation; every intended analysis gets a fitness-for-purpose verdict; closes with a leverage-ordered remediation plan. Never invents facts about data it hasn't seen — honest cannot-assess and needs-profiling flags.

You bring

{ dataset, intended_analyses?, cluster? }

You get

{ dataset_summary, dimensions[] (findings · cannot_assess), fitness_for_purpose[] (verdict · blocking_issues), remediation_plan[], needs_profiling[], grounded_in, provenance }

Use it for

→Pre-flight an attrition analysis: describe the HRIS+ATS+survey stack → which analyses are fit, fit-with-caveats, or not-fit
→Data-platform business case: the remediation plan is the prioritized backlog
→Field-kit candidate: run the profiling checklist inside the org's own Sheets/Excel (PII never leaves)

See it work

example output

Workday + Greenhouse + anonymous Qualtrics stack headed into attrition, source-of-hire, and engagement-linkage analyses — with backfilled term reasons and no ATS↔HRIS key.

Dataset: This is a three-source people-analytics dataset: Workday HRIS (4 years of core HR), Greenhouse ATS (3 years, with no shared employee key to Workday), and an annual Qualtrics engagement survey that is anonymous and only available as team-level rollups. Known defects include term_reason values backfilled from memory for 2021-2022, a 2024 re-leveling event that re-mapped job levels, and contractor records commingled into headcount tables. The intended analyses — attrition drivers by tenure and level, source-of-hire quality, and engagement-attrition linkage — each depend on joins and comparability that the current setup partially or wholly breaks.

undefined

[high] term_reason for 2021-2022 was backfilled from memory rather than captured at the time of termination, so this field is functionally missing/unreliable for two of the four Workday years.
- Breaks: attrition drivers by tenure and level; engagement-attrition linkage
- Fix: HRIS owner should flag 2021-2022 term_reason as 'reconstructed' in Workday and exclude or separately model those records; capture prospective term_reason at exit going forward.

undefined

[high] Contractor records are mixed into headcount tables, meaning worker-type is not a clean valid categorical filter and headcount/attrition denominators include non-employees.
- Breaks: attrition drivers by tenure and level
- Fix: HRIS owner should add/validate a worker_type flag in Workday and exclude contractors from employee attrition denominators.

undefined

[high] The 2024 re-leveling re-mapped job levels, so level values are not comparable across the 4-year window; the same person/role may sit at different level codes pre- and post-2024.
- Breaks: attrition drivers by tenure and level
- Fix: HRIS/comp owner should build a level-crosswalk mapping pre-2024 levels to the post-2024 scheme and apply a consistent normalized level for time-series analysis.

undefined

[medium] Engagement data is an annual snapshot, giving coarse temporal resolution that may not align in time with attrition events for linkage.
- Breaks: engagement-attrition linkage
- Fix: Analytics team should fix the survey wave date and align attrition windows to the survey period (e.g., attrition in the 12 months following each wave) when constructing the linkage.

undefined

[high] term_reason backfilled from memory for 2021-2022 is subject to recall bias and is likely inaccurate, corrupting any voluntary/involuntary or driver attribution for those years.
- Breaks: attrition drivers by tenure and level; engagement-attrition linkage
- Fix: Analytics team should treat 2021-2022 term_reason as low-confidence, report driver analysis restricted to prospectively-captured years, and validate a sample against exit documentation where it exists.

undefined

[critical] Greenhouse ATS has no shared employee key with Workday, so hires cannot be reliably joined back to their eventual tenure/performance/attrition outcomes — the core input for source-of-hire quality.
- Breaks: source-of-hire quality
- Fix: Recruiting Ops/IT should establish a candidate-to-employee key (e.g., stamp Workday employee_id onto the Greenhouse hire record at offer-accept, or build a deterministic email/name+start-date match table).

Fitness for purpose

attrition drivers by tenure and level → fit_with_caveats (blocked by: 2024 level re-mapping breaks level comparability)
source-of-hire quality → not_fit (blocked by: no shared employee key between Greenhouse and Workday)
engagement-attrition linkage → fit_with_caveats (blocked by: survey is anonymous, team-level only)

Remediation plan (top 3)

{"priority":1,"action":"Recruiting Ops/IT establish a candidate-to-employee key (stamp Workday employee_id onto Greenhouse hire records at offer-accept, or build a deterministic match table).","addresses":["lineage_joinability"],"unblocks":["source-of-hire quality"]}
{"priority":2,"action":"HRIS/comp owner build a pre-2024-to-post-2024 job-level crosswalk and apply a normalized level for time-series analysis.","addresses":["consistency"],"unblocks":["attrition drivers by tenure and level"]}
{"priority":3,"action":"HRIS owner add/validate a worker_type flag in Workday and exclude contractors from employee attrition denominators.","addresses":["validity"],"unblocks":["attrition drivers by tenure and level","engagement-attrition linkage"]}

Run it on your data

Call it on your own inputs — over the API, or hand it to your AI agent via MCP. Discovery is open; running it is metered.

REST POST /api/bicycle/hr-data-quality

MCP audit_hr_data_quality

Run it on your data →API & agent access