What is PeopleAnalyst?

PeopleAnalyst is the front door for people-analytics research: 205+ works indexed and profiled, 40+ citation-grade findings extracted, and peer-reviewed behavioral science translated from academic to actionable — the missing manual for the people analytics you always meant to do.

What is people analytics?

People analytics is not a dashboard. It is behavioral science and statistical inference applied to workforce decisions — a discipline with its own methodology, spanning measurement, organizational design, talent, leadership, and analytics craft.

Why does AI in HR need measurement science?

AI is being deployed in high-stakes people decisions — hiring, performance, attrition — without the measurement science to evaluate whether it works or whom it harms. Construct validity, effect sizes, and criterion validity are the vocabulary for asking an AI vendor the right questions.

How is the research made accessible?

The evidence is indexed and searchable: 205+ works, 40+ citation-grade insight cards, and 8 research arcs, so the right finding reaches the right decision at the right time.

What separates good people measurement from assertion?

Good measurement has a method: construct validity, reliability, and effect-size interpretation are not optional — they are what separates evidence from assertion.

library / libcbc0fc3920aebfc3

Computerized Adaptive Testing: A Primer

In a sentence

A comprehensive primer on how to build, calibrate, maintain, and validate computerized adaptive tests (CAT) that tailor item selection to each examinee while preserving measurement quality, fairness, and security.

Computerized Adaptive Testing: A Primer assembles three decades of psychometric research into a single accessible guide to the design and operation of CAT systems. Using a unifying hypothetical example—the Gedanken Computerized Adaptive Test (GCAT)—the authors walk readers through every component required to field a viable adaptive test: system hardware and software design, item-pool construction and review, item response theory (IRT) and item calibration, testing algorithms for item selection and stopping, scaling and equating, reliability and measurement precision, and validity. The second edition adds hard-won operational lessons about examinee access, item-pool security (including the famous Kaplan/GRE exploit), and the economics of computerized testing. Both technical when it must be and richly illustrated with examples, the book is the standard reference for anyone designing, evaluating, or critiquing adaptive measurement systems, and it candidly weighs when computerization is—and is not—worth the cost.

The four lenses

Science
Statistics
Systems
Strategy

The model

A factor model expressing how design levers (item pool quality, IRT model fit, item selection algorithm, exposure control, content balancing) and contextual conditions (administration medium, continuous-testing demand) produce psychological and behavioral states during testing (examinee engagement, proficiency estimation) and ultimately outcome metrics (measurement precision, score comparability, validity of inferences, and test security).

Item Pool Qualitydesign lever

The degree to which the pool of candidate items is well written, sensitivity-reviewed, pretested, covers a wide range of difficulty for each content area, and conforms to the assumptions of the psychometric model used for calibration and scoring.

IRT Model Fit / Unidimensionalitydesign lever

The extent to which the chosen item response theory model adequately describes examinee responses, including conformity to assumptions of unidimensionality, local (conditional) independence, item fungibility, and absence of differential item functioning.

Item Selection Algorithmdesign lever

The set of rules determining which item is presented first, which item follows each response, and when testing stops, typically maximizing information or expected posterior precision near the current proficiency estimate subject to content and exposure constraints.

Content Balancing Constraintsdesign lever

Constraints imposed on item selection so that each examinee receives a comparable mixture of items across content subdomains and informal contexts, preserving comparability of scores and content validity despite individualized item administration.

Item Exposure Controldesign lever

Mechanisms that limit how frequently any item is administered, either proportionally or in absolute number, to flatten the item usage distribution and reduce the chance that memorized items materially inflate examinee scores.

Administration Medium (Computer vs Paper)contextual condition

The contextual condition of whether items are presented and responded to via computer versus paper-and-pencil, which can change the cognitive task, difficulty, timing, and response process for some items, producing medium (mode) effects.

Continuous Testing Demandcontextual condition

The economically and operationally imposed requirement that computerized tests be administered continuously over time rather than at a few fixed mass-administration dates, shaping examinee access, item exposure, and security challenges.

Proficiency Estimation Accuracypsychological state

The accuracy and stability with which an examinee's latent proficiency is estimated during and at the end of an adaptive test, reflected in the width of the posterior distribution or the standard error of the proficiency estimate.

Examinee Engagement and Comfortpsychological state

The examinee's behavioral and affective state during testing—being challenged but not discouraged, avoiding boredom from too-easy items or frustration from too-hard items, and freedom from undue test or computer anxiety.

Measurement Precision / Informationoutcome metric

The amount of test information (inverse of error variance) achieved across the proficiency range, indicating how precisely the test measures each examinee and how efficiently it uses items.

Score Comparability / Equating Qualityoutcome metric

The degree to which scores from different adaptive forms, and from CAT versus legacy paper-and-pencil tests, can be placed on a common scale and used interchangeably, satisfying same-construct, equity, population invariance, and symmetry conditions.

Validity of Score Inferencesoutcome metric

The appropriateness, meaningfulness, and usefulness of the specific inferences (selection, placement, prediction) made from test scores, encompassing content, construct, and criterion-related validity and freedom from threats such as differential item functioning.

Test Securityoutcome metric

The degree to which the item pool is protected against compromise, theft, and preknowledge that would allow examinees to inflate scores without commensurate proficiency, threatening score validity.

How they connect

item pool quality → predicts measurement precision
item selection algorithm → predicts proficiency estimation accuracy
proficiency estimation accuracy → predicts measurement precision
item selection algorithm → mediates measurement precision
content balancing → moderates validity of inferences
irt model fit → moderates validity of inferences
exposure control → predicts test security
test security → predicts validity of inferences
continuous testing demand − influences test security
administration medium − moderates score comparability
item pool quality → predicts score comparability
item selection algorithm → predicts examinee engagement
measurement precision → predicts validity of inferences
continuous testing demand − influences examinee engagement

The story

The reader A psychometrician, test developer, or measurement professional who wants to build, evaluate, or responsibly deploy a computerized adaptive test.

External problem

Conventional paper-and-pencil tests waste examinee time with inappropriately easy or hard items and cannot individualize measurement, while building a CAT requires solving many interlocking statistical, operational, and security problems.

Internal problem

The reader feels overwhelmed by a scattered, highly technical literature and uncertain whether they can field a defensible adaptive test that is fair, accurate, and economically sensible.

Philosophical problem

Tests should measure people as efficiently, fairly, and validly as possible—asking everyone the same items, or wasting their effort on inappropriate ones, is wasteful and inequitable.

The plan

Understand the history and rationale of adaptive testing and IRT.
Design the system: hardware, software, and human factors for administration.
Construct, review, pretest, and calibrate a high-quality item pool.
Implement testing algorithms for starting, continuing, and stopping, with exposure control and content balancing.
Scale and equate scores to make them comparable across forms and to legacy tests.
Evaluate reliability, measurement precision, and validity.
Confront operational realities of access, security, and cost before deciding to computerize.

Success

An adaptive test that measures each examinee accurately in about half the time of a fixed test.
Scores that are comparable across forms and equatable to legacy instruments.
A defensible, fair, secure testing program whose inferences are valid for their intended use.
Confidence to know when computerization adds value and when it does not.

At stake

A poorly constructed CAT with flawed or insecure items that compromises fairness and validity.
Score inflation through item theft and uncontrolled exposure.
Wasted money computerizing a test that gains nothing from it.
Legal and ethical challenges over incomparable scores or differential validity.

Chapter by chapter

ch01Introduction and History
This chapter reveals the historical trajectory of mental testing, particularly in military and educational contexts, highlighting the evolution of testing methods and their implications for future assessments.
ch02System Design and Operation
This chapter explores the critical elements of system design and operation, highlighting common issues that arise during implementation and strategies for effective management.
ch03Item Pools
This chapter presents a structured approach to creating effective item pools for assessments, emphasizing the importance of dimensionality and systematic development.

Related in the literature

The measurement literature behind this signal — sourced, so you can defend it.

“Once tests are administered on a computer, describe some testing options that would be open that are currently not available. _______________ 1 There is a distinction between sets of items for which cumulative and noncumulative (or unfolding) response models are appropriate.…”
— computerized_adaptive_testingmatch 69%
“A., & Gade, P. A. (1983). An application of computerized adaptive testing in Army recruiting. Journal of Computer-Based Instruction . 10 , 37–89. Sands, W. A., Waters, B, & McBride, J. R. (Eds.). (1997). Computerized adaptive testing: From inquiry to operation . Washington, DC:…”
— computerized_adaptive_testingmatch 68%
“Title : Computerized Adaptive Testing Author: Wainer, Howard,Dorans, Neil J.,Flaugher, Ronald,Green, Bert F.,Mislevy, Robert J. ASIN : B000SHL2Z2 ISBN : 9781135660819 [image "image" file=Image00000.jpg] Computerized Adaptive Testing A Primer Second Edition Computerized Adaptive…”
— computerized_adaptive_testingmatch 67%

Resources: computerized_adaptive_testing