What is PeopleAnalyst?

PeopleAnalyst is the front door for people-analytics research: 205+ works indexed and profiled, 40+ citation-grade findings extracted, and peer-reviewed behavioral science translated from academic to actionable — the missing manual for the people analytics you always meant to do.

What is people analytics?

People analytics is not a dashboard. It is behavioral science and statistical inference applied to workforce decisions — a discipline with its own methodology, spanning measurement, organizational design, talent, leadership, and analytics craft.

Why does AI in HR need measurement science?

AI is being deployed in high-stakes people decisions — hiring, performance, attrition — without the measurement science to evaluate whether it works or whom it harms. Construct validity, effect sizes, and criterion validity are the vocabulary for asking an AI vendor the right questions.

How is the research made accessible?

The evidence is indexed and searchable: 205+ works, 40+ citation-grade insight cards, and 8 research arcs, so the right finding reaches the right decision at the right time.

What separates good people measurement from assertion?

Good measurement has a method: construct validity, reliability, and effect-size interpretation are not optional — they are what separates evidence from assertion.

library / lib36a215c544b69796

Item Response Theory Fundamentals

In a sentence

This book provides a practical and accessible introduction to Item Response Theory (IRT), a modern measurement framework that overcomes the limitations of classical test theory to enable more precise, fair, and efficient psychological and educational assessment.

Fundamentals of Item Response Theory offers a comprehensive yet accessible guide to the powerful psychometric framework that has revolutionized educational and psychological testing. It systematically addresses the shortcomings of classical test theory, such as sample-dependent item statistics and test-dependent ability scores, and presents IRT as a superior alternative. Readers will learn the core concepts, models (one-, two-, and three-parameter logistic), and assumptions of IRT, alongside practical guidance on parameter estimation, model-fit assessment, and the interpretation of ability scales. The book then demonstrates the utility of IRT in solving complex measurement problems, including test construction, identifying biased items, equating test scores, and designing computerized adaptive tests, making it an essential resource for measurement practitioners, researchers, and students seeking to understand and apply modern assessment methods.

The four lenses

Science
Statistics
Systems
Strategy

The model

This model describes how the principles of Item Response Theory (IRT) are applied to improve psychological and educational measurement. It outlines how controllable characteristics of test items (difficulty, discrimination, guessing) and testing procedures (length, adaptivity) influence the probability of an examinee's response based on their latent ability. This, in turn, leads to key measurement outcomes like precision, parameter invariance, fairness, score comparability, and testing efficiency, which represent the major advantages of IRT over classical test theory.

Item Difficulty (b-parameter)design lever

A parameter representing the location of an item on the latent ability scale. It is the point on the ability scale where an examinee has a 0.5 probability of a correct response (in the 1-PL and 2-PL models). Higher values indicate more difficult items.

Item Discrimination (a-parameter)design lever

A parameter proportional to the slope of the Item Characteristic Curve (ICC) at the item's difficulty level. It indicates how well an item differentiates between examinees with abilities slightly below and slightly above the item's difficulty. Higher values indicate better discrimination.

Item Pseudo-Guessing (c-parameter)design lever

A parameter representing the probability that a very low-ability examinee will answer the item correctly by chance. It corresponds to the lower asymptote of the ICC. Lower values are desirable for better measurement.

Test Lengthdesign lever

The total number of items included in a test administered to an examinee.

Adaptive Item Selectiondesign lever

The process of selecting the next item to administer to an examinee based on their current ability estimate, with the goal of maximizing the information obtained from that item and thereby increasing measurement efficiency.

Examinee Latent Ability (Theta)psychological state

The unobservable, underlying proficiency, trait, or skill that a test is designed to measure. It is the primary factor that explains an examinee's performance on the test items.

Probability of Correct Responsebehavioral pattern

The likelihood that an examinee of a given ability will answer a specific item correctly. This probability is modeled by the Item Characteristic Curve (ICC), which is a function of the examinee's ability and the item's parameters.

Measurement Precisionoutcome metric

The degree to which an ability estimate is free from random error. In IRT, it is a function of ability level and is quantified by the Test Information Function, which is inversely related to the standard error of the ability estimate.

Parameter Invarianceoutcome metric

The cornerstone property of IRT, where item characteristic parameters (a, b, c) are independent of the distribution of ability in the group of examinees, and examinee ability parameters are independent of the specific set of test items administered. This property holds only when the model fits the data.

Test Fairness (Absence of DIF)outcome metric

The extent to which a test is free from bias. In IRT, this is operationalized as the absence of Differential Item Functioning (DIF), where examinees of the same ability from different subgroups (e.g., gender, ethnicity) have the same probability of answering an item correctly.

Comparability of Scoresoutcome metric

The ability to place scores from different test forms, administered to different groups at different times, onto a common scale. This process, known as equating or linking, allows for meaningful comparison of scores.

Testing Efficiencyoutcome metric

The ability to achieve a target level of measurement precision with the minimum number of items. This is the primary goal and benefit of computerized adaptive testing (CAT).

How they connect

examinee latent ability → predicts probability of correct response
item difficulty − influences probability of correct response
item discrimination → moderates probability of correct response
item guessing parameter → influences probability of correct response
item discrimination → influences measurement precision
item guessing parameter − influences measurement precision
test length → influences measurement precision
measurement precision → influences parameter invariance
parameter invariance → influences test fairness
parameter invariance → influences comparability of scores
adaptive item selection → predicts testing efficiency

A candidate measure

Item Response Theory Fundamentals — derived measurement candidates

Item Difficulty (b-parameter)

The b-parameter value estimated by an IRT software package (e.g., BILOG, LOGIST).

self-report suitability: none

Item Discrimination (a-parameter)

The a-parameter value estimated by an IRT software package.

self-report suitability: none

Item Pseudo-Guessing (c-parameter)

The c-parameter value estimated by an IRT software package.

self-report suitability: none

Test Length

Count of items presented.

self-report suitability: none

Adaptive Item Selection

Log file from the CAT administration showing the sequence of items administered and the reason for their selection (e.g., 'maximum information').

self-report suitability: none

Examinee Latent Ability (Theta)

The theta (θ) value estimated by an IRT software package based on the examinee's response pattern.

self-report suitability: none

Probability of Correct Response

Proportion of correct responses for a group of examinees at a specific, narrow ability interval.

self-report suitability: none

Measurement Precision

Value of the Test Information Function I(θ) at a given ability level.; Standard Error of the ability estimate, SE(θ).

self-report suitability: none

Parameter Invariance

Correlation coefficient between item difficulty estimates from two different subgroups (e.g., high-ability vs. low-ability).; Scatterplot of item parameter estimates from two subgroups, assessed for linearity.

self-report suitability: none

Test Fairness (Absence of DIF)

Chi-square statistic for the difference in item parameters across groups.; Area between the ICCs for two groups.; Mantel-Haenszel statistic.

self-report suitability: none

Comparability of Scores

The scaling constants (alpha and beta) derived from an anchor-test design.; The root mean square difference between scores on two forms after equating.

self-report suitability: none

Testing Efficiency

Average test length in a CAT administration.; Relative efficiency index: I_A(θ) / I_B(θ) comparing two tests A and B.

self-report suitability: none

Run the assessment

The story

The reader A measurement practitioner, test developer, or researcher who uses classical test theory but is frustrated by its limitations. They want to build higher quality, more efficient, and fairer tests, and need to understand and apply modern psychometric methods to solve complex testing problems.

External problem

Classical test methods produce group-dependent item statistics and test-dependent ability scores, making it difficult to build robust item banks, equate different test forms, and construct tests with specified precision.

Internal problem

They feel uncertain and perhaps intimidated by the complexity of modern measurement theories, worrying their methods are outdated and that their tests may not be technically defensible against challenges.

Philosophical problem

It's wrong that an examinee's measured ability should depend on the specific test they happen to take, or that an item's characteristics should change depending on the group of people tested. Measurement should be objective and invariant.

The plan

Learn the fundamental concepts and models of IRT.
Master the procedures for estimating parameters and assessing how well the model fits your data.
Apply IRT to solve key measurement challenges: building better tests, detecting item bias, equating scores, and implementing adaptive testing.

Success

They can design and build technically superior tests with specified levels of precision across the ability spectrum.
They are able to create robust item banks with invariant item parameters, enabling fair comparisons and efficient test development.
They can confidently equate different test forms, detect biased items, and implement advanced applications like computerized adaptive testing.
They become a competent, modern measurement specialist whose work is technically sound, efficient, and defensible.

At stake

They will continue to be constrained by the inherent limitations and conceptual problems of classical test theory.
Their tests will remain less efficient, less precise, and potentially unfair.
They risk falling behind the state-of-the-art in their field, unable to leverage modern tools to meet the growing demands for more sophisticated and defensible assessments.

Chapter by chapter

ch01p01Background (part 1/2)
Dr. Testmaker confronts a pivotal shift from classical test theory to item response theory as he seeks to enhance the validity and reliability of educational assessments amid growing client demands.
ch01p02Background (part 2/2)
This chapter delves into the intricacies of item response theory (IRT), emphasizing the importance of model-data fit and the challenges posed by different item difficulty levels in educational assessments.
ch05p01The Ability Scale (part 1/2)
This chapter demystifies how ability scores are computed from test responses, emphasizing the need for careful validation of these scores to accurately reflect an examinee’s true abilities.
ch05p02The Ability Scale (part 2/2)
This chapter delves into how Computerized Adaptive Testing (CAT) optimizes measure precision of ability assessments by tailoring test items to an examinee's skill level, enhancing efficiency and validity in educational measurement.
- Computerized Adaptive Testing represents a significant evolution in educational assessment, maximizing measurement precision while reducing test time.
- Item Response Theory provides a robust framework that underpins adaptive testing, allowing for personalized examinee experiences.
- The selection of optimal item difficulties is critical; items should target an approximately 50% to 60% likelihood of correct responses for maximum information gain.
- Employing a dual approach of maximum likelihood and Bayesian estimation can yield more robust ability estimates, particularly in CAT contexts.
ch11Future Directions of Item Response Theory
This chapter explores the evolving landscape of Item Response Theory (IRT), emphasizing the importance of adaptive methods and innovative applications while recognizing the limitations and areas still needing research.
- Engagement with IRT models is crucial for effective assessment design, yet reliance solely on theoretical knowledge is insufficient in practice.
- Polytomous and multidimensional models are areas ripe for exploration and should be prioritized by measurement specialists.
- Authentic measurement linked to performance testing challenges specialists to rethink item format and scoring methods.
- The incorporation of diagnostic information in testing will enhance the utility of assessment scores beyond mere ranking.

Related in the library

Psychometric Theory Nunnally Bernstein