What is PeopleAnalyst?

PeopleAnalyst is the front door for people-analytics research: 205+ works indexed and profiled, 40+ citation-grade findings extracted, and peer-reviewed behavioral science translated from academic to actionable — the missing manual for the people analytics you always meant to do.

What is people analytics?

People analytics is not a dashboard. It is behavioral science and statistical inference applied to workforce decisions — a discipline with its own methodology, spanning measurement, organizational design, talent, leadership, and analytics craft.

Why does AI in HR need measurement science?

AI is being deployed in high-stakes people decisions — hiring, performance, attrition — without the measurement science to evaluate whether it works or whom it harms. Construct validity, effect sizes, and criterion validity are the vocabulary for asking an AI vendor the right questions.

How is the research made accessible?

The evidence is indexed and searchable: 205+ works, 40+ citation-grade insight cards, and 8 research arcs, so the right finding reaches the right decision at the right time.

What separates good people measurement from assertion?

Good measurement has a method: construct validity, reliability, and effect-size interpretation are not optional — they are what separates evidence from assertion.

library / lib4034f0b9c85b793b

Reliability and Validity Assessment

Edward G. Carmines and Richard A. Zeller

In a sentence

A concise, foundational guide to how social scientists can assess whether their measures consistently capture (reliability) and accurately represent (validity) the abstract concepts they intend to measure.

Reliability and Validity Assessment is the essential primer for any researcher who must turn fuzzy theoretical concepts into trustworthy empirical measures. Carmines and Zeller lucidly define measurement as the process of linking abstract concepts to observable indicators, then dissect the twin properties every good measure must possess: reliability (consistency across repeated measurements, threatened by random error) and validity (the degree to which an indicator measures what it purports to, threatened by nonrandom error). They walk through the three classic forms of validity—criterion-related, content, and construct—arguing that construct validity is the most broadly applicable to the abstract concepts that dominate social science. They ground reliability in classical test theory (observed score = true score + error), explain parallel measurements, and then evaluate four practical reliability estimation methods: retest, alternative-form, split-halves, and internal consistency (Cronbach's alpha), plus correction for attenuation. An appendix shows how factor analysis aids—but cannot replace—theory-driven reliability and validity assessment. Accessible to anyone familiar with simple correlation, it equips researchers to avoid the misleading conclusions that flow from poor measurement.

The four lenses

Science
Statistics
Systems
Strategy

Tags

applied-statisticsresearch-methods

The model

A structural model of how measurement design choices and error sources determine the reliability and validity of empirical indicators, and how these in turn determine the trustworthiness of inferences about theoretical concepts. Design levers (item construction, number of items, theoretical embedding) influence error states (random and nonrandom error) which drive reliability and validity outcomes.

Concept-Indicator Linkage Strengthdesign lever

The degree to which an observable empirical indicator faithfully represents an underlying unobservable theoretical concept; the central relationship that measurement seeks to establish and the basis for valid inference.

Item and Instrument Design Qualitydesign lever

The care with which measurement items are constructed, including specification of the content domain, sampling of content, item wording, and avoidance of systematic biasing features, which shapes the resulting measurement properties of an instrument.

Number of Items in Scaledesign lever

The count of distinct items or measurements composing a scale, which the Spearman-Brown formula and alpha show increases reliability as items are added, provided the average interitem correlation is not reduced.

Theoretical Network Embeddingcontextual condition

The extent to which a concept is situated within a network of theoretically derived hypotheses relating it to other concepts, which is a precondition for construct validation and generating testable predictions.

Random Measurement Errorpsychological state

Unsystematic chance factors that confound measurement, distributed symmetrically around the true score with an expected value of zero, that cause repeated measurements to differ from one another and from the true score.

Nonrandom (Systematic) Measurement Errorpsychological state

Systematic biasing influences on a measuring instrument, such as method artifacts or the presence of additional unintended constructs, that cause indicators to represent something other than the intended theoretical concept.

Reliabilityoutcome metric

The extent to which a measuring procedure yields consistent results on repeated trials, formally the ratio of true score variance to observed variance, inversely related to the amount of random measurement error present.

Validityoutcome metric

The extent to which an indicator measures what it is intended to measure for a specified purpose, dependent on the absence of nonrandom error and assessed via criterion-related, content, or construct validation strategies.

Quality of Scientific Inferenceoutcome metric

The degree to which analysis of empirical indicators leads to correct, non-misleading conclusions about relationships among underlying theoretical concepts, the ultimate payoff of good measurement.

How they connect

item design quality − influences random error
item design quality − influences nonrandom error
number of items → predicts reliability
random error − influences reliability
nonrandom error − influences validity
reliability → predicts validity
theoretical embedding → moderates validity
concept indicator linkage → predicts validity
reliability → predicts inference quality
validity → predicts inference quality

The process

The book outlines a systematic playbook for conducting rigorous social science research, centered on the critical processes of measurement. The core workflow begins with the careful development of a measurement instrument, which involves translating abstract theoretical concepts into concrete, empirical indicators. This foundational process is immediately followed by two crucial, iterative validation loops: assessing the instrument's validity to ensure it measures the intended construct, and assessing its reliability to confirm its consistency. The book details multiple methods for both validity (content, criterion, construct) and reliability (test-retest, internal consistency, etc.), allowing the researcher to select the most appropriate techniques. Once a valid and reliable instrument is established and data is collected, a final analytical process allows for correcting observed correlations for measurement error, providing a clearer view of the true relationships between theoretical concepts. This comprehensive approach ensures that research findings are built upon a solid, quantifiable foundation.

Develop a Measurement Instrument

To systematically link abstract theoretical concepts to empirical indicators for accurate representation in social research.

When to use: At the beginning of a research project when operationalizing theoretical constructs for data collection.

Step 1Identify the abstract concept that needs to be measured.
Entry: A research question or hypothesis involving an abstract concept has been formulated.
Exit: The abstract concept is clearly and unambiguously defined.
In: Research question, Theoretical framework · Out: A clear definition of the abstract concept
ch01
Step 2Define the empirical indicators that will represent this concept.
Entry: The abstract concept has been defined.
Exit: A set of potential empirical indicators is identified.
- Which empirical indicators best represent the abstract concept?
In: Definition of the abstract concept · Out: A list of empirical indicators
ch01
Step 3Develop an organized plan for classifying and quantifying the indicators.
Entry: Empirical indicators have been selected.
Exit: A draft measurement instrument is created.
- Which instruments are most suitable for data collection?
In: List of empirical indicators · Out: Draft measurement instrument
ch01
Step 4Assess the validity of the measurement instrument.
Entry: A draft measurement instrument exists.
Exit: Evidence for the instrument's validity has been established.
In: Draft measurement instrument · Out: Validity assessment report, Refined measurement instrument
ch01
Step 5Assess the reliability of the measurement instrument.
Entry: A draft measurement instrument exists.
Exit: The instrument's reliability coefficient is calculated and deemed acceptable.
In: Draft measurement instrument · Out: Reliability assessment report
ch01

Evaluate Measurement Validity

To ensure that a measuring instrument accurately reflects the abstract concept it is intended to measure.

When to use: During the development and testing phase of a measurement instrument, after initial items have been created.

Step 1Define the theoretical concept and the specific domain of content to be measured.
Entry: A measurement instrument has been drafted.
Exit: The theoretical construct and its content domain are clearly specified.
In: Draft measurement instrument, Theoretical framework · Out: Specified content domain
ch01 · ch02
Step 2Select the appropriate validity assessment method(s).
Entry: The concept and domain are defined.
Exit: One or more validity assessment methods are chosen.
- Which type of validity is most critical for the research context?
In: Research goals · Out: Selected validity assessment plan
ch02
Step 3Gather data using the measuring instrument and any relevant external criteria or related measures.
Entry: An assessment plan is in place.
Exit: All necessary data for the validity assessment is collected.
In: Measurement instrument, Test subjects · Out: Collected data
ch01 · ch02
Step 4Analyze the relationship between the instrument's results and the theoretical concept or criteria.
Entry: Data has been collected.
Exit: Statistical analysis is complete and a validity coefficient (if applicable) is calculated.
In: Collected data · Out: Validity coefficient, Statistical analysis results
ch01 · ch02
Step 5Make adjustments to the instrument or theory based on the findings.
Entry: Validity has been assessed.
Exit: A decision is made to accept, revise, or reject the instrument.
- Are the indicators suitable or do they need replacement?
- Are adjustments needed to improve validity?
In: Validity assessment results · Out: Validated or revised measurement instrument
ch01

Assess Measurement Reliability

To determine the consistency of measurement results across repeated tests, different forms, or items within a scale.

When to use: During the development and pre-testing of a measurement instrument to ensure it produces stable and consistent results.

Step 1Select an appropriate reliability assessment method.
Entry: A draft measurement instrument is ready for testing.
Exit: A specific reliability testing method (e.g., test-retest, Cronbach's alpha) is chosen.
- Which method best suits the measurement context and minimizes potential biases like memory effects?
In: Measurement instrument, Research context · Out: Reliability assessment plan
ch04
Step 2Administer the instrument(s) according to the chosen method's requirements.
Entry: A reliability assessment plan is in place and a sample of subjects is available.
Exit: Data collection is complete.
In: Measurement instrument(s), Test subjects · Out: Raw test scores
ch01 · ch04
Step 3Calculate the appropriate reliability coefficient.
Entry: Raw test scores have been collected.
Exit: A reliability coefficient is calculated.
In: Raw test scores · Out: Reliability coefficient
ch01 · ch03 · ch04
Step 4Evaluate the coefficient to determine if the measurement has an acceptable level of reliability.
Entry: A reliability coefficient has been calculated.
Exit: A judgment on the instrument's reliability is made.
- Is the level of reliability acceptable for the study's purpose?
- What actions should be taken if reliability is unsatisfactory (e.g., revise items)?
In: Reliability coefficient · Out: Reliability assessment report
ch01

Correct Correlation for Attenuation

To estimate the true correlation between two concepts by adjusting their observed correlation for the unreliability of their respective measures.

When to use: During data analysis, after calculating the observed correlation between two variables and determining the reliability of each variable's measure.

Step 1Calculate the observed correlation between two variables (X and Y).
Entry: Data for variables X and Y has been collected.
Exit: The observed correlation coefficient (r_xy) is known.
In: Dataset with variables X and Y · Out: Observed correlation coefficient
ch04
Step 2Determine the reliability coefficients for the measures of both variable X and variable Y.
Entry: The measures for X and Y have been assessed for reliability.
Exit: Reliability coefficients for both measures (r_xx and r_yy) are known.
In: Reliability assessment reports for measures of X and Y · Out: Reliability coefficients for each variable
ch04
Step 3Apply the correction for attenuation formula to calculate the estimated true correlation.
Entry: Observed correlation and both reliability coefficients are known.
Exit: The corrected correlation coefficient is calculated.
In: Observed correlation coefficient, Reliability coefficients for each variable · Out: Corrected correlation coefficient
ch04
Step 4Interpret the corrected correlation to understand the relationship between the variables absent measurement error.
Entry: The corrected correlation has been calculated.
Exit: An interpretation of the true relationship between the variables is formulated.
In: Corrected correlation coefficient · Out: Interpretation of the true relationship between variables
ch04

A candidate measure

Reliability and Validity Assessment — derived measurement candidates

Concept-Indicator Linkage Strength

validity coefficient magnitude; pattern consistency across studies

self-report suitability: low

Item and Instrument Design Quality

expert content-validity ratings; presence/absence of method-artifact features

self-report suitability: low

Number of Items in Scale

item count; Spearman-Brown projected reliability

self-report suitability: none

Theoretical Network Embedding

count of theoretically relevant external variables; degree of theory systematization

self-report suitability: none

Random Measurement Error

error variance estimate; 1 minus reliability

self-report suitability: none

Nonrandom (Systematic) Measurement Error

method-factor loadings; discrepancy between corrected validity and observed validity

self-report suitability: none

Reliability

test-retest correlation; alternative-form correlation; Spearman-Brown split-half; Cronbach's alpha; KR-20; theta; omega

self-report suitability: none

Validity

criterion validity coefficient; construct-validation prediction confirmations; multitrait-multimethod matrix

self-report suitability: low

Quality of Scientific Inference

replication rate; match between corrected estimates and theory

self-report suitability: none

Run the assessment

The story

The reader A social scientist or student who wants to produce credible research by ensuring their measures of abstract concepts are trustworthy.

External problem

Their empirical indicators may not consistently or accurately represent the theoretical concepts they care about.

Internal problem

They feel uncertain whether their findings reflect reality or merely measurement error and flawed instruments.

Philosophical problem

Drawing conclusions from invalid or unreliable measures is scientifically wrong because it cannot advance genuine understanding of social phenomena.

The plan

Understand measurement as linking abstract concepts to empirical indicators.
Distinguish reliability (consistency) from validity (accuracy) and their respective error types.
Choose an appropriate validity strategy, favoring construct validity for abstract concepts.
Apply classical test theory to understand random error and true scores.
Estimate reliability using sound methods such as alternative-form and Cronbach's alpha.
Correct correlations for attenuation and report reliability transparently.
Use factor analysis as a theory-guided aid, not a substitute for theoretical reasoning.

Success

Measures that reliably and validly represent intended concepts, yielding trustworthy inferences and scientifically defensible conclusions.

At stake

Misleading conclusions, called-into-question prior research, and no genuine advance in understanding because measurement was inadequate.

Chapter by chapter

ch01Introduction
This chapter argues that while measurement is acknowledged as vital in social sciences, a systematic approach to measurement is sorely lacking, which compromises the validity and reliability of research findings.
- Measurement is a critical yet often underappreciated aspect of social science research that necessitates a systematic approach.
- The traditional definition of measurement does not adequately capture the complexities involved in quantifying abstract concepts.
- Reliability reflects the consistency of results across repeated tests, while validity assesses whether measures accurately represent intended constructs.
- Both reliability and validity are imperative for ensuring the credibility of research findings in social sciences.
ch02Validity
This chapter explores the nuanced concept of validity in measuring instruments, particularly focusing on the distinct types of validity applicable in social sciences and emphasizing the importance of the intended use of measures.
- Validity is a context-dependent aspect of measurement: 'One validates, not a test, but an interpretation of data arising from a specified procedure.'
- The three types of validity—criterion-related, content, and construct—serve distinct purposes and come with their challenges and limitations.
- Criterion-related validity emphasizes the correlation between a test and a relevant external criterion to predict specific behaviors or outcomes.
- Content validity requires a thorough understanding of the domain that a measure intends to cover, but lacks precise methods for evaluation in social sciences.
ch03Classical Test Theory
This chapter establishes the foundational principles of Classical Test Theory, focusing on how random measurement error affects the reliability of scores.
ch04Assessing Reliability
This chapter explores four fundamental methods for estimating the reliability of empirical measurements and discusses their strengths and weaknesses, as well as their implications for interpreting correlations.
ch05Appendix: The Place of Factor Analysis in Reliability and Validity Assessment
This chapter elucidates the critical role of factor analysis in assessing reliability and validity in measurement theory, emphasizing its value beyond conventional methodologies.
- Factor analysis serves as a crucial tool in both reliability and validity assessment, offering insights often overlooked by traditional methods.
- Employing both theta and omega coefficients can lead to a more nuanced understanding of measurement reliability, challenging the supremacy of coefficient alpha.
- The chapter highlights the critical intersection of reliability and construct validity, reinforcing the argument for integrating factor analysis in measurement design.
- Researchers are urged to adopt modern statistical methods to ensure the accuracy and integrity of their findings, especially in light of prior flawed studies.

Questions this book answers

What is measurement in the social sciences and how does it differ from physical-science measurement?
How can researchers determine whether an indicator consistently captures a concept (reliability)?
How can researchers determine whether an indicator accurately represents the intended concept (validity)?
What are the differences among criterion-related, content, and construct validity, and when is each applicable?
How does classical test theory formalize random measurement error?

Glossary

Concept-Indicator Linkage Strength: The degree of correspondence between an observable empirical indicator and the unobservable theoretical concept it is intended to represent.
Item and Instrument Design Quality: The degree of methodological care in constructing measurement items and instruments.
Number of Items in Scale: The count of distinct items composing a measurement scale.
Theoretical Network Embedding: The extent to which a concept is situated in a network of theoretically derived hypotheses linking it to other concepts.
Random Measurement Error: Unsystematic chance factors that confound measurement and cause inconsistent results across repeated trials.
Nonrandom (Systematic) Measurement Error: Systematic biasing influences that cause an instrument to represent something other than the intended concept.
Reliability: The consistency of results across repeated measurements, formally the ratio of true score variance to observed variance.
Validity: The extent to which an indicator measures what it is intended to measure for a specified purpose.

Related in the library

Tools these methods power