library / lib6b36e59b752dba0e
Psychometric Theory
Jum C. Nunnally, Ira H. Bernstein
In a sentence
A comprehensive textbook for graduate students and researchers on the theory and statistical methods for creating, evaluating, and applying psychological measures, covering both classical and modern approaches.
The third edition of Nunnally's "Psychometric Theory" stands as a cornerstone text, updated by Ira Bernstein to bridge the gap between classical test theory and modern measurement innovations. This comprehensive guide is essential for graduate students and researchers in psychology, education, and business who need to construct or evaluate quantitative measures. It systematically builds from fundamental statistical concepts to advanced topics like item response theory, generalizability theory, and structural equation modeling. The book's strength lies in its emphasis on core principles, providing a robust framework for understanding measurement error, validity, reliability, and factor analysis. It doesn't just present formulas; it fosters a deep conceptual understanding of why and how psychological tests work, empowering readers to create scientifically sound instruments and critically assess the vast landscape of existing measures.
The four lenses
- Science
- Statistics
- Systems
- Strategy
The model
This model, implicit in psychometric theory, illustrates how the design characteristics of a multi-item measure (such as its length and the properties of its items) influence its reliability, which in turn is a necessary prerequisite for establishing its validity and ultimate scientific utility.
Test Lengthdesign lever
The number of discrete components (items) that are aggregated to form a composite measure.
Average Item Inter-correlationdesign lever
The average degree of linear relationship among the items within a measure, reflecting the extent to which they measure something in common.
Content Homogeneitydesign lever
The degree to which the items in a measure tap into a single, unitary psychological attribute or domain of content.
Methodological Heterogeneitydesign lever
The use of diverse methods, item formats, or situations to measure a construct, ensuring the resulting measure is not confounded with a specific method and possesses greater generalizability.
Measurement Reliabilitypsychological state
The extent to which a measure is free from random measurement error, reflecting its consistency and precision. It is formally the ratio of true score variance to observed score variance.
Construct Validityoutcome metric
The degree to which a measure accurately reflects the theoretical construct it is intended to measure, demonstrated through a network of evidence about its internal structure and external relationships.
Predictive Validityoutcome metric
The degree to which a measure accurately forecasts a specific, external criterion behavior.
Content Validityoutcome metric
The degree to which the content of a measure represents an adequate and representative sample of a defined domain of content or behavior.
Scientific Utilityoutcome metric
The overall usefulness of a measure for advancing scientific understanding or solving practical problems, representing the ultimate goal of measurement.
How they connect
- test length → influences measurement reliability
- item intercorrelation → influences measurement reliability
- content homogeneity → influences item intercorrelation
- measurement reliability → influences construct validity
- measurement reliability → influences predictive validity
- methodological heterogeneity → influences construct validity
- construct validity → predicts scientific utility
- predictive validity → predicts scientific utility
- content validity → predicts scientific utility
The story
The reader Graduate students and researchers in psychology, education, and related behavioral sciences who need to create, evaluate, or apply quantitative measures of human attributes. They want to conduct rigorous, defensible research and make sound decisions, but are often unsure how to navigate the complex statistical landscape of psychometrics.
External problem
Developing or selecting a good psychological measure is difficult. It requires navigating complex statistical concepts, choosing among different theoretical models (classical vs. modern), and rigorously assessing properties like reliability and validity.
Internal problem
They feel intimidated by the mathematical complexity of measurement theory and uncertain about the quality of their own or others' measures. They fear their research might be built on a shaky foundation, leading to invalid conclusions and wasted effort.
Philosophical problem
It's just plain wrong for scientific progress and important real-world decisions to be hindered by poorly understood or improperly constructed measurement tools. Rigorous measurement is the bedrock of scientific psychology.
The plan
- Establish a firm grasp of the fundamental concepts of measurement, scales, statistics, and validity.
- Master Classical Test Theory to understand measurement error and build reliable multi-item scales using techniques like domain sampling and Cronbach's alpha.
- Learn to use factor analysis (both exploratory and confirmatory) to uncover and test the underlying structure of your measures and constructs.
- Explore modern approaches, including Item Response Theory (IRT) and other advanced statistical models, to tackle specialized measurement challenges like test bias and adaptive testing.
Success
- Confidently design and validate high-quality psychological measures.
- Critically evaluate the psychometric properties of instruments used in research and practice.
- Produce more rigorous, replicable, and theoretically meaningful research findings.
- Make more accurate, fair, and defensible decisions in applied settings like education, industry, and clinical practice.
At stake
- Continue to use measurement tools without fully understanding their properties or limitations.
- Produce research with questionable validity that fails to replicate.
- Risk making flawed decisions about individuals based on unreliable or invalid test scores.
- Remain on the sidelines of quantitative research, unable to fully participate in or critique the methods that drive the field.
Questions this book answers
- What are the fundamental principles of scientific measurement and the different types of measurement scales?
- How is the validity of a psychological measure established across its different forms: content, construct, and predictive?
- What is the statistical foundation of psychometric theory, including correlation, regression, and the properties of linear combinations?
- How does Classical Test Theory (CTT), particularly the domain-sampling model, conceptualize and quantify measurement error to assess reliability?
- What are the practical steps and statistical procedures for constructing conventional multi-item tests, from item writing and analysis to norming?
Glossary
- Test Length
- The number of discrete components (items) that are aggregated to form a composite measure. According to the domain-sampling model, a longer test provides a larger and more stable sample of the content domain.
- Average Item Inter-correlation
- The average degree of linear relationship among the items within a measure. It reflects the extent to which the items share a common core or underlying factor.
- Content Homogeneity
- The degree to which the items in a measure tap into a single, unitary, and clearly defined psychological attribute or domain of content. A homogeneous measure is unidimensional.
- Methodological Heterogeneity
- The degree to which a construct is measured using a diversity of methods, item formats, or situations. This practice ensures that the measured construct is not merely an artifact of a single method and enhances its generalizability.
- Measurement Reliability
- The extent to which a measure is free from random measurement error, reflecting its consistency, repeatability, and precision. Formally, it is the proportion of observed score variance that is attributable to true score variance.
- Construct Validity
- The degree to which a measure faithfully represents the theoretical construct it purports to assess. It is the central, unifying concept of validity, supported by a cumulative network of evidence regarding a measure's meaning.
- Predictive Validity
- The degree to which a measure accurately forecasts a specific, external criterion behavior. It is a pragmatic form of validity focused on the functional relationship between a predictor and an outcome.
- Content Validity
- The degree to which the items or content of a measure constitute a representative and adequate sample of a defined domain of knowledge, skill, or behavior.
Related in the library