What is PeopleAnalyst?

PeopleAnalyst is the front door for people-analytics research: 205+ works indexed and profiled, 40+ citation-grade findings extracted, and peer-reviewed behavioral science translated from academic to actionable — the missing manual for the people analytics you always meant to do.

What is people analytics?

People analytics is not a dashboard. It is behavioral science and statistical inference applied to workforce decisions — a discipline with its own methodology, spanning measurement, organizational design, talent, leadership, and analytics craft.

Why does AI in HR need measurement science?

AI is being deployed in high-stakes people decisions — hiring, performance, attrition — without the measurement science to evaluate whether it works or whom it harms. Construct validity, effect sizes, and criterion validity are the vocabulary for asking an AI vendor the right questions.

How is the research made accessible?

The evidence is indexed and searchable: 205+ works, 40+ citation-grade insight cards, and 8 research arcs, so the right finding reaches the right decision at the right time.

What separates good people measurement from assertion?

Good measurement has a method: construct validity, reliability, and effect-size interpretation are not optional — they are what separates evidence from assertion.

library / libac52e82ab55289d6

Multilevel statistical models

Goldstein, Harvey, -, Goldstein, Harvey .

In a sentence

A comprehensive statistical textbook that introduces the theory and application of multilevel models for analyzing hierarchically structured and cross-classified data common in the social and biological sciences.

Researchers and analysts in fields like education, epidemiology, and economics frequently encounter data with a natural hierarchy—students are nested within schools, patients within clinics, or repeated measurements within individuals. Traditional statistical methods like Ordinary Least Squares regression are invalid for such data because they ignore the clustering, leading to incorrect standard errors and flawed conclusions. "Multilevel Statistical Models" provides the definitive, systematic framework for correctly analyzing this type of data. The book starts with the foundational two-level linear model, explaining how to partition variance and model relationships that vary across groups. It then progressively extends this framework to handle a vast array of real-world complexities, including multivariate responses, nonlinear relationships, discrete and categorical outcomes, event history data, cross-classified structures, measurement errors, and missing data. Written by a pioneer in the field, this book serves as both a graduate-level textbook and an essential reference, equipping readers with the theory, practical examples, and advanced techniques needed to gain deeper, more valid insights from their complex data.

The four lenses

Science
Statistics
Systems
Strategy

Individual-Level Predictors

Pre-test scores; Age, gender, socio-economic status indicators (e.g., parental occupation); Responses on Likert-scale attitude surveys

self-report suitability: high

Group-Level Predictors

School's policy on student tracking (e.g., streaming vs. mixed-ability).; Per-pupil expenditure.; School-level mean of student prior achievement scores.; Percentage of students from low-income families.

self-report suitability: medium

Group Contextual Structure

Student ID and School ID variables in a dataset.; Subject ID and Occasion ID for repeated measures.; Student ID, Primary School ID, and Secondary School ID for cross-classification.

self-report suitability: none

Random Effects (Between-Group Heterogeneity)

Variance of the level-2 residuals (random intercept variance).; Variance of the coefficients of level-1 predictors across level-2 units (random slope variance).; Covariance between random intercepts and slopes.; Intra-class Correlation Coefficient (ICC).

self-report suitability: none

Individual-Level Outcome

Score on a final examination or assessment.; Binary choice in a survey (e.g., voted/did not vote).; Count of behaviors in an observation period.; Time until recovery from an illness.

self-report suitability: high

Run the assessment

The story

The reader A quantitative researcher, data analyst, or graduate student working with complex, clustered data—such as students nested within schools, repeated measurements on individuals, or patients within hospitals.

External problem

Traditional statistical models like OLS regression assume observations are independent, an assumption that is violated by hierarchical data. Using these standard methods produces incorrect standard errors, leading to flawed statistical inferences and an inability to properly study group-level effects.

Internal problem

The researcher feels uncertain and frustrated, knowing that their standard analysis is likely invalid but lacking the specialized knowledge and tools to correctly model the complex structure of their data. They worry their findings are not defensible.

Philosophical problem

It is fundamentally wrong to ignore the real-world contexts and groupings that structure data; a proper statistical analysis must respect and model this structure, not pretend it doesn't exist.

The plan

Understand the fundamentals by learning the basic two-level linear model.
Master the estimation procedures and learn to interpret fixed effects, random effects, and variance components.
Extend the basic model to handle more complex data structures, including three or more levels and complex variance patterns.
Apply the multilevel framework to a wide variety of data types, including multivariate, nonlinear, discrete, and longitudinal data.
Learn advanced techniques for handling non-nested structures (cross-classifications), measurement error, and missing data.

Success

The researcher can confidently and correctly analyze complex hierarchical and cross-classified data.
They can produce statistically valid and efficient estimates, enabling robust and defensible research conclusions.
They are able to partition variance across levels and explicitly model contextual effects, leading to deeper and more nuanced insights into their data (e.g., quantifying school effectiveness).

At stake

Continuing to use inappropriate single-level models, leading to flawed conclusions, invalid inferences, and research that may be challenged or retracted.
Resorting to simplistic and incorrect approaches like aggregating data to the group level, which discards information and leads to the ecological fallacy.
Failing to understand and quantify the crucial role that group contexts play in shaping individual outcomes.

Chapter by chapter

ch01Introduction
This chapter introduces the complexities of analyzing hierarchically structured data, highlighting the implications of ignoring such structures in statistical modelling, particularly in fields like education and social sciences.
- Hierarchical structures in data are foundational to understanding phenomena in educational and social sciences.
- Ignoring these structures can lead to invalid conclusions and hamper effective policy decision-making.
- Multilevel modeling provides tools for accurate estimation of relationships, accommodating the complexities of data structures.
- The distinction between individual-level analyses and group-level effects is crucial for valid inference and interpretation.
ch03Extensions to the Basic Multilevel Model
This chapter explores the complexities of variance structures in multilevel models, extending the basic framework to better accommodate varying responses at different levels using explanatory variables.
- A single variance assumption in multilevel models can obscure critical relationships within hierarchical data.
- Modeling variance as a function of explanatory variables enriches interpretations and aids in the formulation of policy decisions.
- The choice of modeling approach can heavily influence findings, necessitating thorough justification for analytical techniques.
- Incorporating complex variance structures can yield significantly more accurate estimates compared to traditional models, particularly in educational settings.
ch04The Multivariate Multilevel Model
This chapter explores how to model multivariate responses through multilevel modeling, using practical applications in educational assessments to illustrate the method's utility and effectiveness.
ch05Nonlinear Multilevel Models
This chapter explores the intricacies of nonlinear multilevel models, emphasizing their application to growth data and other contexts where linear assumptions fall short.
ch06Models for Repeated Measures Data
This chapter explores the complexities and methodologies for modeling repeated measures data, emphasizing the distinction between conditional and unconditional models, the need for appropriate multilevel specifications, and the implications of autocorrelation in data.
ch07Multilevel Models for Discrete Response Data
This chapter details the application of multilevel models for analyzing discrete response data, highlighting the advantages of using statistical techniques suited for proportion and count data—in particular, the methods that address the unique challenges presented by such datasets.
- The transition from continuous to discrete models is not merely a methodological shift; it fundamentally alters data interpretation.
- Generalized Linear Models provide the necessary flexibility to address the complexities inherent in discrete response data.
- The choice of link function significantly influences model outcomes and should be chosen based on the specific nature of the data.
- Extra-binomial variation is a crucial concept in the analysis of discrete data and must be tested for accurate inferential statistics.
ch08Multilevel Cross Classification
This chapter explores the complexities of multilevel cross classification models, illustrating how varying levels of data interpretation impact educational assessments and their outcomes.
- Variance interpretation in educational assessments is multifaceted; simplistic conclusions can misrepresent school effectiveness.
- Prior achievement metrics need to be carefully considered in analyses to provide a fair evaluation of educational outcomes.
- Computational efficiency is crucial as models with significant cross classifications can lead to time-consuming analyses without additional insight.
- The structure and complexity of student classifications significantly impact data interpretation, reflecting real educational environments.
ch09Multilevel Event History Models
This chapter explores the complexities of modeling duration data using multilevel event history approaches, focusing on how to handle right-censored data and random effects within nested data structures.
ch10Multilevel Models with Measurement Errors
This chapter addresses the critical implications of measurement errors in multilevel models, emphasizing the need to appropriately account for these errors to ensure accurate statistical inferences.
- Measurement errors are ubiquitous in social and biological research, necessitating thoughtful consideration in analysis.
- Without addressing measurement errors, researchers risk drawing conclusions that may be fundamentally flawed or misleading.
- Reliability is context-dependent; its interpretation varies significantly across demographic and social subgroups.
- The integration of sophisticated estimation methods for measurement errors can significantly enhance the validity of findings.
ch11Software for Multilevel Modelling; Missing Data and Multilevel Structural Equation Models
This chapter explores specialized software for multilevel modeling, addressing their need for complex data structures, the challenges of missing data, and the implementation of multilevel structural equation models.
- The development of specialized software for multilevel modeling is crucial for analyzing complex datasets that reflect the hierarchies within real-world structures.
- Missing data must be understood not merely as a nuisance but as a critical factor that can substantially affect the validity of research findings.
- Incorporating robust statistical methods for handling missingness, such as multiple imputation, can greatly enhance data integrity and research outcomes.
- The evolution of Bayesian methods and their implementation in software represents a progressive shift toward more sophisticated multilevel analyses.

Questions this book answers

How can we statistically model data with a hierarchical or clustered structure (e.g., students in schools, repeated measures on individuals)?
Why are traditional statistical methods like Ordinary Least Squares (OLS) regression often invalid for analyzing hierarchical data, and what are the consequences of ignoring the data structure?
What are the core components of a multilevel model, including fixed effects, random effects, and variance components?
How are the parameters of a multilevel model estimated, and how are the results interpreted?
How can the basic linear multilevel model be extended to handle more complex data types like multivariate outcomes, nonlinear relationships, discrete responses, and event history (survival) data?

Glossary

Individual-Level Predictors: A set of explanatory variables measured at the lowest level (Level 1) of a data hierarchy. These variables represent characteristics specific to each individual observation, such as a student's prior test score, a patient's demographic information, or the time point of a repeated measurement.
Group-Level Predictors: A set of explanatory variables measured at a higher level (Level 2 or above) of a data hierarchy. These variables represent characteristics of the context or group to which lower-level units belong. They can be intrinsic properties of the group or aggregates of individual-level characteristics ('compositional effects').
Group Contextual Structure: The fundamental structure of the data whereby individual units are nested within higher-level groups, or are cross-classified by multiple grouping factors. This structure creates statistical dependence among observations in the same group, violating the independence assumption of single-level models.
Random Effects (Between-Group Heterogeneity): The unobserved, stochastic component of the model that captures the heterogeneity between groups. It represents the extent to which baseline outcomes (intercepts) and within-group relationships (slopes) vary across the population of higher-level units.
Individual-Level Outcome: The primary dependent or response variable of interest, measured at the lowest level of the data hierarchy (Level 1).