What is PeopleAnalyst?

PeopleAnalyst is the front door for people-analytics research: 205+ works indexed and profiled, 40+ citation-grade findings extracted, and peer-reviewed behavioral science translated from academic to actionable — the missing manual for the people analytics you always meant to do.

What is people analytics?

People analytics is not a dashboard. It is behavioral science and statistical inference applied to workforce decisions — a discipline with its own methodology, spanning measurement, organizational design, talent, leadership, and analytics craft.

Why does AI in HR need measurement science?

AI is being deployed in high-stakes people decisions — hiring, performance, attrition — without the measurement science to evaluate whether it works or whom it harms. Construct validity, effect sizes, and criterion validity are the vocabulary for asking an AI vendor the right questions.

How is the research made accessible?

The evidence is indexed and searchable: 205+ works, 40+ citation-grade insight cards, and 8 research arcs, so the right finding reaches the right decision at the right time.

What separates good people measurement from assertion?

Good measurement has a method: construct validity, reliability, and effect-size interpretation are not optional — they are what separates evidence from assertion.

library / lib7732f47814ae8c96

Bayesian Multilevel Models for Repeated Measures dаta A Conceptual and Practical Introduction in R

Santiago Barreda, Noah Silbert

In a sentence

A practical, hands-on introduction to building, fitting, and interpreting Bayesian multilevel models for repeated measures data using the R package brms.

This book offers a hands-on, conceptual introduction to Bayesian multilevel models for analyzing repeated measures data, a common data type in linguistics, psychology, and cognitive science. Starting with simple models and progressing to more complex ones like multinomial regression, the authors use a single, realistic experimental dataset throughout to provide fully worked examples in R using the `brms` package. Instead of getting bogged down in mathematical theory, the book focuses on building intuitive, geometric understanding and practical coding skills, making it accessible for readers with any level of statistical background who want to move beyond traditional methods and harness the flexibility of Bayesian modeling for their own research.

The four lenses

Science
Statistics
Systems
Strategy

Conducting a Bayesian Multilevel Model Analysis

To analyze repeated measures data using Bayesian multilevel regression models in R to answer specific research questions.

When to use: When a researcher has collected data, particularly with a repeated measures or hierarchical structure, and needs to perform statistical analysis to test hypotheses or explore relationships between variables.

Step 1Define research questions and prepare the data.
Entry: A dataset and a set of research questions are available.
Exit: The data is in a tidy format (e.g., long format) in an R data frame, ready for modeling.
In: Raw experimental data, Research questions · Out: A clean data frame for analysis, Initial data visualizations
Step 2Specify the model structure and priors.
Entry: Data is prepared and research questions are defined.
Exit: A complete `brm` formula and a set of prior definitions are ready.
- Which predictors to include as fixed vs. random effects?
- Which interactions to model?
- What data distribution (family) and link function to use?
- What should the prior distributions be?
In: Clean data frame, Research questions · Out: brms model formula, Prior specifications
Step 3Fit the model using the `brm` function.
Entry: A complete model specification (formula and priors) is defined.
Exit: The MCMC sampling is complete and a `brmsfit` model object is created.
- How many chains and iterations are needed for convergence?
In: brms model formula, Prior specifications, Data frame · Out: A fitted `brmsfit` object containing posterior samples
Step 4Assess model convergence and fit.
Entry: A fitted `brmsfit` object is available.
Exit: Confidence that the model has converged and provides a reasonable representation of the data.
- Does the model need to be refit with more iterations or different settings (e.g., adapt_delta)?
In: Fitted `brmsfit` object · Out: Convergence diagnostics (Rhat, ESS), Posterior predictive check plots
Step 5Interpret the posterior distributions of parameters.
Entry: A converged and well-fitting model is available.
Exit: A clear understanding of the magnitude, direction, and uncertainty of all relevant model parameters.
In: Fitted `brmsfit` object · Out: Summary tables of coefficients, Visualizations of effects, Derived quantities of interest
Step 6Compare candidate models if applicable.
Entry: Two or more fitted `brmsfit` objects are available.
Exit: A decision on the preferred model, or an understanding of the trade-offs between models.
- Is the improvement in model fit from added complexity meaningful?
In: Multiple fitted `brmsfit` objects · Out: Model comparison table (e.g., from `loo_compare`)
Step 7Synthesize and report the findings.
Entry: Model interpretation and comparison are complete.
Exit: A complete report or manuscript describing the research.
In: All model outputs and interpretations · Out: Written report, presentation, or publication

Performing a Prior Predictive Check

To understand the implications of chosen prior distributions by simulating data from the priors alone, ensuring they generate plausible outcomes before conditioning on the data.

When to use: During the model specification phase (Step 2 of the main analysis process), before fitting the final model to the data.

Step 1Specify the full model, including the formula and priors.
Entry: A candidate model structure and set of priors have been chosen.
Exit: A `brm` formula and a list of priors are defined.
In: Model structure, Proposed prior distributions · Out: brms model formula, Prior specifications
Step 2Fit a 'prior-only' version of the model.
Entry: Model formula and priors are specified.
Exit: A `brmsfit` object containing samples from the prior predictive distribution is created.
In: brms model formula, Prior specifications · Out: Fitted prior-only `brmsfit` object
Step 3Generate and visualize data from the prior predictive distribution.
Entry: A prior-only model has been fit.
Exit: Visualizations of the data simulated from the priors are created.
In: Fitted prior-only `brmsfit` object · Out: Simulated datasets, Plots of prior predictive distributions
Step 4Evaluate the plausibility of the simulated data.
Entry: Visualizations of prior predictive distributions are available.
Exit: A judgment on the appropriateness of the current priors.
- Are the priors too wide, too narrow, or centered incorrectly?
In: Plots of prior predictive distributions, Domain knowledge · Out: Assessment of prior plausibility
Step 5Adjust priors and repeat if necessary.
Entry: The current priors have been deemed implausible.
Exit: A set of priors that generate plausible data has been identified.
In: Assessment of prior plausibility · Out: Revised prior specifications

Comparing Bayesian Models using Information Criteria

To select the best-performing model from a set of candidates by estimating their out-of-sample predictive accuracy.

When to use: During the model selection phase (Step 6 of the main analysis process), after fitting several plausible models.

Step 1Fit all candidate models.
Entry: A dataset and at least two competing model specifications are available.
Exit: Multiple fitted `brmsfit` objects are created.
In: Data frame, Multiple model formulas · Out: Multiple fitted `brmsfit` objects
Step 2Calculate the information criterion for each model.
Entry: Multiple fitted `brmsfit` objects are available.
Exit: Each model object is updated to include the calculated information criterion.
In: Fitted `brmsfit` objects · Out: `brmsfit` objects with LOO/WAIC information
Step 3Compare the models using `loo_compare`.
Entry: All candidate models have had information criteria calculated.
Exit: A model comparison table is generated.
In: `brmsfit` objects with LOO/WAIC information · Out: A comparison table ranking models by ELPD
Step 4Interpret the comparison table and check diagnostics.
Entry: A model comparison table is available.
Exit: A decision about the relative performance of the models.
- Is the difference in predictive performance large enough to justify selecting the more complex model?
In: Model comparison table · Out: Identification of the preferred model based on predictive accuracy

Conducting a Bayesian Analysis of Variance (BANOVA)

To decompose the variance in a dependent variable and assess the relative importance of different groups ('batches') of predictors in a complex multilevel model.

When to use: During the interpretation phase of a complex model (within Step 5 of the main analysis process) to get a high-level overview of which factors are driving the variation in the outcome.

Step 1Fit a full, complex Bayesian multilevel model.
Entry: A complex model structure has been defined.
Exit: A fitted `brmsfit` object for the full model is available.
In: Full model specification · Out: Fitted `brmsfit` object
Step 2Calculate the standard deviation for each batch of parameters.
Entry: A fitted `brmsfit` object is available.
Exit: A summary table of standard deviations for each variance component is created.
In: Fitted `brmsfit` object · Out: A data frame of standard deviation summaries for each parameter batch
Step 3Create a Bayesian ANOVA plot.
Entry: Standard deviations for parameter batches have been calculated.
Exit: A BANOVA plot is generated.
In: Standard deviation summaries · Out: BANOVA plot
Step 4Interpret the plot to assess the importance of predictors.
Entry: A BANOVA plot is available.
Exit: An understanding of the relative importance of each variance component in the model.
- Are some predictors or interactions negligible and could potentially be removed in a simpler model?
In: BANOVA plot · Out: Ranked assessment of predictor importance

A candidate measure

Bayesian Multilevel Models for Repeated Measures dаta A Conceptual and Practical Introduction in R — derived measurement candidates

Fundamental Frequency (f0)

Mean f0 in Hertz (Hz) across a vowel nucleus.

self-report suitability: none

Acoustic Vocal-Tract Length (VTL)

Geometric mean of formant frequencies.; VTL in centimeters, derived from formant frequencies relative to a reference.

self-report suitability: none

Apparent Age

Binary classification (Child vs. Adult).; Categorical choice from multiple options (e.g., Boy, Girl, Man, Woman).

self-report suitability: high

Apparent Gender

Binary classification (Male vs. Female).

self-report suitability: high

Apparent Height

Height estimate in centimeters or feet/inches from a slider or numerical input.

self-report suitability: high

Listener Variation

Standard deviation of listener-level random effects (e.g., intercepts, slopes) from a multilevel model.

self-report suitability: none

Speaker Variation

Standard deviation of speaker-level random effects from a multilevel model.

self-report suitability: none

Run the assessment

The story

The reader Researchers, graduate students, and senior undergraduates in fields like linguistics, psychology, and cognitive science. They have repeated measures data and want to analyze it properly, but they find traditional statistics curricula frustratingly slow and indirect. They may feel intimidated by the math or coding involved in modern statistical methods, or feel that their existing skills with simpler models don't fully translate to the Bayesian framework. They want to build flexible, appropriate models for their data and gain a practical, intuitive understanding without getting lost in abstract theory.

External problem

They have complex, repeated measures data, but introductory statistics courses teach inappropriate, oversimplified models (like t-tests or ANOVA) that don't handle this structure, leading to incorrect inferences and publication difficulties.

Internal problem

They feel overwhelmed, confused, and discouraged by statistics. They believe they are "not good at math" and are frustrated that they can't seem to find a straightforward path to analyzing their own research data effectively.

Philosophical problem

It's just plain wrong that researchers should have to learn a series of outdated and inappropriate statistical methods before they are taught the correct, modern tools for their work. Statistical education should be practical, direct, and empowering, not a theoretical gatekeeping exercise.

The plan

Start with a single, realistic repeated measures dataset that will be used throughout the book to build cumulative understanding.
Introduce the core concepts of Bayesian multilevel models conceptually and geometrically, not with dense mathematical proofs.
Provide fully-worked code examples for building, interpreting, and checking progressively more complex models in `brms`.
Offer practical advice on interpreting output, manipulating posterior samples, and writing up results in a publication-ready format.

Success

The reader feels confident and competent in analyzing their own repeated measures data.
They can build, fit, and interpret a wide range of sophisticated Bayesian multilevel models in R.
They have a strong intuitive grasp of statistical concepts and can effectively communicate their analytical methods and results.
They move from feeling statistically anxious to feeling empowered and curious.

At stake

The reader continues to feel stuck, using inappropriate statistical methods or avoiding quantitative analysis altogether.
They remain intimidated by Bayesian methods and modern statistical programming.
Their research is held back by their inability to properly analyze their data, leading to weaker inferences and difficulty publishing.

Chapter by chapter

ch01p01Introduction: Experiments and variables (part 1/2)
This chapter introduces the foundational elements of experimental design and statistical inference, focusing on defining key variables and the structure of the experimental data to be analyzed.
ch01p02Introduction: Experiments and variables (part 2/2)
This chapter delves into the significance of probabilities and statistical inference in research, addressing how these concepts are critical for understanding data and drawing conclusions from it.
- Understanding probabilities is crucial for reliable inferences in research.
- Point estimates alone do not capture the full variability inherent in data; credible ranges are essential.
- The differences between empirical and theoretical probabilities significantly impact research conclusions.
- Conditional probabilities provide a powerful lens through which to view data variation and interdependencies.
ch02Probabilities, likelihood, and inference
This chapter unpacks the concepts of probabilities, joint probabilities, and the nuances of likelihood functions, emphasizing their significance in statistical modeling and inference.
ch03Fitting Bayesian regression models with brms
This chapter explores how to fit Bayesian regression models using the brms package, detailing the theory behind regression, Bayesian inference, and practical implementation.
ch04Inspecting a ‘single group’ of observations using a Bayesian multilevel model
This chapter explores the intricacies of analyzing a single group of observations with Bayesian multilevel models, focusing on the impact of repeated measures data and the importance of modeling within and between group variations.
- Repeated measures data requires careful attention to structure; blindly pooling data can obscure important differences among groups.
- Bayesian multilevel models can reveal systematic patterns in data that traditional models often miss due to their assumption of independence.
- The proper estimation of both within and between group variances is vital for credible model parameters and insights.
- Understanding adaptive pooling allows for more reliable estimates that leverage similarities across observations.
ch05Comparing two groups of observations: Factors and contrasts
This chapter examines how to compare the means of two groups, focusing on the statistical design considerations necessary to accurately interpret observed differences in experimental data.
ch06Variation in parameters (‘random effects’) and model comparison
This chapter examines the variation in listener judgments of height based on the apparent age of the speaker, using random effects models to reveal listener-specific differences that were previously masked.
- Listener judgments of height based on apparent age are varied and cannot be accurately represented with fixed effect models alone.
- The transition to random effects modeling provides a framework that acknowledges and incorporates individual differences in perception.
- Visual data representations illustrate the complexity of auditory judgments, necessitating an individual's perspective for comprehensive analysis.
- Recognizing listener-specific variations is crucial for making informed generalizations in auditory research and communication modeling.
ch07Variation in parameters (‘random effects’) and model comparison
This chapter addresses how to incorporate random effects in statistical models to account for variability among listeners, shedding light on interactions between variables that may influence results in nuanced ways.
- Effective statistical modeling demands a recognition of the inherent variability among data respondents; neglecting this can lead to misleading conclusions.
- Incorporating random effects allows modelers to better capture the nuances of listener-specific responses, yielding deeper insights into apparent age perceptions.
- The significance of interaction effects highlights the conditional dependencies that exist within datasets, indicating that understanding must be contextually grounded.
- Model specifications should prioritize adaptive pooling to enhance parameter estimation reliability, particularly when working with varied groups.
ch08Model comparison
This chapter explores the nuances and methodologies required for effective Bayesian model comparison, emphasizing the delicate balance between model complexity and predictive accuracy.
- Complex models are not inherently better; they can obscure generalizability and lead to overfitting.
- The log pointwise predictive density (lpd) is helpful but should not be the sole determinant for model selection.
- Adjusting lpd to account for model complexity via WAIC can enhance decision-making in model selection.
- Leave-one-out cross-validation (LOO) provides a powerful tool for assessing out-of-sample predictive power efficiently.
ch09p01Comparing many groups, interactions, and posterior predictive checks (part 1/2)
This chapter explores methodologies for analyzing complex experimental data involving multiple groups, focusing on the effects of perceived gender and age on height judgments while addressing the statistical challenges and intricacies involved.
- Understanding the impact of perceived characteristics on judgments requires careful statistical analysis of data involving multiple groups.
- Employing random effects allows for more nuanced interpretations of group interactions in multifactorial datasets.
- Posterior predictive checks are essential for validating the fit of statistical models against actual data.
- Interaction terms can elucidate how different factors interrelate and influence outcomes, underscoring the need for thorough examination in analyses.
ch09p02Comparing many groups, interactions, and posterior predictive checks (part 2/2)
This chapter navigates the intricacies of comparing multiple groups through Bayesian modeling, underscoring the significance of prior predictive checks and the flexibility offered by Bayesian approaches.
- Bayesian models facilitate straightforward comparisons of group effects with greater flexibility than traditional models like lmer.
- Prior predictive checks are critical for validating the plausibility of prior distributions and enhancing model reliability.
- Analysts should anchor their priors on domain-specific knowledge to avoid the pitfalls of uninformative model settings.
- Heteroscedastic modeling offers a nuanced understanding of variability within datasets, providing richer insights.
ch10Varying variances, more about priors, and prior predictive checks
This chapter discusses the implementation of varying variances in hierarchical models, emphasizing the significance of selecting appropriate priors and conducting prior predictive checks to enhance model accuracy and reliability.
ch11Varying variances, more about priors, and prior predictive checks
This chapter delves into the complexities of modeling variance in data, focusing on issues of model identifiability, linear dependence, and the implications of incorporating different types of predictors.
ch12Quantitative predictors and their interactions with factors
This chapter explores the utilization of quantitative predictors in models, specifically focusing on their linear relationships and interactions with categorical factors affecting the measurement of speaker height.
- Establishing a linear relationship between quantitative and categorical predictors enriches understanding of perceptual judgments regarding speaker characteristics.
- Centering predictors is crucial for deriving interpretable intercepts in regression models, leading to more meaningful analyses.
- The interactions between categorical predictors significantly influence outcomes, stressing the need for comprehensive modeling that considers all relevant factors.
- The findings support the idea that quantitative metrics like VTL can offer substantial insights into listener perceptions when appropriately modeled.
ch13Logistic regression and signal detection theory models
This chapter explores the mechanisms and applications of logistic regression and signal detection theory, demonstrating how they can be used to analyze dichotomous variables and enhance our understanding of classification tasks in social perception.
- Logistic regression allows for precise modeling of dichotomous outcomes, providing actionable insights into classification tasks.
- Understanding the relationship between predictors and categorical outcomes is critical to beneficial data interpretation.
- Signal detection theory offers a vital metric system for assessing performance in classification.
- Identifying biases in perception can help refine models and lead to more ethically informed conclusions.
ch14Multiple quantitative predictors, dealing with large models, and Bayesian ANOVA
This chapter explores the complexities of modeling with multiple quantitative predictors using Bayesian ANOVA, emphasizing the advantages of Bayesian methods in high-dimensional settings while outlining best practices for model interpretation and diagnostics.
ch15Bayesian Analysis of Variance
This chapter elucidates how Bayesian analysis can be utilized to interpret the components of variation in dependent variables, primarily through the framework of Bayesian ANOVA, distinguishing it from traditional ANOVA methodologies.
- Bayesian Analysis of Variance (BANOVA) offers a more nuanced understanding of variance and predictor importance compared to traditional methods.
- Emphasizing estimation rather than null hypothesis testing is crucial for interpreting data meaningfully in the age of complex datasets.
- Batches of predictors can meaningfully outline sources of variation, guiding researchers toward significant findings instead of superficial conclusions.
- The concept of superpopulation versus finite-population estimates is essential in understanding the uncertainty surrounding model predictions.
ch16p01Multinomial and Ordinal Regression (part 1/2)
This chapter explores multinomial regression as a robust statistical tool for predicting categorical responses based on multiple independent variables, laying the groundwork for understanding ordinal regression.
ch16p02Multinomial and Ordinal Regression (part 2/2)
This chapter navigates the complexities of analyzing experimental data through multinomial and ordinal regression models, providing a narrative coherence that is often overlooked in academic writing.
- Academic writing should be approached as a narrative structure where each section contributes to an overarching story about your research.
- The effectiveness of your analysis hinges not just on statistical rigor but also on the clarity with which you present your findings.
- Developing coherent models helps clarify the relationships between variables, facilitating reader understanding and engagement.
- Embracing multiple analytical perspectives can enrich the narrative, allowing for a deeper understanding of complex datasets.
ch17Writing up Experiments
This chapter investigates how listeners perceive apparent speaker characteristics, such as age, gender, and height, through speech acoustics, analyzing the systematic errors and consistent biases in these judgments.

Questions this book answers

How can I analyze repeated measures data using a modern statistical framework?
What are Bayesian multilevel models and how do they work conceptually?
How do I specify, fit, and interpret Bayesian multilevel models in R using `brms`?
How do I build progressively more complex models to answer different research questions (e.g., comparing groups, handling interactions, modeling quantitative predictors, logistic/multinomial/ordinal outcomes)?
How do concepts like pooling, random effects, priors, and posterior distributions apply in practice?

Glossary

Fundamental Frequency (f0): The rate of vibration of a speaker's vocal folds, which is the primary acoustic correlate of perceived voice pitch. It serves as a cue for listeners to infer speaker characteristics like size, age, and gender.
Acoustic Vocal-Tract Length (VTL): The effective acoustic length of the vocal tract, from the vocal folds to the lips. It determines the resonance frequencies of speech and is a primary acoustic correlate of perceived speaker size and physical stature.
Apparent Age: The listener's perceptual judgment of a speaker's age group (child vs. adult), derived from vocal cues. This categorization is treated as a psychological state that moderates how other acoustic cues are interpreted.
Apparent Gender: The listener's perceptual judgment of a speaker's gender (male vs. female), derived from vocal cues. This categorization is treated as a psychological state that moderates how other acoustic cues are interpreted.
Apparent Height: The listener's estimate of a speaker's physical height, based on their interpretation of vocal cues. This is the primary behavioral outcome measure in the book's main analysis.
Listener Variation: Systematic, idiosyncratic differences between individual listeners in their baseline judgments and their use of acoustic and categorical cues. This is modeled as a source of non-independence in repeated measures data.
Speaker Variation: Systematic, idiosyncratic differences between individual speakers' voices that influence listener perceptions but are not captured by the main acoustic predictors (f0 and VTL). This represents unexplained but consistent variance attributable to the speaker.