What is PeopleAnalyst?

PeopleAnalyst is the front door for people-analytics research: 205+ works indexed and profiled, 40+ citation-grade findings extracted, and peer-reviewed behavioral science translated from academic to actionable — the missing manual for the people analytics you always meant to do.

What is people analytics?

People analytics is not a dashboard. It is behavioral science and statistical inference applied to workforce decisions — a discipline with its own methodology, spanning measurement, organizational design, talent, leadership, and analytics craft.

Why does AI in HR need measurement science?

AI is being deployed in high-stakes people decisions — hiring, performance, attrition — without the measurement science to evaluate whether it works or whom it harms. Construct validity, effect sizes, and criterion validity are the vocabulary for asking an AI vendor the right questions.

How is the research made accessible?

The evidence is indexed and searchable: 205+ works, 40+ citation-grade insight cards, and 8 research arcs, so the right finding reaches the right decision at the right time.

What separates good people measurement from assertion?

Good measurement has a method: construct validity, reliability, and effect-size interpretation are not optional — they are what separates evidence from assertion.

library / lib3366bde134e37b0b

An Introduction to Statistical Learning: with Applications in R

Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani · 2013

In a sentence

A practical, accessible introduction to statistical learning that teaches the major supervised and unsupervised methods for understanding and predicting from data, with hands-on Python labs.

An Introduction to Statistical Learning, With Applications in Python (ISLP) demystifies modern data analysis by presenting the most important statistical and machine learning methods in an accessible, application-focused way. Rather than dwelling on heavy mathematics, it builds intuition for the core trade-offs that govern model performance, especially the bias-variance trade-off and the gap between training and test error. Covering linear and logistic regression, resampling, regularization, splines and generalized additive models, tree ensembles, support vector machines, deep learning, survival analysis, unsupervised learning, and multiple testing, the book pairs each topic with a Python lab so readers can immediately implement what they learn. It is ideal for advanced undergraduates, master's students, and working data scientists who want to become informed, capable users of statistical learning tools in a data-driven world.

The four lenses

Science
Statistics
Systems
Strategy

The model

A causal/framework model expressing how design levers (chosen model flexibility, regularization, dimension reduction, resampling-based tuning) and contextual conditions (sample size, dimensionality, signal-to-noise ratio) drive psychological/behavioral analyst states and intermediate statistical states (bias, variance, training error), which jointly determine outcome metrics (test error / generalization, interpretability, error-rate control).

Model Flexibilitydesign lever

The degree to which a chosen statistical learning method can fit a wide range of functional forms, controlled by choices such as polynomial degree, number of knots, tree depth, number of neighbors K, or network size.

Regularization Strengthdesign lever

The amount of penalty applied to constrain or shrink model coefficients, such as the ridge or lasso tuning parameter lambda, dropout rate, or shrinkage in boosting, used to reduce variance.

Resampling-Based Tuningdesign lever

The practice of using cross-validation, the validation set approach, or the bootstrap to estimate test error and select tuning parameters or model complexity rather than relying on training error.

Data Dimensionality and Sample Sizecontextual condition

Contextual condition describing the number of predictors p relative to the number of observations n, including high-dimensional settings where p is comparable to or exceeds n.

Signal-to-Noise Ratiocontextual condition

Contextual condition capturing how much of the variation in the response is explained by the true relationship versus irreducible error, influencing achievable predictive accuracy.

Model Biaspsychological state

The error introduced by approximating a complex real-world relationship with a simpler model; decreases as flexibility increases.

Model Variancepsychological state

The amount by which an estimated model would change if fit on a different training set; increases as flexibility increases and decreases with regularization.

Training Errorbehavioral pattern

The error a fitted model makes on the data used to train it, which decreases monotonically with flexibility and underestimates test error.

Overfitting Riskbehavioral pattern

Behavioral pattern in which a model captures noise in the training data, yielding low training error but high test error.

Test Error / Generalizationoutcome metric

Outcome metric measuring how well a model predicts on previously unseen data, exhibiting a U-shape as flexibility increases; the primary criterion for method selection.

Model Interpretabilityoutcome metric

Outcome metric capturing how easily the relationship between predictors and response can be understood from the fitted model; tends to decrease with flexibility.

Error-Rate Control in Inferenceoutcome metric

Outcome metric capturing how well false-positive errors are controlled when drawing inferential conclusions, especially under multiple testing (FWER or FDR).

How they connect

model flexibility − influences model bias
model flexibility → influences model variance
model flexibility − influences training error
model bias → influences test error
model variance → influences test error
model flexibility → predicts overfitting risk
overfitting risk → predicts test error
regularization strength − influences model variance
regularization strength → influences model bias
regularization strength → influences interpretability
resampling tuning − moderates test error
data dimensionality → moderates overfitting risk
signal to noise − moderates test error
model flexibility − influences interpretability
resampling tuning → influences error rate control

A candidate measure

An Introduction to Statistical Learning: with Applications in R — derived measurement candidates

Model Flexibility

effective degrees of freedom; parameter count

self-report suitability: low

Regularization Strength

value of penalty parameter; fraction of dropped units

self-report suitability: none

Resampling-Based Tuning

number of folds; number of bootstrap samples

self-report suitability: medium

Data Dimensionality and Sample Size

p; n; p/n ratio

self-report suitability: none

Signal-to-Noise Ratio

R-squared; residual variance

self-report suitability: none

Model Bias

squared bias in simulation

self-report suitability: none

Model Variance

prediction variance across resamples

self-report suitability: none

Training Error

training MSE; training error rate

self-report suitability: none

Overfitting Risk

train-test error gap

self-report suitability: low

Test Error / Generalization

test MSE; test error rate; cross-validated error

self-report suitability: none

Model Interpretability

number of nonzero coefficients; model complexity score

self-report suitability: medium

Error-Rate Control in Inference

FWER; FDR

self-report suitability: none

Run the assessment

The story

The reader An advanced student or working data analyst who wants to understand and apply modern statistical learning methods to make sense of complex data.

External problem

They face large, complex datasets and need to predict outcomes or discover structure without knowing which method to use or how to evaluate it.

Internal problem

They feel intimidated by the highly technical, mathematical presentations of machine learning and worry they will misuse methods or overfit.

Philosophical problem

It is wrong that powerful analytical tools should remain locked away behind impenetrable mathematics, accessible only to specialists.

The plan

Build intuition for estimating f and the bias-variance trade-off.
Learn core supervised methods for regression and classification.
Use resampling to honestly estimate test error and tune complexity.
Apply regularization and dimension reduction for high-dimensional data.
Work through hands-on Python labs at the end of each chapter.
Extend to trees, SVMs, deep learning, survival analysis, and unsupervised learning.

Success

Confidently selects appropriate methods and evaluates them with cross-validation or test sets.
Avoids overfitting and interprets models honestly, including in high dimensions.
Implements analyses in Python and contributes effectively in a data-driven field.

At stake

Chooses methods blindly, overfits, and reports misleading results.
Misinterprets training error or p-values and makes false discoveries.
Remains a passive consumer of black-box tools without understanding their trade-offs.

Questions this book answers

How do we estimate the function f relating predictors to a response, and why?
How do we choose between flexible and inflexible models given the bias-variance trade-off?
How do we honestly estimate test error and select model complexity?
Which methods are appropriate for prediction versus inference, and for regression versus classification?
How do we handle high-dimensional data, regularize models, and avoid overfitting?

Glossary

Model Flexibility: The capacity of a chosen statistical learning method to represent a wide variety of functional forms relating predictors to response.
Regularization Strength: The degree to which model coefficients are penalized or constrained to reduce variance.
Resampling-Based Tuning: The use of resampling procedures to estimate test error and select model complexity.
Data Dimensionality and Sample Size: The relationship between the number of predictors and the number of observations in a dataset.
Signal-to-Noise Ratio: The proportion of variation in the response attributable to the true relationship versus irreducible error.
Model Bias: The systematic error from approximating a complex true relationship with a simpler model.
Model Variance: The variability of a fitted model across different training samples.
Training Error: The error a model makes on the data used to fit it.

Related in the library

Tools these methods power