What is PeopleAnalyst?

PeopleAnalyst is the front door for people-analytics research: 205+ works indexed and profiled, 40+ citation-grade findings extracted, and peer-reviewed behavioral science translated from academic to actionable — the missing manual for the people analytics you always meant to do.

What is people analytics?

People analytics is not a dashboard. It is behavioral science and statistical inference applied to workforce decisions — a discipline with its own methodology, spanning measurement, organizational design, talent, leadership, and analytics craft.

Why does AI in HR need measurement science?

AI is being deployed in high-stakes people decisions — hiring, performance, attrition — without the measurement science to evaluate whether it works or whom it harms. Construct validity, effect sizes, and criterion validity are the vocabulary for asking an AI vendor the right questions.

How is the research made accessible?

The evidence is indexed and searchable: 205+ works, 40+ citation-grade insight cards, and 8 research arcs, so the right finding reaches the right decision at the right time.

What separates good people measurement from assertion?

Good measurement has a method: construct validity, reliability, and effect-size interpretation are not optional — they are what separates evidence from assertion.

library / lib45e05489bdf5632f

Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow

Aurélien Géron · 2019

In a sentence

A hands-on, code-first guide that teaches the concepts, tools, and techniques needed to build intelligent systems using Scikit-Learn, Keras, and TensorFlow, from fundamental ML algorithms to deep learning.

This book takes a reader who knows close to nothing about Machine Learning and equips them to implement programs capable of learning from data. Rather than dwelling on theory, it favors a hands-on approach—growing intuitive understanding through concrete working examples and just enough math. Part I covers the fundamentals using Scikit-Learn: framing problems, end-to-end project workflow, classification, training models, support vector machines, decision trees, ensemble methods, dimensionality reduction, and unsupervised learning. Part II dives into neural networks and deep learning with Keras and TensorFlow, covering CNNs for vision, RNNs and Transformers for sequences and language, autoencoders and GANs for generative learning, reinforcement learning, and deploying models at scale. With production-ready frameworks, runnable Jupyter notebooks, and practical guidance on the inevitable challenges of bad data and bad algorithms, it is a comprehensive on-ramp for anyone who wants to apply Machine Learning to real projects.

The four lenses

Science
Statistics
Systems
Strategy

The model

A causal framework expressing how design levers in an ML workflow (data quantity/quality, feature engineering, model complexity, regularization, hyperparameter tuning, and validation rigor) influence intermediate states like overfitting/underfitting and training/validation error, ultimately determining generalization performance in production.

Training Data Quantitydesign lever

The amount of training examples available to the learning algorithm; the book stresses that most ML algorithms need a lot of data to work properly, and that data often matters more than the algorithm for complex problems.

Training Data Quality and Representativenessdesign lever

The degree to which training data is clean, low in noise and errors, free of irrelevant features, and representative of the cases the model must generalize to; nonrepresentative or poor-quality data harms generalization.

Feature Engineeringdesign lever

The process of selecting, extracting, and creating useful features to train on, including feature scaling and combining attributes; better features reduce underfitting and improve predictive performance.

Model Complexitydesign lever

The capacity or number of effective parameters and degrees of freedom of the chosen model relative to the data; too much complexity causes overfitting while too little causes underfitting.

Regularization Strengthdesign lever

The amount of constraint placed on a model (e.g., Ridge, Lasso, Elastic Net penalties, dropout, early stopping) to reduce its effective freedom and the risk of overfitting, controlled by hyperparameters such as alpha.

Hyperparameter Tuning and Model Selectiondesign lever

The systematic search (grid search, randomized search) over hyperparameter values using validation data to select the best model configuration for generalization.

Validation and Evaluation Rigordesign lever

The discipline of setting aside representative test, validation, and train-dev sets and using cross-validation to honestly estimate generalization error and avoid data snooping and data mismatch.

Overfitting/Underfitting Statepsychological state

The intermediate condition of the trained model in which it either performs well on training data but poorly on new data (overfitting) or poorly on both (underfitting), reflecting the bias/variance balance.

Generalization Performanceoutcome metric

The model's accuracy or error on unseen data in production, measured by generalization error and task-appropriate metrics like RMSE, precision/recall, F1, and ROC AUC; the ultimate goal of the ML workflow.

Deployment Monitoring and Maintenancebehavioral pattern

The ongoing process of serving the model, monitoring live performance, checking input data quality, and retraining or rolling back as data evolves and models rot over time.

How they connect

training data quantity − influences overfitting state
training data quality → predicts generalization performance
feature engineering − influences overfitting state
model complexity → influences overfitting state
regularization strength − moderates overfitting state
overfitting state − predicts generalization performance
model complexity → mediates generalization performance
hyperparameter tuning → predicts generalization performance
validation rigor → moderates generalization performance
deployment monitoring → moderates generalization performance

A candidate measure

Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow — derived measurement candidates

Training Data Quantity

number of training instances; ratio of instances to features

self-report suitability: none

Training Data Quality and Representativeness

percent missing per feature; outlier rate; statistical distance between training and production distributions

self-report suitability: low

Feature Engineering

feature importance scores; correlation of engineered features with target; number of pipeline steps

self-report suitability: low

Model Complexity

parameter count; polynomial degree; tree max depth; number of layers/neurons

self-report suitability: none

Regularization Strength

alpha value; l1_ratio; dropout rate; epoch at early stop

self-report suitability: none

Hyperparameter Tuning and Model Selection

number of combinations tested; best cross-validation score; selected hyperparameters

self-report suitability: none

Validation and Evaluation Rigor

presence of held-out test set; number of CV folds; representativeness checks performed

self-report suitability: low

Overfitting/Underfitting State

train-validation error gap; learning curve convergence pattern

self-report suitability: none

Generalization Performance

test RMSE/MAE; accuracy; precision; recall; F1; ROC AUC; generalization error with confidence interval

self-report suitability: none

Deployment Monitoring and Maintenance

downstream business metrics; input data drift metrics; retraining frequency; alert count; human-rater agreement on sampled outputs

self-report suitability: medium

Run the assessment

The story

The reader An aspiring practitioner or developer who knows close to nothing about Machine Learning but wants to build intelligent systems that learn from data and apply them to real projects.

External problem

They need to implement working ML systems—classifiers, regressors, neural networks—but lack the concepts, tools, and intuition to do so.

Internal problem

They feel intimidated by the math, jargon, and overwhelming breadth of ML, unsure where to start or which techniques to trust.

Philosophical problem

It is wrong to be locked out of a transformative technology simply because the path from beginner to capable practitioner seems opaque and theory-heavy.

The plan

Learn what ML is and the main categories of ML systems.
Work through a complete end-to-end project to internalize the workflow.
Master core algorithms for classification and regression and how to evaluate them.
Learn to prepare data with reusable pipelines and to regularize and fine-tune models.
Build and train deep neural networks with Keras and TensorFlow for vision, sequences, and generative tasks.
Deploy, monitor, and maintain models at scale.

Success

You can frame, build, evaluate, and deploy ML systems on real-world data.
You confidently choose appropriate algorithms, metrics, and regularization for each task.
You can build deep neural networks for complex perception, language, and generative problems.
You maintain production models that adapt to changing data.

At stake

You remain stuck applying brittle, hand-coded rules that are hard to maintain.
You build models that overfit or are misled by skewed metrics, deploying systems that fail in production.
You miss the value hidden in your data and fall behind as ML transforms industry.

Questions this book answers

What is Machine Learning and when should you use it instead of traditional programming?
What are the main categories of ML systems (supervised/unsupervised, batch/online, instance-based/model-based)?
How do you carry out an end-to-end ML project from framing the problem to deployment?
How do you train, evaluate, regularize, and fine-tune models without overfitting or underfitting?
How do you build and train deep neural networks for vision, sequences, language, and generative tasks?

Glossary

Training Data Quantity: The volume of labeled or unlabeled examples available for training a model, which determines how well the algorithm can learn underlying patterns rather than noise.
Training Data Quality and Representativeness: The cleanliness, relevance, and representativeness of training data relative to the production distribution.
Feature Engineering: The deliberate transformation, selection, extraction, scaling, and creation of features to make the data more learnable.
Model Complexity: The effective capacity of a model to fit varied functions, governed by parameters and degrees of freedom.
Regularization Strength: The degree to which a model's freedom is constrained to reduce overfitting.
Hyperparameter Tuning and Model Selection: The systematic search over model configurations to select the best-generalizing model.
Validation and Evaluation Rigor: The discipline applied to estimating generalization honestly through proper data splits and cross-validation.
Overfitting/Underfitting State: The bias/variance condition of the trained model reflecting its fit to training versus unseen data.

Related in the library

Tools these methods power