What is PeopleAnalyst?

PeopleAnalyst is the front door for people-analytics research: 205+ works indexed and profiled, 40+ citation-grade findings extracted, and peer-reviewed behavioral science translated from academic to actionable — the missing manual for the people analytics you always meant to do.

What is people analytics?

People analytics is not a dashboard. It is behavioral science and statistical inference applied to workforce decisions — a discipline with its own methodology, spanning measurement, organizational design, talent, leadership, and analytics craft.

Why does AI in HR need measurement science?

AI is being deployed in high-stakes people decisions — hiring, performance, attrition — without the measurement science to evaluate whether it works or whom it harms. Construct validity, effect sizes, and criterion validity are the vocabulary for asking an AI vendor the right questions.

How is the research made accessible?

The evidence is indexed and searchable: 205+ works, 40+ citation-grade insight cards, and 8 research arcs, so the right finding reaches the right decision at the right time.

What separates good people measurement from assertion?

Good measurement has a method: construct validity, reliability, and effect-size interpretation are not optional — they are what separates evidence from assertion.

library / lib2e5af4f48743f745

Data Science from Scratch: First Principles with Python

Joel Grus · 2015

In a sentence

A hands-on introduction to data science that teaches the core concepts, algorithms, and mathematics by implementing everything from scratch in Python rather than relying on existing libraries.

Data Science from Scratch teaches you data science by having you build the tools and algorithms yourself, from the ground up, in clear and readable Python. Rather than treating libraries like NumPy, scikit-learn, and pandas as magic black boxes, Joel Grus walks you through implementing linear algebra, statistics, probability, gradient descent, machine learning models, neural networks, deep learning, clustering, NLP, network analysis, and recommender systems by hand. Framed around the fictional 'DataSciencester' social network, the book grounds abstract concepts in concrete problems while developing the hacking skills and mathematical intuition that are at the core of doing real data science. By the end you understand not just how to use data science tools, but how and why they work.

The four lenses

Science
Statistics
Systems
Strategy

The model

A model expressing how the book's design levers (learning from scratch, mathematical foundations, data handling, modeling choices) drive learner and model states (understanding, code correctness, model fit) and outcomes (data science competence, predictive performance, ethical responsibility).

From-Scratch Implementation Practicedesign lever

The pedagogical practice of building data science tools and algorithms by hand in clear Python rather than relying on existing libraries, in order to better understand how they work internally.

Mathematical Foundation (Linear Algebra, Statistics, Probability)design lever

The learner's grounding in the core mathematics underpinning data science, including vectors, matrices, central tendency, dispersion, correlation, probability distributions, and inference, which the book treats as essential prerequisites.

Data Acquisition and Cleaning Effortbehavioral pattern

The work of getting data via files, web scraping, and APIs, and then cleaning, munging, exploring, rescaling, and manipulating it into usable form, which consumes a large fraction of a data scientist's time.

Model Complexitydesign lever

The richness of the chosen model in terms of number of parameters, features, and flexibility, which influences how well the model can fit training data and how prone it is to overfitting or underfitting.

Gradient Descent Optimizationbehavioral pattern

The iterative technique of computing gradients of a loss function and taking steps in the opposite direction to fit model parameters, serving as a unifying method for fitting many models throughout the book.

Conceptual Understandingpsychological state

The learner's genuine internalized grasp of how and why data science algorithms and mathematics work, as opposed to merely being able to invoke library functions, which the book treats as its central pedagogical goal.

Code Correctnessbehavioral pattern

The degree to which implemented code does what it is intended to do, supported by clean coding, type annotations, and liberal use of assert statements and automated testing.

Model Fit Qualityoutcome metric

How well a fitted model captures the patterns in data, reflected in goodness-of-fit measures like R-squared and loss, and balanced against generalization to unseen data.

Predictive Performanceoutcome metric

How well a model performs on new, unseen data, measured by metrics such as accuracy, precision, recall, and F1 score, and judged against the risk of overfitting.

Data Science Competenceoutcome metric

The learner's overall ability to do data science independently, combining hacking skills, mathematical intuition, and understanding of models, which is the ultimate aspiration the book sets for the reader.

Ethical Responsibilityoutcome metric

The data scientist's commitment to considering and mitigating the ethical consequences of their work, including bias, fairness, privacy, interpretability, and the wide-reaching effects of scalable technology.

How they connect

from scratch implementation → predicts conceptual understanding
mathematical foundation → predicts conceptual understanding
from scratch implementation → influences code correctness
code correctness → influences model fit quality
data acquisition and cleaning → influences model fit quality
gradient descent optimization → mediates model fit quality
model complexity → influences model fit quality
model complexity − moderates predictive performance
model fit quality → predicts predictive performance
conceptual understanding → predicts data science competence
predictive performance → influences data science competence
data science competence → influences ethical responsibility
data acquisition and cleaning → correlates ethical responsibility

A candidate measure

Data Science from Scratch: First Principles with Python — derived measurement candidates

From-Scratch Implementation Practice

proportion of algorithms implemented by hand; count of library imports for core logic (inverse); presence of explanatory comments

self-report suitability: high

Mathematical Foundation (Linear Algebra, Statistics, Probability)

scores on math problem sets; self-rated topic familiarity; error rate in computations

self-report suitability: medium

Data Acquisition and Cleaning Effort

time spent on data preparation; number of cleaning steps performed; fraction of invalid rows handled

self-report suitability: medium

Model Complexity

parameter count; number of nonzero coefficients; polynomial degree; regularization strength

self-report suitability: low

Gradient Descent Optimization

loss reduction per epoch; number of epochs to converge; learning rate

self-report suitability: low

Conceptual Understanding

explanation quality scores; independent reimplementation success; method selection accuracy

self-report suitability: medium

Code Correctness

assertion pass rate; test pass rate; number of type errors (inverse)

self-report suitability: low

Model Fit Quality

R-squared; mean squared error; training loss

self-report suitability: none

Predictive Performance

accuracy; precision; recall; F1 score

self-report suitability: none

Data Science Competence

portfolio quality rubric; method selection accuracy; successful library usage

self-report suitability: medium

Ethical Responsibility

presence of ethical review processes; bias/fairness audit results; privacy safeguards in place

self-report suitability: high

Run the assessment

The story

The reader An aspiring data scientist with some mathematical aptitude and programming skill who wants to genuinely understand how data science works, not just call library functions.

External problem

They need to learn the core algorithms, mathematics, and tools of data science well enough to actually do the work.

Internal problem

They feel like an underachiever or impostor who can use libraries but doesn't truly understand what's happening under the hood.

Philosophical problem

Treating data science tools as magic black boxes is the wrong way to learn; true competence comes from understanding things from first principles.

The plan

Get comfortable with Python and the language features that matter for data science.
Build a foundation in linear algebra, statistics, and probability.
Learn to get, clean, explore, and manipulate real data.
Implement core machine learning models and evaluation techniques from scratch.
Advance to neural networks, deep learning, clustering, NLP, and recommender systems.
Consider the ethical consequences of your data work and then move on to using production libraries.

Success

You possess a solid understanding of the fundamentals of data science.
You can build, train, and evaluate models while understanding how they work.
You can confidently use production libraries because you know what they do under the hood.
You can find datasets that interest you and do your own data science projects.

At stake

You remain dependent on libraries you don't understand and can't debug or extend them.
You build models that overfit, mislead, or behave unethically without realizing it.
You stay stuck feeling like an impostor unable to do real data science work.

Questions this book answers

What is data science and what does a data scientist actually do?
How do the fundamental algorithms of machine learning work under the hood?
What mathematics (linear algebra, statistics, probability) do you need to do data science?
How do you get, clean, explore, and manipulate data?
How can you build predictive models, neural networks, and recommender systems from first principles?

Glossary

From-Scratch Implementation Practice: The practice of building data science algorithms and tools by hand in readable Python instead of using prebuilt libraries, undertaken to understand how the methods work internally.
Mathematical Foundation (Linear Algebra, Statistics, Probability): A learner's grounding in the core mathematical disciplines that underpin data science methods.
Data Acquisition and Cleaning Effort: The work involved in obtaining data and transforming it into a clean, usable form before analysis or modeling.
Model Complexity: The flexibility of a model as captured by its number of parameters, features, or representational capacity.
Gradient Descent Optimization: The iterative parameter-fitting process that minimizes a loss function by repeatedly stepping against the gradient.
Conceptual Understanding: A learner's internalized grasp of how and why data science techniques work, beyond surface-level tool usage.
Code Correctness: The degree to which implemented code behaves as intended, supported by testing, assertions, and type checking.
Model Fit Quality: How well a fitted model captures the structure in the data on which it was trained.

Related in the library

Tools these methods power