What is PeopleAnalyst?

PeopleAnalyst is the front door for people-analytics research: 205+ works indexed and profiled, 40+ citation-grade findings extracted, and peer-reviewed behavioral science translated from academic to actionable — the missing manual for the people analytics you always meant to do.

What is people analytics?

People analytics is not a dashboard. It is behavioral science and statistical inference applied to workforce decisions — a discipline with its own methodology, spanning measurement, organizational design, talent, leadership, and analytics craft.

Why does AI in HR need measurement science?

AI is being deployed in high-stakes people decisions — hiring, performance, attrition — without the measurement science to evaluate whether it works or whom it harms. Construct validity, effect sizes, and criterion validity are the vocabulary for asking an AI vendor the right questions.

How is the research made accessible?

The evidence is indexed and searchable: 205+ works, 40+ citation-grade insight cards, and 8 research arcs, so the right finding reaches the right decision at the right time.

What separates good people measurement from assertion?

Good measurement has a method: construct validity, reliability, and effect-size interpretation are not optional — they are what separates evidence from assertion.

library / libec60ceb4f546f825

R for Data Science

Hadley Wickham

In a sentence

A practical, hands-on guide to doing data science in R using the tidyverse, walking the reader through the complete workflow of importing, tidying, transforming, visualizing, modeling, and communicating data.

R for Data Science teaches you how to turn raw data into understanding, insight, and knowledge using R and the tidyverse collection of packages. Rather than starting with the boring parts (data ingest and cleaning), the book begins with visualization and transformation of clean data so your motivation stays high, then progressively layers in programming skills, data wrangling, modeling, and communication. Written by Hadley Wickham (creator of much of the tidyverse) and Garrett Grolemund, the book unabashedly focuses on the most important 80% of data science tasks—hypothesis generation and exploratory data analysis on rectangular, in-memory datasets—giving you a coherent, opinionated toolkit (ggplot2, dplyr, tidyr, readr, purrr, and more) that share a common philosophy and work together naturally. By the end you'll have a reusable mental model of the data science process and the concrete R skills to execute it, plus pointers to deeper resources for the remaining 20%.

The four lenses

Science
Statistics
Systems
Strategy

Tags

applied-statisticssoftware-engineering

The model

A causal-framework model expressing how adopting tidy data practices, a coherent integrated toolkit, and code-duplication-reduction habits drive psychological states (motivation, cognitive clarity) and behavioral patterns (iterative exploration, reproducible communication) that produce the outcomes of insight generation and analytic productivity. Inferred from the book's repeated arguments that consistent data structure and an opinionated toolkit let analysts focus their struggle on questions rather than tool-fighting.

Tidy Data Adoptiondesign lever

The degree to which an analyst stores data in a consistent form where each variable is a column, each observation is a row, and each value is a cell, matching dataset semantics to storage structure.

Integrated Toolkit Usedesign lever

The extent to which an analyst uses a coherent, philosophically consistent set of tools (the tidyverse: ggplot2, dplyr, tidyr, readr, purrr) designed to work together naturally rather than ad hoc, inconsistent tools.

Code Duplication Reductionbehavioral pattern

The practice of extracting repeated code into functions and using iteration tools to avoid copying and pasting, following the Don't Repeat Yourself principle to reduce errors and clarify intent.

Analyst Motivationpsychological state

The psychological state of sustained engagement and willingness to persist through frustration, kept high by experiencing early payoff from visualization before enduring tedious tasks like data ingest and tidying.

Cognitive Claritypsychological state

The reduced cognitive load and increased ability to focus attention on substantive data questions rather than on wrangling data into the right form or deciphering inconsistent tools and code.

Iterative Exploration Behaviorbehavioral pattern

The behavioral pattern of rapidly generating questions, visualizing, transforming, and modeling data, then refining questions and repeating, to generate many promising leads about the data.

Reproducible Communication Behaviorbehavioral pattern

The practice of integrating prose, code, and results into reproducible documents (R Markdown) and capturing reasoning so analyses can be understood, re-run, and shared with others.

Insight Generationoutcome metric

The outcome of discovering true patterns and relationships in data—turning raw data into understanding, insight, and knowledge—while filtering out noise and recognizing the subtler signals that remain after removing strong patterns.

Analytic Productivityoutcome metric

The outcome of being able to tackle a wide variety of data science challenges efficiently—covering roughly 80% of project needs with fewer errors, faster iteration, and less rework.

Data Complexity and Messinesscontextual condition

The contextual condition describing how messy, non-rectangular, or large a dataset is, which conditions how strongly tidy practices and the integrated toolkit translate into productivity gains.

How they connect

tidy data adoption → predicts cognitive clarity
integrated toolkit use → predicts cognitive clarity
tidy data adoption → influences integrated toolkit use
cognitive clarity → predicts iterative exploration
analyst motivation → predicts iterative exploration
integrated toolkit use → influences analyst motivation
duplication reduction → predicts analytic productivity
iterative exploration → predicts insight generation
duplication reduction → predicts cognitive clarity
cognitive clarity → predicts analytic productivity
reproducible communication → influences insight generation
data complexity − moderates tidy data adoption

A candidate measure

R for Data Science — derived measurement candidates

Tidy Data Adoption

proportion of project datasets meeting tidy criteria; count of gather/spread/separate/unite calls per project; rate of non-tidy storage patterns flagged in code review

self-report suitability: medium

Integrated Toolkit Use

share of function calls from tidyverse packages; count of %>% pipelines per script; breadth of tidyverse packages used

self-report suitability: high

Code Duplication Reduction

ratio of duplicated code blocks to abstracted functions; number of function definitions per project; count of iteration constructs replacing copy-paste

self-report suitability: medium

Analyst Motivation

self-rated motivation/engagement; exercise completion rate; session continuation after encountering errors

self-report suitability: high

Cognitive Clarity

self-rated focus/clarity; ratio of analysis time to wrangling time; count of tool-friction incidents per session

self-report suitability: high

Iterative Exploration Behavior

count of plots per session; count of transformations per session; count of model fits per session

self-report suitability: medium

Reproducible Communication Behavior

number of .Rmd files that knit successfully; presence/density of narrative explanation per analysis; use of dependency version tracking

self-report suitability: medium

Insight Generation

number of validated hypotheses generated; expert-rated quality of insights; count of patterns confirmed in independent data

self-report suitability: low

Analytic Productivity

time-to-completion per task; error/bug rate per project; breadth of problem types solved

self-report suitability: medium

Data Complexity and Messiness

dataset size in MB/GB and row count; percent of missing or malformed values; classification of data structure as rectangular vs. non-rectangular

self-report suitability: low

Run the assessment

The story

The reader An aspiring or working data analyst who wants to turn raw data into understanding, insight, and knowledge and tackle a wide variety of data science challenges.

External problem

They have data but lack a coherent, efficient toolkit and workflow to import, clean, explore, model, and communicate it in R.

Internal problem

They feel frustrated and overwhelmed by R's pickiness and the sprawling, inconsistent landscape of tools and techniques.

Philosophical problem

Data analysis shouldn't require fighting your tools to get data into the right shape—you should be able to focus your struggle on questions about the data.

The plan

Install R, RStudio, and the tidyverse.
Start with visualization and transformation of clean data to build momentum.
Learn exploratory data analysis to ask and answer questions about data.
Wrangle messy data into tidy form using import and tidying tools.
Acquire programming skills (functions, vectors, iteration) to tackle harder problems.
Use models to extract patterns and residuals from data.
Communicate results reproducibly with R Markdown.

Success

You can tackle about 80% of any data science project with the tools you've learned.
You generate many promising leads through rapid, iterative data exploration.
You produce elegant, informative plots and reproducible reports.
You write clear, reusable code that you and others can understand later.

At stake

You remain stuck fighting your data into the right form instead of answering real questions.
You make incidental copy-and-paste errors and create inconsistent, buggy analyses.
You can't communicate your results, so even great analysis goes to waste.
You stay overwhelmed by R's idiosyncrasies and never get up and running.

Questions this book answers

What is the typical workflow of a data science project?
How do you import, tidy, transform, visualize, model, and communicate data in R?
What does it mean for data to be 'tidy' and why does it matter?
How can the grammar of graphics (ggplot2) be used to build any kind of plot?
How do you write functions and use iteration to reduce code duplication?

Glossary

Tidy Data Adoption: The degree to which an analyst structures datasets so each variable is a column, each observation a row, and each value a cell, aligning data semantics with storage.
Integrated Toolkit Use: The extent to which an analyst relies on a coherent, philosophically consistent set of tidyverse tools that work together rather than ad hoc, inconsistent tools.
Code Duplication Reduction: The practice of extracting repeated code into named functions and iteration constructs to avoid copy-and-paste, following the DRY principle.
Analyst Motivation: The psychological state of sustained engagement and willingness to persist through frustration while learning and doing data science.
Cognitive Clarity: Reduced cognitive load enabling the analyst to focus attention on substantive data questions rather than on tool-fighting or data-wrangling friction.
Iterative Exploration Behavior: The behavioral pattern of repeatedly generating questions, visualizing, transforming, and modeling data, then refining and repeating to surface promising leads.
Reproducible Communication Behavior: The practice of integrating prose, code, and results into reproducible documents and capturing reasoning so analyses can be re-run, understood, and shared.
Insight Generation: The outcome of discovering true patterns and knowledge in data while filtering out noise, including subtler signals revealed after removing strong patterns.

Tools these methods power