What is PeopleAnalyst?

PeopleAnalyst is the front door for people-analytics research: 205+ works indexed and profiled, 40+ citation-grade findings extracted, and peer-reviewed behavioral science translated from academic to actionable — the missing manual for the people analytics you always meant to do.

What is people analytics?

People analytics is not a dashboard. It is behavioral science and statistical inference applied to workforce decisions — a discipline with its own methodology, spanning measurement, organizational design, talent, leadership, and analytics craft.

Why does AI in HR need measurement science?

AI is being deployed in high-stakes people decisions — hiring, performance, attrition — without the measurement science to evaluate whether it works or whom it harms. Construct validity, effect sizes, and criterion validity are the vocabulary for asking an AI vendor the right questions.

How is the research made accessible?

The evidence is indexed and searchable: 205+ works, 40+ citation-grade insight cards, and 8 research arcs, so the right finding reaches the right decision at the right time.

What separates good people measurement from assertion?

Good measurement has a method: construct validity, reliability, and effect-size interpretation are not optional — they are what separates evidence from assertion.

library / libaf4a4f57f9bc1512

Python for Data Analysis

Wes McKinney

In a sentence

A practical, hands-on guide to manipulating, processing, cleaning, and analyzing structured data in Python using pandas, NumPy, and the Jupyter/IPython ecosystem.

Written by the creator of pandas, this book teaches the foundational programming skills and library workflows needed to become an effective data analyst in Python. Rather than focusing on statistical methodology, it concentrates on the data-oriented Python toolset—NumPy arrays for fast numerical computing, pandas Series and DataFrames for tabular data wrangling, matplotlib and seaborn for visualization, and IPython/Jupyter for interactive development. Through detailed, reproducible examples and real-world datasets (Bitly links, MovieLens ratings, US baby names, USDA food data, FEC contributions), readers learn to load, clean, transform, merge, reshape, group, aggregate, and visualize data, and to handle time series and feed cleaned data into modeling libraries like statsmodels and scikit-learn. It is ideal both for analysts new to Python and for Python programmers new to data work, serving as a durable foundation for moving on to more advanced data science and machine learning resources.

The four lenses

Science
Statistics
Systems
Strategy

Foundational Tool Mastery

Number of library features used correctly; Exercise completion rate; Self-reported familiarity level

self-report suitability: medium

Use of Vectorized Operations

Ratio of vectorized operations to loops in code; Count of explicit loops over arrays

self-report suitability: low

Structured Data Representation

Data-quality/tidiness checklist score; Proportion of columns with correct dtypes

self-report suitability: medium

Workflow Efficiency

Task completion time; Number of iterations to result; Perceived ease rating

self-report suitability: high

Analyst Confidence and Competence

Self-reported confidence level; Self-reported reduction in overwhelm

self-report suitability: high

Analysis-Ready Data

Missing-data resolution rate; Type correctness rate; Structural consistency checks

self-report suitability: low

Effective Insight Extraction

Correctness of analytical outputs; Quality rating of visualizations; Model fit/usefulness

self-report suitability: medium

Run the assessment

The story

The reader An analyst or programmer who wants to effectively manipulate, clean, and analyze data in Python.

External problem

Raw, messy data is hard to load, clean, transform, and analyze, and the Python data tooling is large and confusing to navigate.

Internal problem

They feel overwhelmed by the breadth of libraries and options and unsure they're using the right, efficient approach.

Philosophical problem

Data professionals shouldn't have to waste the majority of their time fighting with cumbersome tooling instead of extracting insight from data.

The plan

Set up a Python environment with the essential data libraries.
Learn the Python language basics and the IPython/Jupyter interactive workflow.
Master NumPy arrays and vectorized computation.
Learn pandas Series and DataFrame for tabular data manipulation.
Practice loading data from many file formats and sources.
Clean, transform, merge, reshape, and aggregate data.
Visualize data and handle time series.
Bridge cleaned data into modeling libraries and apply skills to real datasets.

Success

The reader can confidently load, clean, and prepare messy real-world data.
They use vectorized pandas/NumPy operations to efficiently transform and aggregate data.
They can visualize results and handle time series competently.
They are well prepared to move on to advanced data science and machine learning resources.

At stake

The reader stays stuck spending most of their time wrestling with awkward data manipulation.
They write slow, error-prone element-by-element code.
They remain unable to navigate the Python data ecosystem effectively and cannot reach the analysis or modeling stage.

Questions this book answers

Which parts of the Python language and library ecosystem do I need to do effective data analysis?
How do I load, clean, and prepare messy real-world data into a tabular, analysis-ready form?
How do I use NumPy arrays and pandas DataFrames to manipulate, filter, transform, and aggregate data efficiently?
How do I combine, merge, reshape, and pivot datasets?
How do I group data and compute summary statistics or apply custom group operations?

Glossary

Foundational Tool Mastery: The degree to which an analyst has acquired competency in the core Python data tools and language constructs taught by the book.
Use of Vectorized Operations: The habitual adoption of vectorized array/DataFrame operations in place of explicit Python loops.
Structured Data Representation: The practice of organizing data into clean, labeled tabular structures suitable for analysis.
Workflow Efficiency: The speed and ease with which an analyst can perform data manipulation tasks, reducing friction and time spent.
Analyst Confidence and Competence: The analyst's perceived capability and comfort working within the Python data ecosystem.
Analysis-Ready Data: Data that has been loaded, cleaned, transformed, and reshaped into a reliable, analyzable form.
Effective Insight Extraction: The successful production of meaningful summaries, visualizations, and analytical/modeling results from data.

Tools these methods power