What is PeopleAnalyst?

PeopleAnalyst is the front door for people-analytics research: 205+ works indexed and profiled, 40+ citation-grade findings extracted, and peer-reviewed behavioral science translated from academic to actionable — the missing manual for the people analytics you always meant to do.

What is people analytics?

People analytics is not a dashboard. It is behavioral science and statistical inference applied to workforce decisions — a discipline with its own methodology, spanning measurement, organizational design, talent, leadership, and analytics craft.

Why does AI in HR need measurement science?

AI is being deployed in high-stakes people decisions — hiring, performance, attrition — without the measurement science to evaluate whether it works or whom it harms. Construct validity, effect sizes, and criterion validity are the vocabulary for asking an AI vendor the right questions.

How is the research made accessible?

The evidence is indexed and searchable: 205+ works, 40+ citation-grade insight cards, and 8 research arcs, so the right finding reaches the right decision at the right time.

What separates good people measurement from assertion?

Good measurement has a method: construct validity, reliability, and effect-size interpretation are not optional — they are what separates evidence from assertion.

library / lib2d25dfa5d4547668

Designing Machine Learning Systems

Chip Huyen · 2022

In a sentence

A holistic, iterative framework for designing production-ready machine learning systems that are reliable, scalable, maintainable, and adaptive across every stage from data engineering to continual learning.

Designing Machine Learning Systems by Chip Huyen offers the comprehensive, end-to-end guide that ML engineers and data scientists have long needed to bridge the gap between academic model-building and the messy realities of production. Rather than treating the ML algorithm as the centerpiece, Huyen situates it as just one small component within a much larger system encompassing business objectives, data pipelines, feature engineering, deployment infrastructure, monitoring, and responsible AI. Drawing on her experience at NVIDIA, Netflix, Snorkel AI, and Stanford—where she teaches the course CS 329S: Machine Learning Systems Design—she walks readers through every stage of the ML project lifecycle with concrete case studies, trade-off discussions, and practical frameworks. The book covers everything from sampling strategies and labeling techniques, through model development and offline evaluation, to online prediction, data distribution shift detection, continual learning, MLOps infrastructure, and the human and ethical dimensions of deploying AI at scale. Whether you are deploying your first model or managing hundreds in production, this book provides the principled vocabulary and decision-making framework to do it right.

The four lenses

Science
Statistics
Systems
Strategy

ML Infrastructure and Workflow Management

To design, configure, and manage the necessary infrastructure, development environments, and automated workflows to support the entire machine learning lifecycle efficiently and scalably.

When to use: This is a foundational process, typically initiated before or during the early stages of developing production-level ML systems. It is revisited periodically to assess and optimize performance and costs.

Step 1Evaluate the company's current and future ML application needs to define infrastructure requirements.
Entry: A strategic decision has been made to invest in productionalizing machine learning.
Exit: A clear document outlining the infrastructure requirements is created.
- Choosing between cloud-based, on-premises, or hybrid solutions.
- Determining the required scale of compute and storage resources.
In: Company's ML strategy, Inventory of current and planned ML applications · Out: Infrastructure requirements specification
ch13
Step 2Standardize the development environment for data scientists and ML engineers.
Entry: Infrastructure requirements have been defined.
Exit: A reproducible, standardized development environment is available to the team.
- Choosing which tools and libraries to include in the standard environment.
- Deciding between local containerized environments and cloud-based IDEs.
In: Team preferences, List of required tools and libraries · Out: Standardized development environment (e.g., Docker image, setup scripts), Onboarding documentation
ch13
Step 3Implement a workflow management system to orchestrate and automate ML tasks.
Entry: Repetitive ML tasks and their dependencies have been identified.
Exit: An automated, robust workflow for ML tasks is operational.
- Selecting the appropriate workflow management tool (e.g., Airflow, Kubernetes, Argo).
- Defining the retry and error handling logic for failed tasks.
In: Workflow definitions (DAGs), Task dependencies · Out: Automated ML workflows
ch13

Managing User Experience and Responsible AI

To ensure ML systems are developed and deployed ethically, fairly, and in a way that provides a positive and trustworthy user experience.

When to use: This process should be initiated at the very beginning of a project and continue throughout the entire lifecycle.

Step 1Establish a framework for Responsible AI.
Entry: An ML project is being conceptualized.
Exit: A set of guidelines and documentation practices for responsible AI is adopted by the team.
- Determining which ethical frameworks and fairness metrics to adopt.
In: Research on AI ethics, Stakeholder requirements, Best practices for fairness and transparency · Out: Responsible AI guidelines, Model card templates
ch14
Step 2Foster cross-functional team collaboration.
Entry: Project team is being formed.
Exit: A collaborative workflow is established involving all relevant stakeholders.
- Deciding on the level and frequency of SME involvement.
- Choosing collaboration tools and platforms.
In: Project goals, List of relevant stakeholders and SMEs · Out: Integrated project team, ML systems incorporating diverse domain expertise
ch14
Step 3Design the ML system to ensure a consistent and positive user experience.
Entry: The user-facing aspects of the ML system are being designed.
Exit: The system's behavior is predictable, reliable, and user-friendly.
- Deciding on the trade-off between prediction accuracy and consistency.
- Determining the latency threshold that triggers a switch to a backup model.
In: User interaction designs, Latency and performance requirements · Out: ML system with predictable behavior, Fallback mechanisms for model failures
ch14

Problem Framing and Requirements Definition

To accurately define a business problem, determine if ML is a suitable solution, frame it in technical terms, and establish the non-functional requirements for the system.

When to use: At the very beginning of a project, before any data collection or model development begins.

Step 1Define the business problem and understand its context.
Entry: A potential business opportunity or problem has been identified.
Exit: A clear, concise business problem statement is agreed upon by all stakeholders.
- Determine if machine learning is the most appropriate solution for the problem.
In: Business requirements, Customer feedback, Stakeholder interviews · Out: Business problem statement
ch01 · ch02
Step 2Frame the problem for machine learning.
Entry: The business problem has been defined.
Exit: A clearly defined ML problem with specified inputs, outputs, and objectives.
- Choosing the type of ML problem (e.g., classification, regression, clustering).
- Selecting the primary metric to optimize.
In: Business problem statement · Out: ML problem definition
ch02
Step 3Define the system's non-functional requirements.
Entry: The ML problem has been framed.
Exit: A comprehensive specification document detailing system requirements.
- Determining the required level of performance for each requirement (e.g., 99.9% uptime).
In: ML problem definition, Understanding of the operational context · Out: ML system requirements specification
ch02

Data Preparation for Machine Learning

To transform raw data from various sources into a clean, structured, and feature-rich dataset suitable for training robust and accurate machine learning models, while preventing data leakage.

When to use: After the problem has been framed and before model training begins. This is often the most time-consuming part of an ML project.

Step 1Extract, Transform, and Load (ETL) data from sources.
Entry: Data sources have been identified.
Exit: Raw data is consolidated and stored in a central location.
- Choosing between ETL and ELT (Extract, Load, Transform) patterns.
- Defining data validation and cleaning rules.
In: Data sources, Transformation rules, Target schema · Out: Cleaned and consolidated data in a target destination
ch03
Step 2Acquire labels for the training data.
Entry: Consolidated raw data is available.
Exit: A labeled dataset ready for model training.
- Deciding between in-house labeling, crowdsourcing, or automated methods.
- Choosing the appropriate level of supervision required.
In: Raw data, Labeling guidelines · Out: Labeled dataset
ch05
Step 3Prevent data leakage by splitting the dataset.
Entry: A complete, labeled dataset is available.
Exit: Data is correctly partitioned into training, validation, and test sets.
- Choosing the splitting strategy (random vs. time-based vs. stratified).
- Determining the size of each split.
In: Labeled dataset · Out: Training set, Validation set, Test set
ch06
Step 4Perform data cleaning and preprocessing on the training set.
Entry: Data has been split into train/validation/test sets.
Exit: The training set is cleaned and balanced.
- Choosing between deletion and imputation for missing values.
- Selecting a strategy to handle class imbalance.
In: Training set · Out: Cleaned and balanced training set
ch05 · ch06
Step 5Conduct feature engineering and transformation.
Entry: The training set has been cleaned.
Exit: Model-ready feature sets for training, validation, and testing.
- Choosing the appropriate scaling and encoding methods.
- Deciding which features to cross based on domain knowledge.
In: Cleaned training set, Validation set, Test set · Out: Transformed training, validation, and test feature sets
ch06
Step 6Apply data augmentation to increase dataset size and diversity.
Entry: A baseline model's performance suggests data scarcity is an issue.
Exit: An augmented training dataset is created.
- Choosing the appropriate augmentation techniques for the data modality.
In: Original training set · Out: Augmented training set
ch05

Data Flow Management

To design and implement efficient, reliable methods for passing data between different processes or services in a distributed system that do not share memory.

When to use: During the architectural design phase of a production ML system.

Step 1Identify the required mode of data passing between processes.
Entry: The high-level architecture of the ML system has been designed.
Exit: The data passing requirements for each inter-process communication link are defined.
- Determine if real-time, low-latency communication is needed.
- Assess if asynchronous communication is acceptable or beneficial.
In: System architecture diagram, Performance requirements · Out: Data flow requirements specification
ch03
Step 2Choose and implement the appropriate data passing method.
Entry: Data flow requirements are specified.
Exit: The chosen data passing mechanism is implemented and functional.
- Select the specific technology for the chosen method (e.g., REST vs. gRPC for APIs; Kafka vs. Kinesis for brokers).
In: Data flow requirements specification · Out: Implemented data communication channels between services
ch03 · ch04

Batch Data Processing

To periodically process large volumes of historical, bounded data to generate insights, reports, or features for machine learning models.

When to use: When analysis can be performed on a fixed schedule (e.g., hourly, daily) using a finite dataset.

Step 1Schedule a batch job to run at specified intervals.
Entry: The logic for the batch job has been developed.
Exit: The job is scheduled to run automatically.
- Determining the optimal frequency for the batch job.
In: A schedule (e.g., 'run daily at 2 AM') · Out: A scheduled job
ch04
Step 2Retrieve a batch of historical data for analysis.
Entry: The scheduled job is triggered.
Exit: The required data is loaded into the processing environment.
In: Historical data stored in a database or data lake · Out: A batch of data
ch04
Step 3Compute the required insights or features.
Entry: The batch of data is available.
Exit: The computation is complete.
In: A batch of data · Out: Computed results (e.g., insights, features, predictions)
ch04
Step 4Store or output the results of the analysis.
Entry: The computation is complete.
Exit: The results are successfully stored and accessible.
In: Computed results · Out: Stored historical insights or features
ch04

Stream Data Processing

To process unbounded, continuous streams of data in real-time or near-real-time to derive immediate insights and features.

When to use: When insights are needed from data as it is generated.

Step 1Set up a stream computation engine and ingest data.
Entry: A real-time data source is available.
Exit: The stream processing engine is actively ingesting data.
- Choosing the appropriate stream processing engine.
In: Streaming data from sources (e.g., sensors, application logs) · Out: An active data stream within the processing engine
ch04
Step 2Compute metrics or insights on the streaming data.
Entry: Data is being ingested.
Exit: Real-time computations are being performed.
- Defining the logic for the stream computation (e.g., size of time windows for aggregation).
In: An active data stream · Out: A stream of computed results
ch04
Step 3Use the computed insights for further action.
Entry: A stream of computed results is being produced.
Exit: Downstream actions are being triggered by the stream processing results.
In: A stream of computed results · Out: Real-time metrics, insights, or features used by other systems
ch04

Model Development and Selection

To systematically select, build, and iteratively improve machine learning models, starting with simple baselines and progressing to more complex solutions only when justified by performance gains.

When to use: After data has been prepared and is ready for modeling.

Step 1Establish baselines with non-ML solutions and heuristics.
Entry: A prepared dataset and evaluation metrics are available.
Exit: One or more baseline performance scores are established.
In: Prepared dataset, Evaluation metrics · Out: Baseline performance metrics
ch08
Step 2Select and implement a simple first ML model.
Entry: Baselines have been established.
Exit: A simple ML model is trained and evaluated.
- Choosing the initial model based on problem type, data size, and interpretability needs.
In: Prepared dataset · Out: Trained simple model, Initial model performance metrics
ch07 · ch08
Step 3Iteratively improve the model through feature engineering and data adjustments.
Entry: An initial model has been evaluated.
Exit: Model performance has plateaued or meets the required threshold.
- Deciding which features to engineer or which data issues to address next based on error analysis.
In: Trained model, Evaluation results, Prepared dataset · Out: Improved model, Updated performance metrics
ch07 · ch08
Step 4Explore more complex models if necessary.
Entry: Performance of simpler models has been maximized but is still below requirements.
Exit: A final model that meets performance requirements is selected.
- Deciding if the performance gain from a complex model is worth the trade-offs in training time, cost, and maintainability.
In: Optimized simple model performance metrics, Project requirements · Out: Selected final model architecture
ch07 · ch08

Experiment Tracking and Versioning

To systematically log, organize, and version all components of machine learning experiments to ensure reproducibility, facilitate comparison, and track model lineage.

When to use: Throughout the model development lifecycle, whenever a new model is trained or an existing one is modified.

Step 1Implement a tool for tracking experiments.
Entry: The model development phase is beginning.
Exit: An experiment tracking system is in place and accessible to the team.
- Choosing between an open-source, commercial, or in-house tracking tool.
In: Team requirements for experiment tracking · Out: Configured experiment tracking platform
ch07
Step 2Define and log experiment parameters and configurations.
Entry: A new experiment is being run.
Exit: All configuration parameters for the run are logged.
In: Model configuration, Hyperparameters · Out: Logged parameters
ch07
Step 3Version control code and datasets.
Entry: A new experiment is being run.
Exit: The code and data versions used in the run are recorded.
In: Training script, Dataset · Out: Git commit hash, Data version identifier
ch07
Step 4Log performance metrics and model artifacts.
Entry: A model is being trained and evaluated.
Exit: All metrics and artifacts from the run are logged and stored.
In: Model performance during training, Final evaluation metrics, Trained model file · Out: Logged metrics, Stored model artifacts
ch07
Step 5Review and compare logged experiments to make decisions.
Entry: Multiple experiments have been logged.
Exit: An informed decision is made based on a comparison of experiment results.
- Deciding which model configuration performs best and should be pursued further.
In: Comprehensive logs of multiple experiments · Out: Analysis of model performance, Decision on next steps
ch07

Hyperparameter Tuning and Architecture Search

To systematically and automatically find the optimal set of hyperparameters or model architecture components to maximize model performance.

When to use: When seeking to extract maximum performance from a chosen model type, typically after initial iterative improvements have plateaued.

Step 1Define the search space.
Entry: A baseline model architecture is selected.
Exit: A well-defined search space is configured.
- Deciding which hyperparameters are most important to tune.
- Setting reasonable ranges for each hyperparameter.
In: Model architecture · Out: Defined search space
ch08
Step 2Select a search strategy and performance estimation strategy.
Entry: The search space is defined.
Exit: A search strategy and evaluation plan are chosen.
- Choosing the search strategy based on the size of the search space and available computational budget (Random Search is often a good default).
In: Defined search space, Computational budget · Out: Selected search strategy
ch08
Step 3Execute the search process.
Entry: Search strategy and evaluation plan are in place.
Exit: The search process has completed, and performance for many configurations has been logged.
In: Training and validation datasets, Search strategy · Out: Performance results for each evaluated configuration
ch08
Step 4Select the best configuration and report final performance.
Entry: The search process is complete.
Exit: The best hyperparameter set is identified and final model performance is reported.
In: Performance results for all configurations, Test dataset · Out: Optimized set of hyperparameters or architecture, Final model performance metrics on the test set
ch08

Ensemble Methods Implementation

To combine the predictions of multiple individual machine learning models (base learners) to produce a single, often more accurate and robust, prediction.

When to use: As an advanced technique during model development, typically after optimizing a single model, to further boost performance.

Step 1Select a set of diverse base learners.
Entry: A dataset is prepared for training.
Exit: A set of base model types is chosen.
- Deciding on the type and number of base learners to include.
In: Knowledge of various ML algorithms · Out: A list of selected base learners
ch07
Step 2Train each base learner.
Entry: Base learners are selected.
Exit: All base learners are trained.
In: Training dataset · Out: A set of trained base models
ch07
Step 3Define and implement a method for combining predictions.
Entry: Base learners are trained.
Exit: A prediction combination mechanism is implemented.
- Choosing the method for combining predictions.
In: Predictions from each base learner · Out: Final ensemble prediction
ch07
Step 4Evaluate the performance of the ensemble model.
Entry: The ensemble model is fully defined.
Exit: The ensemble model's performance is measured and documented.
In: Validation or test dataset · Out: Ensemble model performance metrics
ch07

Comprehensive Model Evaluation

To rigorously assess a trained model's performance, robustness, fairness, and reliability beyond simple accuracy metrics, ensuring it is ready for production.

When to use: After a model has been trained and tuned, and before it is considered for deployment.

Step 1Define baseline metrics for comparison.
Entry: A trained model is ready for evaluation.
Exit: A set of baseline performance scores is available.
In: Validation/test dataset · Out: Baseline performance metrics
ch08
Step 2Perform slice-based evaluation to check for fairness and hidden biases.
Entry: Critical data slices have been identified.
Exit: A report detailing model performance across all critical slices.
- Determining which data slices are most critical to analyze based on domain knowledge and fairness considerations.
In: Validation/test dataset, Defined data slices · Out: Slice-based performance reports
ch08
Step 3Conduct behavioral tests for robustness and correctness.
Entry: A trained model is available.
Exit: The model's behavior under various tests is documented.
In: Trained model, Test cases for behavioral testing · Out: Behavioral testing results
ch08
Step 4Ensure model calibration.
Entry: The model produces probabilistic outputs.
Exit: The model's calibration is measured and, if necessary, corrected.
In: Model predictions with probabilities · Out: Calibration plot, Calibrated model
ch08
Step 5Establish confidence thresholds for predictions.
Entry: The model's performance and calibration have been analyzed.
Exit: A confidence threshold and a policy for low-confidence predictions are defined.
- Selecting the appropriate confidence threshold based on the trade-off between coverage and precision.
In: Model predictions with confidence scores · Out: Defined confidence threshold
ch08

Model Deployment and Serving

To take a trained and validated machine learning model and make it operational and accessible in a production environment to serve predictions.

When to use: After a model has been comprehensively evaluated and approved for production use.

Step 1Serialize and package the model.
Entry: A final, trained model has been selected.
Exit: A serialized model file is created.
- Choosing the serialization format based on the training framework and the deployment target's compatibility.
In: Trained machine learning model · Out: Serialized model file
ch09
Step 2Choose a deployment strategy and environment.
Entry: The model is serialized.
Exit: A deployment strategy and target environment are chosen.
- Deciding between cloud and edge deployment based on latency, connectivity, and privacy requirements.
- Choosing between online and batch prediction based on the application's use case.
In: System requirements (e.g., latency, throughput) · Out: Deployment plan
ch09
Step 3Create a prediction service with an API endpoint.
Entry: A deployment plan is in place.
Exit: A functional prediction service is running, accessible via an API.
In: Serialized model file · Out: A deployed prediction service with an accessible endpoint
ch09
Step 4Set up initial monitoring for the deployed service.
Entry: The prediction service is deployed.
Exit: A dashboard or alerting system for key operational metrics is active.
In: Deployed prediction service · Out: Monitoring system for operational metrics
ch09

Production Testing and Rollout

To safely test, validate, and gradually deploy new or updated machine learning models in a live production environment to minimize risk and measure real-world impact.

When to use: After a model has been deployed to production infrastructure but before it is fully serving traffic to all users.

Step 1Implement shadow deployment for silent testing.
Entry: A new model is ready for production deployment.
Exit: The challenger model's performance on live traffic is analyzed and deemed satisfactory.
- Deciding on the duration of the shadow period.
- Defining the metrics for comparing the challenger and champion.
In: Deployed challenger model, Live production traffic · Out: Logs of challenger model predictions, Comparative performance analysis
ch12
Step 2Conduct A/B testing for direct impact measurement.
Entry: The challenger model has passed shadow testing or risk is deemed low enough for a live test.
Exit: A statistically significant conclusion is reached about the new model's performance.
- Determining the percentage of traffic to include in the test.
- Choosing the primary business metric for evaluation.
In: Deployed challenger and champion models, Traffic routing mechanism · Out: A/B test results, Decision to roll out or roll back the new model
ch12
Step 3Use canary releases for a gradual rollout.
Entry: The new model has been validated through shadow or A/B testing.
Exit: The new model is serving 100% of production traffic.
- Defining the stages of the gradual rollout (e.g., 1%, 5%, 20%, 50%, 100%).
- Setting the criteria for proceeding to the next stage.
In: Validated new model, Traffic routing mechanism · Out: Fully deployed new model
ch12

Model Monitoring and Maintenance

To continuously monitor a deployed model's performance and the distribution of its input data in production, in order to detect degradation and trigger necessary maintenance actions like retraining.

When to use: This is an ongoing process that starts as soon as a model is deployed to production.

Step 1Set up monitoring for operational and ML-specific metrics.
Entry: A model is deployed in production.
Exit: Dashboards and logging are in place for key metrics.
In: Live prediction requests and responses · Out: Time-series data of operational and ML metrics
ch10
Step 2Detect data distribution shifts.
Entry: Sufficient production data has been collected to form a current distribution.
Exit: A statistical measure of the difference between current and baseline distributions is calculated.
- Choosing the appropriate statistical test for the data type and dimensionality.
- Setting a threshold for what constitutes a significant shift.
In: Current production data distribution, Historical baseline data distribution · Out: Data drift score or p-value
ch10
Step 3Create an alerting system for significant shifts or performance degradation.
Entry: Monitoring and drift detection are in place.
Exit: An automated alerting system is active.
In: Monitoring metrics, Drift scores · Out: Alerts sent to stakeholders
ch10
Step 4Trigger model retraining in response to degradation.
Entry: An alert for model degradation has been received and verified.
Exit: The model retraining process has been initiated.
- Choosing between full retraining and incremental fine-tuning.
- Selecting the appropriate dataset for retraining (e.g., only recent data, or a combination of old and new).
In: Degradation alert, Recent production data · Out: A decision to retrain the model
ch10
Step 5Validate and deploy the retrained model.
Entry: A new model has been retrained.
Exit: The updated model is deployed in production and its performance is being monitored.
In: Retrained model, Validation dataset · Out: Deployed updated model
ch10

Continual Learning Implementation

To create a system where machine learning models are automatically and continuously updated in production to adapt to evolving data distributions, moving from manual updates to a fully automated process.

When to use: As an evolution of the model monitoring and maintenance process, for organizations seeking to automate the model update lifecycle.

Step 1Start with manual, stateless retraining.
Entry: An initial version of the model is in production.
Exit: A manual process for retraining and redeploying the model exists.
In: Deployed model, Performance degradation reports · Out: Manually retrained and deployed model
ch12
Step 2Automate the stateless retraining process.
Entry: The manual retraining process is well-defined.
Exit: An automated, scheduled pipeline for retraining the model is operational.
- Choosing the frequency for the scheduled retraining.
In: Retraining script, Data source · Out: Automatically retrained model
ch12
Step 3Transition to automated, stateful training (fine-tuning).
Entry: An automated retraining pipeline exists.
Exit: The pipeline is updated to perform stateful fine-tuning.
- Choosing between continuing with stateful updates or reverting to a full retrain based on performance.
- Determining the optimal size of the micro-batches for fine-tuning.
In: Existing model checkpoint, Fresh data for fine-tuning · Out: Fine-tuned model
ch11 · ch12
Step 4Implement a robust evaluation and promotion pipeline.
Entry: A challenger model has been produced via stateful training.
Exit: A decision is made to either promote or discard the challenger model.
- Deciding when to transition from challenger to champion based on evaluation results.
In: Challenger model, Champion model, Evaluation data and metrics · Out: Updated champion model in production
ch11 · ch12
Step 5Evolve to true continual learning.
Entry: An automated, stateful training and evaluation pipeline is in place.
Exit: The model update lifecycle is fully automated and triggered by performance monitoring.
In: Monitoring alerts (e.g., data drift detected) · Out: A self-updating ML system
ch12

The story

The reader ML engineers, data scientists, and engineering managers who want to build ML systems that actually work reliably in production—not just in notebooks—and who are frustrated by the gap between academic ML and real-world deployment complexity.

External problem

Their models perform well in development but degrade, fail silently, or cause harmful outcomes once deployed to real users at scale.

Internal problem

They feel overwhelmed, under-equipped, and uncertain about what to do next when production systems break in ways that unit tests and accuracy scores never revealed.

Philosophical problem

It is wrong for powerful ML systems to be deployed without the discipline, infrastructure, and ethical safeguards that protect users and society from their failures.

The plan

Establish business objectives and translate them into ML objectives and system requirements (reliability, scalability, maintainability, adaptability).
Frame the ML problem correctly: choose the right task type, objective function, and decoupled multi-objective structure.
Master data engineering fundamentals: data sources, formats, storage engines, and dataflow modes.
Create high-quality training data through principled sampling, labeling strategies (natural labels, weak supervision, active learning), and handling class imbalance and data augmentation.
Engineer features carefully, avoiding data leakage, prioritizing feature importance and generalization.
Develop models iteratively, starting simple, using ensembles where justified, tracking experiments, and evaluating with baselines plus slice-based and calibration-aware methods.
Deploy models with an understanding of batch versus online prediction trade-offs, model compression, and edge-versus-cloud considerations.
Monitor production systems continuously for data distribution shifts, using statistical methods and time-windowed telemetry.
Implement continual learning infrastructure so models can be updated as frequently as business value requires.
Build or buy the right MLOps infrastructure—dev environment, resource management, model store, feature store—and embed responsible AI practices throughout.

Success

ML models remain accurate and reliable long after deployment, with degradation detected and corrected quickly.
Teams move from manual, months-long model update cycles to automated, data-driven retraining triggered by real performance signals.
Data scientists own the full ML lifecycle confidently, supported by infrastructure that abstracts away operational complexity.
ML systems earn the trust of users and society because they are fair, transparent, well-monitored, and built with responsible AI practices from day one.
Business stakeholders can clearly see how ML investments translate to measurable business outcomes.

At stake

Models deployed without monitoring degrade silently, eroding user trust and business value until a crisis forces an expensive rebuild.
Teams remain stuck in manual, ad hoc update cycles that cannot keep pace with shifting data distributions.
Biased or opaque ML systems cause harm to underrepresented users, triggering public backlash, regulatory action, and organizational damage.
Without proper infrastructure, ML projects remain one-off experiments that never reach the scale or reliability needed for real business impact.

Chapter by chapter

ch01Overview of Machine Learning Systems
The chapter outlines the essential components and considerations for operationalizing machine learning systems, emphasizing the distinction between machine learning in research and production environments.
- Machine learning is not merely about algorithms; it encompasses a systematic approach that includes data, stakeholder engagement, and operational processes.
- Successful ML systems in production are characterized by constant adaptation to changing patterns and rigorous monitoring of model performance.
- Understanding the differences between ML in research and production is crucial for deploying ML effectively in a real-world context.
- Stakeholder alignment on project goals and requirements can significantly enhance the effectiveness of ML deployments.
ch02Introduction to Machine Learning Systems Design
This chapter emphasizes the critical importance of aligning machine learning (ML) systems with business objectives, detailing essential requirements for their design and the iterative process required for successful implementation.
ch03Data Engineering Fundamentals
This chapter introduces the foundational concepts of data engineering, emphasizing the intricacies of data sources, formats, models, and storage techniques crucial for building machine learning systems.
- The relationship between machine learning and big data is critical, demanding a solid understanding of data engineering basics for successful implementation.
- Recognizing the importance of formatting and structuring data enhances the efficiency of ML systems.
- Data models are integral as they dictate how information is organized, which directly impacts system performance and integrity.
- The ETL process remains central to effective data management, ensuring data is clean, relevant, and ready for analysis.
ch04Data Engineering Fundamentals
This chapter contends that understanding the nuances of data passing through various architectural frameworks is essential for managing efficient data flow in modern applications, particularly in environments shaped by real-time requirements.
ch05Training Data
This chapter navigates the critical yet often overlooked realm of training data in machine learning, addressing essential techniques and challenges in obtaining and preparing data that significantly impact model performance.
- Training data is foundational for successful machine learning applications, warranting careful management and preparation.
- Sampling methods can introduce significant biases if not approached with a robust understanding of the underlying population.
- Label acquisition poses operational challenges that can be mitigated through innovative strategies like weak and semi-supervision.
- Class imbalance must be addressed proactively to ensure ML models are equitable and effective, particularly in critical applications like healthcare and finance.
ch06Feature Engineering
This chapter argues that effective feature engineering is the cornerstone of successful machine learning models, emphasizing its role in substantially improving performance beyond advanced algorithms.
ch07Model Development and Offline Evaluation
This chapter navigates the complexities of selecting and evaluating machine learning (ML) models, emphasizing a systematic approach to development, performance evaluation, and iterative improvement before deployment.
- Machine learning model development is an iterative process that thrives on continuous evaluation and refinement.
- It is crucial to select models based on problem-specific requirements rather than the latest trends or perceived ‘state-of-the-art’ techniques.
- Employing ensemble methods can lead to significant performance improvements by leveraging the strengths of multiple models.
- Comprehensive experiment tracking and versioning practices are vital for reproducible results and effective team collaboration.
ch08Model Development and Offline Evaluation
The chapter presents a comprehensive approach to developing machine learning models, emphasizing the crucial aspects of model evaluation, especially through offline methodologies, to ensure robust performance in real-world applications.
- Implementing a systematic approach to model development rooted in phases—ranging from basic heuristics to complex models—can fundamentally enhance performance outcomes.
- Baseline evaluations are essential to contextualizing model performance; without them, metrics lose their meaning and can lead to faulty conclusions.
- Incorporating evaluation methodologies such as perturbation and invariance tests helps in understanding how models may perform under various real-world conditions, including exposure to noise.
- Emphasizing slice-based evaluation can help avoid biases and ensure that models serve all segments of users fairly, avoiding issues like Simpson's Paradox.
ch09Model Deployment and Prediction Service
Deploying machine learning models is a critical step that transforms theoretical constructs into accessible, real-time applications; this chapter dissects the nuances, challenges, and methodologies of effective deployment.
- Deployment is not just an afterthought; it represents a critical phase that determines the long-term success of machine learning initiatives.
- Continuous updates to ML models should be the norm, not the exception; as user data evolves, so should the computations to maintain relevance.
- The choice between online and batch prediction carries significant implications for user experience and system architecture.
- Understanding environmental factors—whether cloud or edge—can drastically alter deployment outcomes and operational efficiency.
ch10Data Distribution Shifts and Monitoring
The chapter investigates the critical issue of data distribution shifts in machine learning (ML) models, arguing that continuous monitoring and adaptation are essential to maintain model performance over time.
- ML models require continuous monitoring to maintain effectiveness post-deployment, as performance can degrade due to distribution shifts.
- Understanding the three types of data distribution shifts—covariate shift, label shift, and concept drift—is essential for anticipating model performance issues.
- Software system failures also affect ML systems, emphasizing the need for traditional engineering practices alongside ML-specific monitoring.
- Robust monitoring frameworks are essential for capturing both operational and ML-specific metrics to preemptively detect performance issues.
ch11Continual Learning and Test in Production
This chapter addresses the critical need for ongoing adaptation of machine learning models to data distribution shifts through continual learning, emphasizing the infrastructure required for efficient updates and the practice of testing models in production.
- Continual learning is fundamentally an infrastructural challenge that can enhance the adaptability of machine learning systems to data distribution shifts.
- Employing micro-batching and stateful training methodologies can yield significant improvements in model performance and resource efficiency.
- The champion-versus-challenger model strategy is crucial for safe deployments, reducing the risk of catastrophic failures in production environments.
- Flexibility in retraining schedules allows organizations to remain agile in response to changing data landscapes.
ch12Continual Learning and Test in Production
This chapter argues for the imperative of continual learning in machine learning systems to ensure adaptability in rapidly shifting data environments, emphasizing practical challenges and strategies for implementation.
ch13Infrastructure and Tooling for MLOps
Navigating the complexities of machine learning (ML) infrastructure is essential for practitioners to effectively implement ML systems and avoid stagnation due to inadequate tooling and support.
- Adequate infrastructure is a prerequisite for successful ML implementation; neglecting this leads to operational bottlenecks.
- Organizations vary in their infrastructure requirements; no one-size-fits-all solution exists.
- Standardized environments enhance productivity and reduce friction in development, facilitating smoother transitions to production.
- The cloud provides scalable solutions but consider workload variations that may prompt a return to on-prem solutions as organizations mature.
ch14The Human Side of Machine Learning
This chapter emphasizes the critical role of human factors—user experience, team structure, and societal impacts—in the design and implementation of machine learning systems.
ch15Epilogue
The epilogue reflects on the journey of learning and applying machine learning principles, emphasizing the potential for innovation while acknowledging the challenges that lie ahead.

Questions this book answers

When should you use ML and when should you not?
How do you translate business objectives into ML objectives?
How do you create high-quality, representative training data at scale?
How do you detect and handle data distribution shifts in production?
How often should you retrain your models, and how?

Related in the library

Tools these methods power