What is PeopleAnalyst?

PeopleAnalyst is the front door for people-analytics research: 205+ works indexed and profiled, 40+ citation-grade findings extracted, and peer-reviewed behavioral science translated from academic to actionable — the missing manual for the people analytics you always meant to do.

What is people analytics?

People analytics is not a dashboard. It is behavioral science and statistical inference applied to workforce decisions — a discipline with its own methodology, spanning measurement, organizational design, talent, leadership, and analytics craft.

Why does AI in HR need measurement science?

AI is being deployed in high-stakes people decisions — hiring, performance, attrition — without the measurement science to evaluate whether it works or whom it harms. Construct validity, effect sizes, and criterion validity are the vocabulary for asking an AI vendor the right questions.

How is the research made accessible?

The evidence is indexed and searchable: 205+ works, 40+ citation-grade insight cards, and 8 research arcs, so the right finding reaches the right decision at the right time.

What separates good people measurement from assertion?

Good measurement has a method: construct validity, reliability, and effect-size interpretation are not optional — they are what separates evidence from assertion.

library / lib9df977d542007d6d

Introduction to Survey Sampling (Quantitative Applications in the Social Sciences)

Quantitative Applications in the Social Sciences

In a sentence

A concise, practical guide to designing and analyzing probability sample surveys, balancing sampling theory with the real-world problems of frames, nonresponse, and complex designs.

Graham Kalton's Introduction to Survey Sampling distills the essential techniques of probability sampling into a highly readable text for researchers who use surveys but are not statisticians. Beginning with simple random sampling, it builds systematically through systematic sampling, stratification, clustering, multistage and probability-proportional-to-size designs, then confronts the messy realities of imperfect sampling frames, nonresponse, weighting, and the estimation of sampling errors from complex designs. Two worked examples (a national face-to-face survey and a telephone RDD survey) and a discussion of nonprobability and quota sampling show how the pieces combine in practice. The book teaches the reader to weigh precision against cost, to recognize when standard formulas mislead, and to anticipate the practical pitfalls that can ruin an otherwise well-conceived study.

The four lenses

Science
Statistics
Systems
Strategy

Define the Target Population

To establish a clear and precise definition of the population to be studied, ensuring survey results are relevant and accurate.

When to use: At the very beginning of any survey design process.

Step 1Identify the overall objectives of the survey to guide population definition.
Entry: The research questions and goals are clearly formulated.
Exit: Survey objectives are documented and agreed upon.
In: Survey objectives · Out: A clear understanding of the survey's goals
ch01
Step 2Define the ideal target population based on survey goals, specifying inclusion criteria.
Entry: Survey objectives are defined.
Exit: A definition of the ideal population is created.
In: Survey goals · Out: Ideal target population definition
ch01
Step 3Assess practical constraints and modify the ideal population into a feasible survey population.
Entry: Ideal population is defined.
Exit: A practical survey population is defined.
- Which elements to exclude based on practical constraints.
In: Ideal target population definition, List of practical constraints · Out: Defined survey population
ch01
Step 4Explicitly document all exclusions from the target population and assess their impact.
Entry: Survey population is defined.
Exit: A list of exclusions and their potential consequences is documented.
In: Survey population definition · Out: Documented exclusions
ch01

Determine Sample Size

To calculate the appropriate number of subjects to include in a survey to achieve a specified level of precision in the results.

When to use: During the survey planning phase, after defining the target population and before selecting the sample.

Step 1Specify the required degree of precision for the survey estimators.
Entry: Research objectives are clear.
Exit: A target precision level is documented.
- Choosing the level of precision, which involves a trade-off with cost.
In: Research objectives · Out: Desired precision level (E)
ch11
Step 2Calculate the initial sample size (n') using a standard formula.
Entry: Precision level is specified.
Exit: An initial sample size is calculated.
In: Desired precision level (E), Estimated population percentage (P) · Out: Initial sample size (n')
ch11
Step 3Adjust the sample size for the finite population correction (FPC) if the sample is a large fraction of the total population.
Entry: Initial sample size and total population size are known.
Exit: FPC-adjusted sample size is calculated.
In: Initial sample size (n'), Population size (N) · Out: Adjusted sample size
ch11
Step 4Modify the sample size to account for the design effect (deff) of the chosen sampling method.
Entry: A sampling method has been chosen.
Exit: Sample size is adjusted for the design effect.
In: Adjusted sample size, Predicted design effect · Out: Design-adjusted sample size
ch11
Step 5Increase the sample size to compensate for anticipated nonresponse.
Entry: An expected response rate is estimated.
Exit: Final target sample size is determined.
In: Design-adjusted sample size, Anticipated response rate · Out: Final sample size
ch11

Select Sampling Methodology

To choose an appropriate method for selecting a sample from the population that aligns with the survey's objectives, budget, and requirements for generalizability.

When to use: In the survey planning phase, after defining the population and before creating the sampling frame.

Step 1Determine whether to conduct a complete enumeration (census) or use sampling.
Entry: Target population is defined.
Exit: A decision to use sampling is made.
- Whether to use a census or a sample.
In: Defined target population, Available resources · Out: Decision to sample
ch01
Step 2Choose between probability sampling and nonprobability sampling methods.
Entry: Decision to sample has been made.
Exit: A primary sampling paradigm (probability vs. nonprobability) is chosen.
- Choosing between probability and nonprobability methods.
In: Research objectives, Resource constraints · Out: Selected sampling paradigm
ch01
Step 3If probability sampling is chosen, select a specific technique.
Entry: Probability sampling paradigm is chosen.
Exit: A specific probability sampling technique is selected.
- Selecting the appropriate probability sampling technique.
In: Population characteristics, Sampling frame details · Out: Chosen sampling technique
ch01

Create and Maintain the Sampling Frame

To develop and refine a comprehensive list or map of all elements in the survey population to ensure that every element has a chance of selection.

When to use: After defining the survey population and before drawing the sample.

Step 1Obtain or create a list of all elements in the defined survey population.
Entry: Survey population is defined.
Exit: An initial sampling frame is procured or created.
In: Survey population definition, Existing population lists or databases · Out: Initial sampling frame
ch01
Step 2Evaluate the frame for coverage errors, such as missing elements (incompleteness).
Entry: Initial frame exists.
Exit: The extent of missing elements is understood.
In: Initial sampling frame · Out: Assessment of frame completeness
ch01 · ch08
Step 3Address missing elements by using supplementary frames or linking procedures.
Entry: Missing elements have been identified.
Exit: Frame coverage is improved.
- Deciding whether to redefine the population to exclude missing elements or to find supplementary frames.
In: Assessment of frame completeness, Supplementary lists · Out: An updated frame with better coverage
ch08
Step 4Identify and resolve duplicate listings within the frame.
Entry: Frame coverage has been addressed.
Exit: Duplicate listings are removed or flagged.
- Choosing between removing duplicates or using a weighting adjustment.
In: Updated sampling frame · Out: A frame free of duplicates or with flagged duplicates
ch08
Step 5Identify and manage listings that represent clusters of elements (e.g., households).
Entry: Frame is clean of duplicates.
Exit: A clear procedure for handling clusters is established.
- Selecting between a complete sampling of clusters or implementing subsampling.
In: Cleaned sampling frame · Out: Final, ready-to-use sampling frame
ch08

Apply Simple Random Sampling (SRS)

To select a sample from a population where every possible sample of a given size has an equal chance of being selected, ensuring unbiased representation.

When to use: When a complete sampling frame is available and the population is relatively homogeneous.

Step 1Assign a unique identification number to every element in the sampling frame.
Entry: A complete sampling frame exists.
Exit: All population elements are uniquely numbered.
In: Sampling frame · Out: Numbered list of population elements
ch02
Step 2Determine the required sample size (n).
Entry: The sample size determination process is complete.
Exit: The final sample size is known.
In: Output from sample size calculation · Out: Sample size (n)
ch02
Step 3Select n numbers using a random method, such as a random number table or computer generator.
Entry: Population is numbered and sample size is known.
Exit: A set of n random numbers is generated.
- Decide whether to sample with or without replacement (without is standard).
In: Numbered list of population elements, Sample size (n) · Out: List of selected random numbers
ch02
Step 4Match the selected random numbers to the corresponding elements in the sampling frame to form the sample.
Entry: Random numbers have been selected.
Exit: The final sample of elements is identified.
In: List of selected random numbers, Numbered list of population elements · Out: Final sample
ch02

Apply Systematic Sampling

To efficiently select a representative sample from an ordered list by choosing elements at a regular interval.

When to use: As a simpler, more convenient alternative to simple random sampling when a complete list of the population is available.

Step 1Ensure the sampling frame is ordered randomly or in a way unrelated to the survey variables.
Entry: A complete sampling frame exists.
Exit: The frame is confirmed to have no problematic periodicity.
In: Sampling frame · Out: An ordered sampling frame
ch03
Step 2Calculate the sampling interval (k) by dividing the population size (N) by the desired sample size (n).
Entry: Population size (N) and sample size (n) are known.
Exit: The sampling interval (k) is determined.
- If k is not an integer, decide whether to round, treat the list as circular, or use a fractional interval.
In: Population size (N), Sample size (n) · Out: Sampling interval (k)
ch03
Step 3Select a random starting point (r) between 1 and k.
Entry: Sampling interval (k) is determined.
Exit: A random start number is selected.
In: Sampling interval (k) · Out: Random start (r)
ch03
Step 4Select the first element at the random starting point, and then select every kth element thereafter.
Entry: Random start and sampling interval are known.
Exit: The final sample is identified.
In: Ordered sampling frame, Random start (r), Sampling interval (k) · Out: Final sample
ch03

Apply Stratified Sampling

To improve the precision of survey estimates and ensure representation of key subgroups by dividing the population into homogeneous strata and sampling from each.

When to use: When the population consists of subgroups with different characteristics and there is a need to ensure each subgroup is represented in the sample, or to improve overall precision.

Step 1Identify relevant stratification variables and classify the population into distinct, non-overlapping strata.
Entry: Target population is defined and supplementary information is available.
Exit: The population is partitioned into strata.
In: Population data, Supplementary information (e.g., demographics) · Out: Defined strata
ch04
Step 2Determine the sample size to be drawn from each stratum.
Entry: Strata are defined and overall sample size is known.
Exit: Sample sizes for each stratum are determined.
- Choosing between proportionate and disproportionate stratification.
In: Defined strata, Overall sample size · Out: Per-stratum sample sizes
ch04
Step 3Draw a separate, independent sample from each stratum using a method like SRS or systematic sampling.
Entry: Per-stratum sample sizes are determined.
Exit: A sample is drawn from every stratum.
In: Stratified sampling frame, Per-stratum sample sizes · Out: Subsamples from each stratum
ch04
Step 4Combine the results from the stratum samples to produce an overall estimate for the population.
Entry: Data has been collected from all subsamples.
Exit: An overall population estimate is calculated.
In: Collected data from each stratum · Out: Overall population mean and variance estimates
ch04

Apply Cluster and Multistage Sampling

To reduce the cost and logistical complexity of sampling, especially for geographically dispersed populations, by sampling groups (clusters) of elements.

When to use: When the population is spread over a wide area, making it too expensive to sample individuals directly, or when the only available sampling frames are for clusters.

Step 1Divide the population into non-overlapping clusters.
Entry: Target population is defined.
Exit: A frame of clusters is created.
In: Population data · Out: List of clusters
ch05
Step 2Analyze cost factors and estimate the intraclass correlation (ρ) to determine the optimal number of clusters and subsample size.
Entry: Cost data and estimates of cluster homogeneity are available.
Exit: Optimal number of clusters and subsample size are determined.
- Adjusting the number of clusters versus the size of subsamples to balance cost and precision.
In: Cost model, Estimates of intraclass correlation (ρ) · Out: Optimal sampling configuration
ch05
Step 3Select a sample of clusters using a probability sampling method like SRS or PPS.
Entry: A frame of clusters exists.
Exit: A sample of clusters is selected.
In: List of clusters · Out: Sample of clusters
ch05
Step 4Decide between single-stage or multi-stage sampling.
Entry: Clusters have been sampled.
Exit: The number of stages for sampling is decided.
- Choosing between single-stage (take-all) and two-stage (subsampling) approaches.
In: Sample of clusters · Out: Sampling stage decision
ch05
Step 5If using a multi-stage design, draw a sample of elements from within each selected cluster.
Entry: Two-stage sampling is chosen.
Exit: The final sample of elements is identified.
In: Sample of clusters · Out: Final sample
ch05
Step 6Calculate the design effect to evaluate the efficiency of the cluster design.
Entry: Data has been collected.
Exit: The design effect is calculated.
In: Collected data, Intraclass correlation coefficient (ρ) · Out: Design effect value
ch05

Apply Probability Proportional to Size (PPS) Sampling

To select clusters (Primary Sampling Units or PSUs) in a way that gives larger clusters a higher chance of selection, which helps to stabilize sample sizes and improve efficiency in multistage designs.

When to use: Typically in the first stage of a multistage survey to select large units like counties or schools, before subsampling a fixed number of elements from each selected unit.

Step 1Create a sampling frame of all clusters (PSUs) with a measure of size for each.
Entry: A list of all PSUs is available.
Exit: A frame of PSUs with their sizes is created.
In: List of PSUs, Cluster size data · Out: Sampling frame of PSUs with sizes
ch06
Step 2Calculate the cumulative size for the list of PSUs.
Entry: Frame of PSUs with sizes exists.
Exit: A cumulative size range is assigned to each PSU.
In: Sampling frame of PSUs with sizes · Out: List of PSUs with cumulative size ranges
ch06
Step 3Select the desired number of PSUs by drawing random numbers within the total cumulative size range.
Entry: Cumulative sizes are calculated.
Exit: A sample of PSUs is selected.
In: List of PSUs with cumulative size ranges, Number of PSUs to select · Out: Sample of PSUs
ch06
Step 4For each selected PSU, sample a fixed number of elements in the second stage.
Entry: PSUs have been selected.
Exit: The final sample of elements is identified.
In: Sample of PSUs, Fixed number of elements to sample per PSU · Out: Final sample
ch06

Apply Two-Phase Sampling (Double Sampling)

To improve efficiency by collecting some information from a large initial sample and then collecting more detailed information from a smaller subsample.

When to use: To screen for a rare population or to gather information for stratification before drawing the main sample.

Step 1Draw a large first-phase sample from the population.
Entry: Target population is defined.
Exit: A large initial sample is selected.
In: Sampling frame · Out: First-phase sample
ch07
Step 2Collect basic, inexpensive information from the first-phase sample.
Entry: First-phase sample is selected.
Exit: Basic data is collected from the first-phase sample.
In: First-phase sample · Out: First-phase data
ch07
Step 3Use the first-phase data to draw a second-phase subsample.
Entry: First-phase data is collected.
Exit: A second-phase subsample is selected.
- Determine the stratification and allocation for the second-phase sample.
In: First-phase data · Out: Second-phase subsample
ch07
Step 4Collect detailed, expensive information from the second-phase subsample.
Entry: Second-phase subsample is selected.
Exit: Detailed data is collected.
In: Second-phase subsample · Out: Final detailed dataset
ch07

Apply Replicated Sampling

To simplify the calculation of standard errors for complex sample designs and to measure sources of nonsampling error, such as interviewer variance.

When to use: When variance estimation is a major challenge or when there is a need to estimate the contribution of interviewers to total survey error.

Step 1Design the total sample as a set of multiple, independent subsamples (replicates).
Entry: Overall sample design and size are determined.
Exit: A replicated sample design is specified.
In: Overall sample design · Out: Replicated sample design
ch07
Step 2Select each of the replicate subsamples independently from the population.
Entry: Replicated design is specified.
Exit: All replicate subsamples are drawn.
In: Sampling frame · Out: A set of replicate subsamples
ch07
Step 3Assign each replicate subsample to a different interviewer or team of interviewers.
Entry: Replicate subsamples are drawn.
Exit: Subsamples are assigned to interviewers.
- Decide on the assignment strategy (e.g., random or convenience).
In: Set of replicate subsamples, List of interviewers · Out: Interviewer assignments
ch07
Step 4Calculate the estimate of interest separately for each replicate.
Entry: Data collection is complete.
Exit: An estimate is calculated for each replicate.
In: Collected data · Out: A set of replicate estimates
ch07
Step 5Calculate the final estimate by averaging the replicate estimates, and calculate the standard error from the variability among the replicate estimates.
Entry: Replicate estimates are calculated.
Exit: Final estimate and its standard error are calculated.
In: A set of replicate estimates · Out: Final estimate, Standard error of the estimate
ch07

Apply Panel Design Sampling

To collect data from the same sample of individuals or households at multiple points in time to measure change.

When to use: When the primary research goal is to study individual-level change (gross change) rather than just net change in the population.

Step 1Select an initial probability sample to serve as the panel.
Entry: Research objectives require longitudinal data.
Exit: An initial panel sample is recruited.
In: Sampling frame · Out: Initial panel
ch07
Step 2Collect data from the panel at the first time point (wave 1).
Entry: Panel is recruited.
Exit: Wave 1 data is collected.
In: Initial panel · Out: Wave 1 dataset
ch07
Step 3Develop and implement a strategy for retaining panel members for subsequent waves.
Entry: Wave 1 is complete.
Exit: A panel retention strategy is in place.
Out: Panel retention plan
ch07
Step 4Collect data from the same panel members at defined future intervals or events (waves 2, 3, etc.).
Entry: Panel retention plan is active.
Exit: Longitudinal data is collected over multiple waves.
In: Panel members · Out: Multi-wave panel dataset
ch07
Step 5Implement a panel rotation strategy if needed to manage respondent fatigue and maintain representativeness.
Entry: The panel study is long-term.
Exit: Panel composition is refreshed over time.
- Evaluating the panel rotation strategy (e.g., what fraction to rotate).
ch07

Design a National Face-to-Face Interview Survey

To design a practical and representative stratified multistage area sampling framework for conducting national face-to-face interview surveys.

When to use: When high-quality, representative data from a national population is required and face-to-face interviews are the chosen mode of data collection.

Step 1Define Primary Sampling Units (PSUs), typically as metropolitan areas or counties.
Entry: The need for a national area sample is established.
Exit: A list of all PSUs in the country is created.
In: Census data · Out: List of PSUs
ch12
Step 2Identify the largest PSUs as self-representing and include them in the sample with certainty.
Entry: List of PSUs is created.
Exit: Self-representing PSUs are identified.
In: List of PSUs · Out: Certainty PSU selections
ch12
Step 3Stratify the remaining non-self-representing PSUs based on geography and demographic factors.
Entry: Self-representing PSUs are removed from the frame.
Exit: A stratified frame of non-certainty PSUs is created.
In: List of non-certainty PSUs, Census data · Out: Stratified PSU frame
ch12
Step 4Select one PSU from each stratum using Probability Proportional to Estimated Size (PPES).
Entry: PSUs are stratified.
Exit: A sample of non-certainty PSUs is selected.
In: Stratified PSU frame · Out: Sample of PSUs
ch12
Step 5Within each selected PSU, select smaller clusters (e.g., city blocks) using PPES.
Entry: PSUs are selected.
Exit: A sample of clusters within each PSU is selected.
In: Sample of PSUs, Census block data · Out: Sample of clusters
ch12
Step 6Use field staff to create a list of all housing units in the selected clusters (segments).
Entry: Clusters are selected.
Exit: A list of housing units in selected segments is created.
In: Sample of clusters · Out: Housing unit sampling frame
ch12
Step 7Select a sample of housing units from the created list to be contacted for interviews.
Entry: Housing unit frame is created.
Exit: Final sample of housing units is selected.
In: Housing unit sampling frame · Out: Final sample
ch12

Design a Telephone Interview Survey

To design a sampling framework for telephone surveys that maximizes coverage of the telephone household population while managing inefficiencies.

When to use: When a rapid and cost-effective method of reaching a broad population is needed.

Step 1Evaluate potential sampling frames, recognizing the limitations of telephone directories.
Entry: Telephone survey mode is chosen.
Exit: The decision is made to avoid directory-based frames.
- Choosing between directories and Random-Digit Dialing (RDD).
Out: Decision to use RDD
ch12
Step 2Implement a Random-Digit Dialing (RDD) method to generate a sample of telephone numbers.
Entry: RDD method is chosen.
Exit: A list of potential telephone numbers is generated.
In: List of area/central office codes · Out: RDD sample of numbers
ch12
Step 3Apply a method like the Waksberg scheme to improve the efficiency of RDD.
Entry: A basic RDD sample is generated.
Exit: A more efficient sample of telephone numbers is created.
In: RDD sample of numbers · Out: An efficiency-improved RDD sample
ch12
Step 4Once a residential household is contacted, select an individual respondent.
Entry: A household has been successfully contacted.
Exit: A single respondent within the household is selected.
- Choosing the within-household respondent selection method.
In: List of household members · Out: Selected respondent
ch12

Apply Haphazard or Convenience Sampling

To quickly and inexpensively gather data from readily available subjects.

When to use: When speed and low cost are the highest priorities and the risks of bias are acceptable.

Step 1Identify a location or group of subjects that are easily accessible.
Entry: A non-probability approach has been deemed acceptable.
Exit: A source of convenient subjects is identified.
Out: Identified source of subjects
ch13
Step 2Collect data from these subjects until the desired sample size is reached.
Entry: Source of subjects is identified.
Exit: Data collection is complete.
In: Readily accessible participants · Out: Collected data
ch13

Apply Judgment or Purposive Sampling

To select a sample that is deemed representative or particularly informative based on an expert's opinion.

When to use: When studying a very specific or hard-to-reach population, or when an expert's knowledge is trusted to select a representative or illustrative sample.

Step 1Identify an expert or set of experts in the subject matter.
Entry: A non-probability approach has been deemed acceptable.
Exit: An expert is identified.
Out: Identified expert
ch13
Step 2The expert selects subjects or cases that they believe are most representative or will provide the richest information.
Entry: An expert is identified.
Exit: A sample is selected.
- Different experts may select different samples, leading to different results.
In: Expertise in the subject, Criteria for representativeness · Out: A judgment-based sample
ch13

Apply Quota Sampling

To create a sample that mirrors the population's demographic proportions on a few key variables, without using probability sampling.

When to use: As a nonprobability alternative to stratified sampling, often used in commercial research for its speed and cost-effectiveness.

Step 1Identify key demographic variables and determine their proportions in the target population.
Entry: A non-probability approach has been deemed acceptable.
Exit: Demographic quotas are defined.
In: Demographic data (e.g., from census) · Out: Defined quotas
ch13
Step 2Assign interviewers specific quotas to fill for different types of respondents.
Entry: Quotas are defined.
Exit: Quotas are assigned to interviewers.
In: Defined quotas · Out: Interviewer assignments
ch13
Step 3Interviewers find and survey people who fit their assigned quotas until they are filled.
Entry: Quotas are assigned.
Exit: All quotas are filled and data collection is complete.
- Interviewers choose how to fill quotas, which can introduce selection bias.
Out: A sample that matches the population on quota variables
ch13

Conduct Survey and Minimize Nonresponse

To execute the data collection plan while implementing strategies to maximize the response rate and reduce potential bias from nonresponse.

When to use: During the fieldwork or data collection phase of the survey.

Step 1Rigorously train interviewers on engagement techniques and survey protocols.
Entry: Interviewers are hired and the survey instrument is final.
Exit: Interviewers are fully trained and ready for fieldwork.
In: Survey questionnaire, Interviewer manual · Out: Trained interviewers
ch09
Step 2Make initial contact with sampled individuals, providing assurances of anonymity and confidentiality.
Entry: Sample has been drawn.
Exit: Initial contact attempts are made for the entire sample.
In: Final sample · Out: Initial contact outcomes (completed interview, refusal, not available)
ch09
Step 3For respondents who are not available, schedule and make multiple callbacks at varied times and days.
Entry: Initial contact resulted in a 'not available' outcome.
Exit: A predefined number of callback attempts have been made.
- Deciding how many callbacks to make before classifying a case as a non-contact.
In: List of non-contacts · Out: Updated contact outcomes
ch09
Step 4For mail surveys, implement follow-up procedures for non-respondents.
Entry: Initial deadline for mail survey return has passed.
Exit: Follow-up mailings are complete.
In: List of non-respondents · Out: Additional completed questionnaires
ch09

Handle Item Nonresponse and Impute Data

To address missing data on individual survey questions (item nonresponse) to allow for complete case analysis and reduce potential bias.

When to use: During the data processing and analysis phase, after data collection is complete.

Step 1Analyze the extent and pattern of item nonresponse.
Entry: Raw survey dataset is available.
Exit: A report on missing data patterns is created.
In: Collected survey data · Out: Missing data analysis
ch09
Step 2Choose an appropriate method for handling the missing data.
Entry: Missing data has been analyzed.
Exit: An imputation strategy is chosen.
- Choice of imputation method (e.g., mean substitution, regression, hot deck).
In: Missing data analysis · Out: Selected imputation method
ch09
Step 3Apply the chosen imputation method to estimate and fill in the missing values.
Entry: Imputation method is chosen.
Exit: Missing values in the dataset are replaced with imputed values.
In: Survey dataset with missing values · Out: A complete dataset with no missing values
ch09
Step 4Create a flag variable in the dataset to indicate which values have been imputed.
Entry: Imputation is complete.
Exit: Imputed values are clearly flagged in the final dataset.
- Decision to either flag or leave imputed values unmarked.
In: Imputed dataset · Out: Final, flagged dataset
ch09

Apply Sample Weights

To adjust survey data to compensate for unequal selection probabilities, nonresponse, and noncoverage, ensuring that estimates are representative of the target population.

When to use: During the data analysis phase, before calculating population estimates.

Step 1Calculate the base weight for each sampled element to account for unequal selection probabilities.
Entry: The selection probability for each sampled unit is known.
Exit: A base weight is calculated for each respondent.
In: Selection probability (pi) for each element · Out: Base weights
ch10
Step 2Adjust the base weights for nonresponse.
Entry: Base weights are calculated.
Exit: Nonresponse-adjusted weights are calculated.
In: Base weights, Data on respondents and non-respondents · Out: Nonresponse-adjusted weights
ch10
Step 3Apply post-stratification or raking to adjust weights so that sample totals match known population totals.
Entry: Nonresponse-adjusted weights are calculated.
Exit: Final weights are calculated.
In: Nonresponse-adjusted weights, Known population totals (e.g., from census) · Out: Final analysis weights
ch10
Step 4Analyze the coefficient of variation of the final weights.
Entry: Final weights are calculated.
Exit: The variability of weights is assessed.
In: Final analysis weights · Out: Coefficient of variation of weights
ch10
Step 5Use the final weights in all subsequent analyses to produce population estimates.
Entry: Final weights are created and checked.
Exit: Weighted analysis is performed.
In: Final analysis weights, Survey data · Out: Weighted population estimates
ch10

Calculate Sampling Error for Complex Designs

To accurately estimate the sampling error (e.g., standard errors, confidence intervals) for survey estimates derived from complex sampling designs.

When to use: During the final analysis phase, when reporting survey estimates and their precision.

Step 1Identify the specific features of the sample design that must be accounted for in variance estimation.
Entry: Survey data and sample design information are available.
Exit: Key design features are documented for analysis.
In: Sample design documentation · Out: List of relevant design features
ch10
Step 2Choose an appropriate method for variance estimation.
Entry: Design features are identified.
Exit: A variance estimation method is selected.
- Choosing between linearization and replication methods based on the design and available software.
In: List of relevant design features · Out: Selected variance estimation method
ch10
Step 3Use specialized survey analysis software to apply the chosen method.
Entry: Variance estimation method is selected.
Exit: Sampling errors are calculated.
In: Survey dataset with weights, Stratum and cluster identifiers · Out: Estimated sampling errors (standard errors, confidence intervals)
ch10
Step 4Report the estimates along with their corresponding standard errors or confidence intervals.
Entry: Sampling errors are calculated.
Exit: Final results are reported with measures of precision.
In: Survey estimates, Calculated sampling errors · Out: Final survey report
ch10

A candidate measure

Introduction to Survey Sampling (Quantitative Applications in the Social Sciences) — derived measurement candidates

Probability Design Choice

design type classification; selection equation parameters

self-report suitability: low

Stratification

number of strata; sampling fraction by stratum

self-report suitability: none

Clustering / Multistage Structure

cluster size; subsample size b; number of stages

self-report suitability: none

Sampling Frame Quality

coverage rate; blank rate; duplicate rate

self-report suitability: none

Response Rate

completed/eligible ratio; refusal proportion; contact attempts

self-report suitability: low

Equality of Selection Probabilities

coefficient of variation of weights; weight range

self-report suitability: none

Design Effect

v(z)/v(z0) ratio; intraclass correlation rho

self-report suitability: none

Sample Size

n records; n by domain

self-report suitability: none

Estimate Precision

standard error; confidence interval width; coefficient of variation

self-report suitability: none

Estimate Bias

Wm times subgroup difference; benchmark comparison

self-report suitability: none

Survey Cost

cost per cluster; cost per element; total budget

self-report suitability: low

Run the assessment

The story

The reader A social-science researcher or survey practitioner who wants to draw valid, efficient samples and produce trustworthy population estimates.

External problem

Designing a sample that yields precise, unbiased estimates within budget while coping with imperfect frames and nonresponse.

Internal problem

Feeling that sampling is an intimidating technical black box best left to statisticians.

Philosophical problem

It is wrong to let a poorly designed sample undermine otherwise careful research; researchers should understand the foundation of their evidence.

The plan

Define the target and survey populations carefully.
Choose an appropriate probability design (SRS, systematic, stratified, cluster, multistage, PPS).
Build and assess the sampling frame, handling missing, clustered, blank, and duplicate listings.
Minimize and compensate for nonresponse.
Apply weights and compute sampling errors appropriate to the design.
Determine sample size from precision, design effect, and nonresponse, balancing cost.

Success

Surveys produce defensible, precise estimates with quantified uncertainty.
The researcher confidently navigates frame and nonresponse problems and complex designs.
Resources are used efficiently, matching precision to need and budget.

At stake

Selection bias and frame errors render results untrustworthy.
Sampling errors are misstated, overstating precision.
Time and money are wasted on a sample whose results cannot support valid inference.

Chapter by chapter

ch01Introduction
Surveys have become an essential tool for collecting data across various fields, yet the methodology behind sample surveys is historically recent and complex, requiring careful design to ensure valid results.
- Sample surveys are vital tools that provide essential data across multiple disciplines, yet they require rigorous design methods to ensure accuracy.
- Defining the target population is a complex but necessary step that greatly influences survey outcomes.
- The efficiency of sampling, when done correctly, can produce higher quality data compared to complete population enumeration.
- Probability sampling methods offer theoretical rigour, making them preferable over nonprobability methods that rely on subjective judgments.
ch02Simple Random Sampling
Simple random sampling (SRS) establishes the foundation for probabilistic sampling methods, illustrating that every group of potential samples has an equal chance of selection, which is critical for accurate data collection.
- Simple random sampling ensures all elements have equal selection probabilities, critical for unbiased data collection.
- The impracticality of the lottery method highlights the necessity for more systematic approaches, such as random number tables.
- Sampling without replacement offers greater precision in estimates over sampling with replacement, making it the preferred method.
- The properties of estimators are linked to bias and variance, impacting the reliability of statistical conclusions drawn from sampled data.
ch03Systematic Sampling
This chapter examines the efficiencies and drawbacks of systematic sampling, a method which simplifies the sampling process by selecting every kth element after a random start.
ch04Stratification
Stratification in survey sampling is a technique that allows demographers and researchers to improve the accuracy and validity of their analyses by classifying a population into distinct strata based on relevant characteristics.
- Stratification allows for tailored sample designs that improve precision and representativity in survey sampling.
- Proportionate stratification ensures that the resulting sample is no less precise than that of a simple random sample of the same size.
- Disproportionate stratification can enhance precision and provide sufficient representation for smaller subpopulations, regardless of overall population size.
- The effective use of stratification contributes to the ethical obligation of researchers to deliver accurate and fair representation of diverse populations.
ch05Cluster and Multistage Sampling
This chapter explores cluster and multistage sampling techniques as alternatives to simple random sampling, addressing their distinct purposes and the trade-offs related to precision and cost-effectiveness.
ch06Probability Proportional to Size Sampling
This chapter explores the complexities and methodologies of Probability Proportional to Size (PPS) sampling, emphasizing the importance of accurate size measures in sampling designs that account for varying group sizes.
- Ignoring cluster size variability in sampling designs can lead to significant inaccuracies in data representation.
- Probability proportional to size (PPS) sampling mitigates the risks associated with unequal cluster sizes and provides a reliable method for achieving fixed sample sizes.
- Effective stratification of samples can reduce variability, but careful consideration is required to avoid compromising the integrity of the final sample.
- Distinguishing between true sizes and estimated sizes is critical when employing PPS or PPES methodologies, as inaccuracies can introduce bias and affect sample outcomes.
ch07Other Probability Designs
This chapter examines specialized sampling techniques—two-phase sampling, replicated sampling, and panel designs—that enhance survey data collection efficiency and accuracy across varied contexts.
ch08Sampling Frames
This chapter dissects the complexities and critical importance of sampling frames in survey design, detailing various potential issues and their solutions, which ultimately affect the integrity of statistical research.
ch09Nonresponse
Nonresponse in surveys poses a significant risk of bias, jeopardizing the reliability of data by compromising the representativeness of respondent groups, which increasingly reflects a disengaged public.
- Nonresponse is a critical issue with potential biases that can skew survey insights significantly.
- The prevalence of refusals and not-at-homes necessitates dedicated strategies to enhance engagement in survey efforts.
- Nonresponse rates may disproportionately affect marginalized populations, thereby altering the reliability of derived insights.
- Employing follow-up strategies can mitigate nonresponse, yet researchers must remain alert to the risk of imputed values distorting original data distributions.
ch10Survey Analysis
This chapter examines the specialized considerations in analyzing survey data derived from complex sample designs, focusing on the use of weights and the calculation of sampling errors.
ch11Sample Size
Determining the appropriate sample size for surveys is a complex task that balances precision, design effects, and costs, which are influenced by prior assessments and predictions.
- Understanding the necessary sample size is crucial for accurate survey estimations; overspecifying precision can lead to impractical sample requirements.
- Utilizing conservative estimates for population parameters maximizes the likelihood of obtaining robust data.
- The finite population correction should not be overlooked; it significantly impacts the reliability of the sample derived from the population size.
- Sample design decisions, such as whether to stratify or maintain simple random sampling, impact required sample size and data precision.
ch12Two Examples
This chapter presents two sample designs for surveys — one face-to-face and the other via telephone — illustrating how established sampling techniques can be effectively implemented in practical applications.
ch13Nonprobability Sampling
This chapter scrutinizes nonprobability sampling methods, emphasizing their practical applications despite the inherent risks of bias and the absence of statistical rigor compared to probability sampling.
ch14Concluding Remarks
This chapter emphasizes the crucial importance of proper survey sampling practices, warning that novice researchers may jeopardize their results without consultation from experienced statisticians.
- Survey sampling is an intricate domain that requires careful attention to detail to ensure valid results.
- Engaging with experienced survey statisticians is crucial for those unfamiliar with the nuances of sampling techniques.
- A robust understanding of the literature surrounding sampling can empower researchers to avoid common pitfalls in survey design.
- Inadequate sampling can lead to significant distortions in research findings, impacting the credibility of subsequent conclusions.

Questions this book answers

How should a sample be selected so that valid statistical inferences about a population can be made?
What are the trade-offs in precision and cost among simple random, systematic, stratified, cluster, multistage, and PPS designs?
How do imperfections in sampling frames and nonresponse threaten survey estimates and how can they be mitigated?
How are weights and sampling errors computed for complex sample designs?
How large a sample is needed, and when is nonprobability sampling acceptable?

Glossary

Probability Design Choice: The chosen probabilistic mechanism for selecting sample elements such that each has a known nonzero probability of inclusion.
Stratification: Partitioning of the population into strata and drawing separate samples from each to control allocation and improve precision.
Clustering / Multistage Structure: Use of grouped units (clusters/PSUs) with selection of clusters and subsampling of elements to economize fieldwork.
Sampling Frame Quality: The degree to which the frame correctly and completely lists each population element exactly once.
Response Rate: The proportion of eligible sampled elements yielding usable data.
Equality of Selection Probabilities: The extent to which all elements share the same selection probability (epsem).
Design Effect: Ratio of complex-design estimator variance to the SRS variance of the same sample size.
Sample Size: The number of elements from which survey data are collected.

Related in the library

Tools these methods power

Related in the literature

The measurement literature behind this signal — sourced, so you can defend it.

“Title : Introduction to Survey Sampling (Quantitative Applications in the Social Sciences) Author: Kalton, Graham ASIN : B01FMG3ZGM ISBN : 9781452237374 Series / Number 07-035 INTRODUCTION TO SURVEY SAMPLING GRAHAM KALTON University of Michigan [image "Image"…”
— Introductiontosurveysamplingquantitativematch 75%
“The foundation of survey research, of course, lies in sampling procedures. No matter how good the questions asked and no matter how elegant the analysis, little knowledge will be gained if the sample itself is poorly designed and executed. Despite the obviousness of these…”
— Introductiontosurveysamplingquantitativematch 68%
“One is the lack of the need for a sampling frame for selecting respondents within sampled areas. The other is the avoidance of the requirement that interviewers make callbacks to contact specified respondents. With a quota sample, if an eligible person is unavailable when the…”
— Introductiontosurveysamplingquantitativematch 66%

Resources: Introductiontosurveysamplingquantitative