peopleanalyst

library / lib9df977d542007d6d

Introduction to Survey Sampling (Quantitative Applications in the Social Sciences)

In a sentence

A concise, practical guide to designing and analyzing probability sample surveys, balancing sampling theory with the real-world problems of frames, nonresponse, and complex designs.

Graham Kalton's Introduction to Survey Sampling distills the essential techniques of probability sampling into a highly readable text for researchers who use surveys but are not statisticians. Beginning with simple random sampling, it builds systematically through systematic sampling, stratification, clustering, multistage and probability-proportional-to-size designs, then confronts the messy realities of imperfect sampling frames, nonresponse, weighting, and the estimation of sampling errors from complex designs. Two worked examples (a national face-to-face survey and a telephone RDD survey) and a discussion of nonprobability and quota sampling show how the pieces combine in practice. The book teaches the reader to weigh precision against cost, to recognize when standard formulas mislead, and to anticipate the practical pitfalls that can ruin an otherwise well-conceived study.

The story it tells the reader

The reader A social-science researcher or survey practitioner who wants to draw valid, efficient samples and produce trustworthy population estimates.

External problem

Designing a sample that yields precise, unbiased estimates within budget while coping with imperfect frames and nonresponse.

Internal problem

Feeling that sampling is an intimidating technical black box best left to statisticians.

Philosophical problem

It is wrong to let a poorly designed sample undermine otherwise careful research; researchers should understand the foundation of their evidence.

The plan

  1. Define the target and survey populations carefully.
  2. Choose an appropriate probability design (SRS, systematic, stratified, cluster, multistage, PPS).
  3. Build and assess the sampling frame, handling missing, clustered, blank, and duplicate listings.
  4. Minimize and compensate for nonresponse.
  5. Apply weights and compute sampling errors appropriate to the design.
  6. Determine sample size from precision, design effect, and nonresponse, balancing cost.

Success

  • Surveys produce defensible, precise estimates with quantified uncertainty.
  • The researcher confidently navigates frame and nonresponse problems and complex designs.
  • Resources are used efficiently, matching precision to need and budget.

At stake

  • Selection bias and frame errors render results untrustworthy.
  • Sampling errors are misstated, overstating precision.
  • Time and money are wasted on a sample whose results cannot support valid inference.

Model of the world · 11 constructs · 12 relations

A framework linking sample design choices and survey conditions to intermediate statistical and operational states (probabilistic coverage, frame quality, response, design effect) that determine the bias, precision, and cost of survey estimates.

Design levers

  • Sample Size
  • Clustering / Multistage Structure
  • Probability Design Choice
  • Stratification

Intermediate states & behaviors

  • Design Effect
  • Equality of Selection Probabilities
  • Response Rate

Outcomes

  • Estimate Precision
  • Estimate Bias
  • Survey Cost

Moderators / context: Sampling Frame Quality

Consolidated shape of the book’s model — full constructs and relationships below.

Probability Design Choicedesign lever

The selection of a probability sampling scheme (SRS, systematic, stratified, cluster, multistage, PPS) that assigns each population element a known nonzero selection probability, forming the structural backbone of the sample.

Stratificationdesign lever

The classification of the population into internally homogeneous strata from which separate samples are drawn, controlling sample sizes by stratum to improve precision or guarantee domain estimates.

Clustering / Multistage Structuredesign lever

The use of grouped sampling units (clusters, PSUs) in which only a sample of clusters is selected and elements subsampled within them, trading reduced precision for substantial cost economies in data collection.

Sampling Frame Qualitycontextual condition

The degree to which the frame lists each population element once and only once, free of missing elements, clusters of elements, blanks/foreign elements, and duplicate listings, determining coverage of the target population.

Response Ratebehavioral pattern

The proportion of eligible sampled elements from which usable data are obtained, reflecting success in avoiding refusals, noncontacts, and incapacity, and bounding potential nonresponse bias.

Equality of Selection Probabilitiespsychological state

The extent to which the realized design is epsem (equal probability of selection), since unequal probabilities from frame or design features necessitate weighting and typically reduce precision.

Design Effectpsychological state

The ratio of the variance of an estimator under the complex design to its variance under simple random sampling of the same size, summarizing how stratification, clustering, and unequal weights jointly affect precision.

Sample Sizedesign lever

The number of elements from which data are collected, the single most important determinant of sampling variance for large populations and a primary cost driver subject to precision and budget trade-offs.

Estimate Precisionoutcome metric

The smallness of the sampling error (standard error / variance) of a survey estimator, determining the width of confidence intervals around population parameters such as means and proportions.

Estimate Biasoutcome metric

The systematic deviation of the expected value of a survey estimator from the true population parameter, arising chiefly from frame noncoverage, nonresponse, and selection bias.

Survey Costoutcome metric

The total resources (money, time, interviewer effort, travel) required to execute the sample design and data collection, which constrains achievable sample size and precision.

How they connect

  • stratification use influences design effect
  • clustering use influences design effect
  • clustering use influences survey cost
  • design effect influences estimate precision
  • sample size predicts estimate precision
  • sample size predicts survey cost
  • probability design choice influences selection probability equality
  • selection probability equality influences design effect
  • sampling frame quality influences estimate bias
  • response rate influences estimate bias
  • sampling frame quality influences selection probability equality
  • estimate precision influences sample size

Possible measures & feedback loops

A candidate team / org survey built from this book’s model — exploratory operationalizations, not validated instruments. Where a construct maps to a validated measure in Principia, we’ll point to that instead.

Probability Design Choice

design type classification; selection equation parameters

self-report suitability: low

Stratification

number of strata; sampling fraction by stratum

self-report suitability: none

Clustering / Multistage Structure

cluster size; subsample size b; number of stages

self-report suitability: none

Sampling Frame Quality

coverage rate; blank rate; duplicate rate

self-report suitability: none

Response Rate

completed/eligible ratio; refusal proportion; contact attempts

self-report suitability: low

Equality of Selection Probabilities

coefficient of variation of weights; weight range

self-report suitability: none

Design Effect

v(z)/v(z0) ratio; intraclass correlation rho

self-report suitability: none

Sample Size

n records; n by domain

self-report suitability: none

Estimate Precision

standard error; confidence interval width; coefficient of variation

self-report suitability: none

Estimate Bias

Wm times subgroup difference; benchmark comparison

self-report suitability: none

Survey Cost

cost per cluster; cost per element; total budget

self-report suitability: low

Preview the survey →

Frameworks & instruments in this book

  • Define an ideal target population first, then explicitly note exclusions to form the survey population.
  • Each element must have a known, nonzero probability of selection for valid inference.
  • Form strata to be internally homogeneous; form clusters to be internally heterogeneous.
  • Use the design effect to translate between complex-design and simple-random-sampling precision.
  • Keep nonresponse small because its bias is the product of nonresponse rate and respondent-nonrespondent difference.
  • Choose between designs by balancing precision against survey cost.

Several of these are operationalized as tools in the People Analytics Toolbox.

Topics

Related in the library