This guide is for someone who reads research, makes decisions informed by it, or is about to run their own study — and wants their conclusions to hold up under scrutiny rather than rest on opinion. The through-line is a chain: a clear theoretical question shapes how you define concepts; definitions become measurements; measurements must be reliable and valid; you sample observations and pick methods that fit the question; you apply analytic procedures rigorously; and only then are you entitled to an inference. Each link enables the next, and a weak link anywhere caps the trustworthiness of everything downstream. The corpus spans quantitative, qualitative, psychometric, survey, experimental, and organizational traditions — and it genuinely disagrees on some deep points (what 'rigor' even means, how you generalize, whether validity is a property of measures or of inferences). Where the corpus splits on worldview, this guide maps the camps and helps you choose for your situation. Where a claim is an outlier resting on thin support, it says so. You don't need to occupy this world day-to-day yet; the aim is to build the capability from wherever you start.
The path
- Anchor the study in a theory, paradigm, and a clearly framed research question.
- Conceptualize your fuzzy concepts and operationalize them into concrete indicators.
- Establish that your measures are reliable — consistent and free of random error.
- Establish that your measures are valid — they capture the intended construct, not something else.
- Choose a sampling design whose logic of generalization matches your question.
- Select methods and models that fit the question, the outcome type, and the data structure.
- Apply analytic procedures rigorously, screening data and respecifying only on principled grounds.
- Draw inferences that the chain of evidence actually supports — and report enough that others can check.
Theoretical Grounding and Paradigm Selection
Foundations
Research is not aimless observation. It is oriented by a theory, an epistemological stance, and a clearly framed research question that together decide which concepts are relevant, what counts as data, and how findings get interpreted. The Practice of Social Research frames science as resting on two pillars — logic (theory) and observation (data) — which must both make sense and correspond to reality. The Foundations of Social Research goes deeper, showing that every method is grounded in a methodology, a theoretical perspective, and an epistemology (objectivism, constructionism, or subjectivism), and that naming these makes your work transparent and accountable. Before you collect anything, you choose a purpose — exploration, description, or explanation — and, in modeling traditions like SEM, you commit to specifying your model from theory rather than fishing in the data.
Why it matters. Skip this and you get a study that measures the wrong things and cannot interpret its own results. If your concepts are not selected by a question and a framework, you will collect data that look impressive and answer nothing — and you will have no principled basis to defend why your approach is the right one. Kline's central rule is that model specification must be driven by theory and a priori hypotheses; a data-driven model dressed up as confirmatory is a common and costly self-deception.
The myth: Good research means being objective and theory-free — you just go look at the data and let it speak.
The reality: There is no theory-free observation. The Foundations of Social Research shows every method already carries an epistemology and a theoretical perspective; the honest move is to make yours explicit so the process can be scrutinized, not to pretend you have none.
The myth: The research question is a formality you write up after you know what you found.
The reality: The question is the practical point of departure that determines design, relevant concepts, and interpretation. Westland is blunt: causality is a theoretical assumption, not a statistical output — your framework decides what the numbers can mean.
How to:
- State the real-life issue your inquiry addresses, then write a single research question and decide its purpose: exploration, description, or explanation.
- Name your epistemological stance — objectivist, constructionist, or subjectivist — and the theoretical perspective behind your method, so your logic and criteria are explicit (The Foundations of Social Research).
- List the concepts your question makes relevant; this list, not convenience, drives what you will measure.
- If you will model relationships, write the hypothesized paths and their directions from substantive knowledge before touching data (Kline; Jöreskog).
- Describe your intended process as specifically as possible so an outside observer could trace and challenge it.
Watch out for:
- Dressing up a fishing expedition as confirmatory research — specifying a model after seeing the data and reporting it as a priori (sem_principles_practice_kline; sem_paths_to_networks_westland).
- Borrowing a method without its paradigm: techniques carry assumptions about the human world that may clash with your question (the_foundations_of_social_research).
- Treating 'agreement reality' — what everyone assumes is true — as evidence; the discipline is deliberate, careful observation that can overturn it (the_practice_of_social_research).
Grounded in: The Practice of Social Research; The Foundations of Social Research; Sem Principles Practice Kline; Sem Paths to Networks Westland; Scale Development (Applied Social Research Methods); Case Study Research Design Methods Yin
Conceptualization and Operationalization
Foundations
Abstract concepts like 'centralization,' 'job performance,' or 'alienation' are fuzzy until you make them specific. Conceptualization gives a concept a precise nominal definition with explicit boundaries and dimensions; operationalization specifies the concrete operations — the indicators and measurement procedures — that link concept to observation. The Practice of Social Research treats this as the move that makes a concept studiable at all. Scale Development and the rapid-assessment guide insist you fix the construct's definition and level of specificity before writing a single item, and the Handbook of Organizational Measurement shows the payoff: deconstructing complex umbrella concepts into distinct, measurable components is what lets a field accumulate comparable findings.
Why it matters. If your operational definition does not match your concept, every later number is precise about the wrong thing. A measure of 'absenteeism' that accidentally counts authorized vacation, or a 'performance' measure that captures results rather than behavior, will produce clean statistics and false conclusions. Schmitt's selection work makes the point sharply: performance is behavior, distinct from the results of that behavior — conflate them in your operationalization and you misjudge your predictors.
The myth: Everyone knows what the concept means, so a definition is busywork.
The reality: Shared intuition hides disagreement. The Handbook of Organizational Measurement defines 'absenteeism' to explicitly exclude vacations and layoffs precisely because the everyday word smuggles in the wrong cases. Precise boundaries are what make findings comparable across studies.
The myth: More items means better coverage, so pile them on.
The reality: Scale Development distinguishes relevant content redundancy — multiple items expressing the same construct-relevant idea, which builds reliability — from superficial wording redundancy, which inflates it artifactually. Content sampling must representatively cover the domain without adding construct-irrelevant variance.
How to:
- Write a nominal definition that states explicitly what the construct includes and excludes (its domain boundaries) and its level of specificity (scale_development; developing_validating_rapid_assessment_instruments).
- Identify the dimensions of the concept, then specify indicators for each — the observable signs that stand in for the unobservable concept.
- Spell out the exact measurement operation: how an observation gets categorized or scored, so the procedure is repeatable.
- Match the specificity of items to the specificity of the construct and the research question (scale_development).
- For complex constructs, deconstruct the umbrella term into distinct measurable components rather than one vague index (handbook_organizational_measurement_price).
Watch out for:
- Double-barreled or vaguely worded items that ask two things at once — item-design flaws that corrupt the concept-indicator link (scale_development).
- Construct underrepresentation (too narrow a sample of the domain) or construct-irrelevant variance (items tapping something else) (developing_validating_rapid_assessment_instruments).
- Operationalizing results when you mean behavior, or vice versa (personnel_selection_in_organizations_schmitt).
Grounded in: The Practice of Social Research; The Foundations of Social Research; Reliability and Validity Assessment; Scale Development (Applied Social Research Methods); Developing and Validating Rapid Assessment Instruments (Pocket Guides to Social Work Research Methods); Psychometric Theory Nunnally Bernstein; Handbook of organizational measurement
Measurement Reliability
Foundations
Reliability is the consistency, repeatability, and precision of a measure. Formally, Psychometric Theory defines it as the proportion of observed-score variance attributable to true score — the degree to which a measure is free from random error. Under the domain-sampling model, two levers raise it: aggregating more items and raising their average intercorrelation. Reliability and Validity Assessment offers a working benchmark — reliabilities for widely used scales should generally not fall below .80 — and reminds you that adding items helps only if you don't drag down their average intercorrelation. The same logic appears in qualitative work as inter-rater agreement on codes and in case study work as the goal that another investigator following your procedures would reach the same findings.
Why it matters. Random measurement error is noise that no amount of clever analysis removes after the fact — it attenuates every relationship you estimate. Hunter and Schmidt show that unreliable measures systematically shrink observed effect sizes below their true value, so an unreliable study can hide a real effect entirely. And because reliability sets a ceiling on validity, an inconsistent measure cannot be a valid one.
The myth: A reliable measure is an accurate one.
The reality: Reliability is only consistency. A bathroom scale five pounds off is perfectly reliable and perfectly wrong. Scale Development states it plainly: reliability is necessary but not sufficient for validity.
The myth: Just add more items to push reliability up.
The reality: Test length helps only if the new items share the same common core. Adding items that lower the average inter-item correlation does not help, and may signal you are measuring more than one thing (reliability_and_validity_assessment; psychometric_theory_nunnally_bernstein).
How to:
- Choose the reliability evidence that fits: internal consistency (coefficient alpha) for a single administration, test-retest for stability over time, inter-rater agreement for judged or coded data (developing_validating_rapid_assessment_instruments; the_coding_manual_for_qualitative_researchers).
- Aim for internal-consistency reliability of at least .80 for established scales, treating lower values as a flag, not a verdict (reliability_and_validity_assessment).
- Build reliability in at design time through enough items with genuine content overlap, not after the fact (psychometric_theory_nunnally_bernstein).
- When measures are imperfect, model the error: SEM uses multiple indicators per latent construct precisely to separate true score from random error (sem_principles_practice_kline).
- In qualitative coding, define codes clearly enough that a second coder applies them consistently, and check agreement (the_coding_manual_for_qualitative_researchers).
Watch out for:
- Chasing a high alpha by stacking near-identical items — that is superficial redundancy inflating reliability without improving the measure (scale_development).
- Assuming a published scale's reliability transfers to your population; reliability is a property of scores in a context, not a fixed attribute of the instrument (developing_validating_rapid_assessment_instruments).
- Ignoring that correlated/clustered data carries less information than its raw N suggests, which affects the precision you can claim (beyond_multiple_linear_regression).
Grounded in: Psychometric Theory Nunnally Bernstein; Reliability and Validity Assessment; Scale Development (Applied Social Research Methods); Developing and Validating Rapid Assessment Instruments (Pocket Guides to Social Work Research Methods); The Practice of Social Research; The Coding Manual for Qualitative Researchers; Item Response Theory Fundamentals
Measurement and Construct Validity
Practitioner
Validity is the extent to which a measure reflects the construct you intend rather than something else. Psychometric Theory treats construct validity as the central, unifying concept — content, criterion, and predictive evidence all feed one cumulative case about what a measure means. Reliability and Validity Assessment adds that construct validation requires a surrounding theoretical network: you confirm a measure by showing it relates to other constructs the way theory predicts. Crucially, validity is not a fixed property stamped on an instrument — Scale Development insists it resides in how a tool is used in a given context and population. Systematic (nonrandom) error is the enemy here: a measure that consistently captures the wrong thing is reliable but invalid.
Why it matters. Construct validity is the link that turns measurement into inference. If your indicators don't reflect the intended construct, even a flawless analysis yields a confident statement about the wrong thing. In personnel selection this is decisive: Schmitt's principle that all validation is construct validation means a predictor that looks job-relevant but measures something else will mispredict performance and may produce unfair, legally indefensible decisions (Cook; Edenborough).
The myth: A test is valid, full stop — validity is a property the instrument carries everywhere.
The reality: Validity is a property of the inferences you draw from scores in a specific use, population, and context (scale_development; developing_validating_rapid_assessment_instruments). The same instrument can be valid for one purpose and invalid for another.
The myth: If it predicts the outcome, it's valid — correlation is enough.
The reality: Criterion evidence is one strand, not the whole rope. Construct validity requires embedding the measure in a theoretical network and showing the whole pattern of relationships holds (reliability_and_validity_assessment; psychometric_theory_nunnally_bernstein).
How to:
- Assemble content validity first: have experts judge whether items representatively cover the defined domain (developing_validating_rapid_assessment_instruments).
- Gather criterion/predictive evidence where a meaningful outcome exists — e.g., does a selection measure forecast later job performance (personnel_selection_adding_value_cook).
- Build construct evidence by specifying, in advance, which other constructs your measure should and should not correlate with, then testing that pattern (reliability_and_validity_assessment).
- Use diverse methods and item formats so the construct is not an artifact of one method (methodological heterogeneity) (psychometric_theory_nunnally_bernstein).
- Treat validity as an integrated, evolving judgment built from multiple lines of evidence, re-evaluated for each new population and use (developing_validating_rapid_assessment_instruments).
Watch out for:
- Confusing reliability with validity — a measure can be highly consistent and still systematically wrong (reliability_and_validity_assessment).
- Reifying factors: treating a factor-analytic dimension as a real entity rather than a model approximation that needs theoretical interpretation (exploratory_factor_analysis_step_by_step; reliability_and_validity_assessment).
- Assuming a validated instrument stays valid when you move populations, settings, or purposes (scale_development).
Grounded in: Psychometric Theory Nunnally Bernstein; Reliability and Validity Assessment; Scale Development (Applied Social Research Methods); Developing and Validating Rapid Assessment Instruments (Pocket Guides to Social Work Research Methods); Personnel Selection Adding Value Cook; Experimental Quasiexperimental Designs Shadish; Sem Principles Practice Kline
Sampling Design Rigor
Practitioner
How you select observations governs whether your findings travel beyond your data. In the probability-sampling tradition, every population element must have a known, nonzero (ideally equal) chance of selection — that is what permits unbiased estimation and quantified sampling error. The survey-sampling guide walks the practical sequence: define an ideal target population, note exclusions to form the survey population, build a quality frame, and choose among stratification (strata internally homogeneous, to improve precision) and clustering (clusters internally heterogeneous, to cut cost). Fowler's Total Survey Design reminds you that nonresponse bias is the product of the nonresponse rate and the respondent-nonrespondent difference, so a high response rate matters. In factor analysis and SEM, sample adequacy reappears as the condition for stable estimates.
Why it matters. A biased or inadequate sample undermines every downstream claim no matter how sophisticated the analysis. Fowler's principle is that survey quality is a function of all its components, and weakness in one — here, sampling — can invalidate strengths elsewhere. Westland makes the parallel point for modeling: without sufficient sample size and information content, statistical analysis is meaningless, and small samples in some estimators produce badly biased coefficients.
The myth: A big sample is a representative sample.
The reality: Size does not cure selection bias. An element's known, nonzero selection probability is what licenses inference; a huge convenience sample with unknown selection chances still cannot support unbiased generalization (introduction_to_survey_sampling_quantitative_app; survey_research_methods_fowler).
The myth: Stratifying and clustering are interchangeable ways to organize a sample.
The reality: They serve opposite goals. You form strata to be internally homogeneous to improve precision; you form clusters to be internally heterogeneous to economize fieldwork — and clustering generally costs precision (the design effect tells you how much) (introduction_to_survey_sampling_quantitative_app).
How to:
- Define the target population, then the survey population after explicit exclusions, before sampling anything (introduction_to_survey_sampling_quantitative_app).
- Audit your sampling frame for coverage: does it list each element once, completely (introduction_to_survey_sampling_quantitative_app; survey_research_methods_fowler).
- Choose a probability design and document selection probabilities; use the design effect to translate complex-design precision back to simple-random terms.
- Plan nonresponse procedures — repeated contacts, incentives, refusal conversion — and gather data on nonrespondents to estimate bias (survey_research_methods_fowler).
- For factor analysis and SEM, check sample adequacy against model complexity before trusting estimates (exploratory_factor_analysis_step_by_step; sem_paths_to_networks_westland).
Watch out for:
- Treating a frame as the population — coverage gaps silently bias estimates (survey_research_methods_fowler).
- Letting nonresponse climb without studying who is missing; the bias is rate times difference, not rate alone (introduction_to_survey_sampling_quantitative_app).
- Running latent-variable models on samples too small for stable estimates and non-convergence (sem_principles_practice_kline; sem_paths_to_networks_westland).
Grounded in: Introduction to Survey Sampling (Quantitative Applications in the Social Sciences); Survey Research Methods - Fowler; The Practice of Social Research; Exploratory Factor Analysis (Understanding Statistics); Sem Paths to Networks Westland; Using Multivariate Statistics Tabachnick Fidell
Method and Model Selection Appropriateness
Practitioner
The method of observation and the statistical model must fit the research question, the outcome type, and the structure of the data. Beyond Multiple Linear Regression states the governing rule: the statistical model must match the structure of the data — non-normal responses need a link function, and correlated (nested or repeated) data violate the independence assumption of ordinary regression, demanding multilevel models. The people-analytics handbook turns this into a decision: choose the regression family by the outcome variable type and the data hierarchy. In factor analysis, the choices are extraction method, rotation, and the common factor model over PCA. In qualitative work, Saldaña frames coding-method selection as a deliberate, design-level choice tied to your question, paradigm, and data forms. Yin's logic is the meta-rule: pick the method (case study, survey, experiment) by the form of the question — 'how/why' questions favor case studies and experiments, 'what/who/how many' favor surveys.
Why it matters. The wrong model produces confidently wrong numbers. Using ordinary regression on nested data yields incorrect standard errors and misleading coefficients — the 'unit of analysis' problem — so you can declare a finding significant that isn't, or miss one that is (Raudenbush & Bryk; Goldstein). Beyond MLR shows that ignoring a binary or count outcome's distribution invalidates the inference outright.
The myth: Linear regression is the default; reach for something else only if it breaks.
The reality: There is no neutral default. Beyond MLR teaches that the response distribution and the dependence structure dictate the model from the start — count, binary, and nested data each demand a specific framework, chosen up front, not patched in later.
The myth: More method sophistication is always better.
The reality: The people-analytics handbook and Westland both argue for matching method to question and data, and for parsimony — adding complexity that yields no analytic benefit is a cost, not a virtue. Saldaña: good research is about good thinking, not fancy methods.
How to:
- Classify your outcome (continuous, binary, count, ordinal, time-to-event) and pick the regression family accordingly (regression_modeling_in_people_analytics; beyond_multiple_linear_regression).
- Diagnose data structure: is it nested or repeated-measures? If so, use a multilevel/hierarchical model that partitions variance across levels (hierarchical_linear_models_raudenbush_bryk; multilevel_statistical_models_goldstein).
- Match the method to the question's form: 'how/why' favors case studies or experiments; descriptive 'what/how many' favors surveys (case_study_research_design_methods_yin).
- In factor analysis, justify extraction method, rotation, correlation type, and choose the common factor model over PCA when modeling latent structure (exploratory_factor_analysis_step_by_step).
- In qualitative analysis, select coding methods deliberately to fit paradigm, question, and data form rather than defaulting to one (the_coding_manual_for_qualitative_researchers).
Watch out for:
- Running single-level models on clustered data and reporting overconfident standard errors (multilevel_statistical_models_goldstein).
- Choosing a method for convenience or convention rather than fit to question and data (sem_paths_to_networks_westland).
- Interpreting coefficients without respecting the link function or the between- vs. within-group distinction in multilevel models (beyond_multiple_linear_regression).
Grounded in: Beyond Multiple Linear Regression Applied Generalized Linear Models And Multilevel Models in R; Handbook of Regression Modeling in People Analytics; Hierarchical Linear Models Raudenbush Bryk; Multilevel statistical models; Exploratory Factor Analysis (Understanding Statistics); Case Study Research Design Methods Yin; The Coding Manual for Qualitative Researchers; Sem Paths to Networks Westland
Analytic Rigor and Procedure Application
Advanced
Having chosen the right tool, you must apply it with discipline. Across traditions this means: screen the data before the main analysis; estimate correctly; respecify only on theoretical grounds; and detect patterns honestly. Using Multivariate Statistics makes data screening the first commandment — inspect for errors, missing data, outliers, and assumption violations (normality, linearity, collinearity) before trusting any result. Kline extends this to SEM with model identification and theory-guided respecification, and warns that statistically equivalent models exist, so you must justify your preferred one on substance. The people-analytics handbook adds: always validate assumptions before declaring results valid, and prefer inference over prediction in consequential, small-sample human contexts. In qualitative analysis, rigor is the systematic movement from coding to categories to concepts (Corbin & Strauss), and Yin's pattern-matching, explanation-building, and rival-explanation testing are the disciplined analytic strategies for case studies.
Why it matters. Rigor at this stage is what separates a defensible estimate from an artifact. Tabachnick and Fidell show that unhandled outliers and assumption violations silently distort estimates and significance tests. Kline's warning about equivalent models is the deep one: the data alone cannot tell you which of several models that fit equally well is correct — only theory and ruling out rivals can, and skipping that step is how spurious 'confirmed' models get published.
The myth: If the model fits the data well, the model is correct.
The reality: Good fit is necessary, not sufficient. Statistically equivalent models fit identically; Kline requires you to acknowledge them and justify your choice on theory. Westland: all models are wrong, but some are useful — fit is not truth.
The myth: Data screening is a preliminary chore you can rush.
The reality: Tabachnick and Fidell treat screening as the analysis's foundation: undetected errors, missing data, and assumption violations are the most common cause of invalid multivariate results. Screening is where rigor is won or lost.
How to:
- Screen first: check for data-entry errors, examine missing-data patterns, identify outliers and influential cases, and test the assumptions of your chosen technique (using_multivariate_statistics_tabachnick_fidell; sem_principles_practice_kline).
- Confirm model identification before estimation in SEM, and validate model assumptions before interpreting any regression output (sem_principles_practice_kline; regression_modeling_in_people_analytics).
- Respecify only with theoretical justification; never let modification indices alone redesign your model (sem_principles_practice_kline).
- Actively test rival explanations — operationalize and try to disconfirm alternatives, in both statistical and case-study work (case_study_research_design_methods_yin).
- In qualitative analysis, write analytic memos as you code, move systematically from codes to categories to concepts, and use theoretical sampling to chase emerging questions (basics_qualitative_research_grounded_theory_corbin_strauss; the_coding_manual_for_qualitative_researchers).
- Report effect sizes, not just significance — statistical significance is not practical importance (using_multivariate_statistics_tabachnick_fidell).
Watch out for:
- Mistaking method artifacts for substantive findings in factor analysis without theoretical guidance (reliability_and_validity_assessment; exploratory_factor_analysis_step_by_step).
- Capitalizing on chance through data-driven respecification dressed as confirmation (sem_principles_practice_kline).
- In small, consequential people-data settings, over-relying on prediction metrics when inference is what the decision needs (regression_modeling_in_people_analytics).
Grounded in: Using Multivariate Statistics Tabachnick Fidell; Sem Principles Practice Kline; Handbook of Regression Modeling in People Analytics; Exploratory Factor Analysis (Understanding Statistics); Basics Qualitative Research Grounded Theory Corbin Strauss; The Coding Manual for Qualitative Researchers; Case Study Research Design Methods Yin
Validity of Inference and Conclusions
Advanced
This is the convergence point of the whole chain: the degree to which your substantive conclusions correctly represent true population relationships and underlying constructs. Shadish's reframing is pivotal — validity is a property of inferences, not of methods — and his framework distinguishes statistical conclusion validity, internal validity, construct validity, and external validity, each with its own threats to rule out. The causal logic is a process of identifying and rendering implausible the alternative explanations for an observed relationship. In the measurement traditions, valid inference means defensible score interpretation; in regression and SEM, it means parameter estimates and standard errors that reflect true relationships and their uncertainty (Beyond MLR's inferential validity). The qualitative traditions reach inference through credibility — member checking, triangulation, theoretical saturation — rather than statistical generalization.
Why it matters. Every prior link exists to make this one trustworthy, and a flaw anywhere caps it. The most consequential mistake is the causal one: Research Methods in Psychology shows that without experimental control — manipulation, random assignment, ruling out confounds — a correlation cannot license a causal claim. Confounding gives an alternative explanation for your result, and if you have not designed it out or argued it away, your conclusion does not follow from your evidence however significant the p-value.
The myth: A statistically significant result establishes that the effect is real and caused by my variable.
The reality: Significance speaks only to statistical conclusion validity — and even that assumes the model is right. Causal inference additionally requires internal validity: ruling out selection, history, maturation, and confounds (experimental_quasiexperimental_designs_shadish; research_methods_in_psychology).
The myth: Validity is something the method guarantees if I follow the steps.
The reality: Shadish's core claim is that validity belongs to the inference, not the method. The same method yields valid inferences in one context and invalid ones in another; you must argue, every time, why the plausible alternatives are ruled out.
How to:
- Name the inference you want to make (descriptive, associational, or causal) and hold it to the matching standard of evidence (experimental_quasiexperimental_designs_shadish).
- For causal claims, establish covariation, time order, and the absence of confounds — through manipulation and random assignment where possible, or structural design elements where not (research_methods_in_psychology; experimental_quasiexperimental_designs_shadish).
- List plausible rival explanations explicitly and show how your design or analysis renders each implausible (case_study_research_design_methods_yin; experimental_quasiexperimental_designs_shadish).
- Check that estimates and standard errors are trustworthy given assumptions and data structure before drawing conclusions (beyond_multiple_linear_regression; using_multivariate_statistics_tabachnick_fidell).
- In qualitative work, build credibility through triangulating multiple evidence sources, informant review of drafts, and a traceable chain of evidence (case_study_research_design_methods_yin; basics_qualitative_research_grounded_theory_corbin_strauss).
Watch out for:
- Sliding from association to causation without an internal-validity argument (research_methods_in_psychology).
- Reactivity and demand characteristics quietly producing the result you expected (research_methods_in_psychology).
- Generalizing beyond what your sampling logic supports — a recurring error addressed directly in the tensions below (experimental_quasiexperimental_designs_shadish; case_study_research_design_methods_yin).
Grounded in: Experimental Quasiexperimental Designs Shadish; Research Methods In Psychology; Sem Principles Practice Kline; Beyond Multiple Linear Regression Applied Generalized Linear Models And Multilevel Models in R; Using Multivariate Statistics Tabachnick Fidell; Case Study Research Design Methods Yin; Basics Qualitative Research Grounded Theory Corbin Strauss
Overall Research Quality and Trustworthiness
Advanced
The composite outcome the whole chain serves: the credibility, accuracy, generalizability, defensibility, and usefulness of your conclusions. Fowler's Total Survey Design captures the structural truth — quality is a function of all components, and a weakness in one can invalidate strengths in the others, so trustworthiness is the minimum across links, not the average. The Foundations of Social Research locates soundness in the process itself, laid out for an observer to scrutinize: transparent, accountable, defensible. Yin operationalizes this through the chain of evidence that lets an external reader trace any conclusion back to the original question. And the corpus insists the point is not internal elegance but downstream value — findings that inform policy, practice, decisions, and instrument use (practical usefulness and stakeholder impact).
Why it matters. You can do nine things well and have one broken link sink the study — a biased sample, an invalid measure, an unjustified causal leap. Treating quality as a chain rather than a checklist changes how you allocate effort: you protect the weakest link, because that is what determines whether anyone should believe or act on your conclusions. Westland's reminder that all models are wrong but some are useful is the working standard — perfection isn't available, defensible usefulness is.
The myth: A study with advanced statistics is a high-quality study.
The reality: Sophisticated analysis cannot rescue a broken upstream link. Fowler's principle is that a weakness in one component can invalidate the whole; a beautiful model on a biased sample or an invalid measure produces a confident wrong answer.
The myth: Quality is whether the conclusions are correct.
The reality: You usually can't observe correctness directly. The Foundations of Social Research grounds soundness in the transparency and accountability of the process — quality is whether an outside observer can trace and challenge your reasoning from question to conclusion.
How to:
- Audit your study link by link — grounding, conceptualization, reliability, validity, sampling, method, analysis, inference — and invest where the weakest link is (survey_research_methods_fowler).
- Build a traceable chain of evidence so an external reader can follow any conclusion back to the question and data (case_study_research_design_methods_yin).
- Report your methods fully and specifically enough that others can evaluate and replicate them (the_foundations_of_social_research; survey_research_methods_fowler).
- Assess generalizability honestly by the logic your design supports — statistical or analytic — and state its limits (experimental_quasiexperimental_designs_shadish; case_study_research_design_methods_yin).
- Tie conclusions to downstream usefulness: what decision, policy, or practice they can responsibly inform (regression_modeling_in_people_analytics; personnel_selection_adding_value_cook).
Watch out for:
- Averaging quality across components instead of finding the binding constraint (survey_research_methods_fowler).
- Overclaiming generalization beyond the sampling or replication logic actually used (experimental_quasiexperimental_designs_shadish).
- Reporting that hides choices, making the process impossible to scrutinize or replicate (the_foundations_of_social_research).
Grounded in: Survey Research Methods - Fowler; The Foundations of Social Research; Case Study Research Design Methods Yin; Experimental Quasiexperimental Designs Shadish; The Practice of Social Research; Sem Paths to Networks Westland; Basics Qualitative Research Grounded Theory Corbin Strauss
Live tensions in the field
Where the corpus genuinely disagrees — these are choices to make for your situation, not settled answers.
What counts as 'rigor' for causal claims — experimental control versus qualitative credibility.
Experimental/quasi-experimental: internal validity via manipulation and random assignment is supreme; rigor means ruling out confounds by design (research_methods_in_psychology, experimental_quasiexperimental_designs_shadish). · Qualitative/grounded-theory: reject experimental control entirely; rigor comes from reflexivity, theoretical sampling, triangulation, and member checking (basics_qualitative_research_grounded_theory_corbin_strauss, the_coding_manual_for_qualitative_researchers, case_study_research_design_methods_yin).
This is context-contingent (wide-consensus that both routes are legitimate for different questions, contested on which is 'better'). Let the question's form decide, as Yin advises: if you can manipulate a variable and your question is whether X causes Y in a way that should hold across cases, the experimental route gives the strongest causal warrant. If your question is how and why a process unfolds in a real-world setting you cannot control, the qualitative route is not a weaker version of an experiment — it is the appropriate instrument, and its credibility checks are its analog to ruling out confounds. The error is applying one camp's standards to judge the other's work.
How findings generalize — statistical generalization versus analytic generalization.
Survey/experimental: generalize from a probability sample to a population via sampling theory (introduction_to_survey_sampling_quantitative_app, survey_research_methods_fowler). · Case-study/grounded-theory: generalize to theory through analytic generalization and replication logic, not to populations (case_study_research_design_methods_yin, basics_qualitative_research_grounded_theory_corbin_strauss).
Context-contingent and incompatible in logic, so name which one you are using and don't borrow the other's authority. If you need a population estimate (what fraction of employees are disengaged), you need probability sampling — a case study cannot give it. If you need to know whether a theoretical mechanism holds and under what conditions, analytic generalization across deliberately selected cases is the right logic, and multiple cases work by replication, not by sampling. Shadish's generalization-focused sampling sits between them: deliberate, non-random selection of instances to support reasoned generalization. State your mechanism explicitly so a reader knows what is being claimed.
What validity is — a property of measures, a property of inferences, or emancipation.
Psychometric/SEM: validity is accumulated through measurement evidence about an instrument (psychometric_theory_nunnally_bernstein, sem_principles_practice_kline). · Shadish: validity is a property of inferences, partitioned into four types (experimental_quasiexperimental_designs_shadish). · Critical/feminist: the terminal aim of research is emancipation and critique of hegemony, not accuracy (the_foundations_of_social_research).
Largely reconcilable rather than a war: the psychometric and inference views are complementary — you accumulate measurement evidence (psychometric) in order to license a defensible interpretation (inference), and modern psychometrics already treats construct validity as a judgment about score interpretations, which is close to Shadish. Hold both. The critical/feminist reframing is a genuine outlier within this corpus — asserted by one book and resting on a philosophical stance rather than methodological evidence — so weigh it as a statement about research purpose, not a competing technical standard. It is a legitimate reminder that 'accuracy for whom and to what end' is a real question, but it does not replace reliability and validity evidence when your aim is to know what is so.
Does reliability precede validity, or are they parallel co-outcomes of measurement quality?
Sequential: reliability is a necessary prerequisite that enables validity (scale_development, developing_validating_rapid_assessment_instruments, psychometric_theory_nunnally_bernstein). · Parallel: reliability and validity are co-outcomes of measurement quality to be assessed and reported together, not strictly ordered (reliability_and_validity_assessment, the_practice_of_social_research).
Mostly a framing difference, not a real disagreement, and the consensus is wide: a measure cannot be more valid than it is reliable (random error caps both), so in that precise sense reliability precedes validity. But operationally you assess and report both, and a measure can be highly reliable and entirely invalid — so treating reliability as 'done' before you think about validity is the trap. Practical rule: build reliability in at design time, then treat validity as the larger, continuing case into which reliability is one input.
PLS versus covariance-based SEM at small samples.
PLS-PA is defensible and useful for prediction and small samples (a position Westland surveys and critiques). · Covariance-based SEM with concern for estimator bias: PLS-PA with small samples is prone to higher bias (sem_paths_to_networks_westland).
This is contested and somewhat technical, but the evidence in this corpus leans one way: Westland documents that PLS-PA produces biased path coefficients in small samples, so the 'PLS solves small-sample problems' claim is weak where unbiased structural estimates are the goal. Take the position the evidence supports — if your aim is confirmatory hypothesis testing about structural parameters, do not reach for PLS to escape a small sample; fix the sample or lower your claims. If your aim is exploratory prediction and you understand the bias you are accepting, PLS can be a reasonable tool. Match the method to whether you are exploring or confirming, which Westland makes the prior question.
Do organizational/personnel subject-matter constructs belong in the same model as the methodology spine?
Integrated: substantive constructs like capabilities, performance, and selection validity are the point — methodology serves them (personnel_selection_adding_value_cook, personnel_selection_in_organizations_schmitt, assessment_methods_recruitment_selection_edenborough, handbook_organizational_measurement_price). · Separable: the dominant methodology spine (grounding through inference) stands on its own; the organizational subject layer is asserted by few books and largely disconnected from it.
This is a structural observation about the corpus more than a debate to resolve, and the evidence base is thin and one-sided (a handful of books assert the subject-layer constructs with little cross-linking to the methodology spine). Treat the methodology spine as the transferable capability — it applies whether you study schools, voters, or employees. Treat the organizational constructs as a worked domain: personnel selection is where reliability, predictive validity, and construct validity get their highest-stakes, most concrete application (Cook's productivity argument, Schmitt's construct-oriented view, Edenborough's competency models). Learn the spine first; use the organizational books as the case that makes it real, not as a separate theory you must reconcile.