peopleanalyst

library / lib081f830c238dd10e

Exploratory Factor Analysis (Understanding Statistics)

In a sentence

A practical, formula-light, step-by-step guide to conducting exploratory factor analysis (EFA) in SPSS using evidence-based best practices.

Exploratory factor analysis is over a century old and ubiquitous across the behavioral, medical, and social sciences, yet surveys repeatedly show it is routinely misapplied because researchers receive little formal training and lean on poor software defaults. Marley Watkins answers this gap with a concise, accessible, applied manual that walks the reader through every decision step of an EFA—choosing variables and participants, screening data, judging whether EFA is appropriate, selecting the model, extraction method, number of factors, rotation, interpretation, and reporting—each illustrated with annotated SPSS screenshots, syntax, downloadable datasets, and scholarly citations. With minimal mathematics and a calm, jargon-light tone, the book equips students and seasoned researchers alike to produce defensible, replicable factor-analytic results and to respond confidently to editorial reviews.

The story it tells the reader

The reader An applied researcher or graduate student who wants to conduct a credible, publishable exploratory factor analysis in SPSS.

External problem

They must make many technical EFA decisions in SPSS with little training and unsound software defaults.

Internal problem

They feel uncertain, intimidated by the math, and worried their analysis is wrong or indefensible.

Philosophical problem

Sloppy, default-driven factor analysis distorts science by creating false certainty and non-replicable results, which is just plain wrong.

The plan

  1. Follow the ten-step EFA decision checklist in order.
  2. Screen data and verify EFA is appropriate before analyzing.
  3. Choose the common factor model with a justified extraction method.
  4. Use multiple criteria (parallel analysis, MAP, scree, theory) to decide factor number.
  5. Apply oblique rotation, interpret competing models, and report every decision transparently.

Success

  • The reader produces defensible, replicable EFA results.
  • They can justify every analytic choice to reviewers with citations.
  • They confidently interpret, name, and report factors and understand when to use EFA versus CFA.

At stake

  • The reader accepts unsound defaults and produces distorted, meaningless solutions.
  • Their flawed results mislead theory and instrument development and fail to replicate.
  • They are unable to defend their methods against editorial review.

Model of the world · 13 constructs · 14 relations

A process model in which the quality of an exploratory factor analysis solution depends on a sequence of researcher decisions (design levers) operating on data conditions, mediated by adherence to evidence-based practice and the appropriateness of the correlation structure, producing interpretable, replicable, well-reported factor solutions.

Design levers

  • Factor Retention Accuracy
  • Variable Selection Quality
  • Data Screening Rigor
  • Correlation Type Appropriateness
  • Common Factor Model Choice
  • +2 more

Intermediate states & behaviors

  • Adherence to Evidence-Based Practice

Outcomes

  • Solution Interpretability
  • Reporting Transparency
  • Replicability and Construct Validity

Moderators / context: Correlation Matrix Appropriateness · Sample Adequacy

Consolidated shape of the book’s model — full constructs and relationships below.

Variable Selection Qualitydesign lever

The degree to which the measured variables included in the analysis adequately and validly sample the domain of interest with sufficient reliability and at least three indicators per anticipated factor.

Sample Adequacycontextual condition

The degree to which the participant sample is appropriate in representativeness and sufficiently large given communality, factor overdetermination, data type, and missingness to yield stable factor recovery.

Data Screening Rigordesign lever

The thoroughness with which linearity, distributional normality, outliers, restricted range, and missing data are inspected and appropriately handled prior to factor analysis using both statistics and graphics.

Correlation Matrix Appropriatenesscontextual condition

The extent to which the correlation matrix contains sufficient common variance for factoring, evidenced by coefficients at or above .30, an acceptable determinant, statistically significant Bartlett's test, and adequate KMO sampling adequacy values.

Correlation Type Appropriatenessdesign lever

The degree to which the type of correlation coefficient used (Pearson versus polychoric or other) matches the measurement level and distributional characteristics of the variables, especially for ordinal or nonnormal data.

Common Factor Model Choicedesign lever

The decision to use the common factor model (EFA) rather than principal components analysis when the goal is to represent latent structure by separating common variance from unique and error variance.

Extraction Method Appropriatenessdesign lever

The degree to which the chosen factor extraction method (e.g., maximum likelihood versus least-squares/principal axis) matches the data's distributional assumptions, sample size, and factor strength to recover factors accurately.

Factor Retention Accuracydesign lever

The degree to which the number of factors retained matches the true latent dimensionality, determined using convergent evidence from parallel analysis, minimum average partial, scree, theory, and a model-selection comparison rather than discredited single rules.

Rotation Appropriatenessdesign lever

The suitability of the rotation choice—favoring oblique rotations that allow correlated factors—for improving interpretability and honoring the typical intercorrelation among social-science constructs.

Adherence to Evidence-Based Practicebehavioral pattern

The overall extent to which the researcher follows documented best-practice recommendations across all decision steps rather than accepting unsound software defaults or arbitrary conventions.

Solution Interpretabilityoutcome metric

The degree to which the resulting factor solution exhibits approximate simple structure, salient and theoretically coherent loadings, adequate scale reliability, and small residuals without symptoms of over- or underextraction.

Reporting Transparencyoutcome metric

The completeness and clarity with which all analytic decisions, software, statistics, and results are reported so that an independent reader could review, replicate, and accumulate knowledge from the study.

Replicability and Construct Validityoutcome metric

The ultimate scientific value of the factor solution, reflected in its reproducibility across samples and methods and the meaningfulness of its relationships with external criteria within a construct-validation program.

How they connect

  • variable selection quality influences correlation matrix appropriateness
  • sample adequacy influences correlation matrix appropriateness
  • data screening rigor influences correlation matrix appropriateness
  • correlation type appropriateness moderates correlation matrix appropriateness
  • correlation matrix appropriateness predicts factor number accuracy
  • model choice common factor influences solution interpretability
  • extraction method fit influences solution interpretability
  • factor number accuracy predicts solution interpretability
  • rotation appropriateness influences solution interpretability
  • evidence based adherence influences factor number accuracy
  • evidence based adherence mediates solution interpretability
  • solution interpretability predicts replicability validity
  • reporting transparency influences replicability validity
  • evidence based adherence influences reporting transparency

Possible measures & feedback loops

A candidate team / org survey built from this book’s model — exploratory operationalizations, not validated instruments. Where a construct maps to a validated measure in Principia, we’ll point to that instead.

Variable Selection Quality

reliability coefficients; indicators-per-factor count; communality estimates

self-report suitability: low

Sample Adequacy

N; participant:variable ratio; communality x overdetermination interaction

self-report suitability: low

Data Screening Rigor

skew/kurtosis values; outlier counts; percent missing

self-report suitability: low

Correlation Matrix Appropriateness

KMO value; Bartlett's chi-square/p; determinant; proportion of r ≥ .30

self-report suitability: none

Correlation Type Appropriateness

number of ordered categories; skew/kurtosis; matrix type used

self-report suitability: none

Common Factor Model Choice

model type reported; communality estimation method

self-report suitability: none

Extraction Method Appropriateness

method reported; Heywood-case occurrence; iterations to convergence

self-report suitability: none

Factor Retention Accuracy

criteria agreement count; real vs random eigenvalues; MAP minimum

self-report suitability: none

Rotation Appropriateness

rotation type reported; interfactor correlations; pattern/structure coefficients

self-report suitability: none

Adherence to Evidence-Based Practice

proportion of steps with stated rationale; default-avoidance count

self-report suitability: medium

Solution Interpretability

RMSR; count of residuals > .10; salient-loading pattern; alpha/omega

self-report suitability: none

Reporting Transparency

checklist element coverage; presence of software/version and matrices

self-report suitability: medium

Replicability and Construct Validity

congruence across samples; stability across methods; external correlate magnitudes

self-report suitability: none

Preview the survey →

Frameworks & instruments in this book

  • Parsimony: explain the most common variance with the fewest interpretable factors.
  • Simple structure: each variable should load saliently on as few factors as possible.
  • Evidence-based decision-making over reliance on software defaults.
  • Transparency and replicability in reporting all analytic choices.
  • Models only approximate reality; factors must be validated and not reified.

Several of these are operationalized as tools in the People Analytics Toolbox.

Topics

Related in the library