peopleanalyst

research / namesake / reports

Variance decomposition

Namesake·Reports·source: people-analyst/baby-namer/docs/research/reports/variance_decomposition_report.md

5.9 Variance Decomposition: What Fraction of Naming Is Cultural?

Nested OLS with incremental R^2 reporting. Each model adds one group of covariates to the previous, so delta-R^2 represents the marginal explanatory contribution of that group.

StepGroupFeatures AddedCumulative R^2Delta R^2Adj R^2n
Aevent40.01040.0104-0.0098200
Bname_matching40.50930.49890.4887200
Cname_independent70.51810.00880.4788200
Dphonetic30.52220.00410.4747200
Ecycle10.52230.00010.4718200

Interpretation

Event characteristics alone explain R² = 0.0104 of variation in the per-event synthetic-control divergence (the ate_t2 column). The full model with all four groups reaches R² = 0.5223.

A-209 caveat on the name_matching block. The name_matching block's ΔR² of 0.4989 is partly tautological: those features (syllable count, log_pre_rank, pre_spike_trajectory_3yr, phonetic_neighborhood_size) are inputs to the Phase 8a synthetic-control donor matching. A well-fit donor pool mechanically reduces the divergence variance left over for those variables to explain. The honest comparison is event ΔR² vs name_independent + phonetic + cycle ΔR², where the name features are independent of the matching procedure.

Residual variance: 47.8% — attributable to idiosyncratic factors, measurement error, SUTVA-violation noise from phonetic spillover (see A-208), and fundamentally unpredictable cultural dynamics.

Multicollinearity Warning

The following features have VIF > 10:

  • log_budget: VIF = 40.9
  • log_revenue: VIF = 38.1

Standard errors on these coefficients may be unreliable.

Analysis based on 200 events with valid causal ATEs.