peopleanalyst

research / namesake / bibliography

Literature map

Field positioning across the cultural-diffusion literature.

Namesake·Bibliography·source: people-analyst/baby-namer/docs/research/literature-map.md

Literature Map — Namesake Research Program (Cultural Diffusion)

Status: First merged draft Date: 2026-04-29 Scope: every cited or method-naming work that appears in docs/research/reports/ and the master spec docs/research/PHD_STUDY_SPEC.md. Each section names the construct or method, the Namesake measure, the canonical source, and Namesake's divergence or extension. Companion bibliography.bib carries BibTeX with verified DOIs. Method: Citations harvested by walking every report under docs/research/reports/ and the methods rows of PHD_STUDY_SPEC.md §1, then cross-referenced against shipped artifacts (null_model_summary.json, granger_results.parquet, event_panel.parquet, etc.). Citation style: APA 7 inline; BibTeX keys lastname_year_firstword in bibliography.bib. DOI policy: DOIs and URLs included only when verified to resolve. Public-dataset references appear in their own section.

Column legend:

  • Coordinate: the construct, method, or pipeline component being positioned.
  • Namesake measure: the concrete column, parquet artifact, or scoring step.
  • Primary source (APA 7): the canonical citation.
  • DOI / URL: verified publisher or stable identifier.
  • Relevance: why this source matters for the Namesake claim (≤40 words).
  • Namesake divergence or extension: what we do differently from the cited work.

A. The Lieberson neutral-fashion debate (PHD_STUDY_SPEC §1, Phase 5)

The whole research program is structured around a falsifiable contest: do cultural events drive naming, or does most of the observed turnover fall out of a neutral-copying ("internal fashion") process? Phase 5's null model is the referee.

CoordinateNamesake measurePrimary source (APA)DOI / URLRelevanceNamesake divergence
Internal-fashion / "matter of taste" theoryPhase 5 null model; bucket field on name_cultural_events × null_model_thresholds.parquet (drift-consistent, partially-cultural, culturally-influenced, strongly-cultural)Lieberson, S. (2000). A matter of taste: How names, fashions, and culture change. Yale University Press.Yale UPOrigin of the claim that name churn arises predominantly from endogenous taste cycling rather than discrete cultural events. The ceiling our pipeline tries to break.We attribute 843 specific cultural events with a confidence-thresholded LLM pipeline and benchmark each against the same neutral simulation, so the contrast becomes per-event rather than population-level.
Neutral drift / random copying applied to baby namesInnovation rate $\mu = 0.0505$ and effective population sizes ($N_e^F = 9{,}852$, $N_e^M = 22{,}320$) in null_model_summary.jsonHahn, M. W., & Bentley, R. A. (2003). Drift as a mechanism for cultural change: an example from baby names. Proceedings of the Royal Society B, 270(Suppl. 1), S120–S123.10.1098/rsbl.2003.0045Establishes the Wright–Fisher analogy for SSA-grade naming data and the magnitude of drift expected with realistic $N_e$.We carry the framework forward to 2024, refit per-sex $N_e$ on the modern panel, and report the share of name-years that beat the null at p95 (4.57%) and p99 (2.83%).
Random-copying as a general cultural-change mechanismPhase 5 phonetic-fashion null in fit_phonetic_fashion.py; beats_phonetic_p99 flag (5.27% of name-years at p95)Bentley, R. A., Lipo, C. P., Herzog, H. A., & Hahn, M. W. (2007). Regular rates of popular culture change reflect random copying. Evolution and Human Behavior, 28(3), 151–158.10.1016/j.evolhumbehav.2006.10.002Generalizes the drift result to dog breeds, pottery, and other cultural domains, predicting power-law turnover where random copying dominates.The phonetic-fashion null incorporates Berger-style sound-neighborhood pull, so divergence between drift and phonetic nulls quantifies how much structure sits in sound communities versus generic copying.
Wright–Fisher population geneticsPosterior $N_e$ fits used to parameterize Phase 5 simulationWright, S. (1931). Evolution in Mendelian populations. Genetics, 16(2), 97–159.10.1093/genetics/16.2.97Foundational drift-only model the cultural application borrows from.We use it as a null, not as a description; cultural innovation rate $\mu$ is not biological mutation.
Genetical theory of natural selectionSameFisher, R. A. (1930). The genetical theory of natural selection. Clarendon Press.Internet ArchiveCo-foundational source for the drift framework Phase 5 inherits via Hahn & Bentley.We extend the analogy only as far as turnover statistics; do not import "fitness" as a per-name parameter.
Neutral theory of molecular evolutionConceptual analogue for the "most change is drift" framing in cultural-diffusion.mdKimura, M. (1968). Evolutionary rate at the molecular level. Nature, 217(5129), 624–626.10.1038/217624a0Provides the rhetorical template: most observed change is consistent with the null, with selection visible only in the tails.Our findings reverse Kimura's emphasis: the modal name year is null-consistent, but the tail (4.57% at p95) is large enough that population-level claims of "drift dominates" understate culture's role.

B. Phonetic neighborhoods and sound-driven fashion (PHD_STUDY_SPEC §1, Phases 3 + 6)

The single sentence the cultural-diffusion essay keeps returning to — names do not travel alone; they travel in sound families — is operationalized as a phoneme-edit-distance graph.

CoordinateNamesake measurePrimary source (APA)DOI / URLRelevanceNamesake divergence
Phonetic neighborhood as the unit of cultural diffusion in baby namesphonetic_spillover_results.parquet (166,046 phonetic pairs); phonetic_clusters.parquet (1,221 clusters); Phase 6 Welch t = 7.21 (p ≈ 1.1 × 10⁻¹²) for phonetic vs random pairsBerger, J., Bradlow, E. T., Braunstein, A., & Zhang, Y. (2012). From Karen to Katie: Using baby names to understand cultural evolution. Psychological Science, 23(10), 1067–1073.10.1177/0956797612443371Establishes that phonetic neighbors co-vary in popularity at rates substantially above chance — the empirical foundation for Namesake's spillover analysis.Berger inferred phonemes from spelling heuristics on a smaller window; we use the CMU Pronouncing Dictionary plus a g2p fallback across the full SSA universe (1880–2024) and persist the neighborhood graph as a queryable artifact.
ARPAbet phoneme strings for English given namesresearch_name_phonemes table; phoneme/sonority/onset-rhyme features in data/research/derived/phonetic_features.parquetCarnegie Mellon University. (2014). The CMU Pronouncing Dictionary, version 0.7b [Software].CMUProvides ARPAbet for ~134K English words including a large fraction of attested first names; the substrate for the Phase 3 graph.We backfill out-of-vocabulary names with a g2p model rather than dropping them, which matters because the most "culturally interesting" recent names (Khaleesi, Wrenley) are exactly the ones missing from CMU.

C. Cultural diffusion theory (PHD_STUDY_SPEC §1, Phase 7c)

The Bass model and the structural-virality framework are the two diffusion theories Namesake actually fits, not just cites.

CoordinateNamesake measurePrimary source (APA)DOI / URLRelevanceNamesake divergence
Innovation/imitation decomposition of adoption curvesPhase 7c bass_params.parquet (60,470 fits, R² median 0.992): p, q, classification ∈ {broadcast 26.1%, peer 42.7%, mixed 26.8%, unfit 4.4%}Bass, F. M. (1969). A new product growth for model consumer durables. Management Science, 15(5), 215–227.10.1287/mnsc.15.5.215Provides the (p, q) decomposition that lets us separate broadcast-driven from peer-driven names — the key behavioral prediction of the cultural-events theory.Bass was about durable goods with a fixed market potential m; we fit on adoption-curve segments with rolling segmentation, accept that m is partly identified through the SSA registration ceiling, and report stratified medians by event_type rather than firm-level dynamics.
Diffusion of innovations as a general frameworkBackground motivation for Phase 7c segmentation choices and the "innovators / early adopters / majority" framing in the cultural-diffusion essayRogers, E. M. (2003). Diffusion of innovations (5th ed.). Free Press.Free PressDefines the conceptual vocabulary (innovator, imitator, S-curve) we operationalize via Bass and the spike-attribution panel.Rogers is qualitative; we instantiate his typology as the Bass classification rule (broadcast vs peer) on 60K fitted curves.
Broadcast vs viral cascadesConceptual frame for the side-quest panel on cascade structure (docs/research/reports/side_quests.md) and the Bass-classification stratificationGoel, S., Anderson, A., Hofman, J., & Watts, D. J. (2016). The structural virality of online diffusion. Management Science, 62(1), 180–196.10.1287/mnsc.2015.2158Formalizes the broadcast↔viral spectrum that Bass collapses into a single (p, q) ratio; constrains how to interpret peer-dominant names.We apply the broadcast/viral framing to names rather than to URL cascades; the analogue of "tree depth" is the Phase 6 spillover ratio onto phonetic cousins.

D. Self-exciting processes and contagion dynamics (PHD_STUDY_SPEC §1, Phase 7b)

Hawkes processes give us a continuous-time complement to the annual-grain Granger work.

CoordinateNamesake measurePrimary source (APA)DOI / URLRelevanceNamesake divergence
Self-exciting point processes; branching ratio as a contagion measurePhase 7b hawkes_params.parquet (6,328 fits, 6,033 stable): branching ratio (median 0.241), half-life weeks (median 1.39), background rate $\mu$ (median 0.064), excitation $\alpha$, decay $\beta$Hawkes, A. G. (1971). Spectra of some self-exciting and mutually exciting point processes. Biometrika, 58(1), 83–90.10.1093/biomet/58.1.83The canonical statement of the model; defines branching ratio < 1 as the stability boundary that 95.3% of fits satisfy.We fit Hawkes on weekly Google Trends spike series rather than on births directly; the half-life captures the memory of cultural attention, not of the SSA file. The mismatch is intentional — short-horizon search excitation is exactly what the Granger annual aggregation cannot resolve (limitation §3 of the Phase 7a report).

E. Time-series causal inference — Granger causality and panel VAR (PHD_STUDY_SPEC §1, Phase 7a)

CoordinateNamesake measurePrimary source (APA)DOI / URLRelevanceNamesake divergence
Granger causality between paired time seriesPhase 7a granger_results.parquet (4,185 names, 21.5% significant at p<0.05 raw, 1.0% under BH at α=0.05; median optimal lag 1 year); validation canaries (Elsa, Arya, James, Khaleesi)Granger, C. W. J. (1969). Investigating causal relations by econometric models and cross-spectral methods. Econometrica, 37(3), 424–438.10.2307/1912791Defines the predictive-causality test that anchors the search→births claim.The Khaleesi non-result identifies a class of within-year cultural shocks Granger cannot detect by construction — Phase 7b Hawkes is the explicit complement.
Panel vector autoregression and impulse-response analysisWithin-name demeaned panel VAR(p=3) on 4,206 names; pooled IRF +0.0063 at h=1 with 95% CI [+0.0045, +0.0081]; subsample IRFs reveal aspirational (+) vs cautionary (−) splitSims, C. A. (1980). Macroeconomics and reality. Econometrica, 48(1), 1–48.10.2307/1912017Foundational specification for VAR/IRF analysis applied across the stacked panel; the orthogonalized IRF is what isolates the response of births to a one-SD search shock.We apply Sims' macro tooling to a name-level cultural panel; the most consequential finding (negative IRF for event_news_event at h=2: −0.0226, 95% CI [−0.033, −0.009]) is a within-name fixed-effects result, not a cross-name comparison.

F. Synthetic-control causal inference (PHD_STUDY_SPEC §1, Phase 8)

CoordinateNamesake measurePrimary source (APA)DOI / URLRelevanceNamesake divergence
Synthetic-control method (origination)Donor-pool construction in phase8_causal/synthetic_controls.py; placebo distribution on 1,000 non-spiking namesAbadie, A., & Gardeazabal, J. (2003). The economic costs of conflict: A case study of the Basque Country. American Economic Review, 93(1), 113–132.10.1257/000282803321455188Introduces the convex-weights matching that lets us build a counterfactual for a single treated unit (here, a name with a cultural spike).We treat 200 events instead of one and do per-event placebo inference rather than narrative comparison.
Synthetic-control method (formalization, placebo inference)research_event_ate ATE_t1..ATE_t5 with placebo p-value; covariate set in PHD_STUDY_SPEC §4 (is_lead_character, is_title_character, actor_popularity, budget, revenue, genres)Abadie, A., Diamond, A., & Hainmueller, J. (2010). Synthetic control methods for comparative case studies: Estimating the effect of California's tobacco control program. Journal of the American Statistical Association, 105(490), 493–505.10.1198/jasa.2009.ap08746Provides the MSPE-minimizing donor weights and the placebo-test framework for empirical p-values that we use directly.The donor pool is matched on phonetic-neighborhood density (not just covariates), an extension that follows from Phase 6 spillover.

G. Spatial autocorrelation (PHD_STUDY_SPEC §1, Phase 10a)

CoordinateNamesake measurePrimary source (APA)DOI / URLRelevanceNamesake divergence
Moran's I for spatial autocorrelationDecade-by-decade Moran's I in moran_report.md (1960s: 0.515 → 2020s: 0.268); pre-streaming mean 0.429 vs post-streaming 0.291Moran, P. A. P. (1950). Notes on continuous stochastic phenomena. Biometrika, 37(1/2), 17–23.10.1093/biomet/37.1-2.17Defines the I statistic that quantifies whether name popularity is geographically clustered — the empirical lever for the streaming-era homogenization claim.We compute Moran's I on the SSA state file (ssa_state_year.parquet) rather than on contiguous neighborhoods; spatial weights are state-adjacency, and the trend (decreasing I) is read as evidence of streaming-era national-exposure flattening.

H. Predictability ceiling (PHD_STUDY_SPEC §1, Phase 10b)

CoordinateNamesake measurePrimary source (APA)DOI / URLRelevanceNamesake divergence
Predictability of cultural-success outcomes; MusicLab paradigmPhase 10b train-on-2004–2014, test-on-2015–2024 model panel (predictability_ceiling.md): full-feature logistic AUC 0.999, PR-AUC 0.881, P@100 = 0.98Salganik, M. J., Dodds, P. S., & Watts, D. J. (2006). Experimental study of inequality and unpredictability in an artificial cultural market. Science, 311(5762), 854–856.10.1126/science.1121066Establishes the framing of "ceiling" prediction in cultural domains: even with full information about the past, future hits remain partially unpredictable.Our high AUC is partly an artifact of structural features (prior popularity, phonetic density) constraining the possibility space — a divergence from Salganik's experimental setup, where path dependence on early plays is the main predictability bottleneck. We flag this in the report's interpretation rather than claiming we have broken Salganik's ceiling.

I. Saturation, exposure, and reactance — the Blockbuster Paradox (PHD_STUDY_SPEC §1, Phase 8b)

CoordinateNamesake measurePrimary source (APA)DOI / URLRelevanceNamesake divergence
Hill equation as a saturation form for exposure → responsePhase 8b Hill fits in blockbuster_paradox_report.md (revenue: $E_{\max}$, EC50, h; reactance term γ)Hill, A. V. (1910). The possible effects of the aggregation of the molecules of haemoglobin on its dissociation curves. Journal of Physiology, 40, iv–vii.WileyOriginal Hill curve; the analytical scaffold for the saturation hypothesis in name adoption.We add a reactance term to the standard Hill curve to test for non-monotonicity at high exposure — a structural deviation from Hill's one-way saturation.
Modern review and parameterization of the Hill equationSameGoutelle, S., Maurin, M., Rougier, F., Barbaut, X., Bourguignon, L., Ducher, M., & Maire, P. (2008). The Hill equation: a review of its capabilities in pharmacological modelling. Fundamental & Clinical Pharmacology, 22(6), 633–648.10.1111/j.1472-8206.2008.00633.xPractical guide we follow for parameter identification, especially the Hill coefficient h.We accept that h < 1 is concavity, not paradox; the paradox claim is only triggered by a significant negative reactance γ — which the current sample does not support.
Adstock / advertising memoryConceptual motivation for the Hill+exposure framing (PHD_STUDY_SPEC §1)Broadbent, S. (1979). One way TV advertisements work. Journal of the Market Research Society, 21(3), 139–166.Marketing-mix-modeling source for the decay-of-exposure assumption that exposure has a half-life and a saturation shoulder.We use the adstock framing only metaphorically; the actual decay structure on names comes from Phase 7b Hawkes fits, not an adstock fit.
Psychological reactance as the mechanism behind the Blockbuster ParadoxReactance term γ in the Hill+reactance specification (blockbuster_paradox_report.md)Brehm, J. W. (1966). A theory of psychological reactance. Academic Press.Provides the theoretical reason a parent might avoid a name precisely because it has become ambient — the missing piece in a pure-saturation account.We test reactance empirically against null and find it not significant in the current sample (γ p > 0.05 across exposure measures); the Blockbuster Paradox remains a hypothesis with negative evidence rather than a confirmed finding.

J. Survival analysis (PHD_STUDY_SPEC §1, Phase 9 / future work)

CoordinateNamesake measurePrimary source (APA)DOI / URLRelevanceNamesake divergence
Cox proportional-hazards modelPlanned analysis of name "lifespan" after a cultural event (PHD_STUDY_SPEC §1, supporting role)Cox, D. R. (1972). Regression models and life-tables. Journal of the Royal Statistical Society. Series B (Methodological), 34(2), 187–220.10.1111/j.2517-6161.1972.tb00899.xStandard hazard-model machinery for time-to-decay analyses; the "half-life" intuition in the cultural-diffusion essay would be formalized here.Currently scoped but not yet fit; flagged in the preregistration deviations log because the Phase 7b Hawkes half-life partially substitutes for the Cox lifespan estimand.

K. Multiple-testing control

CoordinateNamesake measurePrimary source (APA)DOI / URLRelevanceNamesake divergence
Benjamini–Hochberg false-discovery-rate proceduregranger_results.parquet columns pvalue_bh_adjusted, granger_significant_05_bh (42 names / 1.0% survive at α=0.05 vs 901 / 21.5% raw)Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society. Series B (Methodological), 57(1), 289–300.10.1111/j.2517-6161.1995.tb02031.xProvides the standard FDR control we apply across the ~5K per-name Granger family.We persist both raw and BH-adjusted columns in the canonical artifact and route smart-list product features through the BH-adjusted gate, which is unusual; most published analyses report only one.

L. Public data sources referenced as research instruments

These are not theoretical works but they are the empirical substrate; the methodology section of every report eventually grounds out here.

SourceNamesake useCitation
SSA national + state name filesexternal/ssa_national_year.parquet (2,149,477 rows), external/ssa_state_year.parquet (6,600,640 rows).U.S. Social Security Administration. (2025). Beyond the top 1000 names [Data set]. ssa.gov
Google Trendsexternal/google_trends_* parquets; weekly fetcher in cloud/modal_app.py::fetch_search_trends_weekly.Google. (2026). Google Trends. trends.google.com
GDELT 2.0external/gdelt_name_mentions.parquet.GDELT Project. (2024). The GDELT Project, version 2.0. gdeltproject.org
OMDb APISpike-attribution evidence in scripts/attribute_spikes.py.Fritz, B. (2024). OMDb API: The Open Movie Database. omdbapi.com
TMDb API v3Cast / role metadata in tmdb_titles.parquet, tmdb_cast.parquet.The Movie Database. (2024). TMDb API, version 3. themoviedb.org
Google Books Ngramsexternal/google_ngrams_names.parquet (5.8M token rows) for pre-internet baseline.Google Books Ngram Viewer. (2020). Google Books Ngram Corpus. ngrams datasets v3

M. What Namesake adds — gap analysis

Each row in this section is a place where the literature sets up a question the field has not answered with this depth of data, and where Namesake's pipeline produces the missing number.

Gap in the literatureWhat Namesake contributesWhere in the corpus
Lieberson 2000 vs Hahn–Bentley 2003 leave the question open: how much of observed turnover is drift, how much is cultural? Both sides argue from population-level statistics.A per-name-year null-vs-cultural classification on 46,412 names with 4 buckets (drift-consistent → strongly-cultural). 4.57% of name-years beat neutral drift at p95; 5.27% beat the phonetic null.phase5_null_model_results.md
Berger et al. 2012 establishes phonetic spillover at the population level but does not measure how much cultural mass leaks from the focal name to its acoustic cousins for specific events.Welch t = 7.21 across 166K phonetic pairs; Phase 6 produces the focal share and spillover ratio per attributed event.phase6_phonetic_spillover.md; planned per-event extension in Phase 8
Granger 1969 / Sims 1980: the literature has applied these to macroeconomic series for decades but rarely to attention → behavior conversion at scale.4,185 per-name Granger tests with FDR correction; pooled IRF with subsample decomposition that uncovers an aspirational (+) vs cautionary (−) sign split by event_type.phase7a_granger_causality.md
Salganik 2006 sets a predictability ceiling in an experimental cultural market but the natural-experiment version was not feasible without the Trends + SSA join.Train-on-2004–2014 → test-on-2015–2024 model panel with full-feature logistic at AUC 0.999, deliberately presented as a structural-features upper bound rather than a Salganik-paradigm refutation.predictability_ceiling.md
Bass 1969 / Goel 2016: broadcast-vs-peer typology is widely cited but rarely fit at corpus scale on a single domain.60,470 Bass fits stratified by event_type; sports moments are the most peer-dominant (47.0% peer); animation is the most broadcast-dominant.phase7c_bass_diffusion.md
Marketing-mix modeling treats exposure → outcome as a Hill curve but rarely tests a reactance term.Hill+reactance specification across three exposure measures (revenue, spike magnitude, vote count); reactance non-significant in the current sample, which is itself a publishable null.blockbuster_paradox_report.md
Most diffusion studies treat events as exogenous binary shocks.Synthetic-controls design with per-event placebo p-values, planned for the 200 best-attributed events in Phase 8a, with covariate set documented in PHD_STUDY_SPEC §4.Phase 8 (in flight at time of writing)

Companion files: bibliography.bib (BibTeX), PHD_STUDY_SPEC.md (master design), PIPELINE_STATUS.md (artifact freshness), and the per-phase reports under reports/.