peopleanalyst

research / namesake / pipeline

Pipeline status

Namesake·Pipeline·source: people-analyst/baby-namer/docs/research/PIPELINE_STATUS.md

Research Pipeline — Status & Next Steps

Auto-generated: 2026-04-30 10:55 UTC by scripts/python/research/report_pipeline_status.py

This report is rewritten each time npm run research:status is invoked. Every number is sourced from actual parquet metadata or Supabase SELECT COUNT(*). Do not edit this file by hand — edits will be overwritten.

Summary

PhaseStatusDetail
Phase 0: Scaffold✅ CompleteScripts scaffolded
Phase 1: Internal snapshot⚪ Not StartedMissing required files: raw/name_enrichment.parquet, raw/name_stats.parquet, raw/name_rank_history.parquet, raw/name_search_trends.parquet
Phase 2: External data acquisition⚪ Not StartedMissing required files: external/cmu_pronouncing_dict.parquet, external/ssa_national_year.parquet, external/google_ngrams_names.parquet
Phase 3a: Phoneme decomposition⚪ Not StartedMissing required files: derived/name_phonemes.parquet
Phase 3b: Phonetic neighborhood graph⚪ Not StartedMissing required files: derived/phonetic_neighbors.parquet
Phase 4: Panel construction⚪ Not StartedMissing required files: derived/annual_panel.parquet, derived/weekly_panel.parquet
Phase 5: Null model (neutral drift)⚪ Not StartedMissing required files: processed/null_model_thresholds.parquet
Phase 6: Phonetic spillover⚪ Not StartedMissing required files: processed/phonetic_spillover_results.parquet
Phase 7: Timeseries (Hawkes/Bass/Granger)⚪ Not StartedNo outputs yet
Phase 8: Causal analysis⚪ Not StartedNo outputs yet
Phase 9: Heterogeneity decomposition⚪ Not StartedNo outputs yet
Phase 10: Geographic + predictability⚪ Not StartedNo outputs yet
Phase 11: Final report⚪ Not StartedNo outputs yet

Data Files

FilePhaseStatusRowsSizeUpdated
raw/name_enrichment.parquetPhase 1
raw/name_stats.parquetPhase 1
raw/name_rank_history.parquetPhase 1
raw/name_search_trends.parquetPhase 1
raw/name_spike_events.parquetPhase 1
raw/name_cultural_events.parquetPhase 1
external/ssa_national_year.parquetPhase 2
external/ssa_state_year.parquetPhase 2
external/cmu_pronouncing_dict.parquetPhase 2
external/google_ngrams_names.parquetPhase 2
external/gdelt_name_mentions.parquetPhase 2
external/cdc_natality_monthly.parquetPhase 2
external/place_names.parquetPhase 2
external/babynames_tidy.parquetPhase 2
external/omdb_titles.parquetPhase 2
derived/name_phonemes.parquetPhase 3a
derived/phonetic_neighbors.parquetPhase 3b
derived/annual_panel.parquetPhase 4a
derived/weekly_panel.parquetPhase 4b
derived/event_panel.parquetPhase 4c
processed/null_model_thresholds.parquetPhase 5
processed/phonetic_spillover_results.parquetPhase 6
processed/phonetic_clusters.parquetPhase 6

Database Spot Check

TableDB RowsParquet RowsDivergenceStatus
name_enrichment⚠️ NEXT_PUBLIC_SUPABASE_URL and SUPABASE_SE
name_stats⚠️ NEXT_PUBLIC_SUPABASE_URL and SUPABASE_SE
name_search_trends⚠️ NEXT_PUBLIC_SUPABASE_URL and SUPABASE_SE
name_spike_events⚠️ NEXT_PUBLIC_SUPABASE_URL and SUPABASE_SE
name_cultural_events⚠️ NEXT_PUBLIC_SUPABASE_URL and SUPABASE_SE

Blockers

  • Google Trends: raw/name_search_trends.parquet missing — Phase 4b/5 blocked
  • name_spike_events: DB count = unknown — spike detection not run (blocks attribution, Phase 4c, Phase 6+)
  • name_cultural_events: DB count = unknown — attribution not run (blocks D05/D10, Phase 8 causal)
  • Missing required file: raw/name_enrichment.parquet (Phase 1)
  • Missing required file: raw/name_stats.parquet (Phase 1)
  • Missing required file: raw/name_rank_history.parquet (Phase 1)
  • Missing required file: raw/name_search_trends.parquet (Phase 1)
  • Missing required file: raw/name_spike_events.parquet (Phase 1)
  • Missing required file: raw/name_cultural_events.parquet (Phase 1)
  • Missing required file: external/ssa_national_year.parquet (Phase 2)
  • Missing required file: external/ssa_state_year.parquet (Phase 2)
  • Missing required file: external/cmu_pronouncing_dict.parquet (Phase 2)
  • Missing required file: external/google_ngrams_names.parquet (Phase 2)
  • Missing required file: external/gdelt_name_mentions.parquet (Phase 2)
  • Missing required file: external/cdc_natality_monthly.parquet (Phase 2)
  • Missing required file: external/place_names.parquet (Phase 2)
  • Missing required file: external/babynames_tidy.parquet (Phase 2)
  • Missing required file: derived/name_phonemes.parquet (Phase 3a)
  • Missing required file: derived/phonetic_neighbors.parquet (Phase 3b)
  • Missing required file: derived/annual_panel.parquet (Phase 4a)
  • Missing required file: derived/weekly_panel.parquet (Phase 4b)
  • Missing required file: derived/event_panel.parquet (Phase 4c)
  • Missing required file: processed/null_model_thresholds.parquet (Phase 5)

Next Steps

  1. Complete Phase 1: Internal snapshot: Missing required files: raw/name_enrichment.parquet, raw/name_stats.parquet, raw/name_rank_history.parquet, raw/name_search_trends.parquet
  2. Run scripts/attribute_spikes.py to populate cultural events
  3. Resume Google Trends fetch (A-025)

Master spec: PHD_STUDY_SPEC.md Operator manual: ../../scripts/python/research/README.md Auto-refreshed nightly by .github/workflows/research-status-nightly.yml