research / namesake / pipeline
Pipeline status
Namesake·Pipeline·source: people-analyst/baby-namer/docs/research/PIPELINE_STATUS.md
Research Pipeline — Status & Next Steps
Auto-generated: 2026-04-30 10:55 UTC by scripts/python/research/report_pipeline_status.py
This report is rewritten each time
npm run research:statusis invoked. Every number is sourced from actual parquet metadata or SupabaseSELECT COUNT(*). Do not edit this file by hand — edits will be overwritten.
Summary
| Phase | Status | Detail |
|---|---|---|
| Phase 0: Scaffold | ✅ Complete | Scripts scaffolded |
| Phase 1: Internal snapshot | ⚪ Not Started | Missing required files: raw/name_enrichment.parquet, raw/name_stats.parquet, raw/name_rank_history.parquet, raw/name_search_trends.parquet |
| Phase 2: External data acquisition | ⚪ Not Started | Missing required files: external/cmu_pronouncing_dict.parquet, external/ssa_national_year.parquet, external/google_ngrams_names.parquet |
| Phase 3a: Phoneme decomposition | ⚪ Not Started | Missing required files: derived/name_phonemes.parquet |
| Phase 3b: Phonetic neighborhood graph | ⚪ Not Started | Missing required files: derived/phonetic_neighbors.parquet |
| Phase 4: Panel construction | ⚪ Not Started | Missing required files: derived/annual_panel.parquet, derived/weekly_panel.parquet |
| Phase 5: Null model (neutral drift) | ⚪ Not Started | Missing required files: processed/null_model_thresholds.parquet |
| Phase 6: Phonetic spillover | ⚪ Not Started | Missing required files: processed/phonetic_spillover_results.parquet |
| Phase 7: Timeseries (Hawkes/Bass/Granger) | ⚪ Not Started | No outputs yet |
| Phase 8: Causal analysis | ⚪ Not Started | No outputs yet |
| Phase 9: Heterogeneity decomposition | ⚪ Not Started | No outputs yet |
| Phase 10: Geographic + predictability | ⚪ Not Started | No outputs yet |
| Phase 11: Final report | ⚪ Not Started | No outputs yet |
Data Files
| File | Phase | Status | Rows | Size | Updated |
|---|---|---|---|---|---|
raw/name_enrichment.parquet | Phase 1 | ❌ | — | — | — |
raw/name_stats.parquet | Phase 1 | ❌ | — | — | — |
raw/name_rank_history.parquet | Phase 1 | ❌ | — | — | — |
raw/name_search_trends.parquet | Phase 1 | ❌ | — | — | — |
raw/name_spike_events.parquet | Phase 1 | ❌ | — | — | — |
raw/name_cultural_events.parquet | Phase 1 | ❌ | — | — | — |
external/ssa_national_year.parquet | Phase 2 | ❌ | — | — | — |
external/ssa_state_year.parquet | Phase 2 | ❌ | — | — | — |
external/cmu_pronouncing_dict.parquet | Phase 2 | ❌ | — | — | — |
external/google_ngrams_names.parquet | Phase 2 | ❌ | — | — | — |
external/gdelt_name_mentions.parquet | Phase 2 | ❌ | — | — | — |
external/cdc_natality_monthly.parquet | Phase 2 | ❌ | — | — | — |
external/place_names.parquet | Phase 2 | ❌ | — | — | — |
external/babynames_tidy.parquet | Phase 2 | ❌ | — | — | — |
external/omdb_titles.parquet | Phase 2 | ⚪ | — | — | — |
derived/name_phonemes.parquet | Phase 3a | ❌ | — | — | — |
derived/phonetic_neighbors.parquet | Phase 3b | ❌ | — | — | — |
derived/annual_panel.parquet | Phase 4a | ❌ | — | — | — |
derived/weekly_panel.parquet | Phase 4b | ❌ | — | — | — |
derived/event_panel.parquet | Phase 4c | ❌ | — | — | — |
processed/null_model_thresholds.parquet | Phase 5 | ❌ | — | — | — |
processed/phonetic_spillover_results.parquet | Phase 6 | ⚪ | — | — | — |
processed/phonetic_clusters.parquet | Phase 6 | ⚪ | — | — | — |
Database Spot Check
| Table | DB Rows | Parquet Rows | Divergence | Status |
|---|---|---|---|---|
name_enrichment | — | — | — | ⚠️ NEXT_PUBLIC_SUPABASE_URL and SUPABASE_SE |
name_stats | — | — | — | ⚠️ NEXT_PUBLIC_SUPABASE_URL and SUPABASE_SE |
name_search_trends | — | — | — | ⚠️ NEXT_PUBLIC_SUPABASE_URL and SUPABASE_SE |
name_spike_events | — | — | — | ⚠️ NEXT_PUBLIC_SUPABASE_URL and SUPABASE_SE |
name_cultural_events | — | — | — | ⚠️ NEXT_PUBLIC_SUPABASE_URL and SUPABASE_SE |
Blockers
- Google Trends: raw/name_search_trends.parquet missing — Phase 4b/5 blocked
- name_spike_events: DB count = unknown — spike detection not run (blocks attribution, Phase 4c, Phase 6+)
- name_cultural_events: DB count = unknown — attribution not run (blocks D05/D10, Phase 8 causal)
- Missing required file: raw/name_enrichment.parquet (Phase 1)
- Missing required file: raw/name_stats.parquet (Phase 1)
- Missing required file: raw/name_rank_history.parquet (Phase 1)
- Missing required file: raw/name_search_trends.parquet (Phase 1)
- Missing required file: raw/name_spike_events.parquet (Phase 1)
- Missing required file: raw/name_cultural_events.parquet (Phase 1)
- Missing required file: external/ssa_national_year.parquet (Phase 2)
- Missing required file: external/ssa_state_year.parquet (Phase 2)
- Missing required file: external/cmu_pronouncing_dict.parquet (Phase 2)
- Missing required file: external/google_ngrams_names.parquet (Phase 2)
- Missing required file: external/gdelt_name_mentions.parquet (Phase 2)
- Missing required file: external/cdc_natality_monthly.parquet (Phase 2)
- Missing required file: external/place_names.parquet (Phase 2)
- Missing required file: external/babynames_tidy.parquet (Phase 2)
- Missing required file: derived/name_phonemes.parquet (Phase 3a)
- Missing required file: derived/phonetic_neighbors.parquet (Phase 3b)
- Missing required file: derived/annual_panel.parquet (Phase 4a)
- Missing required file: derived/weekly_panel.parquet (Phase 4b)
- Missing required file: derived/event_panel.parquet (Phase 4c)
- Missing required file: processed/null_model_thresholds.parquet (Phase 5)
Next Steps
- Complete Phase 1: Internal snapshot: Missing required files: raw/name_enrichment.parquet, raw/name_stats.parquet, raw/name_rank_history.parquet, raw/name_search_trends.parquet
- Run scripts/attribute_spikes.py to populate cultural events
- Resume Google Trends fetch (A-025)
Master spec: PHD_STUDY_SPEC.md
Operator manual: ../../scripts/python/research/README.md
Auto-refreshed nightly by .github/workflows/research-status-nightly.yml