peopleanalyst

research / namesake / reports

Predictability ceiling

Where the upper bound of name-spread prediction lives.

Namesake·Reports·source: people-analyst/baby-namer/docs/research/reports/predictability_ceiling.md

5.11 The Predictability Ceiling — A-239 honest respec

Methodology change. The prior framing — predict whether a name enters the SSA top 100 next year, trained on 2004–2014 — reached AUC = 0.999 in the canonical run. That number was almost entirely an artifact of class imbalance (positive base rate ≈ 0.46%) and an AR(1) baseline that already reached 0.997. Per A-239, the task is now framed against a denser positive class and a longer horizon: for names with rank in [201, 5000] in year t, predict whether the name enters the SSA top 200 in any of the next 3 years. Train: 2004–2018; test: 2019–2021 (the 3-year horizon completes by 2024). The previous report's table is preserved in the agent-assignments archive as the auditable record.

Models

ModelAUCPR-AUCBrierP@25P@50P@100n_testpositives
Baseline A: rank-threshold rule0.9870.3240.3070.4800.5000.38026,717126
Baseline B: AR(1) prior rank0.9780.1580.3050.1200.1600.21026,717126
Full: Logistic Regression0.9900.3240.0420.3200.4000.41026,717126
Full: LightGBM0.9910.5950.0040.9600.7600.66026,717126

Acceptance gates (A-239)

Two gates report whether the full feature set adds value above the AR(1) baseline:

MetricAR(1) baselineFull model (best)ΔGatePass?
AUC0.9780.991+0.013≥ +0.05⚠️
PR-AUC0.1580.595+0.437≥ +0.10

Read. AUC is saturated: in a rank-based prediction task the AR(1) baseline picks up most of the signal by construction, so a small AUC delta is expected even when the full model adds real value. PR-AUC is the more informative metric under this much class imbalance (positive rate ~0.5% in the test set), and the PR-AUC gate clears comfortably — the full model is dramatically better than AR(1) at identifying the actual breakthroughs at the top of its predicted-probability ranking.

Top features (logistic / GBT)

Full: Logistic Regression — top-5 |coef| (standardized):

  • rank: -3.474
  • births_count: +1.722
  • rank_lag1: +0.728
  • births_per_1000: -0.646
  • gender_pct_male: -0.485

Full: LightGBM — top-5 importance:

  • rank_3yr_trend: 4031
  • phonetic_density: 2920
  • search_3yr_mean: 2405
  • births_count: 2292
  • sex_pct_male: 2279

Calibration (test set)

Decile-binned probability vs observed positive rate. A well-calibrated model produces values close to the diagonal.

Binpredicted (Baseline B)observed (B)predicted (full LR)observed (LR)
00.0190.0000.0090.000
10.1520.0000.1450.000
20.2510.0000.2470.000
30.3500.0000.3460.005
40.4500.0000.4480.003
50.5500.0000.5500.000
60.6500.0000.6480.009
70.7500.0000.7520.012
80.8500.0020.8510.026
90.9300.0690.9550.194

Auditable record (old framing)

The previous (top-100, 1y horizon, full SSA-cohort) framing reached AUC = 0.999 with an AR(1) baseline at 0.997 — a +0.002 incremental contribution that pop-press could quote as '99.9% predictable'. That table is preserved in docs/agent-assignments-archive.md under A-239's predecessor; it was not informative as a research claim and is no longer surfaced here. See A-202 for the consumer-copy gate that prevents the old number from leaking back into marketing material.

Train: 135,275 (name, year) rows in [2004, 2018]; test: 26,717 rows in [2019, 2021]. Positive class = entered SSA top 200 within 3 years.