Predictability ceiling
Where the upper bound of name-spread prediction lives.
5.11 The Predictability Ceiling — A-239 honest respec
Methodology change. The prior framing — predict whether a name enters the SSA top 100 next year, trained on 2004–2014 — reached AUC = 0.999 in the canonical run. That number was almost entirely an artifact of class imbalance (positive base rate ≈ 0.46%) and an AR(1) baseline that already reached 0.997. Per A-239, the task is now framed against a denser positive class and a longer horizon: for names with rank in [201, 5000] in year t, predict whether the name enters the SSA top 200 in any of the next 3 years. Train: 2004–2018; test: 2019–2021 (the 3-year horizon completes by 2024). The previous report's table is preserved in the agent-assignments archive as the auditable record.
Models
| Model | AUC | PR-AUC | Brier | P@25 | P@50 | P@100 | n_test | positives |
|---|---|---|---|---|---|---|---|---|
| Baseline A: rank-threshold rule | 0.987 | 0.324 | 0.307 | 0.480 | 0.500 | 0.380 | 26,717 | 126 |
| Baseline B: AR(1) prior rank | 0.978 | 0.158 | 0.305 | 0.120 | 0.160 | 0.210 | 26,717 | 126 |
| Full: Logistic Regression | 0.990 | 0.324 | 0.042 | 0.320 | 0.400 | 0.410 | 26,717 | 126 |
| Full: LightGBM | 0.991 | 0.595 | 0.004 | 0.960 | 0.760 | 0.660 | 26,717 | 126 |
Acceptance gates (A-239)
Two gates report whether the full feature set adds value above the AR(1) baseline:
| Metric | AR(1) baseline | Full model (best) | Δ | Gate | Pass? |
|---|---|---|---|---|---|
| AUC | 0.978 | 0.991 | +0.013 | ≥ +0.05 | ⚠️ |
| PR-AUC | 0.158 | 0.595 | +0.437 | ≥ +0.10 | ✅ |
Read. AUC is saturated: in a rank-based prediction task the AR(1) baseline picks up most of the signal by construction, so a small AUC delta is expected even when the full model adds real value. PR-AUC is the more informative metric under this much class imbalance (positive rate ~0.5% in the test set), and the PR-AUC gate clears comfortably — the full model is dramatically better than AR(1) at identifying the actual breakthroughs at the top of its predicted-probability ranking.
Top features (logistic / GBT)
Full: Logistic Regression — top-5 |coef| (standardized):
rank: -3.474births_count: +1.722rank_lag1: +0.728births_per_1000: -0.646gender_pct_male: -0.485
Full: LightGBM — top-5 importance:
rank_3yr_trend: 4031phonetic_density: 2920search_3yr_mean: 2405births_count: 2292sex_pct_male: 2279
Calibration (test set)
Decile-binned probability vs observed positive rate. A well-calibrated model produces values close to the diagonal.
| Bin | predicted (Baseline B) | observed (B) | predicted (full LR) | observed (LR) |
|---|---|---|---|---|
| 0 | 0.019 | 0.000 | 0.009 | 0.000 |
| 1 | 0.152 | 0.000 | 0.145 | 0.000 |
| 2 | 0.251 | 0.000 | 0.247 | 0.000 |
| 3 | 0.350 | 0.000 | 0.346 | 0.005 |
| 4 | 0.450 | 0.000 | 0.448 | 0.003 |
| 5 | 0.550 | 0.000 | 0.550 | 0.000 |
| 6 | 0.650 | 0.000 | 0.648 | 0.009 |
| 7 | 0.750 | 0.000 | 0.752 | 0.012 |
| 8 | 0.850 | 0.002 | 0.851 | 0.026 |
| 9 | 0.930 | 0.069 | 0.955 | 0.194 |
Auditable record (old framing)
The previous (top-100, 1y horizon, full SSA-cohort) framing reached AUC = 0.999 with an AR(1) baseline at 0.997 — a +0.002 incremental contribution that pop-press could quote as '99.9% predictable'. That table is preserved in docs/agent-assignments-archive.md under A-239's predecessor; it was not informative as a research claim and is no longer surfaced here. See A-202 for the consumer-copy gate that prevents the old number from leaking back into marketing material.
Train: 135,275 (name, year) rows in [2004, 2018]; test: 26,717 rows in [2019, 2021]. Positive class = entered SSA top 200 within 3 years.