Predictability ceiling

Where the upper bound of name-spread prediction lives.

Namesake·Reports·source: people-analyst/baby-namer/docs/research/reports/predictability_ceiling.md

5.11 The Predictability Ceiling — A-239 honest respec

Methodology change. The prior framing — predict whether a name enters the SSA top 100 next year, trained on 2004–2014 — reached AUC = 0.999 in the canonical run. That number was almost entirely an artifact of class imbalance (positive base rate ≈ 0.46%) and an AR(1) baseline that already reached 0.997. Per A-239, the task is now framed against a denser positive class and a longer horizon: for names with rank in [201, 5000] in year t, predict whether the name enters the SSA top 200 in any of the next 3 years. Train: 2004–2018; test: 2019–2021 (the 3-year horizon completes by 2024). The previous report's table is preserved in the agent-assignments archive as the auditable record.

Models

Model	AUC	PR-AUC	Brier	P@25	P@50	P@100	n_test	positives
Baseline A: rank-threshold rule	0.987	0.324	0.307	0.480	0.500	0.380	26,717	126
Baseline B: AR(1) prior rank	0.978	0.158	0.305	0.120	0.160	0.210	26,717	126
Full: Logistic Regression	0.990	0.324	0.042	0.320	0.400	0.410	26,717	126
Full: LightGBM	0.991	0.595	0.004	0.960	0.760	0.660	26,717	126

Acceptance gates (A-239)

Two gates report whether the full feature set adds value above the AR(1) baseline:

Metric	AR(1) baseline	Full model (best)	Δ	Gate	Pass?
AUC	0.978	0.991	+0.013	≥ +0.05	⚠️
PR-AUC	0.158	0.595	+0.437	≥ +0.10	✅

Read. AUC is saturated: in a rank-based prediction task the AR(1) baseline picks up most of the signal by construction, so a small AUC delta is expected even when the full model adds real value. PR-AUC is the more informative metric under this much class imbalance (positive rate ~0.5% in the test set), and the PR-AUC gate clears comfortably — the full model is dramatically better than AR(1) at identifying the actual breakthroughs at the top of its predicted-probability ranking.

Top features (logistic / GBT)

Full: Logistic Regression — top-5 |coef| (standardized):

rank: -3.474
births_count: +1.722
rank_lag1: +0.728
births_per_1000: -0.646
gender_pct_male: -0.485

Full: LightGBM — top-5 importance:

rank_3yr_trend: 4031
phonetic_density: 2920
search_3yr_mean: 2405
births_count: 2292
sex_pct_male: 2279

Calibration (test set)

Decile-binned probability vs observed positive rate. A well-calibrated model produces values close to the diagonal.

Bin	predicted (Baseline B)	observed (B)	predicted (full LR)	observed (LR)
0	0.019	0.000	0.009	0.000
1	0.152	0.000	0.145	0.000
2	0.251	0.000	0.247	0.000
3	0.350	0.000	0.346	0.005
4	0.450	0.000	0.448	0.003
5	0.550	0.000	0.550	0.000
6	0.650	0.000	0.648	0.009
7	0.750	0.000	0.752	0.012
8	0.850	0.002	0.851	0.026
9	0.930	0.069	0.955	0.194

Auditable record (old framing)

The previous (top-100, 1y horizon, full SSA-cohort) framing reached AUC = 0.999 with an AR(1) baseline at 0.997 — a +0.002 incremental contribution that pop-press could quote as '99.9% predictable'. That table is preserved in docs/agent-assignments-archive.md under A-239's predecessor; it was not informative as a research claim and is no longer surfaced here. See A-202 for the consumer-copy gate that prevents the old number from leaking back into marketing material.

Train: 135,275 (name, year) rows in [2004, 2018]; test: 26,717 rows in [2019, 2021]. Positive class = entered SSA top 200 within 3 years.