research / principia / audience tiers
Engineering critique
Engineering reviewer's lens — schema discipline, the ETL pipeline, the verification-log infrastructure, the hub-and-spoke @measurement/core story. How the registry is built and where it can fail.
By Mike West
Principia: an engineering critique
An engineering-reviewer's lens — schema discipline, ETL, verification, hub-and-spoke distribution, queryable-registry architecture, public reader UI. Architectural choices with their trade-offs named, and the failure modes the design buys itself.
This document is written for the reader who would assess Principia as software rather than as a substantive contribution to organizational measurement — software engineer, data engineer, ML practitioner. The substantive claims are reviewed in peer-review-framing.md. The question here is whether the infrastructure under those claims is built sensibly, where it has trade-offs, and what the failure modes are.
The honest framing: Principia is a small system with an oversized hub-and-spoke discipline. Schemas are shared across four downstream consumers; deploy cadence is daily; the data store is the JSON-files-bundled-into-the-Vercel-runtime version most teams would consider temporary. Several of those choices are temporary. The discipline holds anyway, because the alternative — local schema drift across consumers — produces failure modes that compound faster than deploy-cadence ones.
1. Schema discipline against @measurement/core
The load-bearing decision is that schemas are consumed, never re-implemented. The canonical vocabulary lives in @people-analyst/measurement-core (currently published to GitHub Packages at v0.9.0; renamed from @measurement/core during PRN-014c when the deploy story hardened). Principia, meta-factory, and the people-analytics-toolbox/hub all import the same versioned package. New types do not land locally in any consumer; they land in the shared package via a coordinated PR with both meta-factory and Principia owners signing off.
The trade-off: coordination overhead at every schema change vs. vocabulary stability across consumers.
The cost is real. When CanonicalTheoreticalModel shipped (PRN-037), the bump to v0.9.0 happened in meta-factory, the publish workflow ran, Principia's package.json and .npmrc got updated, the Vercel NODE_AUTH_TOKEN had to be confirmed live, and the eight seed theories landed in a separate principia commit. Four steps across two repos to add one schema. The temptation to short-circuit it — inline a TheoreticalModel in Principia and "promote it later" — is constant and incorrect; every "promote it later" type in the past has produced field drift the hub-and-spoke discipline was built to prevent.
The discipline pays back in linkage cases. CanonicalSurveyItem.canonical_item_id is the same identifier the toolbox's Reincarnation engine uses for its RID (SPEC §5 Delta 5). When the toolbox writes a DeploymentEvidence row back to the registry (PRN-039), the linkage works on the first try because both sides resolve against the same type. If either side had inlined its own version, the round-trip would fail at the field-name boundary in production rather than at typecheck.
The failure mode: schema drift between consumers. Mitigation: publish-and-pin (each consumer pins a specific version; .npmrc resolves @people-analyst:registry=https://npm.pkg.github.com; NODE_AUTH_TOKEN flows through CI at build time). Residual: optional-field additions that bump the version without coordinated PR — consumer typecheck still passes but semantic drift enters. AGENTS.md rule requires coordinated review across both owners.
2. ETL pipeline — meta-factory extracts, Principia curates
The split is structural. Meta-factory runs extraction pipelines producing canonical JSON outputs from PDFs, book chapters, deep-research dispatches. Principia runs curation: dedup against existing canonical entities, source-grading per the A-D rubric, novelty-verification, promotion to load-bearing rows.
Components in the loop today:
LiteratureMonitor(SPEC §5 Delta 8) — persistent search queries against external sources (Scholar via SerpAPI; CrossRef; OpenAlex; deep-research; Scite when MF-14 ships). Each monitor carries a watchlist and a cadence; the scheduler walks them, dispatches ingest, writes match rows. Five engagement-family watchlists seeded: new meta-analyses, UWES validation, longitudinal predictors, Q12 deployments, key authors.EnrichmentJobqueue (SPEC §5 Delta 9) — background jobs that fire when evidence arrives. Types:synthesize_canonical_prior(re-pool on newEffectSize),refresh_citation_metadata,validate_non_doi_citation,extract_psychometrics_from_pdf,import_scite_citation_context. Retries tested; idempotency via deterministicjob_id.AcquisitionRequestqueue — for citations Principia can't auto-fetch (paywalled, gray literature, monograph chapters). Status flows queued → in_progress → fulfilled → abandoned. Surfaces as "you could pull this" cards in the curator's reading flow.DeepResearchDispatcher(PRN-016..023) — for literature gaps watchlists won't fill. Dispatcher writesDeepResearchRequestJSON to the meta-factory side and ingests the result back. Round-trip closed with MF-PRINCIPIA-15.
The trade-off: asynchronous and idempotent throughout vs. operational complexity at small scale.
At the current size (79 canonical variables across five Tier-1 families; 147 citations; 0 live EffectSize rows), the loop is more machinery than the workload requires. The reason: Phase C+ workload is fundamentally different — continuous ingest from multiple sources on cadences that range from daily to per-call. The same pipeline has to handle one citation from a manual curator promotion and a hundred from a CrossRef poll without surface-level differences. Building the queue infrastructure now is cheaper than retrofitting after the corpus is dense.
Audit-trail discipline drove the asynchronous design. Every ingest path writes an audit row; every store mutation traces to a request id; curator-policy D3 (reject = extraction_status: rejected with rejection_reason, never hard-delete) is store-level so no automation path can shortcut it.
The failure mode: queue stalls. A handler that throws on malformed input can stop the queue. Mitigation: retry counters with exponential backoff, transition to failed after a limit, admin-UI surface for resolution. Residual: silent-failure (job marks complete with wrong output); novelty verification catches most, but the verification pass is itself queue-managed and can cascade.
3. Verification-log infrastructure
Nothing automated writes to load-bearing entities without leaving an audit row. The curator-policy decisions (locked 2026-05-20 at docs/specification/loop/curator-policy.md):
- D1 — daily digests gitignored; weekly rollups committable.
- D2 —
extraction_status: verifieddoes not auto-promote tosurvey_candidate: trueuntil the corresponding construct-family survey is live on PA-site. Per-monitorauto_promote_verifiedopens later, only for trusted monitors (seed:monitor.engagement.new_meta_analysesonly — notkey_authorsuntil tuned). - D3 — reject =
extraction_status: rejected+rejection_reason(off_topic|duplicate|not_peer_reviewed|hallucination|other) + curator provenance. Never hard-delete. Rejected rows stay in the store with audit trail. - D4 — unattended automation paths write proposals only. No automation may set
verifiedwithout CrossRef + multi-model agreement; no automation may write anEffectSize. Promotion goes throughpromote-effect-size(PRN-030).
The trade-off: strict audit discipline vs. throughput at scale.
At Phase C+, the policy imposes real throughput limits. If every load-bearing EffectSize requires manual curator promotion, registry growth is bounded by curator hours. The Phase D plan (PRN-035 agent executor reading bounded ResearchTask JSON, writing proposals, humans approving) is the long-term mitigation. Intermediate: auto_promote_verified per-monitor flag, but only after the engagement proof chain ships end-to-end.
The verification log is a published artifact. Each row carries a verification_status; pending and complete states are visible via /api/v1/audit/recent (gated to read:internal scope; the methodology of what gets verified, when, by which model, is public).
The failure mode: verification debt — rows ingested faster than the verifier processes accumulate in pending_verification. Consumer-facing endpoints filter on verified status, so the failure is hidden from readers; the curator sees it in the digest. Phase C target: backlog age < 48 hours.
4. Hub-and-spoke distribution
@people-analyst/measurement-core is consumed by:
- meta-factory — the extraction-and-emission layer; produces
ConstructCard,InstrumentCard,Citation, etc. JSON in the shape Principia ingests. - principia — the curation layer; consumes meta-factory's emissions, resolves them against the canonical spine (
CanonicalVariable,CanonicalSurveyItem), grades them, and exposes them via REST + MCP. - people-analytics-toolbox — the runtime layer; consumes Principia's REST surface for measurement-graded calls in production features (Calculus's engagement-metric coverage, AnyComp's tenure-modeling improvements, Reincarnation's adaptive-measurement work).
- eventually, People Analytics Platform apps — Calculus, AnyComp, Reincarnation as standalone consumer apps once they shed their toolbox-internal status and ship as platform services.
Distribution is GitHub Packages (https://npm.pkg.github.com). Each consumer's .npmrc resolves the @people-analyst scope; NODE_AUTH_TOKEN is set at consumer build time. The publish workflow is workflow_dispatch-triggered from meta-factory after a coordinated PR lands.
The trade-off: versioned distribution overhead vs. independent install hygiene.
Earlier, @measurement/core was consumed via file:../../../meta-factory/packages/measurement-core workspace deps. That worked locally; it broke on Vercel build runners because Vercel only clones the principia repo. The migration to GH Packages (PRN-014c, 2026-05-19) was forced by the deploy reality. The benefits — semver pinning, npm outdated works, clean exports map forcing public-API discipline (the CanonicalVariable relocation onto the schema spine happened as part of this) — were available the whole time; the cost was visible enough to justify them once a deploy started failing. Submodules were considered and rejected: 30 minutes cheaper now, accumulating debt every session because no version pinning means breaking changes propagate silently.
The failure mode: consumer-side NODE_AUTH_TOKEN misconfiguration. A new CI environment fails the install with an opaque 401. Mitigation: .npmrc checked into every consumer; deploy-readiness checklist in SPEC names the env var. Residual: token rotation across every consumer's Vercel project — documented, but bites whenever expiry catches the team unaware.
5. Queryable-registry architecture
The data layer today is JSON files. The pre-built JsonRegistryStore at apps/web/data/store/ is committed to the repo (79 canonical variables / 170 constructs / 48 instruments / 147 citations across five Tier-1 families); Next.js's outputFileTracingIncludes packs it into the function bundle. Re-seed via npm run seed:store from meta-factory canonical outputs.
The trade-off: zero-infrastructure for v1 reads vs. data updates require redeploy.
The PRN-014e decision got the public API live in a day. Every read goes through the Next.js function with no database call. Cost: every data update is a deploy. PRN-033 (Postgres via Neon) is filed to lift the store out of the bundle. The RegistryStore interface (get / put / iterate / index) was deliberately built abstract enough that the backend can swap to SQLite for Phase 2 and Postgres for Phase 3 without touching ingest or MCP.
The HTTP surface is dual:
- REST
/api/v1/*— versioned public API. Endpoints span constructs, instruments, citations, effects, evidence, models, theoretical-models, priors, recommend (POST), deployment-evidence (POST). Shape:{ items, next_cursor }for lists,{ entity }for single. Bearer-token auth viaPRINCIPIA_KEY_*env vars; per-consumer sliding-window rate limit; coarse scope (read:public|read:internal|read:all|write:deployment_evidence). - MCP
/api/mcp— Model Context Protocol surface for AI consumers. Tools mirror REST:principia.{constructs,instruments,items,citations,effects,evidence,models,theoretical_models,priors,recommend}.{list,lookup,search}. Gateway pattern lifted from toolbox per SPEC §11.
Shared handlers (packages/registry/src/mcp/handlers/*.ts) back both transports. No duplicate logic; REST route handlers under apps/web/app/api/v1/ invoke the same functions the MCP server registers.
The failure mode: rate-limit per-instance counters. The sliding window is in-memory per Vercel function instance; with N live instances the effective ceiling is N × the configured limit. For internal consumers at current scale this is fine; PRN-014d-followup is filed to swap to a global counter (Upstash Redis or Postgres token bucket) when traffic outgrows per-instance counting.
6. Public reader UI
The public reader UI at peopleprincipia.com/registry/* (PRN-038a–d) is Next.js App Router, SSR-rendered, zero auth (Option D locked: registry fully open; monetization is on sister apps, not the registry).
SEO-driven design constraints:
- Server-rendered HTML for crawlability. Every entity page emits SSR markup with construct definition, instrument names, citation list, and (for prior pages) inline-SVG plots. View-source contains substantive content; crawlers do not execute JS to index.
- Sitemap and OpenGraph.
app/sitemap.tsenumerates every entity;app/robots.tsallows/registry/*, disallows/api/*. Per-entity OpenGraph viagenerateMetadata. Full JSON-LD structured data (ScholarlyArticlefor citations,DefinedTermfor constructs,Datasetfor the registry,Bookfor the book) is in PRN-038d. - Zero-JS distribution plots. The
CanonicalPriorviewer renders inline-SVG plots without client JS. Pure-JS PDF math (Lanczos logGamma + A&S erf + Beasley-Springer-Moro normal-quantile) runs server-side. The Stan/PyMC/brms/R/NumPy code-copy widget uses CSS-only radio-driven tabs. - No third-party trackers. Vercel Analytics only. Option D commits the registry as public good; no tracking-pixel baggage.
The trade-off: zero auth + zero JS + maximum SEO discipline vs. progressive enhancement. SSR-SVG plots cost design surface — static, not interactive — and bought crawlability + load time. Lighthouse target: ≥ 95 on Performance + Accessibility + SEO + Best Practices.
The failure mode: routes rendering against missing data. A user clicks /registry/priors/construct.work_engagement/predicts/construct.task_performance and the prior hasn't been synthesized yet. Mitigation: honest empty state ("no synthesized prior yet; contributing studies are listed below"), not 404 or a fabricated default. Residual: a curator who promotes an EffectSize without re-running synthesis can leave the prior stale; last_updated surfaces the staleness, but a consumer who doesn't read it can be misled.
7. Cross-cutting failure modes
Pulling it together, the failure modes the design buys itself:
- Schema drift between consumers. Mitigated by
@people-analyst/measurement-coreversioning + AGENTS.md coordinated-PR rule. Residual: optional-field additions that don't break typecheck but introduce semantic drift. - Source-grade rubric inconsistency. Single-curator reduces inter-rater drift but introduces single-rater bias. Post-v1 external-contribution model produces inter-rater data v1 lacks.
- Primary-source PDF extraction errors. Handled by novelty verification (PRN-032) + multi-vendor consensus (PRN-023 multi-model verifier across model families). Residual: false confirms where extractor and verifier agree on a wrong row.
- Versioning and snapshot edge cases.
frozen_atsemantics +book-build.tsdeterministic snapshot id (sha256 over JSON store contents). Two builds against the same input produce byte-identical artifacts. Residual: a curator who edits the underlying store between snapshot builds produces an id that no longer matches the version pinned by a downstream consumer; audit log captures the change but does not prevent it. - Queue stalls and verification debt. Per-job retry counters with exponential backoff + admin-UI surfaces. Residual: silent-failure cases where a job marks complete with wrong output.
8. What an engineering reviewer should hold us to
Three things, in order of how cheaply they can be checked:
- The schema is consumed, not re-implemented. Grep across the consumer repos (meta-factory, principia, toolbox) for any inline definition of types that live in
@people-analyst/measurement-core. There should be none, except in fixture or test scaffolds where the inlining is local and explicit. - Every load-bearing row carries provenance. Pick a
CanonicalPriorrow from the public API; trace itscontributing_effect_size_ids[]to study-level rows; trace each study row to aCitationwith a DOI; trace the DOI to the source paper. The chain should be unbroken. - Verification status is visible to consumers, not hidden. Pick a row that has not yet passed verification; confirm that the consumer-facing API filters it out, and that the
verification_statusfield is exposed to internal-scope callers who want to see the pending queue.
If those three hold, the discipline the design is built around is working. If any of them slip, the failure is structural and visible; the audit log is where the slip is found.
The rest is plumbing. The plumbing is documented in docs/specification/SPEC.md, the curator policy at docs/specification/loop/curator-policy.md, the research-ingestion vision at docs/specification/loop/research-ingestion-vision.md, and the assignment queue at docs/AGENT-ASSIGNMENTS.md. The code is consumable from the public REST surface at peopleprincipia.com/api/v1/* and the MCP surface at peopleprincipia.com/api/mcp. The grades are public. The disagreements are public. The failure modes are named.
That is what engineering review can hold us to.