peopleanalyst

← Portfolio

MetaFactory

↻ brief 5d ago

Two-shell production-factory substrate for the portfolio — an engine (OLD, AI-controlled, no human UI) that ingests books and research at chapter-respecting fidelity and runs a roster of named factories (Persona Factory, Survey Factory, Competency Factory, Models Factory, Requirements Factory, Prompt Factory, Publishing Factory, Business Ideas Factory, Application Designs Factory) producing canonical outputs; and an API host (PROD, Vercel) that exposes a v1.2.0 REST + MCP contract plus a cross-portfolio library layer (Stream 7, ~944 records) the rest of the portfolio reads from.

Microstory
Customer
Portfolio properties — PA-site, vela, principia, Performix, DevPlane, Fourth & Two, anycomp, segmentation-studio — that need the same library, the same canonical IDs, and the same passages queryable the same way.
Problem · external
Each consumer re-implements retrieval, identity, and integrity against the substrate, with definitions drifting per product.
Problem · internal
Every cross-portfolio question feels like a week of plumbing before the actual question gets answered.
Problem · philosophical
Cross-portfolio infrastructure that ultimately speaks to humans about humans should not have seven slightly-different definitions of competency, persona, or passage — it should be one substrate, legible to every consumer the same way.
Guide
`meta-factory-prod` — the thin Vercel API host that exposes the OLD meta-factory engine over a frozen v1.x REST + MCP contract plus a build-time-bundled cross-portfolio library snapshot every product reads from.
Plan
Pin to a v1.x contract version → read snapshot, library, and passages via REST or MCP per `docs/CONSUMERS.md` → migrate explicitly on major bumps; the engine and write-side stay in OLD, the host stays thin.
Success
One library, one canonical-output spec, one passage search, one identity authority — and a contract changelog the consumer can read in a sitting.
Failure avoided
Per-product taxonomy drift, runtime coupling to engine internals, and the substrate sliding back toward the half-renovated state the OLD/PROD split was designed to fix.
The problem

Cross-cutting infrastructure — book ingestion at chapter-respecting fidelity, schema-conformant extraction of behavioral constructs, job / competency / persona generation, survey factories, the canonical-vocabulary substrate every consumer needs to compare measurements across products — gets re-implemented per consumer when there's no production-factory substrate. The cost isn't only engineering: each consumer ends up with its own slightly-drifted definition of competency, persona, engagement, effect-size, and behavioral constructs that should be comparable across products become incomparable. The earlier 'Universal Information Factory' framing tried to do too much; the resulting analytics-vs-production drift left the repo half-renovated for a stretch. The fix was a deliberate split: production-factory engine in one shell, consumer-facing API in another, with a clean seam between.

What I built

Two-repo architecture. **OLD (`meta-factory`)** — AI-controlled engine, local-only on Mike's Mac, no human UI. Owns ingestion pipelines (collector → organizer → referee → classifier 14-stage book flow; research_agent + deep_research_agent for articles), the named factory roster (Persona Factory, Survey Factory, Competency Factory, Models Factory, Requirements Factory, Prompt Factory, Publishing Factory, Business Ideas Factory, Application Designs Factory, plus checkpoint-charlie and orchestrator infrastructure), the asset registry (5,013 entries as of 2026-05-11), and canonical_outputs production. Cryptographic provenance with SHA-256 tracking on every source file; safe-delete invariants require hash verification before any local delete. Dual-grade corpus ingestion migrated in from Vela — same database holds editorially-selected curator passages and bulk research chunks, distinguished by tag. **PROD (`meta-factory-prod`)** — pure API host on Vercel. Owns the v1.2.0 REST + MCP contract (`docs/API-CONTRACT.md` + `CONTRACT-CHANGELOG.md`), the cross-portfolio library layer (Stream 7, library snapshot of ~944 records consumed by every portfolio product), consumer-onboarding doc (`CONSUMERS.md`), auth boundary (`META_FACTORY_API_SECRET` shared-secret), and cloud content access (bundled snapshot + Supabase Storage). Same engine; different shells.

What's novel
  • 01Split engine + API host architecture — OLD does the work, PROD makes it accessible. The seam is a snapshot + cloud-storage refresh, run manually on a manual cadence. The two repos can evolve independently without coupling consumer integration to engine internals.
  • 02Named factories as units of production — Persona Factory, Survey Factory, Competency Factory, Models Factory, Requirements Factory, Prompt Factory, Publishing Factory, Business Ideas Factory, Application Designs Factory. Each is a package that outputs a structured behavioral artifact; analytics offerings explicitly excluded. The roster grows; the substrate doesn't.
  • 03Cross-portfolio library layer (Stream 7) — a single read surface for ~944 corpus records that every product consumes, with a shared lifecycle and spec at `peopleanalyst-site/docs/library/SPEC.md`. The portfolio has one library, not seven slightly-different ones.
  • 04Dual-grade corpus ingestion — same database holds editorially-selected curator passages and bulk research chunks, distinguished by tag. Vela's pipeline lifted into the substrate.
  • 05Cryptographic provenance contract — SHA-256 tracked for every source file; safe-delete invariants require hash verification on backup and durable storage before any local delete. The substrate cannot lose source material to a careless deletion.
  • 06~$0.13 per research-run synthesis at 30K+ passage scale — most 'AI research' tools run 10–100× more expensive because they retrieve without pattern extraction. The discipline is statistical: extract patterns once, cite the evidence, don't re-retrieve.
  • 07Schema-extracted measurement vocabulary (@measurement/core) shared across consumers — constructs, items, instruments, effect-sizes defined once in canonical form, so behavioral measurement compares cleanly across Performix / Principia / PA Platform / the toolbox. Cross-product comparison becomes structurally possible rather than aspirational.
  • 08v1.2.0 REST + MCP contract — PROD exposes the same engine over HTTP for engineers and MCP for AI agents. Consumer integration is contract-versioned with a changelog; consumers pin to the version they were built against and migrate explicitly on major bumps.
Recent ships
  1. 2026-05-18**DP-161 + DP-163 (Phase 1):** MetaFactory Console v2 lift — 10 admin routes under app/admin/substrate/* + login, 57 shadcn primitives, console widgets (substrate-browser, record-detail, ingestion-jobs, integrity-dashboard, pathb-planner, drm-queue, cross-property-memberships), gold/Geist-Mono operator-console visual discipline (P233). SHA d5914a3.
  2. 2026-05-18**DP-161 Phase 2 prep:** supabase/migrations/002_metafactory_console_phase_2.sql (10 tables) + 6 admin HTTP routes + scripts/mcp-meta-factory.ts v1.5.0 with 6 mirrored admin write tools; gated on DATABASE_URL so nothing 500s pre-activation. SHA 2e3917b.
  3. 2026-05-18**MF-150:** chat-capture pipeline (Chrome extension → /api/capture/chat-turnchat_turns_raw/books). SHA e6492ed.
  4. 2026-05-18**DP-162:** portfolio adapter corrected for structured consumes[] edges. SHA db65cca.
  5. 2026-05-18**MF-012:** receiver-archive notification handoff (HRIS fold landed in toolbox; segmentation-studio receiver repo safe to archive).
  6. 2026-05-14**MF-200:** research-discovery engine (problem-anchored scans). SHA 85b9fae.
  7. 2026-05-13**Stream 8 / state-ui:** six accessibility wins — /state/fixes, /state/queues, filters, evidence pane, weekly cron. SHA fdca4bf.
  8. 2026-05-13**MF-100 → MF-106:** Content-State Service v1 — /state page + REST API, cloud-mirrored canonical_outputs + restore, quality validators (caught 350+ registry mismaps), reconciliation reports + registry consolidation (846 proposals), re-extraction workflow with budget gates.
  9. 2026-05-11**MF-050:** chapter-level passages search baseline — 32,768 chunks indexed across 513 books, 99 ms/query, MCP search_passages tool v1.3.0. SHA 96d214a.
  10. 2026-05-09**Phase 1C cloud-press:** REST + MCP host live at meta-factory-prod.vercel.app; library snapshot + Supabase Storage content; documented in docs/handoff/2026-05-09-phase-1c-shipped.md.
In progress
  • ·DP-161 Phase 2 activation — pending Neon provisioning (Vercel Marketplace) + psql "$DATABASE_URL" -f supabase/migrations/002_metafactory_console_phase_2.sql; smoke-test 6 admin HTTP routes + 6 MCP write tools after.
  • ·MetaFactory Console v2 Phase 3 — wrap the OLD engine MCP for ingestion orchestration through the same console (queued behind Phase 2).
  • ·PA-022 Path B curate batch — title-match plan output landed (4 actionable, 23 NOT_FOUND, 17 MISSING_TEXT); awaiting Mike's budget-reset decision on the 68-book aggregate-fixable rerun (~$68 actual vs $25 authorized).
  • ·PAT-47 substrate-first disposition — producer half dropped per 2026-05-18 stash disposition; substrate stays canonical, consumer half tracked in toolbox.
  • ·MF-031 UI/UX leverage survey — pills production-ready (0.5d lift), player 2-3d, kanban 1-3d; rollout plan staged in docs/ui/MF-031-leverage-survey.md.
  • ·PA-SPEC §5 alignment ask — off-session ask to PA-site; unblocks every Phase 2 canonical_id join.
Packageable components
ComponentStageReuse
Cross-portfolio library snapshot
lib/library/data/library.snapshot.json
productionConsumed by PA-site, vela, principia, Performix, DevPlane, Fourth & Two via REST + MCP (Stream 7).
Asset registry snapshot
lib/v1/data/asset-registry.snapshot.json
productionBundled read surface for the engine's 5,013-entry registry.
MCP server scaffold
scripts/mcp-meta-factory.ts
production (v1.5.0)Reference implementation for portfolio MCP servers — 13 read tools + 6 admin write tools + job-family-agent suite.
Operator-console visual discipline (P233)
app/admin/substrate/*, components/ui/*, components/console-layout.tsx, app/globals.css
early-buildGold accent + green-only-status + Geist Mono for code-shaped strings; inherit across new admin routes per AGENTS.md.
Admin write-surface scaffold
app/api/v1/admin/{jobs,records/[id]/{tags,overlay,memberships,lifecycle},remediation}/route.ts + lib/db/client.ts
early-builddbNotProvisionedError() gating pattern — durable-write routes ship dark and activate on env.
Library importer
scripts/import-library-from-pa-site.ts
productionSnapshot + cloud-storage refresh seam between OLD engine outputs and PROD host.
Architecture

`meta-factory-prod` is a thin API host on Vercel over a build-time-bundled library snapshot, with the engine and write-side living in OLD `people-analyst/meta-factory` (MF-DEC-1 settled). The contract is frozen at v1.x and additive-only (`docs/API-CONTRACT.md` + `CONTRACT-CHANGELOG.md`), so consumers pin a version and migrate explicitly on majors. Phase 1C shipped 2026-05-09; Phase 2 — durable operational DB on Neon plus the MetaFactory Console write surface — is staged behind a `DATABASE_URL` gate so nothing 500s pre-activation, and Phase 3 wraps the OLD engine MCP for ingestion orchestration through the same console. The seam between OLD and PROD is a manual snapshot + Supabase-Storage refresh; the two repos evolve independently.

Outcome

Private. OLD/PROD split shipped + cleanly separated in the portfolio registry (DP-149) and the briefing composite (DP-146 ripped the canonical.inspect fallback); v1.2.0 API contract live on Vercel at `meta-factory-prod.vercel.app`. Asset registry at 5,013 entries; cross-portfolio library snapshot at ~944 records (Stream 7, 2026-05-11). All major portfolio consumers integrated (PA-site, vela, principia, Fourth & Two, Performix, DevPlane). **MetaFactory Console v2 Phase 1** lift shipped 2026-05 (DP-161 + DP-163; Phase 2 prep DP-161-PHASE2-PREP added SQL schema + admin write surface + MCP v1.5.0). **MF-150 chat-capture pipeline** shipped (Chrome extension → `/books`) — the first new ingestion-side capability since 2026-05-11, alongside MF-200 (filed) research-discovery engine for problem-anchored scans. Provides capabilities to portfolio consumers: `library-api`, `chat-capture`, `research-discovery` (filed). Operator commands (`registry:build`, `registry:verify`, `snapshot:registry`, `upload:content`) are the engine-health surface for Mike or an AI agent. The half-renovated state is behind us; the substrate is legible in thirty seconds.

MetaFactory exists because cross-portfolio infrastructure can't be re-built per consumer and can't sit half-renovated forever. Two structural decisions defined the long-term shape. The first was the narrowing decision — production-factory artifacts (competencies, personas, instruments, job profiles) over open-ended analytics offerings; the cuts were executed and the roster sharpened. The second was the OLD/PROD split — the engine that does the work is too heavy to ship on Vercel and too dangerous to expose to consumers as an internal surface, and the API host that consumers actually integrate against is too thin to carry the ingestion pipelines. Two repos, one system, one seam. Underneath the systems argument is a measurement argument: every artifact a factory ships becomes an input to a human decision somewhere — a competency model becomes a development plan, a persona shapes a product call, a survey instrument runs at a client. The substrate is narrowed not just for engineering legibility, but because the artifacts go to people who need them defensibly grounded.

Surface
MetaFactory workflow — sources → 14-stage ingestion → 5,013-entry asset registry → nine named factories → canonical outputs → v1.2.0 REST + MCP contract → six portfolio consumers. (Animated illustration: MetaFactory has no UI; the diagram stands in for a screenshot.)

MetaFactory workflow — sources → 14-stage ingestion → 5,013-entry asset registry → nine named factories → canonical outputs → v1.2.0 REST + MCP contract → six portfolio consumers. (Animated illustration: MetaFactory has no UI; the diagram stands in for a screenshot.)