What is PeopleAnalyst?

PeopleAnalyst is the front door for people-analytics research: 205+ works indexed and profiled, 40+ citation-grade findings extracted, and peer-reviewed behavioral science translated from academic to actionable — the missing manual for the people analytics you always meant to do.

What is people analytics?

People analytics is not a dashboard. It is behavioral science and statistical inference applied to workforce decisions — a discipline with its own methodology, spanning measurement, organizational design, talent, leadership, and analytics craft.

Why does AI in HR need measurement science?

AI is being deployed in high-stakes people decisions — hiring, performance, attrition — without the measurement science to evaluate whether it works or whom it harms. Construct validity, effect sizes, and criterion validity are the vocabulary for asking an AI vendor the right questions.

How is the research made accessible?

The evidence is indexed and searchable: 205+ works, 40+ citation-grade insight cards, and 8 research arcs, so the right finding reaches the right decision at the right time.

What separates good people measurement from assertion?

Good measurement has a method: construct validity, reliability, and effect-size interpretation are not optional — they are what separates evidence from assertion.

parts / pa-toolbox

People Analytics Toolbox — reusable patterns

Nineteen live, co-deployed analytical microservices behind one Next.js app, each with its own Postgres schema, typed Zod contract, and parallel HTTP + MCP transports. They compose in two tiers: low-level measurement and decision building blocks (PA Instruments) snap together into finished products like a Leadership Index and an analytics-plan generator. The patterns underneath document how one repo runs many services without drift: per-spoke contract versioning that flows into a central registry, the same algorithm reachable from browsers and AI agents, anonymity gates and tokenization as composable primitives, and a fire-and-forget audit discipline that never blocks the call.

20 patterns·source: people-analyst/people-analytics-toolbox/docs/REUSABLE_PATTERNS.md

People Analytics Toolbox — Reusable Engineering Patterns

Production-validated patterns from the People Analytics Toolbox codebase, stripped of domain context (HR / comp / psychometrics) and written to be dropped into any new system.

Each entry has the same shape: Problem → The Pattern (TS sketch) → Design decisions → Tradeoffs → Citations. Citations point at the production-validated original inside this repo.

Scope (v1.0, 2026-05-24)

PA Toolbox is one Next.js deploy wrapping 19 live, independently-versioned analytical microservices ("spokes"), each owning a Postgres schema, a typed Zod contract, and parallel HTTP + MCP transports. Those services compose in two tiers: low-level measurement and decision building blocks (the PA Instruments) snap together into finished products such as a Leadership Index and an analytics-plan generator. The patterns below are the cross-cutting shapes that make that consolidation work: the contract registry, the multi-transport dispatcher, the per-tenant scope check, the privacy gate, the metric envelope. Spoke-internal algorithms (IRT scoring, Oaxaca decomposition, Workday OAuth) are domain-specific; they're not in this catalog.

Patterns marked CROSS recur in 2+ portfolio repos and reflect architectural convictions that hardened across products.

Pattern Index

#	Pattern Name	Core Technology	Problem Solved
P01	Per-Spoke Contract Versioning with Central Registry Aggregation	TypeScript re-export + literal	Track API versions across many co-deployed services so version bumps happen at the spoke and propagate to discovery without manual edits
P02	Parallel HTTP + MCP Transports Over a Shared Core	Next.js routes + MCP SDK	Expose the same capability to humans, browsers, and AI agents without duplicating algorithm code. CROSS
P03	Vendored Typed Contract Pattern for Cross-Repo Service Consumption	Per-service `contracts/types.ts` + Zod	Let downstream apps depend on a service's types without importing the service's runtime. CROSS
P04	Scope-Matched Per-Consumer API Keys	`Map<consumer, scope[]>` + glob match	Issue per-consumer credentials that grant exactly the tools each caller needs, with a wildcard escape for operators
P05	Two-Header Auth with Single-Warning Bypass for Dev	NextResponse + env probe	Support multiple auth header conventions during a credential migration, and stay frictionless in local dev
P06	Per-Schema Heartbeat Health Check with Aggregate Roll-Up	Drizzle raw SQL + `Promise.all`	Detect when one of many co-deployed services has lost its data layer without inventing a new probe per service
P07	Anonymity-Gate-as-Primitive (Min-N Cohort Check)	Pure evaluator + discriminated response	Block individually-attributable rollups at the response boundary using one canonical predicate every caller can compose
P08	Deterministic Per-Tenant HMAC Tokenization	Node `createHmac` + tenant-derived key	Replace sensitive values with stable, non-reversible tokens that are consistent within a tenant but un-correlatable across tenants
P09	Per-Route Structured Logger as Higher-Order Handler	Closure wrapper + `console.log(JSON)`	Get one structured JSON log line per request without an APM dependency or per-route boilerplate
P10	Fire-and-Forget Audit Write with Stderr-on-Failure	Unawaited promise + `.catch` to stderr	Persist audit rows for every privileged call without ever blocking the call on a slow or failing audit DB
P11	Stateless Multi-Tenant Request Context Resolution	Header → body → legacy fallback	Pull tenant identity off a request from any of three header/body conventions during a long migration window
P12	Modular Tool Registration with Per-Spoke Self-Registration	Aggregator + one `register<Spoke>Tools(server, ctx)` per module	Onboard a new microservice with one import + one call in the aggregator — no central catalog edits
P13	Cross-Cutting Domain Envelope (Value + Provenance + Enrichment)	Zod schema + nullable enrichment	Pass numeric facts between services with their lineage, confidence, and comparison metadata in one shape
P14	Data-Shape-Driven Method Selection for Confidence Intervals	Pure dispatch on input arity / type	Pick the statistically appropriate CI method (Wilson / t / normal / bootstrap) automatically from the data, not a caller flag
P15	Generic Hypothesis-Walk Diagnostic Chain	Higher-kinded `<Context, Evidence>` types	Run an ordered chain of "what's causing this?" hypotheses and rank the survivors by evidence strength, reusable across diagnostic spokes
P16	Per-Request In-Memory Fixed-Window IP Rate Limiter	`Map<key, {count, windowStart}>`	Throttle a public POST endpoint without adding Redis when the deploy is one regional Next.js process
P17	Stateless MCP Gateway with Per-Request Server Bundle	MCP SDK + JSON-RPC envelope	Run MCP on serverless without session-store infrastructure by accepting that every POST creates a fresh transport
P18	Discriminated-Union Response with Block-vs-OK Status	Zod discriminated union	Surface "we refused to answer this query" as a typed first-class response shape, not an exception
P19	Idempotent Bootstrap Migration from Bundled JSON	Empty-table check + bundled import	Ship a service with its seed corpus inside the build so first boot populates the DB without a separate migration step
P20	Per-Module Lazy Import to Defer Side-Effectful Deps	`await import()` inside the call path	Keep a module's static imports light when one of its dependencies needs an env var that may not exist at test time

Patterns

P01. Per-Spoke Contract Versioning with Central Registry Aggregation

Problem You're running many co-deployed services in one repo, each exposing a typed contract that downstream consumers vendor. Every service evolves on its own cadence and bumps its own semver. A central catalog needs to advertise the current version of every service — but you don't want the central catalog to be the thing that decides when a service bumps, and you don't want a manual edit every time.

The Pattern

// Per-service contract module — the source of truth for this service's version.
// services/widget/contracts/types.ts
export const CONTRACT_VERSION = "1.3.0";
export const CreateWidgetRequestSchema = z.object({ name: z.string() });
// ... rest of the contract

// Central registry — imports CONTRACT_VERSION from each service.
// lib/contracts/registry.ts
import { CONTRACT_VERSION as WIDGET_VERSION } from "@/services/widget/contracts/types";
import { CONTRACT_VERSION as GADGET_VERSION } from "@/services/gadget/contracts/types";

export type ServiceEntry = {
  slug: string;
  contractVersion: string;            // pulled from the spoke at build time
  status: "live" | "reserved" | "coming-soon";
  contractsTypesPath: string;         // for the discovery UI / docs
  endpoints: EndpointEntry[];
};

const REGISTRY: ServiceEntry[] = [
  {
    slug: "widget",
    contractVersion: WIDGET_VERSION,  // moves automatically on bump
    status: "live",
    contractsTypesPath: "src/services/widget/contracts/types.ts",
    endpoints: [/* ... */],
  },
  {
    slug: "gadget",
    contractVersion: GADGET_VERSION,
    status: "live",
    contractsTypesPath: "src/services/gadget/contracts/types.ts",
    endpoints: [/* ... */],
  },
];

export function buildRegistry(): { generatedAt: string; services: ServiceEntry[] } {
  return { generatedAt: new Date().toISOString(), services: REGISTRY };
}

Pair with a per-service CHANGELOG.md discipline: every bump appends a row. Major (X.0.0) bumps additionally require an explicit list of affected consumers.

Design decisions

Re-export, don't duplicate. The version literal lives once, in the service's contracts/types.ts. The registry imports it. There is no place where two copies of the version string can disagree.
Status enum on the registry, not the service. The service doesn't know whether it's live / reserved / coming-soon — that's a deployment decision. Keep it in the registry where the deployment view lives.
Endpoints listed centrally even though they live in route files. The registry is the discovery surface (consumers + ops dashboards read it); routes are the runtime. Discovery shouldn't require walking the filesystem.
Pull the path to the contract file into the entry. Consumers vendoring the contract need a stable reference to copy from; the registry is where they look that up.

Tradeoffs

Strengths	Weaknesses
Version bumps require zero registry edits	Adding a new service requires a registry edit (one import + one entry)
The registry is grep-able truth about who exposes what	Endpoints array drifts from route files if not disciplined
Discovery clients (dashboards, agents) can introspect everything from one file	The registry becomes large in repos with many services
Per-service CHANGELOG discipline is local to the service, not a shared changelog	Cross-service version changes (a breaking primitive bump) still need a manual cross-service note

Citations

src/lib/contracts/registry.ts — ~20-service registry; each entry imports CONTRACT_VERSION from its spoke
src/spokes/*/contracts/types.ts — the per-spoke source-of-truth files
src/spokes/*/CHANGELOG.md — per-spoke version log; one row per bump
docs/PATTERNS/contract-versioning.md — internal pattern doc that codifies the bump procedure

P02. Parallel HTTP + MCP Transports Over a Shared Core

Problem You've built a capability and you want three kinds of caller to use it: a browser app via HTTP+JSON, an AI agent via MCP (Model Context Protocol), and a future script via direct import. If you let each transport own its own algorithm code, the three eventually disagree on edge cases, error messages, or units. You also don't want every service to invent its own answer to "how does my POST handler talk to the MCP tool handler?"

The Pattern

// core/run-redaction.ts — the algorithm lives here, transport-agnostic.
// Pure function over a parsed, validated request. Returns a typed response.
export async function runRedaction(input: RedactionRequest): Promise<RedactionResponse> {
  const rules = await storage.getActiveRules(input.tenantId);
  const result = redactText({ text: input.text, rules, fieldName: input.fieldName });
  const validated = RedactionResponseSchema.parse(result);
  await storage.writeAuditRow({ /* ... */ });
  return validated;
}

// app/api/services/widget/redact/route.ts — HTTP transport.
export const POST = withRouteLogger("widget.redact", async (request) => {
  const denied = requireServiceKey(request);
  if (denied) return denied;
  const parsed = RedactionRequestSchema.parse(await request.json());
  const response = await runRedaction(parsed);
  return NextResponse.json(response);
});

// services/widget/mcp/register.ts — MCP transport, same core.
export function registerWidgetTools(server: McpServer, ctx: RegistrationCtx): void {
  if (!toolVisible("widget.redact", ctx.scopes, ctx.spokeSlug)) return;
  server.registerTool(
    "widget.redact",
    {
      inputSchema: RedactionRequestSchema.shape,
      outputSchema: RedactionResponseSchema.shape,
    },
    wrapTool(ctx, "widget.redact", async (args, handlerCtx) => {
      const parsed = RedactionRequestSchema.parse(args);
      return runRedaction(parsed);
    }),
  );
}

Design decisions

The algorithm is one function. Both transports parse the request with the same Zod schema and hand it to the same core function. If the function changes, both transports change. Drift is impossible at the algorithm layer.
Each transport carries its own concerns. HTTP gets the service-key gate and structured logger; MCP gets scope filtering, tenant context, and audit. Those concerns don't belong in the core function — they're transport-specific.
Same Zod schemas for both. The request type is parsed once per transport, but it's the same schema. The response is validated against the same shape. Consumers vendoring the contract see one truth.
Scope-gated tool visibility in MCP. toolVisible(toolName, scopes, spokeSlug) short-circuits registration. A consumer with widget.* scope sees this tool; one without it never knows it exists.

Tradeoffs

Strengths	Weaknesses
Three callers, one algorithm, zero drift	Two boilerplate files per capability (HTTP route + MCP register)
Transport-specific concerns stay in transport-specific code	The shared core can't access transport-level state (e.g., HTTP request headers); pass it in
AI agents and browsers can both rely on the same response shape	MCP tools and HTTP routes need parallel test coverage
Adding a third transport (e.g., direct script import) is free — the core is already extracted	The split discourages "just do it inline" prototyping

Citations

src/spokes/data-anonymizer/core/redact-orchestrator.ts — shared core
src/app/api/spokes/data-anonymizer/redact/route.ts — HTTP transport
src/spokes/data-anonymizer/mcp/register.ts — MCP transport
src/lib/mcp/registration.ts — shared wrapTool + toolVisible helpers used by every spoke

See also: DevPlane P14 (Parallel API Surfaces over Shared Core) — same shape generalized to REST + MCP + CLI; Performix #4 (MCP consumer with lazy session affinity) — same MCP discipline from the consumer side.

P03. Vendored Typed Contract Pattern for Cross-Repo Service Consumption

Problem You own a service in repo A. Consumer apps in repos B, C, D want the type-safety of import { CreateRequest } from "@your-service/contracts" but you don't want them importing your runtime — they shouldn't pick up your DB client, your env-var requirements, or your transitive dependencies. You also don't want to publish a private npm package for what's effectively a single types file.

The Pattern

// Producer side (repo A): the contract lives at a stable path.
// service-a/src/services/widget/contracts/types.ts

import { z } from "zod";

export const CONTRACT_VERSION = "1.3.0";

export const CreateWidgetRequestSchema = z.object({
  tenantId: z.string().min(1),
  name: z.string().min(1),
  colorHex: z.string().regex(/^#[0-9a-f]{6}$/i),
});
export type CreateWidgetRequest = z.infer<typeof CreateWidgetRequestSchema>;

export const CreateWidgetResponseSchema = z.object({
  id: z.string().uuid(),
  contractVersion: z.literal(CONTRACT_VERSION),
});
export type CreateWidgetResponse = z.infer<typeof CreateWidgetResponseSchema>;

// Consumer side (repo B): vendored copy.
// consumer-b/src/integrations/widget-service/contract-vendored.ts
// vendored from people-analyst/service-a @ <commit-sha>
// pinned to contractVersion 1.3.0 — bump via re-vendor only.

import { z } from "zod";
// ... identical schema body ...

// Consumer's adapter uses the vendored types only.
import { CreateWidgetRequestSchema, type CreateWidgetResponse }
  from "./integrations/widget-service/contract-vendored";

export async function createWidget(
  input: unknown,
): Promise<CreateWidgetResponse> {
  const parsed = CreateWidgetRequestSchema.parse(input);
  const res = await fetch(`${SERVICE_A_URL}/api/widgets`, {
    method: "POST",
    headers: { "x-service-key": process.env.SERVICE_A_KEY!,
               "Content-Type": "application/json" },
    body: JSON.stringify(parsed),
  });
  if (!res.ok) throw new Error(`widget create failed: ${res.status}`);
  return (await res.json()) as CreateWidgetResponse;
}

Design decisions

Vendor the file, not the package. A copy on disk in repo B with a vendored from <repo> @ <commit> header. Re-vendoring on a producer bump is a deliberate, audited act — not an automatic npm update.
Pin to a contract version. The consumer header names the producer commit + version. Anyone reading the consumer code can answer "are we current?" without running a tool.
Schema is the contract, not the prose. Zod schemas carry the structural shape; consumers re-parse on the wire to catch transport-level drift. Prose comments are nice-to-have; the schema is the legal document.
Producer never imports from consumers. Vendoring is unidirectional. The producer ships its own contract; consumers do the copy. No build-time cross-repo dependency.

Tradeoffs

Strengths	Weaknesses
Consumer build doesn't pull producer runtime	Re-vendoring on bump is manual (audit-able, but manual)
Cross-repo coupling is explicit and `grep`-able by commit SHA	Drift between vendored copy and upstream is possible if discipline lapses
No private npm registry, no monorepo dependency	Consumer codebase grows with one vendored file per consumed service
Wire-level Zod re-parse catches drift at the boundary, not deep in business code	Major version bumps require coordinated re-vendor across all consumers

Citations

src/spokes/*/contracts/types.ts — producer-side contracts (one per spoke)
docs/EXTERNAL-CONSUMERS.md — onboarding runbook documenting the vendor-this-file pattern
src/spokes/job-family-agent/core/service.ts — top-of-file vendored from <path> @ <commit> header convention

See also: Performix #3 (Vendored typed contracts for cross-repo service consumption) — same pattern from the consumer side, with the same commit-SHA header convention.

P04. Scope-Matched Per-Consumer API Keys

Problem You have one service deployment and many distinct callers (browser apps, batch jobs, AI agents, partner services). They need different subsets of your capabilities. Issuing one shared key gives everyone everything; issuing one key per route doesn't scale. You want a middle path: per-consumer keys, each mapped to a scope set that's expressive enough to say "this caller gets all tools under namespace X, plus exactly tool Y from namespace Z."

The Pattern

// auth/consumer-keys.ts
type Scope = string;  // "*", "<namespace>.*", or exact "<namespace>.<tool>"

const CONSUMER_SCOPE_MAP: Record<string, Scope[]> = {
  // App consumer — full access to widget service, plus three exact gadget tools.
  "app-frontend": [
    "widget.*",
    "gadget.list",
    "gadget.get",
    "gadget.health",
  ],
  // Operator / orchestrator — wildcard; explicit second-class permission.
  "ops-console": ["*"],
};

function extractKey(request: Request): string | null {
  const direct = request.headers.get("x-service-key");
  if (direct) return direct;
  const authz = request.headers.get("authorization");
  if (authz?.toLowerCase().startsWith("bearer ")) {
    return authz.slice(7).trim();
  }
  return null;
}

export function resolveConsumer(
  request: Request,
): { consumerId: string; scopes: Scope[] } | null {
  const key = extractKey(request);
  if (!key) return null;
  // Each consumer gets its own env: SERVICE_KEY_<NAME>=…
  for (const [envName, envValue] of Object.entries(process.env)) {
    const m = envName.match(/^SERVICE_KEY_(.+)$/);
    if (!m || envValue !== key) continue;
    const consumerId = m[1]!.toLowerCase().replace(/_/g, "-");
    return { consumerId, scopes: CONSUMER_SCOPE_MAP[consumerId] ?? [] };
  }
  return null;
}

// Pattern matching: "widget.*" matches "widget.create" and "widget"; "*" matches anything.
export function matchesScope(toolName: string, scopes: Scope[]): boolean {
  for (const s of scopes) {
    if (s === "*") return true;
    if (s === toolName) return true;
    if (s.endsWith(".*")) {
      const prefix = s.slice(0, -2);
      if (toolName === prefix || toolName.startsWith(`${prefix}.`)) return true;
    }
  }
  return false;
}

Design decisions

Env var per consumer, not one shared secret store. SERVICE_KEY_<NAME> is grep-able in the deploy platform's UI and rotates per-consumer. Compromise of one key doesn't compromise all.
Scope strings are minimal: three patterns only. "*", "<ns>.*", exact "<ns>.<tool>". Anything more expressive (role hierarchies, deny lists) is YAGNI for this surface.
Wildcard reserved for operators. Only the orchestrator console gets ["*"]. New consumers default to no access (empty scope list) unless explicitly added to the map.
Resolution returns null, not throws. Auth is a runtime decision at the boundary; failing fast at the routing layer keeps the call path obvious.

Tradeoffs

Strengths	Weaknesses
Per-consumer key rotation is independent	Adding a consumer requires a code edit, not just an env var change
Scope patterns are tiny and grep-able	No deny-list expressiveness; if a consumer needs "all of widget except `widget.delete`," you list every other tool
Tool visibility is enforced at registration, not just at call time — agents can't even discover tools they lack scope for	Per-consumer env-var sprawl in the deploy platform
`["*"]` orchestrator key is an obvious audit target	Compromised orchestrator key is total compromise — needs special rotation procedure

Citations

src/lib/mcp/auth.ts — CONSUMER_SCOPE_MAP + resolveConsumer
src/lib/mcp/scope.ts — the three-pattern matchesScope function
src/lib/mcp/registration.ts toolVisible — gates registration on scope match
docs/EXTERNAL-CONSUMERS.md — runbook for adding a new consumer (key generation, scope choice, audit verification)

P05. Two-Header Auth with Single-Warning Bypass for Dev

Problem You're migrating consumers from one auth header convention to another (e.g., Authorization: Bearer → a dedicated x-service-key header). You can't break in-flight consumers. You also want local dev to work without setting any env vars, because friction in dev means people disable the gate entirely.

The Pattern

import { NextResponse } from "next/server";

const ENV_VAR = "SERVICE_KEY";
const HEADER_NAME = "x-service-key";
let warnedMissing = false;

function extractKey(request: Request): string | null {
  // Prefer the new header.
  const direct = request.headers.get(HEADER_NAME);
  if (direct) return direct;
  // Fall back to legacy Bearer for in-flight consumers.
  const authz = request.headers.get("authorization");
  if (authz?.toLowerCase().startsWith("bearer ")) {
    return authz.slice("bearer ".length).trim();
  }
  return null;
}

export function requireServiceKey(request: Request): NextResponse | null {
  const expected = process.env[ENV_VAR];
  if (!expected) {
    // Local dev: don't reject, but log once so the dev sees it.
    if (!warnedMissing) {
      console.warn(
        `[auth] ${ENV_VAR} env var is not set; write endpoints are not gated. ` +
        `OK for local dev. Production deploys MUST set this env var.`,
      );
      warnedMissing = true;
    }
    return null;  // permit
  }
  const got = extractKey(request);
  if (got !== expected) {
    return NextResponse.json(
      { error: "Service key required or invalid",
        accepts: [HEADER_NAME, "Authorization: Bearer <key>"] },
      { status: 401 },
    );
  }
  return null;  // permit
}

// Usage in a route handler:
export async function POST(request: Request) {
  const denied = requireServiceKey(request);
  if (denied) return denied;
  // ... rest of handler
}

Design decisions

Two header conventions accepted simultaneously. The migration window is finite; the code is small. Drop the legacy one once the consumer logs show no Bearer calls for N days.
Single warning per process boot. A noisy warn-on-every-request trains devs to ignore it. Once-per-process is loud enough to notice, quiet enough to not be wallpaper.
Failure mode is permit + warn, not deny + crash. Local dev should never need to set the env var. Production absolutely must. The asymmetry is intentional — the production checklist is the gate, not the runtime.
The 401 body lists the accepted headers. When a consumer gets denied, the response tells them how to fix it. No "consult the docs."

Tradeoffs

Strengths	Weaknesses
Migrating consumers off the legacy header is non-breaking	Two-header support is permanent debt until you actively delete it
Local dev needs no env config	The permit-when-unset behavior is dangerous if deployed misconfigured; relies on env-var checklist discipline
Tiny handler-side boilerplate (one `if (denied) return denied;` line)	Single-flag warning loses info if multiple env vars are missing (each only fires once)
Self-documenting 401 response	Two simultaneous header conventions widen the attack surface (more parsers to keep right)

Citations

src/lib/auth/service-key.ts — full implementation
docs/DECISIONS/2026-05-10-pat-11-deployment-protection.md — decision memo for why dual-header is acceptable during the PAT-11 migration

P06. Per-Schema Heartbeat Health Check with Aggregate Roll-Up

Problem You have N services co-deployed against the same Postgres instance, but each service owns a separate schema (the privacy boundary). When something breaks, the operator needs to know: is the whole DB down, or is just service Y's schema unreachable? You want a /health endpoint per service that probes only that service's schema, plus an aggregate endpoint that rolls them all up.

The Pattern

// lib/health/check.ts — the per-service probe.
import { db } from "@/db/client";
import { sql } from "drizzle-orm";

export type SpokeHealthStatus = "ok" | "degraded" | "down";

export type SpokeHealth = {
  spoke: string;
  status: SpokeHealthStatus;
  contractVersion: string;
  schemaReachable: boolean;
  latencyMs: number;
  checkedAt: string;
};

const DEGRADED_THRESHOLD_MS = 500;

export async function checkSpokeHealth(input: {
  slug: string;
  schemaName: string;
  contractVersion: string;
}): Promise<SpokeHealth> {
  const start = Date.now();
  let schemaReachable = false;
  try {
    // Every schema MUST have a `heartbeat` table — convention, not config.
    await db.execute(
      sql.raw(`SELECT count(*) FROM "${input.schemaName}".heartbeat`),
    );
    schemaReachable = true;
  } catch {
    schemaReachable = false;
  }
  const latencyMs = Date.now() - start;
  const status: SpokeHealthStatus =
    !schemaReachable ? "down"
    : latencyMs > DEGRADED_THRESHOLD_MS ? "degraded"
    : "ok";
  return {
    spoke: input.slug,
    status,
    contractVersion: input.contractVersion,
    schemaReachable,
    latencyMs,
    checkedAt: new Date().toISOString(),
  };
}

// lib/health/aggregate.ts — roll up every service in parallel.
export async function aggregateHealth(): Promise<{
  status: SpokeHealthStatus;
  spokes: SpokeHealth[];
}> {
  const probes = await Promise.all(
    REGISTRY.services
      .filter((s) => s.status === "live")
      .map((s) => checkSpokeHealth({
        slug: s.slug,
        schemaName: s.schema,
        contractVersion: s.contractVersion,
      })),
  );
  const worst = probes.some((p) => p.status === "down") ? "down"
              : probes.some((p) => p.status === "degraded") ? "degraded"
              : "ok";
  return { status: worst, spokes: probes };
}

Design decisions

Convention: every schema has a heartbeat table. Onboarding a new service requires adding the table; the health-check primitive needs no per-service code. The cost is one row migration per schema.
Schema reachability is the proxy for service health. Schemas are the privacy boundary, so a reachable schema implies the data layer is up. The check deliberately does NOT exercise feature tables — they can be empty or in-migration without meaning "down."
Latency-as-status, not just up/down. A schema that responds in 800ms isn't healthy; it's degraded. Surfaces the case where one tenant's DB is overloaded but everyone else is fine.
Aggregate fans out in parallel. Promise.all so one slow schema doesn't serialize the whole probe.
Worst-case roll-up. Aggregate status is the worst per-service status. No averaging, no quorum; if anyone is down, the system is down for that consumer.

Tradeoffs

Strengths	Weaknesses
Per-schema isolation surfaces partial failures clearly	The `heartbeat` table convention is brittle if someone forgets to add it on a new spoke
Latency-as-degraded catches "creeping slowness" before total failure	Threshold (500ms) is global, not per-service — slow-by-design spokes look perpetually degraded
Aggregate is `Promise.all`, so the whole probe takes ~max(per-spoke latency)	Postgres `SELECT count(*)` on a small table is cheap, but on a huge one is expensive — keep heartbeat tiny
Public `/health` works for uptime monitors without special config	Doesn't catch logical bugs (write paths failing); only data-layer reachability

Citations

src/lib/health/check.ts — per-service probe
src/lib/health/aggregate.ts — registry-driven fan-out
src/app/api/spokes/*/health/route.ts — per-spoke health endpoints
src/app/api/health/route.ts — aggregate roll-up

P07. Anonymity-Gate-as-Primitive (Min-N Cohort Check)

Problem Any system that returns aggregated stats over individuals (engagement scores by team, salary by department, response rates by manager) risks re-identifying small cohorts. You need a single canonical predicate every caller can compose — and a typed response shape that distinguishes "the rollup is blocked for privacy" from "the rollup is empty."

The Pattern

// core/min-n-gate.ts — pure evaluator, no I/O.
export type MinNCheckResponse = {
  segmentId: string;
  respondentCount: number;
  threshold: number;
  ok: boolean;
  reason?: string;
};

export function evaluateMinN(input: {
  segmentId: string;
  respondentCount: number;
  threshold: number;
}): MinNCheckResponse {
  const { segmentId, respondentCount, threshold } = input;
  const ok = respondentCount >= threshold;
  return {
    segmentId,
    respondentCount,
    threshold,
    ok,
    reason: ok
      ? undefined
      : `Segment "${segmentId}" has ${respondentCount} respondents, below threshold ${threshold}.`,
  };
}

// Callers compose at the response boundary, never the storage boundary.
// (Storage stays raw; the gate is applied right before returning.)
export async function getPreferenceWeights(surveyId: string): Promise<Response> {
  const survey = await store.getSurvey(surveyId);
  const respondentCount = await store.getCompletedResponseCount(surveyId);
  const gate = evaluateMinN({
    segmentId: surveyId,
    respondentCount,
    threshold: survey.minimumResponseThreshold,
  });

  if (!gate.ok) {
    // Discriminated-union response — typed "blocked" status, not an exception.
    return { status: "blocked", surveyId, anonymity: gate };
  }

  const weights = await computeWeights(survey, respondentCount);
  return { status: "ok", surveyId, anonymity: gate, weights };
}

// Per-segment composition: skip below-threshold cohorts silently.
async function withSegmentBreakdown(surveyId: string, threshold: number) {
  const rowsBySegment = await store.getRowsBySegment(surveyId);
  const breakdown = [];
  for (const [segmentId, rows] of rowsBySegment) {
    const gate = evaluateMinN({ segmentId, respondentCount: rows.length, threshold });
    if (!gate.ok) continue;   // segment too small — omit, don't block the whole response
    breakdown.push({ segmentId, weights: aggregate(rows), anonymity: gate });
  }
  return breakdown;
}

Design decisions

Pure evaluator; no I/O, no storage coupling. The gate is a typed predicate. Callers fetch the respondent count from their own source and pass it in. This lets the gate compose with any data layer.
Per-segment skip, top-level block. If the whole rollup is below threshold, return status: "blocked" (the caller sees a typed refusal). If one segment in a breakdown is below threshold, silently omit it (the caller sees N-1 segments). Two failure modes; one primitive.
reason carries the user-facing explanation. Privacy refusals are part of the product surface — operators see "this segment has 2 respondents (threshold 5)" not a generic "denied." The exact text is part of the contract.
Threshold is per-caller, not global. Survey A wants threshold 5; survey B wants 10. The gate doesn't decide; the caller passes the policy in.

Tradeoffs

Strengths	Weaknesses
One primitive every caller composes; consistent privacy contract	Threshold is a runtime value — wrong policy at the caller passes the gate
Pure function → trivially testable, no DB mock required	Doesn't catch correlated re-identification across rollups (k-anonymity is one of many privacy properties)
Typed "blocked" status is a first-class response shape	Two failure modes (top-level block vs per-segment skip) is a discipline; callers can confuse them
Composable with breakdowns: small cohorts vanish, not the whole report	Operators must understand why a segment is missing; doc the omission semantics

Citations

src/spokes/data-anonymizer/core/min-n-gate.ts — pure evaluator
src/spokes/preference-modeler/core/anonymity-threshold.ts — parallel implementation with the same shape (different spoke)
src/spokes/preference-modeler/core/preference-aggregate.ts — composer that uses the gate at both rollup and per-segment levels
Discriminated-union response surface: see Pattern P18

P08. Deterministic Per-Tenant HMAC Tokenization

Problem You need to replace a sensitive value (employee ID, email, salary) with a token. Three properties:

Stable within a tenant — the same email always maps to the same token (so joins still work).
Un-correlatable across tenants — tenant A's token for alice@x ≠ tenant B's token for alice@x (so a leaked token from tenant A reveals nothing about tenant B).
One-way — given a token, you can't recover the input without the master secret.

The Pattern

import { createHmac } from "crypto";

const SECRET_ENV_VAR = "TOKEN_SECRET";
const SECRET_FALLBACK = "dev-secret-do-not-use-in-prod";

function masterSecret(): string {
  return process.env[SECRET_ENV_VAR] ?? SECRET_FALLBACK;
}

// Derive a per-tenant key. Same tenantId → same key. Different tenantId → uncorrelated key.
export function tenantKey(tenantId: string): string {
  return createHmac("sha256", masterSecret())
    .update(`tenant:${tenantId}`)
    .digest("hex");
}

// Token = HMAC(tenantKey, "<field>:<value>")
// Same (tenantId, field, value) within a tenant → same token.
// Different tenant → completely different token for the same value.
export function tokenFor(tenantId: string, field: string, value: string): string {
  const key = tenantKey(tenantId);
  return createHmac("sha256", key).update(`${field}:${value}`).digest("hex");
}

// Composed with a strategy registry for richer transforms:
export function anonymizeValue(input: {
  tenantId: string;
  field: string;
  value: string;
  strategy: "tokenize" | "faker-name" | "salary-band";
}): string {
  const seed = tokenFor(input.tenantId, input.field, input.value);
  switch (input.strategy) {
    case "tokenize":
      return seed;  // raw token
    case "faker-name":
      // Same seed → same fake name within a tenant.
      faker.seed(parseInt(seed.slice(0, 8), 16));
      return faker.person.fullName();
    case "salary-band":
      const num = parseInt(input.value, 10);
      return String(Math.round(num / 5000) * 5000);  // banded, no HMAC needed
  }
}

Design decisions

Two-layer HMAC, not one. First HMAC derives a per-tenant key from tenantId. Second HMAC tokenizes the value with that key. This is the technically correct way to scope a keyed PRF — single-layer HMAC(secret, tenant + value) would also work but offers weaker domain separation guarantees.
Field is part of the value input. "<field>:<value>" so the same email value in from_email and to_email tokenizes to different strings. Otherwise correlation across columns leaks structure.
Field-name colon-prefix is a poor-man's domain separator. It's good enough when the field set is small + grep-able; if you have hundreds of fields, switch to a more rigorous tagged construction.
Faker-name strategy seeds from the token. Deterministic re-identification within a tenant; consumers can join on tokenized name without seeing the real name.

Tradeoffs

Strengths	Weaknesses
Same (tenant, field, value) → same token forever; joins survive anonymization	Master secret rotation invalidates every token everywhere
Different tenants get uncorrelated tokens; cross-tenant leak is contained	Field-name domain separation is informal; collisions if field names overlap by accident
No external dep — `node:crypto` only	HMAC is fast but not parallelizable across cores in pure JS
Composable with strategy registry (faker, banding, redaction)	Cipher-text equivalence enables enumeration attacks on small value spaces (e.g., 5-bit roles)

Citations

src/spokes/data-anonymizer/core/tokenization.ts — tenantKey + tokenFor
src/spokes/data-anonymizer/core/strategies.ts — applyStrategy switch with faker / banding / token strategies
Salary-band variant: salaryBand() for numeric values that need bucketing rather than tokenization

P09. Per-Route Structured Logger as Higher-Order Handler

Problem You want one structured JSON log line per request — {ts, requestId, route, status, latencyMs} — without an APM dependency, without per-route boilerplate, and without breaking the framework's error handling (Next.js still has to see the throw to render its 500).

The Pattern

import type { NextResponse } from "next/server";

type Handler<C = unknown> = (
  request: Request,
  ctx: C,
) => Promise<NextResponse | Response>;

let requestCounter = 0;
function nextRequestId(): string {
  requestCounter = (requestCounter + 1) & 0xffffffff;
  return `${Date.now().toString(36)}-${requestCounter.toString(36)}`;
}

export function withRouteLogger<C>(name: string, handler: Handler<C>): Handler<C> {
  return async (request, ctx) => {
    const requestId = nextRequestId();
    const start = Date.now();
    const url = new URL(request.url);
    let status = 0;
    let errorStack: string | undefined;

    try {
      const response = await handler(request, ctx);
      status = response.status;
      return response;
    } catch (err) {
      status = 500;
      errorStack = err instanceof Error ? err.stack : String(err);
      throw err;     // re-throw so framework still handles it
    } finally {
      const latencyMs = Date.now() - start;
      const line = {
        ts: new Date().toISOString(),
        requestId,
        route: name,                 // canonical name, not URL path
        method: request.method,
        path: url.pathname,
        status,
        latencyMs,
        ...(errorStack ? { errorStack } : {}),
      };
      console.log(JSON.stringify(line));
    }
  };
}

// Usage — one wrap per route, name is canonical contract ID.
export const POST = withRouteLogger("widget.create", async (request) => {
  const denied = requireServiceKey(request);
  if (denied) return denied;
  // ... rest of handler
});

// For routes with params (Next.js 15 async params):
export const GET = withRouteLogger<{ params: Promise<{ id: string }> }>(
  "widget.get",
  async (_request, { params }) => { /* ... */ },
);

Design decisions

route is a canonical name, not the URL. "widget.create" is grep-able and stable across URL changes. The URL path is logged separately as path. Log search filters on route: even if the route is moved.
Catch, log, re-throw. The framework still owns 500 rendering; we just observe the error and emit the line. Swallowing would hide bugs behind silent 500s.
finally block, not after-success. The line emits whether the handler returned, threw, or was cancelled. Latency is always recorded.
One JSON line per request, console.log. Vercel's log pipeline already structures this. No agent, no batching, no sidecar. If you outgrow console.log, swap one function; everything else stays.
Counter wraps at 32 bits. Request IDs are <time>-<counter> — collision-safe within a process even after a counter wrap.

Tradeoffs

Strengths	Weaknesses
Zero deps; works on any FaaS that captures stdout	No sampling, no batching — every request emits one line (cost at high volume)
`route` name is grep-able and stable	The `name` arg must be set per route; forgetting it = no log
Catches all paths (return, throw, cancel) via `finally`	Doesn't capture downstream call timings unless you add child spans manually
Re-throw preserves framework error semantics	Errors are logged with stack at the route boundary, but stack-walking for inner errors needs separate logging

Citations

src/lib/log/route-logger.ts — full implementation
Used by every src/app/api/spokes/*/*/route.ts — usage is consistent across ~100+ routes

P10. Fire-and-Forget Audit Write with Stderr-on-Failure

Problem Every privileged call (MCP tool invocation, write endpoint) needs an audit row in the DB. But the audit write must never:

Block the response (a slow audit DB shouldn't add latency to the call).
Fail the call (an audit DB outage shouldn't make tools 500).
Disappear silently (operators must see audit-write failures, even if consumers don't).

The Pattern

import { db } from "@/db/client";
import { auditTable } from "@/db/schema";

export type AuditEvent = {
  consumerId: string;
  toolName: string;
  latencyMs: number;
  status: "ok" | "error";
  errorStack?: string;
};

function persistAudit(event: AuditEvent): Promise<unknown> {
  return db.insert(auditTable).values({
    consumerId: event.consumerId,
    toolName: event.toolName,
    status: event.status,
    latencyMs: event.latencyMs,
    errorMessage: event.errorStack ?? null,
  });
}

export function logToolCall(input: AuditEvent): void {
  // Always emit the stdout line first — Vercel log search is the primary surface.
  console.log(JSON.stringify({
    ts: new Date().toISOString(),
    route: `mcp.tool.${input.toolName}`,
    kind: "mcp_tool",
    consumerId: input.consumerId,
    toolName: input.toolName,
    latencyMs: input.latencyMs,
    status: input.status,
    ...(input.errorStack ? { errorStack: input.errorStack } : {}),
  }));

  // Fire-and-forget DB write. We intentionally do NOT await — the tool response
  // should never be blocked on an audit write, and an audit-write failure should
  // never surface as a tool failure to the consumer.
  void persistAudit(input).catch((err) => {
    console.error(JSON.stringify({
      ts: new Date().toISOString(),
      route: `mcp.audit.persist-failed`,
      kind: "audit_error",
      toolName: input.toolName,
      consumerId: input.consumerId,
      errorMessage: err instanceof Error ? err.message : String(err),
    }));
  });
}

Design decisions

void prefix is intentional and load-bearing. Without it, ESLint will (correctly) flag the floating promise. void says "I know this is a promise and I deliberately won't await it." The behavior is unchanged; the linter is satisfied.
Two log lines on failure: stdout ok + stderr audit_error. The consumer sees success; the operator sees the audit failure. Decoupled.
stdout line is always emitted first. If the audit DB write hangs the process (it won't — we don't await — but in principle), the structured log is already out.
console.error for failure, console.log for success. Vercel + most log pipelines route stderr to a separate severity bucket. Operators alert on audit_error without alerting on every call.
No retry. If the audit DB is down, the audit row is lost; the operator sees it in stderr. Audit reliability is a function of audit-DB uptime, not of in-app queueing — pushing complex retry logic into the audit path adds failure modes.

Tradeoffs

Strengths	Weaknesses
Audit writes can't slow or fail consumer calls	Lost audit rows on DB outage — replay impossible
Operators alert on stderr without false positives from happy path	Two log surfaces (DB + stderr) to query when investigating
No queue, no retry, no extra infrastructure	High-traffic systems can lose meaningful audit volume during outages
`void` + `.catch` is idiomatic and locally legible	Easy to accidentally `await` and break the no-block guarantee

Citations

src/lib/mcp/audit.ts — logToolCall + persistMcpAudit
src/lib/log/route-logger.ts — same structured-log convention on the HTTP side (Pattern P09)

P11. Stateless Multi-Tenant Request Context Resolution

Problem A request needs to be bound to a tenant. Different consumers send the tenant ID differently:

Modern consumers send a typed JSON header (x-tenant-context: {...}).
Body-shaped consumers embed { tenantContext: {...} } in the JSON body.
Legacy consumers send only { organizationId: "..." } or { tenantId: "..." } in the body.

You need one resolution primitive that handles all three, picks the best available, and returns null (not throws) when nothing matches.

The Pattern

import { z } from "zod";

export const TenantContextSchema = z.object({
  organizationId: z.string().min(1),
  principal: z.string().min(1),
  scopes: z.array(z.string()),
});
export type TenantContext = z.infer<typeof TenantContextSchema>;

export async function readTenantContext(
  request: Request,
): Promise<TenantContext | null> {
  // 1. Dedicated header — modern path.
  const headerValue = request.headers.get("x-tenant-context");
  if (headerValue) {
    try {
      const parsed = TenantContextSchema.safeParse(JSON.parse(headerValue));
      if (parsed.success) return parsed.data;
    } catch { /* fall through */ }
  }

  // 2. Body field — middle-era path.
  if (request.body) {
    try {
      const cloned = request.clone();   // clone — body is a stream, single-consumer
      const body = (await cloned.json()) as Record<string, unknown> | null;
      if (body && typeof body === "object") {
        if (body.tenantContext) {
          const parsed = TenantContextSchema.safeParse(body.tenantContext);
          if (parsed.success) return parsed.data;
        }
        // 3. Legacy body field — back-compat path.
        const orgId =
          (typeof body.organizationId === "string" && body.organizationId) ||
          (typeof body.tenantId === "string" && body.tenantId);
        if (orgId) {
          return TenantContextSchema.parse({
            organizationId: orgId,
            principal: deriveImplicitPrincipal(request),
            scopes: [],     // legacy callers get empty scopes (no scope check)
          });
        }
      }
    } catch { /* body wasn't JSON-shaped — return null */ }
  }
  return null;
}

export async function requireTenantContext(
  request: Request,
): Promise<TenantContext> {
  const ctx = await readTenantContext(request);
  if (!ctx) throw new TenantContextMissingError();
  return ctx;
}

function deriveImplicitPrincipal(request: Request): string {
  return request.headers.get("x-consumer") ?? "anonymous";
}

Design decisions

Priority order matters and is fixed. Header > body field > legacy body field. Once a layer parses successfully, lower layers don't run. This makes mixed payloads deterministic.
request.clone() before reading the body. The request body is a stream; reading it consumes it. The actual handler still needs the body, so this primitive clones first.
safeParse, not parse, until you commit. Each layer tries to parse and falls through silently on failure. The whole function only throws via the explicit requireTenantContext wrapper.
Legacy callers get empty scopes. This is a deliberate downgrade — legacy callers can't pass scope assertions. New code that requires scopes will reject; new code that tolerates the absence (e.g., read-only endpoints) will accept.
readTenantContext returns null; requireTenantContext throws. Two functions, two semantics. Callers pick: "I need a tenant" → require; "I'd prefer a tenant" → read.

Tradeoffs

Strengths	Weaknesses
Three header conventions accepted; migration is non-breaking	Body clone has CPU + memory cost on every request
Predictable resolution order, no surprise overrides	Three input paths = three places a malformed payload can confuse parsing
Pure-function-shaped (single Request → typed context or null)	Legacy fallback creates contexts with empty scopes silently; downstream scope checks must handle this
`safeParse` → no throws on the read path	Long-term: removing legacy path requires consumer audit + coordinated removal

Citations

src/lib/tenant-context/index.ts — readTenantContext + requireTenantContext
src/lib/mcp/wrap.ts resolveTenantContextForMcpCall — parallel implementation for the MCP transport (header + body + legacy)

P12. Modular Tool Registration with Per-Spoke Self-Registration

Problem You have N microservices in one deploy, each exposing M tools to an agent transport (MCP, gRPC, or any RPC server). When a new service comes online, you want the registration cost to be O(1): one import + one call in an aggregator. You don't want a central catalog that the new service edits — that's a coordination bottleneck.

The Pattern

// Each service owns its own register module — only modifies its own code.
// services/widget/transport/register.ts
import type { McpServer } from "@modelcontextprotocol/sdk/server/mcp";
import { wrapTool, toolVisible, type RegistrationCtx } from "@/lib/transport/registration";
import { CreateWidgetRequestSchema, CreateWidgetResponseSchema } from "../contracts/types";
import { runCreateWidget } from "../core/run-create-widget";

export function registerWidgetTools(server: McpServer, ctx: RegistrationCtx): void {
  if (toolVisible("widget.create", ctx.scopes, ctx.spokeSlug)) {
    server.registerTool(
      "widget.create",
      {
        inputSchema: CreateWidgetRequestSchema.shape,
        outputSchema: CreateWidgetResponseSchema.shape,
      },
      wrapTool(ctx, "widget.create", async (args) => {
        const parsed = CreateWidgetRequestSchema.parse(args);
        return runCreateWidget(parsed);
      }),
    );
  }
  // ... more tools in this service
}

// services/gadget/transport/register.ts — same shape, different service.
export function registerGadgetTools(server: McpServer, ctx: RegistrationCtx): void {
  // ...
}

// lib/transport/register-tools.ts — the single aggregator.
// Adding a new service: ONE import + ONE call.
import { registerWidgetTools } from "@/services/widget/transport/register";
import { registerGadgetTools } from "@/services/gadget/transport/register";

export function registerAllTools(server: McpServer, ctx: RegistrationCtx): void {
  registerWidgetTools(server, ctx);
  registerGadgetTools(server, ctx);
  // ... new service goes here as a single line
}

Design decisions

Aggregator is dumb. It only imports + calls. No knowledge of what tools exist, no scope enforcement (that's in the per-spoke register), no catalog. The aggregator is the only place that needs editing on new-spoke onboarding.
Per-spoke register self-filters via toolVisible. A consumer with scope widget.* only sees widget tools; the aggregator doesn't need to know — each registrar checks.
One file, one service. The register module lives next to the service's contracts and core. Reviewing a service review touches one filesystem location.
RegistrationCtx carries the cross-cutting context. Consumer ID, scopes, optional spoke filter — passed in, never derived inside the registrar.

Tradeoffs

Strengths	Weaknesses
Onboarding a new service is one PR file (the aggregator) + the new service tree	The aggregator can grow long; a 50-service repo has a 50-line aggregator
Each service's registrar lives next to the service code (high locality)	Cross-service tool naming collisions are caught only at registration time
Scope filtering at registration prevents agents from discovering unauthorized tools	Per-spoke registrars repeat the wrapping boilerplate (alternative: a shared `registerSimple` helper, but adds an abstraction)
The aggregator is grep-able truth about which services are online	No declarative manifest — agents can't introspect "what services exist" without invoking the aggregator

Citations

src/lib/mcp/register-tools.ts — the aggregator (currently ~27 imports + calls)
src/spokes/*/mcp/register.ts — per-spoke register modules (one per service)
src/lib/mcp/registration.ts — shared wrapTool + toolVisible helpers

See also: DevPlane P14 (Parallel API Surfaces over Shared Core) — same self-registration discipline applied to REST + CLI + MCP at once.

P13. Cross-Cutting Domain Envelope (Value + Provenance + Enrichment)

Problem Numeric facts pass between services and surfaces. A bare number (engagement = 4.2) is useless on receipt — the receiver can't tell what it measures, when it was computed, against what cohort, with what confidence interval, or whether it's better or worse than last quarter. You want one canonical envelope shape every service emits and every consumer renders.

The Pattern

import { z } from "zod";

// Where did this number come from?
export const ProvenanceSchema = z.object({
  source: z.string(),              // "service.compute-method"
  computedAt: z.string(),          // ISO-8601
  method: z.string().optional(),   // optional algorithm name
  notes: z.string().optional(),
});

// What do we know about it beyond the raw value?
export const EnrichmentSchema = z.object({
  ci: z.object({
    lower: z.number(),
    upper: z.number(),
    level: z.number().min(0).max(1),
    method: z.enum(["wilson", "normal", "bootstrap", "t"]),
  }).optional(),
  zScore: z.number().optional(),
  percentile: z.number().min(0).max(100).optional(),
  changeRate: z.number().optional(),       // vs previousValue
  previousValue: z.number().optional(),
  effectSize: z.enum(["small", "medium", "large", "negligible"]).optional(),
});

// The envelope itself: keyed by (metric, segment, period) for cross-cohort comparison.
export const EnvelopeSchema = z.object({
  metricKey: z.string().min(1),
  segmentId: z.string().nullable(),        // null = unsegmented (overall)
  period: z.string(),                      // ISO month/quarter — caller-defined granularity
  value: z.number(),
  sampleSize: z.number().int().min(0).optional(),
  provenance: ProvenanceSchema,
  enrichment: EnrichmentSchema.optional(),
  tenantId: z.string().nullable().optional(),   // privacy boundary
});
export type Envelope = z.infer<typeof EnvelopeSchema>;

// Enrichment is computed once and merged onto a bare envelope.
export function enrichEnvelope(
  envelope: Envelope,
  comparison?: { distribution?: number[]; previousValue?: number },
): Envelope {
  const enrichment = computeEnrichment({
    value: envelope.value,
    distribution: comparison?.distribution,
    previousValue: comparison?.previousValue,
  });
  return { ...envelope, enrichment: { ...envelope.enrichment, ...enrichment } };
}

Design decisions

Three layers: identity (metricKey + segmentId + period), value (value + sampleSize), context (provenance + enrichment). Each layer has its own evolution cadence — identity rarely changes; value is the data; context grows with what you know.
segmentId: null means "unsegmented." Not an empty string, not absent — explicit null. Eliminates the "what does "" mean?" question.
Provenance is mandatory. No envelope without a source + computedAt. The cost of optional provenance is "where did this number come from?" debug sessions; mandatory is cheap.
Enrichment is optional but additive. You can have just a value; you can compute CI + zScore later and merge. The schema doesn't force the producer to know everything up front.
tenantId lives on the envelope, not separately. Privacy boundary travels with the data. A persister that strips tenantId is a bug, not a feature.

Tradeoffs

Strengths	Weaknesses
One render surface (chart, table, card) works for every metric across every service	The envelope is verbose for trivially-shaped data
Provenance + enrichment travel with the value — no orphaned numbers	All consumers must understand the shape (vs simple `{value: number}`)
Optional enrichment lets producers ship before they have CIs	Many envelopes with optional fields → consumer-side defensiveness
Tenant boundary embedded in the data; cross-tenant leak requires explicit strip	Hard to evolve the schema once consumers vendor it — backwards-compat needs `.optional()` discipline

Citations

src/spokes/calculus/contracts/types.ts — MetricEnvelope (the original; PA Toolbox terminology)
src/spokes/calculus/core/stats-enrich.ts — enrichment computation
src/spokes/calculus/core/factory.ts — combinatorial envelope-grid builder

P14. Data-Shape-Driven Method Selection for Confidence Intervals

Problem You're enriching a numeric value with a confidence interval. The right method depends on the data shape: proportion data (counts of successes / trials) wants Wilson-score; small samples want a t-interval; large samples can use a normal approximation. Forcing the caller to pick adds friction and gets it wrong. You want the algorithm to pick the right method from the input shape.

The Pattern

export type ConfidenceInterval = {
  lower: number;
  upper: number;
  level: number;
  method: "wilson" | "normal" | "bootstrap" | "t";
};

export function wilsonInterval(p: number, n: number, conf = 0.95): ConfidenceInterval { /* ... */ }
export function tInterval(values: number[], conf = 0.95): ConfidenceInterval { /* ... */ }
export function normalInterval(values: number[], conf = 0.95): ConfidenceInterval { /* ... */ }

/** Pick the right CI method from the input shape, no caller flag. */
export function computeEnrichment(params: {
  value: number;
  distribution?: number[];          // raw observations (continuous data)
  previousValue?: number;
  proportionDenominator?: number;   // if set, value is a proportion
}): { ci?: ConfidenceInterval; zScore?: number; percentile?: number; changeRate?: number } {
  const enrichment: ReturnType<typeof computeEnrichment> = {};

  if (params.proportionDenominator) {
    // Proportion data → Wilson-score (handles boundary 0/1 correctly).
    enrichment.ci = wilsonInterval(params.value, params.proportionDenominator);
  } else if (params.distribution && params.distribution.length >= 2) {
    const dist = params.distribution;
    // Continuous data: t-interval for small samples, normal for large.
    enrichment.ci = dist.length < 30 ? tInterval(dist, 0.95) : normalInterval(dist, 0.95);
    enrichment.zScore = zScore(params.value, dist);
    enrichment.percentile = percentileRank(params.value, dist);
  }

  if (params.previousValue !== undefined && params.previousValue !== 0) {
    enrichment.changeRate = (params.value - params.previousValue) / params.previousValue;
  }

  return enrichment;
}

Design decisions

Shape implies method, not a caller flag. proportionDenominator present → Wilson; distribution present → t or normal by n; nothing → no CI. The caller can't pick the wrong method by misreading the docs.
n=30 cutoff for t-vs-normal. Textbook threshold; "good enough" for the common case. Callers wanting a different threshold compose tInterval / normalInterval directly.
Method enum on the response. The consumer can render "Wilson 95% CI [0.42, 0.58]" without re-deriving which method ran. Auditability for free.
No CI is a valid result. If the caller passes neither distribution nor denominator, return an enrichment without ci. Forcing a CI from a single point is wrong; refusing is right.
Same function for HTTP route + MCP tool. Both transports compute via this one function so the picked method never disagrees across surfaces.

Tradeoffs

Strengths	Weaknesses
Callers don't have to know statistics to use it correctly	Cap of n=30 for t-vs-normal is opinionated; some domains prefer 50 or 100
Method is recorded on the output for auditability	No bootstrap path here — for non-parametric data, caller must compute externally
Shape-driven dispatch is small and testable	Adds a function-level branching point that's easy to grow into a god-function — resist new methods unless the shape genuinely demands it
Same function across transports → no drift	Method choice depends on accurate parameter naming; mislabeling proportion-as-distribution silently produces a wrong CI type

Citations

src/spokes/calculus/core/stats.ts — wilsonInterval, tInterval, normalInterval, tCritical
src/spokes/calculus/core/stats-enrich.ts — computeStatsEnrichment (the shape-driven selector)

P15. Generic Hypothesis-Walk Diagnostic Chain

Problem You're building "why did this happen?" answers across multiple services — rating misalignments, performance drops, validity scorecards. Each service has its own hypotheses + evidence shapes, but the orchestration is the same: try each hypothesis, score it on the evidence, skip when inapplicable, return the survivors sorted by strength. You want one primitive every service reuses.

The Pattern

/** One step in an ordered diagnostic chain. */
export type Hypothesis<Context, Evidence> = {
  id: string;
  name: string;
  evidenceQuery: (ctx: Context) => Evidence | Promise<Evidence>;
  /** Score on 0..1, or null when the hypothesis doesn't apply. */
  scoringFn: (evidence: Evidence) => number | null;
  narrate: (evidence: Evidence, score: number) => string;
};

export type ChainResult<Evidence> = {
  rankedFindings: Array<{
    hypothesisId: string;
    hypothesisName: string;
    score: number;
    evidence: Evidence;
    narration: string;
  }>;
  skipped: Array<{ hypothesisId: string; reason: string }>;
};

export async function runDiagnosticChain<Context, Evidence>(
  context: Context,
  chain: Hypothesis<Context, Evidence>[],
  opts?: { minScore?: number },
): Promise<ChainResult<Evidence>> {
  const minScore = opts?.minScore ?? 0;
  const rankedFindings: ChainResult<Evidence>["rankedFindings"] = [];
  const skipped: ChainResult<Evidence>["skipped"] = [];

  for (const h of chain) {
    const evidence = await h.evidenceQuery(context);
    const score = h.scoringFn(evidence);

    if (score === null) {
      skipped.push({ hypothesisId: h.id, reason: "Hypothesis does not apply (scorer returned null)." });
      continue;
    }
    if (!Number.isFinite(score)) {
      skipped.push({ hypothesisId: h.id, reason: "Scorer produced a non-finite score." });
      continue;
    }
    if (score < minScore) continue;

    rankedFindings.push({
      hypothesisId: h.id,
      hypothesisName: h.name,
      score,
      evidence,
      narration: h.narrate(evidence, score),
    });
  }

  rankedFindings.sort((a, b) => b.score - a.score);
  return { rankedFindings, skipped };
}

// Service-specific usage: define the Context / Evidence types, build a chain, run.
type RatingContext = { employeeId: string; cycle: string };
type RatingEvidence = { managerChange: boolean; promotedRecently: boolean; /* ... */ };

const RATING_CHAIN: Hypothesis<RatingContext, RatingEvidence>[] = [
  {
    id: "manager-change",
    name: "Manager change during cycle",
    evidenceQuery: async (ctx) => fetchManagerHistory(ctx),
    scoringFn: (e) => e.managerChange ? 0.7 : null,
    narrate: (e, s) => `Manager changed during the cycle (confidence ${s.toFixed(2)}).`,
  },
  // ...more hypotheses
];

const result = await runDiagnosticChain({ employeeId: "...", cycle: "Q4" }, RATING_CHAIN);

Design decisions

Generic over Context + Evidence. The primitive doesn't know what you're diagnosing — both type parameters are caller-defined. Reuse across rating, performance, validity, anomaly chains.
null means "skip silently." Distinguishes "this hypothesis doesn't apply" from "this hypothesis scored zero." Skipped hypotheses are reported separately so operators can see they ran.
Sequential, not parallel. Each hypothesis runs to completion before the next. Some hypotheses cheaply rule themselves out via scoringFn returning null; parallelism would waste work on those. (Add Promise.all if your hypotheses are I/O-heavy and independent.)
narrate is part of the contract. Findings carry explanatory text per-hypothesis; the consumer doesn't compose narration from raw scores.
Sort by descending score on output. Strongest evidence first. Callers can take the top-N or threshold by minScore.

Tradeoffs

Strengths	Weaknesses
One primitive for many domain-specific diagnostic spokes	Sequential evidence-fetch can be slow when hypotheses are I/O-heavy
`null = skip` keeps non-applicable hypotheses out of the ranked list	Score scale (0..1) is convention, not enforced — divergent scales across hypotheses skew ranking
Narration travels with the finding, no separate templating step	No support for hypothesis dependencies (e.g., "only run B if A scored > 0.5")
Type-parameterized → fully typed at the call site	Easy to abuse by stuffing arbitrary side-effects into `scoringFn`; keep scorers pure

Citations

src/lib/diagnostic-chain/run-diagnostic-chain.ts — full implementation
src/lib/diagnostic-chain/types.ts — Hypothesis + ChainResult types
Used by performance-validity, manager-effectiveness, rating-divergence chains across multiple spokes

P16. Per-Request In-Memory Fixed-Window IP Rate Limiter

Problem You have a public POST endpoint (one that's deliberately not behind an auth gate — e.g., a read-shaped classify endpoint). You want to throttle abuse without adding Redis. The deploy is one regional Next.js process; per-process state is acceptable for v1.

The Pattern

type Bucket = { count: number; windowStart: number };

const WINDOW_MS = 60_000;
const DEFAULT_MAX = 100;
const buckets = new Map<string, Bucket>();

function clientKey(request: Request): string {
  // Prefer the platform's forwarded-for; fall back to direct headers.
  const fwd = request.headers.get("x-forwarded-for");
  if (fwd) return fwd.split(",")[0]!.trim() || "unknown";
  const realIp = request.headers.get("x-real-ip");
  if (realIp) return realIp.trim();
  return "unknown";
}

/**
 * Fixed-window counter. Returns a 429 Response when over limit, null when allowed.
 * Caller pattern: `const limited = rateLimitClassify(req); if (limited) return limited;`
 */
export function rateLimitClassify(
  request: Request,
  maxPerWindow: number = DEFAULT_MAX,
): Response | null {
  const key = `classify:${clientKey(request)}`;
  const now = Date.now();
  let b = buckets.get(key);
  if (!b || now - b.windowStart >= WINDOW_MS) {
    b = { count: 0, windowStart: now };
    buckets.set(key, b);
  }
  if (b.count >= maxPerWindow) {
    const retryAfterSec = Math.ceil((WINDOW_MS - (now - b.windowStart)) / 1000) || 1;
    return new Response(
      JSON.stringify({
        error: { code: "rate_limited",
                 message: `Maximum ${maxPerWindow} requests per minute. Retry after ${retryAfterSec}s.` },
      }),
      {
        status: 429,
        headers: { "Content-Type": "application/json", "Retry-After": String(retryAfterSec) },
      },
    );
  }
  b.count += 1;
  return null;
}

Design decisions

Fixed window, not sliding. Sliding requires storing timestamps; fixed needs one int + one window-start. Easier to reason about, easier to debug.
Per-process, not global. Acknowledged limitation — a multi-region deploy gets max × regions. For most use cases that's still a useful throttle. The pattern explicitly punts to Redis when global is needed.
Returns a 429 response, not throws. The caller pattern is if (limited) return limited; — same as the auth-deny pattern (P05). Compose linearly.
Retry-After header on the response. Standards-compliant; well-behaved clients back off correctly.
No bucket cleanup. Buckets accumulate per IP, but their values rotate. Long-running processes with high IP churn would benefit from cleanup; the current scale (one Next.js process, modest IP set) doesn't need it.

Tradeoffs

Strengths	Weaknesses
Zero deps; works the moment the route is written	Per-process means N regions = N × limit (not truly global)
In-process Map is microsecond-fast	Process restart clears all buckets — attackers can wait out crashes
`Retry-After` is correct + standard	Fixed window has the classic edge case (2× limit at the window boundary)
Caller composes the same way as auth-deny	No bucket eviction → unbounded growth over long uptime + diverse IP set

Citations

src/spokes/job-family-agent/core/rate-limit.ts — full implementation
Applied at the route boundary in the corresponding public POST handler

P17. Stateless MCP Gateway with Per-Request Server Bundle

Problem MCP (Model Context Protocol) is session-oriented by default — the spec has an Mcp-Session-Id header and the SDK supports a session-store. Serverless deploys don't have session affinity; a follow-up POST might land on a different instance. The classic fix (sticky sessions or shared session store) adds infrastructure. You want MCP to work on stateless serverless.

The Pattern

import { resolveConsumer, type ResolvedConsumer } from "@/lib/transport/auth";
import { createMcpSessionBundle } from "@/lib/transport/server";

export async function handleMcpRequest(
  request: Request,
  opts?: { spokeSlug?: string },
): Promise<Response> {
  // GET disabled — clients use POST-carried SSE only (fits serverless affinity).
  if (request.method === "GET") {
    return new Response(null, { status: 405, headers: { Allow: "POST, DELETE" } });
  }

  const consumer = resolveConsumer(request);
  if (!consumer) {
    return new Response(
      JSON.stringify({
        error: "Unauthorized",
        accepts: ["Authorization: Bearer <key>", "x-service-key: <key>"],
      }),
      { status: 401, headers: { "Content-Type": "application/json" } },
    );
  }

  // DELETE is a no-op in stateless mode — there's no per-session state.
  if (request.method === "DELETE") {
    return new Response(
      JSON.stringify({ jsonrpc: "2.0", result: { note: "stateless gateway" }, id: null }),
      { status: 200, headers: { "Content-Type": "application/json" } },
    );
  }

  if (request.method !== "POST") {
    return new Response(null, { status: 405, headers: { Allow: "GET, POST, DELETE" } });
  }

  let rawBody: unknown;
  try {
    rawBody = await request.json();
  } catch {
    return new Response(
      JSON.stringify({
        jsonrpc: "2.0",
        error: { code: -32700, message: "Parse error: invalid JSON" },
        id: null,
      }),
      { status: 400, headers: { "Content-Type": "application/json" } },
    );
  }

  // Stateless: every POST gets a fresh server bundle. Configured with
  // `sessionIdGenerator: undefined` so the transport returns no session-id
  // header and doesn't reject non-initialize POSTs for missing session state.
  const { transport } = await createMcpSessionBundle({
    consumer,
    spokeSlug: opts?.spokeSlug,
  });
  return transport.handleRequest(request, { parsedBody: rawBody });
}

Design decisions

GET = 405. The MCP spec supports standalone SSE on GET; we disable it. POST-carried SSE only — every "session" is a single POST that returns a streaming response.
sessionIdGenerator: undefined on the transport. This tells the SDK to skip session-ID generation and skip the "must be initialized" check on non-initialize POSTs. The cost is no cross-request session memory; the benefit is no infra requirement.
DELETE is a 200 no-op. Spec-compliant for clients that issue cleanup, harmless when there's nothing to clean.
Auth before parse. Unauthorized requests don't waste CPU on JSON parsing.
spokeSlug opt narrows the bundle. When the request is /api/transport/services/widget, the bundle only registers widget.* tools. Per-service endpoints have stricter visibility than the global /api/transport.

Tradeoffs

Strengths	Weaknesses
Works on serverless without sticky sessions or shared state	Multi-step tool flows that depend on session state need to encode state into tool arguments
Every POST is a fresh, isolated bundle — no cross-request data leakage	Fresh-bundle creation has latency overhead per call
Per-service narrow endpoints reduce attack surface	Clients expecting standard MCP session semantics need transport-layer adapter to fit POST-only
Auth-then-parse minimizes wasted work on unauthorized calls	DELETE-as-noop is spec-compliant but functionally inert; clients that rely on session cleanup get no real-world cleanup

Citations

src/lib/mcp/gateway.ts — handleMcpRequest
src/lib/mcp/server.ts — createToolboxMcpSessionBundle (the fresh-per-request bundle factory)
src/lib/mcp/session-store.ts — kept as a no-op shape for forward compatibility

P18. Discriminated-Union Response with Block-vs-OK Status

Problem A query endpoint sometimes returns data and sometimes returns "we refused to answer for privacy reasons." Returning null data with a side-channel error loses information; throwing an exception breaks the "this is a successful HTTP request that happened to refuse" semantics. You want one typed response shape where status: "blocked" and status: "ok" carry different fields.

The Pattern

import { z } from "zod";

const AnonymitySchema = z.object({
  threshold: z.number(),
  respondentCount: z.number(),
  allowed: z.boolean(),
  reason: z.string().optional(),
});

// One response schema, two cases.
export const PreferenceWeightsResponseSchema = z.discriminatedUnion("status", [
  z.object({
    status: z.literal("blocked"),
    surveyId: z.string(),
    tenantId: z.string().nullable(),
    questionId: z.string().nullable(),
    anonymity: AnonymitySchema,
    // ...no weights field — TypeScript narrows it out.
  }),
  z.object({
    status: z.literal("ok"),
    surveyId: z.string(),
    tenantId: z.string().nullable(),
    surveyName: z.string(),
    questionId: z.string(),
    questionType: z.string(),
    method: z.string(),
    weights: z.array(z.object({ optionId: z.string(), weight: z.number() })),
    anonymity: AnonymitySchema,
    bySegment: z.array(z.object({
      segmentId: z.string(),
      respondentCount: z.number(),
      weights: z.array(z.object({ optionId: z.string(), weight: z.number() })),
      anonymity: AnonymitySchema,
    })).optional(),
  }),
]);
export type PreferenceWeightsResponse = z.infer<typeof PreferenceWeightsResponseSchema>;

// Caller side: TypeScript narrows by status.
function render(resp: PreferenceWeightsResponse) {
  if (resp.status === "blocked") {
    // resp.weights does not exist here — the narrow has hidden it.
    return renderBlockedNotice(resp.anonymity.reason);
  }
  // resp.weights is required here.
  return renderWeightsChart(resp.weights);
}

Design decisions

Discriminator field is status, a z.literal. The two cases are distinguished by an obvious string the caller switches on. Not a boolean (isOk: true) — strings are self-documenting in logs and devtools.
Block carries an anonymity object explaining the refusal. The caller can render a meaningful message ("threshold 5, got 3") without parsing strings.
OK carries the full payload; block carries the minimum. TypeScript narrows correctly — accessing weights on a blocked response is a compile error.
Both cases share fields where it makes sense. surveyId, tenantId, anonymity appear in both — they're stable identifiers that callers want regardless of status.
Zod discriminatedUnion, not regular union. Better runtime validation errors and TypeScript narrowing than a plain z.union.

Tradeoffs

Strengths	Weaknesses
Refusal-with-reason is a first-class response, not a 4xx	Adds API surface — every new status grows the union
TypeScript narrows the response correctly at the call site	Callers must remember to switch; ignoring the status accidentally accesses the OK shape
Zod validates the union at the boundary, catching shape drift	Discriminated unions are slightly more verbose than `{ ok: boolean; data?: T; reason?: string }`
Adding a third status (e.g., "partial") is additive, not breaking for OK consumers	Schema documentation tools sometimes render discriminated unions poorly

Citations

src/spokes/preference-modeler/contracts/types.ts — PreferenceWeightsResponseSchema
src/spokes/preference-modeler/core/preference-aggregate.ts — emits both shapes from the same function

P19. Idempotent Bootstrap Migration from Bundled JSON

Problem You're shipping a service whose data layer needs to be populated before the service can answer queries (a SOC code registry, a glossary, a seed corpus). You don't want a separate migration step, you don't want the first request to find an empty table, and you want subsequent redeploys to skip the work without re-checking every row.

The Pattern

import postgres from "postgres";
import * as fs from "node:fs";
import * as path from "node:path";

export class JsonbRegistry {
  private readonly sql;
  private readonly table = "registry_rows";
  private readonly bundledRoot?: string;
  private inited = false;

  constructor(connectionString: string, opts: { bundledRoot?: string } = {}) {
    this.sql = postgres(connectionString, { prepare: false });
    this.bundledRoot = opts.bundledRoot;
  }

  async init(): Promise<void> {
    if (this.inited) return;

    // Idempotent DDL — every boot ensures the table exists.
    await this.sql.unsafe(`
      CREATE TABLE IF NOT EXISTS ${this.table} (
        id text NOT NULL,
        type text NOT NULL,
        payload jsonb NOT NULL,
        created_at timestamptz NOT NULL DEFAULT now(),
        PRIMARY KEY (type, id)
      );
    `);

    // One-shot bootstrap: only runs when the table is empty AND a bundle exists.
    // Subsequent boots see populated tables and skip this branch entirely.
    if (this.bundledRoot && (await this.isEmpty())) {
      await this.migrateBundled(this.bundledRoot);
    }
    this.inited = true;
  }

  private async isEmpty(): Promise<boolean> {
    const [{ count }] = await this.sql<{ count: number }[]>`
      SELECT COUNT(*)::int AS count FROM ${this.sql(this.table)} LIMIT 1
    `;
    return count === 0;
  }

  private async migrateBundled(root: string): Promise<void> {
    if (!fs.existsSync(root)) return;
    for (const file of fs.readdirSync(root).filter((f) => f.endsWith(".json"))) {
      const parsed = JSON.parse(fs.readFileSync(path.join(root, file), "utf-8"));
      for (const r of parsed.rows ?? []) {
        await this.sql`
          INSERT INTO ${this.sql(this.table)} (type, id, payload)
          VALUES (${parsed.type}, ${r.id}, ${this.sql.json(r.payload)})
          ON CONFLICT (type, id) DO NOTHING
        `;
      }
    }
  }
}

Design decisions

CREATE TABLE IF NOT EXISTS every boot. Cheap, idempotent. No "is this a fresh schema?" check needed.
Empty-table check gates the bootstrap. First boot finds the table empty → migrates. Subsequent boots find rows → skip. No version table needed, no "was this already migrated?" flag.
ON CONFLICT (type, id) DO NOTHING as belt + suspenders. Even if two instances race the bootstrap, neither corrupts; both end with the same row set.
Bundle lives in the repo. JSON files in data/ are committed with the code. Deploys are self-contained — no separate seed step, no "did the seed run?" question.
Bootstrap is on-init, not on-first-request. The first request finds populated tables; users never see "warming up."

Tradeoffs

Strengths	Weaknesses
Deploys are self-contained; no separate migration step	Changing seed data after first deploy requires manual UPDATE — the bootstrap will skip
First request never sees empty tables	Bundle size adds to deploy artifact size
Idempotent under concurrent boots (ON CONFLICT)	Doesn't handle schema evolution — bootstrap only seeds, doesn't migrate existing rows
No version tracking required for v1	If a row is deleted in production, redeploy won't restore it (empty-table check only fires on fully empty table)

Citations

src/spokes/job-family-agent/core/service.ts — JSON-backed registry pattern (the SOC + family + function bundles)
src/spokes/job-family-agent/data/*.json — bundled seed files
See Principia P01 for a postgres-JSONB variant with explicit migration tracking

P20. Per-Module Lazy Import to Defer Side-Effectful Deps

Problem A module needs a function from another module — but importing it statically causes a side effect at module-load time. The classic case: importing a registry that transitively pulls in a DB client that throws when DATABASE_URL isn't set, which breaks every test that doesn't need the DB.

The Pattern

// catalog-only function — should work in tests without DB env vars.
export async function describeService(serviceSlug: string): Promise<ServiceDescription> {
  // Dynamic import keeps module evaluation light. The static-import alternative
  //   `import { getCatalog } from "@/lib/catalog";`
  // would pull `@/lib/catalog` which pulls `@/lib/register-tools` which pulls
  // spoke-specific modules, some of which import `@/db/client` and throw when
  // DATABASE_URL is unset.
  const { buildCatalogText, buildValidNameSet } = await import("@/lib/catalog");
  const catalogText = buildCatalogText();
  const knownNames = buildValidNameSet();

  // ...rest of the function uses catalogText / knownNames
  return { slug: serviceSlug, knownToolCount: knownNames.size };
}

// Lazy import for storage providers too:
export async function getStorage(): Promise<StorageBackend> {
  // Default backend imports the DB client; test callers supplying their own
  // storage never trigger that import path.
  return (await import("@/lib/storage/default-backend")).defaultStorage;
}

// Test path: caller supplies a mock storage, bypassing the lazy import entirely.
export async function getThing(opts?: { storage?: StorageBackend }) {
  const storage = opts?.storage ?? (await import("@/lib/storage/default-backend")).defaultStorage;
  return storage.fetch();
}

Design decisions

await import() instead of import. Top-level static imports run once at module-load. Dynamic imports run on first call, and only when the call path actually exercises the branch.
Pair with caller-supplied optional storage. Tests can pass their own backend without ever triggering the default backend's import path. This is the killer feature — pure unit tests need no env vars at all.
Document why above the dynamic import. "Why isn't this a static import?" is a question every reader asks. A one-line comment ("would pull X which pulls Y which needs Z env var") saves the next reader fifteen minutes.

Module-level cache the dynamic import. Repeated calls don't re-import (Node caches), but if the function is called in a hot path, capture the value:

let cached: Catalog | null = null;
async function getCatalog() {
  if (cached) return cached;
  cached = (await import("@/lib/catalog")).getCatalog();
  return cached;
}

Tradeoffs

Strengths	Weaknesses
Modules don't accidentally pull heavy deps into test environments	Dynamic imports lose static analysis (tree-shaking, deadcode) for the lazy branch
Pairs cleanly with dependency injection (optional storage param)	Type narrowing through `await import()` is awkward — explicit casts sometimes needed
`await import()` cost is paid once per module via Node cache	A growing number of lazy imports signals deeper coupling that should be refactored
The comment-above pattern documents the constraint inline	First call is slower (import + parse); subsequent calls are cached

Citations

src/lib/intent-router/interpret.ts — await import("@/lib/intent-router/tool-catalog") to avoid pulling spoke-registration side effects
src/lib/connectors/refresh-policy.ts getPendingLegislationJurisdictions — await import("@/lib/connectors/refresh-policy-storage") so test callers with custom storage never touch the DB-backed default

How to use this catalog

These patterns are not a checklist. They're a reference for "I'm building X — has anything like this shipped here?" When you see a match, read the production-validated original, copy the structure, then specialize.

For cross-product matches — when a pattern here also lives in DevPlane, Vela, Performix, Namesake, Principia, or Fourth & Two — the See also lines at the bottom of each pattern's Citations name the equivalent in the other repo. The cross-product equivalents are the architectural convictions that have hardened across the portfolio; if the shape recurs in three independent codebases, the shape is probably right.

When you ship a new pattern in PA Toolbox that meets the bar at the top of EXTRACTION-SPEC.md, add it here with the next available PNN. Don't pad; mediocre patterns dilute the strong ones.