Many designs. One conviction. Feedback makes people — and A.I. — better. Data workflow is the spine.
A range of analytical products across coding, fantasy football, baby naming, figurative art, AI-augmented authorship, and enterprise people analytics, including human performance and all that relates. They sound unrelated. They are all instances of the same loop — measurement in, reflection within, better decisions out — applied to whatever you care about.
Projects
DevPlane
A cockpit for multi-tool software development — assignment registry, two-phase actor handoff, coordination-event log. The operator-side measurement layer that AI coding tools' agent-side metrics miss.
The problem
AI coding tools' productivity claims rest on agent-side measurements — lines produced, tasks completed, time-to-PR. If the Ironies of Automation are operative — operator vigilance falling as agent reliability rises — those measurements systematically overstate net effect. There is no operator-side cockpit catching the loss.
What I built
A multi-agent kanban with a completion-block protocol that tracks per-card execution across heterogeneous AI tools. Continuous coordination-event log. Two-phase actor handoff (builder → reviewer) where the second transition requires an artifact only the reviewer can produce. Cross-tool sync via a hub SDK so an operator coordinates Cursor, Claude Code, Replit, and other agents through one board. The continuous production telemetry that runs the C1 risk-compensation field study — a pre-registered test of Bainbridge 1983 in real coding work, with hypotheses, analysis plan, and falsification criteria specified before data accumulates.
What's novel
01Two-phase actor handoff (builder → reviewer) where the second transition requires an artifact only the reviewer can produce — enforces review without trusting it
02Coordination-event log as a research instrument, not just an audit trail — the apparatus for the C1 risk-compensation field study
03Completion blocks as a protocol — every assignment ends with structured machine-readable completion, not free-text close-out
04Hub-and-spoke sync between heterogeneous AI tools so an operator coordinates Cursor + Claude Code + Replit + custom agents through one board
Outcome
Private. The operator-side coordination spine for the multi-app portfolio. Live, measuring, instrumented for the C1 field study.
DevPlane exists because the productivity claims being made for AI coding tools are largely grounded in agent-side measurements — and those measurements systematically miss what an operator running multiple agents actually has to do. The bet: build the operator's cockpit, instrument it, and run a pre-registered field study against the agents-on-tap-make-everyone-faster claim. Either the data validates the claim or it qualifies it; either way it is more honest than what the field has today.
A fantasy football platform with a magazine front and a strategy engine back — SI-style covers, Monte Carlo decisions, weekly LLM-generated newsroom stories.
The problem
Fantasy products are either marketplaces with shallow projections or hardcore stat tools that do not make a case. The middle — readable intelligence with a point of view — is empty.
What I built
A multi-app monorepo: a public newsroom (apps/web), a GM workflow console (gm-console), a strategy portal with cover-art game modes (apps/strategy), and a Python analytics API. Live MFL adapter, projections provider, Monte Carlo strategy engine, weekly LLM-generated stories under editorial discipline.
What's novel
01Newsroom-as-product: weekly LLM-generated stories with editorial structure, briefs, and StoryNumbers
02Monte Carlo strategy engine running per-decision (fourth-down, lineup, waivers)
03Multi-app monorepo with formal adapter contract for swapping fantasy providers
04Magazine-grade cover art system for each game mode (the-pick, the-matchup, survivor, drive-duel, fourth-down-gambit)
Outcome
Private. Multi-app monorepo with engine, adapters, magazine, and analytics API in active development.
Fourth & Two is the bet that intelligence is more readable when it has a sensibility — a magazine voice on the front, a Monte Carlo engine on the back, and editorial discipline binding them. The platform exists because the middle — analytics with a point of view — was empty.
Intentional baby naming, instrumented — live trend frequency, sibling-set acoustic checks, narrative association from a curated literary corpus.
The problem
Most baby-naming tools are SEO listicles. They cannot tell you whether a name is rising or fading this month, whether it sounds right next to your other kids, or what its narrative texture actually is.
What I built
A naming tool that pulls live trend data, scores phonetic and sibling fit, and surfaces narrative associations from a literary corpus. A calibration loop that learns parent preferences over the session.
What's novel
01Live trend frequency from external data sources, not stale annual rankings
03Narrative association layer pulled from curated literary references
04Within-session preference calibration so the tool gets sharper as parents use it
Outcome
Private beta. Decision tool for one of the highest-stakes irreversible decisions parents make.
Naming a child is high-stakes and irreversible. Most tools treat it as entertainment. Namesake treats it as a decision — with live signal, acoustic structure, and literary depth — because that is what the decision actually warrants.
A contemplative platform that began with fine-art figurative work and has broadened into an adaptive-authorship substrate — magazine paced per-reader, Editorial Office of staff voices, Penwright in /labs, and three editorial axes (figurative response · emotion architecture · developmental theology).
The problem
Image platforms either flatten taste into engagement metrics or hide behind gatekeepers. Editorial platforms publish on calendars rather than to readers. Neither produces a reading rhythm. Neither learns from you. Vela is built on the bet that a single substrate can do both — taste-driven figurative discovery and longform editorial work — when adaptive measurement runs underneath.
What I built
385 active works pulled from museum APIs (ARTIC, Met, BnF, Smithsonian, Europeana) under full attribution and license discipline. The Reincarnation engine learns per-reader desire and pool composition across visual rhyme and emotional register. A magazine with original fiction, editorial criticism, and three load-bearing editorial axes — figurative response, emotion architecture, developmental theology — paced per-user (each reader's magazine begins when they arrive, not on a calendar). Penwright lives in /labs/penwright (F-03 Authorship Packet UI MVP shipped; F-19 Adaptive Authorship Control Kernel is the spine). Editorial Office: Writer's Desk for 1:1 with each writer; The Office for multi-writer convening with round-2 react. Stripe membership in live mode. Derivative pipeline that produces new transformative works under license.
What's novel
01Reincarnation engine: per-user desire scoring with RID/SID adaptive measurement and visual-rhyme sequencing
02Per-user magazine pacing — each reader's editorial schedule begins when they arrive; positioning wedge is "your magazine begins when you do"
03Three editorial axes (figurative response · emotion architecture · developmental theology) coexist on a single substrate; emotion is now the core editorial axis (post the 2026-04-30 pivot)
04Editorial Office — Writer's Desk + multi-writer convening turns the writer roster from production tool into colleagues
05Adaptive-authorship substrate underneath — Vela is property #1; siblings reuse the substrate
06Museum-grade attribution and license discipline as a first-class feature, not a footnote
Outcome
Live at vela.study. Stripe membership in live mode. Magazine publishing weekly. ~1,300 commits since project start. Penwright in early build inside the same repo; Editorial Office and per-user magazine pacing in active development.
Vela began as a bet that taste compounds when given a substrate. The substrate is the asymmetry: AI holds the survived corpus, humans hold the unsurvived response. Vela is the place where those two meet — careful sourcing on one side, calibrated human signal on the other, and a magazine for the language in between. The bet has broadened: the same substrate now hosts Penwright (authorship system in /labs), the Editorial Office (writer collaboration), per-user magazine pacing, and three editorial axes that coexist without collapsing. It is also the reference implementation for an adaptive-authorship platform that future siblings will sit on top of.
An AI-augmented authorship system — corpus control, packet-shaped composition, and a measurement framework that asks whether the writer is better with it, than without it, in six months.
The problem
Most AI writing tools optimize for output fluency. They make it easier to produce something faster — and that something is often shaped by the model rather than the writer. The longer-term cost (capability erosion, voice flattening, sycophancy spirals, source attribution buried) is barely measured because the field measures what is easy to measure. The result is a generation of tools that look like assistants and act like substitutes.
What I built
An authorship environment that inverts the prompt-then-edit pattern. Writers assemble Authorship Packets — intent · structure · key ideas · relevant passages · counterpositions — before the AI is invoked. Corpus selection is explicit: writers choose which sources influence the work rather than inheriting the model's training distribution. The Adaptive Authorship Control Kernel (F-19) is the spine — central registry of skill measurement, intervention, and genre-aware behavior (memoir / nonfiction / fiction never collapsed). The Penwright Measurement Framework — six skill dimensions, six derived indices, three measurement layers, five-step learning loop, and four non-negotiable failure modes — determines whether a session made the writer better. Lives inside Vela's repo (app/labs/penwright/) for now; graduates when the design stabilizes.
What's novel
01Authorship Packet Model — replaces freeform prompting with structured input units; the structure itself is data
02Corpus Control Layer — writer selects sources rather than inheriting the LLM's training distribution
03Adaptive Authorship Control Kernel (F-19) — central registry of measurement and intervention; genre-aware behavior forks copy + schema enums + prompts + metrics rather than collapsing them
04Penwright Measurement Framework — first multi-dimensional measurement system for AI-augmented writing skill development; four non-negotiable failure modes (output-only optimization · over-automation · weak measurement · ignoring genre differences) act as veto
05Anti-invention constraint — when a structural rhetorical move requires biographical material the user has not supplied, the tool refuses to render rather than confabulating
06Has its own published research program at peopleanalyst.com/research/ai-human-interaction (12-paper Penwright Research Program across three tiers)
Outcome
Early build inside Vela's repo (app/labs/penwright). F-03 (Authorship Packet UI MVP) shipped. F-19 (Adaptive Authorship Control Kernel) is the architectural spine; it ships first or in parallel with the first feature. 19 features (F-01..F-19) sequenced across 6 implementation waves; 79 ASNs in flight.
Penwright exists because the field of AI writing is being measured by output and not by capability. The longitudinal test — better writer with Penwright, than without it, in six months — is unfashionable but load-bearing. The alternative bet — better outputs faster, optimization toward fluency — is the bet most of the field has already taken. Penwright is the bet on the other side: that writers can become more capable inside an AI-augmented environment, and that this can be measured rigorously enough to fail on its own terms. Seven non-negotiable rules in §7 of the vision doc act as the spine for every product decision (don't build generic AI writing features · don't collapse genre distinctions · don't hide source attribution · don't flatten emotional nuance · don't optimize for speed over authorship · don't make AI compliant · don't over-moralize).
Performix
Performance review is the densest behavioral data in HR — rater bias, calibration politics, distribution shape, motivational consequences of every rating call. Performix treats it that way. First vertical port of the library-core pattern outside Vela.
The problem
Most products treat performance review as a transaction record (HRIS posture) or a dashboard (BI posture). Neither matches the posture the data warrants — comparative, longitudinal, distribution-aware, small-N-honest, decision-support-shaped, behavioral. And the data carries plenty: rater bias (halo, recency, leniency, severity), distribution politics (forced ranking, ceiling compression, sandbagging-under-cap), multi-rater agreement, longitudinal career-arc shape, the motivational consequences of every rating call. Among the most poorly-instrumented data in any organization.
What I built
Early build. Two simultaneous bets. The analytical bet: a behaviorally-grounded posture — rater dynamics treated as signal not noise, multi-rater agreement (ICC, kappa) surfaced not averaged away, distribution comparison across managers and units, longitudinal career-arc shape under small-N inference (most managers rate <10 people) — produces insights neither HRIS-posture products nor BI-posture dashboards can. The architectural bet: Performix is the first vertical port of the library-core multi-property pattern outside Vela — testing whether the dual-grade research-ingest, adaptive measurement, and canonical-vocabulary discipline host a non-editorial vertical cleanly. Substance arrives as the research-ingest extraction lands in MetaFactory's packages/research-ingest/ and Performix becomes its first consumer.
What's novel
01Behavioral posture on performance data — rater dynamics, calibration politics, and distribution shape treated as the signal, not the noise to be averaged away
02Multi-rater statistics surfaced where most products show only the average — ICC and rater-agreement diagnostics make compression and rater-effect bias legible
03Longitudinal career-arc analysis under small-N constraints — most managers rate <10 people; the stats discipline has to match
04First non-Vela consumer of the dual-grade research-ingest pipeline — a real port test of the library-core pattern
05If the substrate ports cleanly to a vertical that shares no editorial subject matter, the cross-property architecture is real; if it doesn't, the substrate is Vela-specific and the multi-property bet has to be re-thought
Outcome
Early build. Repo seeded; substance gated on the research-ingest extraction landing in MetaFactory.
Performix is two bets at once. The analytical bet: performance review data is behavioral (rater dynamics, distribution politics, motivational consequences of every rating call) and statistically tractable under the right discipline (multi-rater agreement, distribution comparison, longitudinal shape under small-N) — and the field today has neither posture. The architectural bet: the library-core substrate from Vela — adaptive measurement, dual-grade research-ingest, canonical vocabulary — ports into a non-editorial vertical, or it doesn't. If both land, you have a behaviorally-honest performance product on a portable substrate. If either fails, the failure is informative.
MetaFactory
Production-factory monorepo for the portfolio — competency models, personas, survey instruments, job profiles, pay analyses, all extracted from a dual-grade research corpus with cryptographic provenance. Internal substrate; every artifact it ships is human-facing.
The problem
Cross-cutting infrastructure — book ingestion at chapter-respecting fidelity, schema-conformant extraction of behavioral constructs, job/competency/persona generation, survey factories — gets re-implemented per consumer when there's no production-factory substrate. The cost isn't only engineering: each consumer ends up with its own slightly-drifted definition of competency, persona, engagement, effect-size, and behavioral constructs that should be comparable across products become incomparable. The half-renovated state is worse than either decision: a 'Universal Information Factory' framing that tried to do too much, the resulting analytics-vs-production drift, and a multi-hundred-file cleanup-doc midden in the repo root that documents an effort that didn't land. The substrate has to be narrowed and its purpose made legible in 30 seconds.
What I built
A monorepo of production factories — book ingestion (collector / organizer / referee with statistical-quality referee gates); job-matching from extracted role-DNA; competency models extracted from organizational-psychology canon; persona factory; survey-instrument factory with item-bank-honest construct mapping; pay; business-plan; variableizer — plus core infrastructure (asset registry, storage resolver, schemas, prompts, deep-research agent). Currently undergoing the consolidation drive (ASN-1003): KEEP/DROP rule fully executed (production factories KEEP; analytics offerings DROP); the dual-grade research-ingest pipeline migrated in from Vela; the cleanup-doc midden archived; CLAUDE.md / AGENTS.md / README rewritten to actual current state.
02Dual-grade corpus ingestion — same database holds editorially-selected curator passages and bulk research chunks, distinguished by tag
03Cryptographic provenance contract — SHA-256 tracked for every source file; safe-delete invariants require hash verification on backup and durable storage before any local delete
04~$0.13 per research-run synthesis at 30K+ passage scale — most 'AI research' tools run 10–100× more expensive because they retrieve without pattern extraction. The discipline is statistical: extract patterns once, cite the evidence, don't re-retrieve
05Schema-extracted measurement vocabulary (@measurement/core) shared across consumers — constructs, items, instruments, effect-sizes defined once in canonical form, so behavioral measurement compares cleanly across Performix / Principia / PA Platform / the toolbox. Cross-product comparison becomes structurally possible rather than aspirational
Outcome
Private. Five-track consolidation in flight (audit · pruning · production-factory finalization · research-ingest migration · consumer rewiring). Eight sub-ASNs (ASN-1004..ASN-1011) under the parent (ASN-1003).
MetaFactory exists because cross-portfolio infrastructure can't be re-built per consumer and can't sit half-renovated forever. The narrowing decision is made — production over analytics — and what's outstanding is executing the cuts, finalizing what remains, and lifting Vela's research-ingest pipeline back in (because Vela built dual-grade ingestion when MetaFactory's API-based approach lost fidelity, and that pipeline now belongs in the substrate). The interesting bet is whether a focused production-factory substrate compounds across consumers fast enough to justify the consolidation cost. Underneath the systems argument is a measurement argument: every artifact a factory ships becomes an input to a human decision somewhere — a competency model becomes a development plan, a persona shapes a product call, a survey instrument runs at a client. The substrate has to be narrowed not just for engineering legibility, but because the artifacts go to people who need them defensibly grounded.
People Analytics Toolbox
Seven independently-versioned analytical microservices for people analytics — psychometric diagnostics, preference modeling, privacy primitives, segmentation, statistical enrichment, compensation logic, decision forecasting — deployed as a single Next.js application and exposed over two transports: HTTP for engineers, MCP for AI agents. One Vercel project, one Supabase project. The behavioral and statistical substrate consumer apps compose against.
The problem
HR analytics products treat behavioral data as if it were transactional. Engagement gets reduced to a survey score; performance to a rating average; retention to a churn rate; compensation to a band. The richer questions — what actually drives engagement in this organization, what signal in performance distributions matters for decisions, what statistical posture handles small-N segmentations honestly, what value would more information have before another study runs — get lost. Cross-cutting concerns (anonymization, metric calculation, segmentation, survey delivery, decision support) also get re-implemented per product, brittle and fragmented. The combination — wrong analytical posture and fragmented infrastructure — is the field's default state.
What I built
Seven live spokes — reincarnation (adaptive psychometric diagnostic engine; IRT-weighted item selection; pool-based item lifecycle), preference-modeler (Likert/multi-choice/free-text plus MaxDiff, conjoint, penny allocation, paired comparison; BIBD-balanced task generation; MNL utility estimation via Newton-Raphson), data-anonymizer (PII detection, deterministic HMAC tokenization, k-anonymity min-N gate, substitution-strategy registry), segmentation-studio (HRIS canonical-field normalization with 35-field priority catalog; multi-membership cohort resolution; OneModel adapter; recipes; pack publishing), calculus (statistical enrichment, anomaly detection, time-series imputation, metric × segment × period combinatorial factory; auto-selects Wilson / t-interval / normal CI), anycomp (comp models, market band math, stateless evaluation, auditable cycle runs), forecasting (Monte Carlo simulation, EVPI, discrete EVSI on aligned-chance decision trees) — plus a reserved namespace job-family-agent whose canonical home is meta-factory-prod. Each spoke owns its own schema, contract, and audit trail. 51 MCP tools, 48 HTTP routes, hand-authored tool descriptions with whenToUse / whenNotToUse disambiguation. Per-route structured logs plus per-tool audit rows in mcp.mcp_audit. Consumer apps vendor the typed contracts; the algorithms live here.
What's novel
01Substrate, not product. Consumer apps (Performix, vela, future analytical products) vendor only the typed Zod contracts; the algorithms stay in the toolbox. Adoption is spoke-by-spoke; no all-or-nothing migration.
02AI-native by construction. Every spoke is callable from AI agents over MCP (Model Context Protocol) without bespoke integration. Per-consumer auth, scope-restricted keys, fire-and-forget audit log. Performix migrated to MCP transport 2026-05-11 as the first external consumer; devplane operates the wildcard key.
03Behavioral science in the algorithms, not bolted on. Reincarnation runs IRT a-parameter-weighted adaptive item selection with Cronbach α tracking. Preference-modeler runs MNL utility estimation on real MaxDiff/conjoint designs. Calculus auto-selects the right confidence-interval method by data shape. These are textbook psychometrics and choice theory implemented as service APIs.
04Privacy is a service, not a setting. Data-anonymizer is cross-cutting — every spoke that surfaces team-level rollups calls min-N-check before responding. Anonymity-gated aggregations return blocked status below the floor; tokenization is deterministic and cache-backed; substitution strategies (mask / pseudonymize / synthetic-realistic) live in a registry.
05Systems × survey × behavioral-science join is first-class. Segmentation-studio normalizes HRIS canonical fields; data-anonymizer makes the join safe under k-anonymity; calculus enriches the joined records into MetricEnvelope objects. The same envelope shape carries data from a Workday extract, a survey response, or a derived rollup — consumers don't care which.
06Explicit contract versioning. Every spoke ships CONTRACT_VERSION; every additive change is a semver bump; every breaking change is a major bump with affected-consumer notes. Consumers vendor a copy and re-vendor on major bumps. The deploy boundary is clean.
Outcome
Seven live spokes plus one reserved namespace. 51 MCP tools (3 toolbox-discovery + 48 spoke); 48 HTTP routes. All health endpoints green; mcp.mcp_audit writing real rows; database migrations auto-run on Vercel production builds (DP-134). First external consumer (Performix) migrated to MCP transport 2026-05-11. Solo build.
The toolbox exists because every HR analytics product I worked with kept re-implementing the same five things — anonymization, metric definitions, segmentation, surveys, decision support — and getting each one slightly wrong. Building them once, well, and letting verticals consume them is the architectural bet. Two things changed in the last twelve months that sharpened the bet. The first was the narrowing decision: an earlier roster of broader spokes was cut down to seven that actually pull weight at production scale. The second was MCP. Once the algorithms can be called directly from AI agents — typed, scoped, audited — the toolbox is no longer a back-end someone else's UI sits on top of; it is the legible service substrate that both engineering teams and AI consumers compose against. Underneath all of that the original measurement bet is unchanged: HR analytics works only when behavioral science and statistical rigor are first-class. Constructs defined defensibly. Anonymity thresholds enforced in the contract, not in a settings page. Decisions getting value-of-information treatment rather than dashboard intuition. Small-N segmentations handled honestly rather than averaged into uselessness. The architecture is what makes one operator productive at the scale of a software company.