peopleanalyst

parts / capability / glass-ox

Glass Ox (tested, transparent data coding)

The standard for every step of data work in the toolbox: tested, transparent, fail-loud data coding you can see through — an AI-native Alteryx. Every step is visible, every assumption is asserted, and a step that violates an assertion halts with a stated reason instead of silently defaulting. A primitive engine plus a spoke that persists a durable, queryable run history.

Data·origin: people-analyst·source: people-analyst/devplane/docs/CAPABILITIES/glass-ox.md
Glass Ox (tested, transparent data coding) — screenshot

Glass Ox (tested, transparent, AI-native data coding)

Type: data Origin repo(s): people-analyst (the People Analytics Toolbox) — the Glass Ox primitive (src/lib/glass-ox/) plus the glass-ox spoke that persists its run history Extraction readiness: live — the primitive is the authoritative engine; the spoke adds durable run history, a read API, a service-key-gated run endpoint, and an MCP surface Depends on: the data-profiling, plan-running, provenance, and mapping-cascade primitives it unifies Last reviewed: 2026-06-08

What it is

The standard for every step of data work in the toolbox: tested, transparent, fail-loud data coding you can see through. Think of it as an AI-native Alteryx — the data-coding workhorse, except every transformation step is visible, every assumption is asserted, and a step that violates an assertion halts with a stated reason instead of silently defaulting.

The name is the promise: the ox is the workhorse that does the heavy data lifting; the glass is that you can see inside every step it takes. Where a normal pipeline quietly fills a missing value or lets a bad coding slip through, Glass Ox surfaces it — and where it cannot proceed honestly, it quarantines the offending rows with a reason rather than producing a clean-looking wrong answer.

Who it's for

The data professional who has been burned by a silent-default bug — the join that quietly dropped half the rows, the column that collapsed to one dominant value and poisoned every model downstream — and now wants every coding step they ship to be visible, asserted, and fail-loud. It is for the team that has to trust the inputs before they trust the analysis: the comp pipeline coding pay levels, the segmentation run normalizing HRIS fields, the analyst preparing a dataset for a fairness check. The concrete outcome is a run report that shows each transformation, the assertions that had to hold, and — when one fails — a halt with a stated reason plus quarantined rows, instead of a clean-looking wrong number. It is positioned as the standard for data work in the toolbox, not an optional linter: an AI-native Alteryx where the work is legible because you can see and check it, not because you were told to trust it.

How it works

A run is a sequence of coded steps. Each step runs through an assertion catalog — the checks that have to hold for the result to be trustworthy. A canonical example: a concentration assert that catches a column that has silently collapsed to one dominant value (the kind of bug that turned a real job-level field into 84% one level — garbage that would have poisoned every downstream model). When an assertion fails, the run halts loudly and the report says exactly what failed and why.

The primitive is the authoritative engine — it runs the steps, applies the assertions, builds the run report, and renders the steps for a data-lens view. The spoke that sits beside it owns the durable side: a Postgres history of plans, runs, steps, assertions, and quarantines, a versioned wire contract, read APIs to query past runs, a gated endpoint to execute a registered plan and persist it, and an MCP surface so an agent can run and inspect coding work the same way a human can.

Why it is shaped this way

  • Fail loud, never silent-default. The whole point is to catch the silent-default bug class — the column that quietly became one value, the join that quietly dropped half its rows. A halt with a reason beats a clean number that is wrong.
  • Every step visible and tested. A data lens shows each transformation; the assertion catalog tests each one. Trust comes from being able to see and check the work, not from being told to trust it.
  • Quarantine with a reason. Rows that cannot be coded honestly are set aside with an explanation, not forced through.
  • Primitive owns the engine; the spoke owns the history. The engine never depends on the persistence layer, so the standard travels even where the run history does not.
  • CSV ingestion + column detection — the front door whose output Glass Ox codes.
  • AI field mapping → canonical schema — a coding step Glass Ox makes visible and testable.
  • Statistical analysis engine — the trust-the-numbers layer that assumes clean, tested inputs underneath.