Multi-faceted vs. dual-grade ingestion
Methodological note on the Vela ingestion architecture — when to favor the dual-grade pattern, when multi-faceted extraction wins.
Multi-faceted vs ingest:dual — comparison memo
ASN-689 acceptance artifact. Updated 2026-04-29 with empirical probe data from the path-B kickoff. Full-book live runs still open.
Why we run both
ingest:dual and the multi-faceted pipeline serve different consumers.
They are not substitutes; this memo is the running record of what each
produces on the same source.
ingest:dual | multi-faceted | |
|---|---|---|
| Consumer | Mosaic editorial · retrieval · audits | /sources/[code] profile · factory corpus surfaces |
| Output shape | Curator passages (~15) + research-bulk (~50–150) | Models / Constructs / Instruments / Factor model / Chapter summaries / Case studies (+opt-in Propositions / Supports) |
| Granularity | Passage chunks (~500 words) | Typed entities, per chapter |
| Per-book cost (sonnet 4.6, default tier, with chapter caching) | ~$0.30 (embed + chunk LLMs are cheap) | $4.30–$7.54 (12-chapter book) |
| Per-book cost (deep tier, +propositions) | n/a | $6.46–$12.34 |
| Fail mode if absent | Retrieval against the source returns unrelated high-similarity hits | /sources/[code] panel is empty for the source |
Probe data — Maslow, A Theory of Human Motivation (60KB chunk, 2026-04-29)
Path-B validation runs on a single representative chunk.
Probe 1: entities bundle vs three singles control (untuned)
| Bundle | Singles | |
|---|---|---|
| Models | 1 | 1 |
| Constructs | 10 | 8 |
| Instruments | 0 | 0 |
| Cost | $0.115 | $0.252 |
| Stop | clean | clean |
54.1% savings; comparable counts; bundle's first-model fields slightly
shallower than singles (e.g. key_meta_analytic_correlates: [] vs 5
entries; aggregation_rules shorter).
Probe 2: entities bundle (tuned with FIELD COMPLETENESS) vs singles
| Bundle | Singles | |
|---|---|---|
| Models | 1 | 1 |
| Constructs | 9 | 9 |
| Instruments | 0 | 0 |
| Cost | $0.121 | $0.256 |
| Stop | clean | clean (1 parse error on instruments singleton) |
52.9% savings; counts match; bundle's aggregation_rules is now richer
than singles (captured the specific 85/70/50/40/10 satisfaction
percentages). Crucial finding: singles hallucinated
first_publication_year: 1943; bundle correctly returned null because
the chunk contains zero mentions of 1943. The FIELD COMPLETENESS rule
made the bundle more source-faithful, not just cheaper. Memory:
feedback_bundled_prompts_source_fidelity.
Probe 3: argument bundle (factor_model + propositions + supports)
| Result | |
|---|---|
| stop_reason | max_tokens |
| Output truncated | yes, mid-string at position 46274 |
| Cost | $0.257 |
The bundle exceeded the 12K output-token budget. Propositions extraction is intrinsically high-output (20–50+ per chapter, each ~50 tokens). The bundle pattern doesn't fit propositions. Decision: split argument back — factor_model becomes a singleton (small output), propositions becomes a separate opt-in single.
Probe 4: narrative bundle (chapter_summary + case_studies)
| Result | |
|---|---|
| stop_reason | clean (end_turn) |
| chapter_summary | 1 (rich — logline + pitch + story_script + 12 takeaways + 7 key_questions + 10 misconceptions) |
| case_studies | 7 (each typed and outcome-coded) |
| Output tokens | 15,442 of 16,000 budget |
| Cost | $0.308 |
Works at the limit of the 16K budget. High-value output but expensive.
Per-bundle decision (path-B post-probe)
| Bundle | Status | Cost/chapter | In default tier? |
|---|---|---|---|
entities | ✓ ship | ~$0.12 | yes |
factor_model (singleton) | ✓ ship | ~$0.05 | yes |
narrative | ✓ ship | ~$0.31 | yes |
propositions (singleton) | ✓ ship | ~$0.20–$0.40 | opt-in via --include-propositions |
Tier model
Mike, 2026-04-29: "books we need or want to go deep on, or that we intend to do a broader guide based on (in synthesis with other hand selected books) — then we get the more expensive propositions. That until we are flush with money and can spend the extra money to just have it on hand."
- Default tier (entities + factor_model + narrative): $4.30–$7.54 per 12-chapter book. Always run on vela-tagged sources we want public source profiles for.
- Deep tier (+propositions): $6.46–$12.34 per book. Reserved for
books that anchor a guide / cross-source synthesis or that we want
full argument-graph treatment on. Toggle:
--include-propositions.
What multi-faceted captures that ingest:dual does not
Concrete examples from the Maslow probe outputs:
- Named theoretical models with explicit factor structure. The
entities bundle returned Maslow's Hierarchy as a typed entity with
primary_factors: [Physiological, Safety, Love, Esteem, Self-Actualization, Self-Transcendence, Cognitive Needs],aggregation_rulescapturing the partial-satisfaction percentages (85/70/50/40/10), and acluster_category: Motivation. Passage chunks fromingest:dualcould not surface this in queryable form. - Constructs as cards. 9 typed constructs (Physiological Needs, Safety Needs, Self-Actualization, Self-Transcendence, Prepotency, …) each with definition + level + linked models. A query for "self-actualization" hits the construct card directly rather than passage excerpts.
- Chapter summary as a story-shape. logline, pitch, character / problem / guide / plan / call-to-action, transformative_journey, key questions, misconceptions, takeaways. Useful for editorial reading guides built from a source.
- Case studies typed and outcome-coded. "The Chronically and Extremely Hungry Man" / "The Psychopathic Personality" / "Martyrs and Idealists" — each with type, outcome_type, conflict, illustrates, themes, quotes.
What ingest:dual captures that multi-faceted does not
- Long-form passage retrieval for any claim. Multi-faceted indexes typed entities; it does not replace passage-level evidence retrieval.
- Magazine-grade pull quotes with charge tier and arc-stage tagging (curator pass).
- Embedding-backed similarity at passage level. Multi-faceted entities are not embedded in this iteration (follow-up scope).
Cost / time per book
Filled per book after the first live full-book run. The chunk-level probe data already published above.
| Book | book_key | dual-grade passages | multi-faceted entities | tier | run log | actual cost |
|---|---|---|---|---|---|---|
| TBD | TBD | TBD | TBD | TBD | TBD | TBD |
| TBD | TBD | TBD | TBD | TBD | TBD | TBD |
| TBD | TBD | TBD | TBD | TBD | TBD | TBD |
Decision matrix
When a vela-tagged book lands, which pipelines should we run, in what order?
- Always run
ingest:dualfirst. Cheap; fixes retrieval; everything else depends on a queryable corpus. - Run default-tier
multi-facetedfor vela-audience books we want to feature on the source profile. Reserve for sources reached from magazine articles, museum tours, or the reading list. - Add
--include-propositionsfor books that anchor a guide — broader cross-source synthesis where the argument-graph queries pay off. Mike's "until we're flush" tier. - Skip multi-faceted entirely for factory-internal sources unless we later want a per-source factory studio surface. The current ASN-685 panel is public-facing; factory-internal extraction adds spend without UI payoff.
Open questions
- Should multi-faceted entities also be embedded for retrieval (parallel
to
mosaic_passages.embedding)? Decide once ASN-690's suggestion engine for the magazine editor defines the retrieval shape. - Aggregator + referee passes are scaffolded in the design doc but not
yet wired in the orchestrator's
--executepath. Add when the per-chapter loop is validated end-to-end on a real book. - Long books (>1MB source, chapters >200KB each) will land toward the high end of the cost envelope because input-cost grows. May want a sub-chapter split strategy honoring sub-headers (per book-ingestion rule 2). Defer until a real long-book run shows the actual hit.
How to extend this memo
After every multi-faceted run:
- Append a row to the Cost / time per book table with concrete numbers.
- If the run surfaced something the prompt set didn't anticipate, note it in What multi-faceted captures.
- If a bundle's actuals differ materially from the envelope, update the
per-bundle envelope in
lib/factory/multi-faceted-pipeline.tsand re-publish here. - Revise the decision matrix if a pattern emerges.
The run log lives at data/factory-outputs/research/run-logs/<book_key>-<ts>.json
and is the source of truth for these numbers.