peopleanalyst

research / vela / reports

Emotion corpus expansion (2026-04)

Notes on the emotion-corpus expansion plan.

By Mike West

Vela·Reports·source: people-analyst/vela/docs/research/emotion-corpus-expansion-2026-04.md

Emotion corpus expansion — sourcing log (ASN-930)

Wave: Emotion Surfaces (ASN-930..938)
Goal: Broaden CC0/PD figurative coverage so crowd-rating pools (ASN-934) are not biased toward grief/yearning/tenderness-adjacent tones.
Image-level emotion labels still flow through passage ↔ image pairings (ASN-931); this doc tracks library growth and passage-side counts as a tilt proxy.

Underserved primaries (query packs live in code)

Canonical list: EMOTION_CORPUS_UNDERSERVED_PRIMARIES in lib/artwork/emotion-discovery-packs.ts.

PrimaryRunbook target (approved units)Notes
anger≥40Myth/combat + moral outrage subjects
anxiety≥40Night, storm, expressionist isolation
confusion≥40Dreamlike / ambiguous crowds
disgust≥40Grotesque, memento mori (curatorial register)
excitement≥40Triumph, festival, dance
fear≥40Horror, inferno, skeleton — boundary-sensitive
joy≥40Non-tenderness celebration (distinct from tenderness)
realization≥40Vision, conversion, annunciation-adjacent
surprise≥40Astonishment, discovery

If a museum/API genuinely returns fewer than 25 viable hits after boundary + license filters, document the gap here and lower that row only for that emotion.

Per-emotion audit — passage corpus (loom_passage_tags)

Passage tags are the only per-emotion row counts wired today (ASN-588). Refresh counts anytime:

npm run artwork:discover:emotion -- --audit

Paste TSV output below when you run it (initial snapshot placeholder until first run lands in git).

Snapshot: passage_primary counts (2026-04-29, linked dev DB)

admiration	0
anger	0
anxiety	0
chagrin	0
confusion	0
desire	0
disappointment	0
disgust	0
embarrassment	0
excitement	0
exposure-dread	0
fear	0
grief	0
guilt	0
humiliation	0
joy	0
love	0
mortification	0
nostalgia	0
pride	0
pride-as-defense	0
realization	0
relief	0
remorse	0
sadness	0
shame	0
surprise	0
tenderness	0
yearning	0

All zeros here means loom_passage_tags is not yet populated on this database (or bulk tagging has not run); re-run npm run artwork:discover:emotion -- --audit after memoir taxonomy ingest.

Active image library size

1356 experience_units with status = 'active' (same audit run). Image↔emotion distributions await ASN-931 pairings + later decomposition/telemetry.

Discovery commands

Preview candidates (Met + ARTIC + BnF default):

npm run artwork:discover:emotion -- --emotion fear --sources met,artic,bnf --limit-per-query 12

Dry-run (queries only):

npm run artwork:discover:emotion -- --emotion anger --dry-run

Europeana (probationary key, session cap — see scripts/sources/europeana.ts):

npm run artwork:discover:emotion -- --emotion fear --sources europeana

Smithsonian EDAN bulk has no HTTP search client in-repo; hints for shard grep:

npm run artwork:discover:emotion -- --emotion disgust --sources smithsonian --dry-run

Then use substring hints printed with scripts/ingest/smithsonian.ts workflows.

Sourcing log — operator-maintained

EmotionQueued (IDs / batch)ApprovedRejectedCurator notes

After approval: units follow existing decompose + content_attribution pipeline like all ingests.

Sourcing log — 2026-05-04 deep-research arrival

External LLM bring-back returned: 36 runs (30 per-emotion via Prompt 1 on OpenAI Deep Research; 6 cross-cutting via Prompts 2–5 split between OpenAI and Claude). Full archive at docs/research/emotion-corpus-expansion/external-runs/; INDEX, plan, and cross-citation digest alongside.

Counts:

SourceCountNotes
Per-emotion runs (Prompt 1, OpenAI)30one per primary emotion
Cross-cutting source-discovery runs (Prompt 2)2OpenAI + Claude (paired for convergence reading)
Cross-cutting historiography runs (Prompt 3)2OpenAI + Claude
Cross-cultural ethnography runs (Prompt 4)1Claude only — OpenAI counterpart future bring-back
Therapy-transcript discovery runs (Prompt 5)1Claude only — OpenAI counterpart future bring-back
Total36

Citations extracted: ~1,200 unique works across the 36 runs (per EXTRACTED-CITATIONS-2026-05-04.md). Section A (per-emotion canonical texts) covered Anger through Grief in full; remaining 15 emotions referenced structurally — follow-up Phase 2.5 needed for full extraction.

Downstream surfaces updated by this arrival:

  • docs/research/corpus-acquisition/EMOTION-CORPUS.mdBring-back arrival — 2026-05-04 section appended; ~70 new MIKE-ACQUIRE rows; ~17 new PD-AUTO candidates.
  • docs/research/corpus-acquisition/THERAPY-TRANSCRIPT-CORPUS.md — same appendage; ~30 new rows.
  • docs/research/corpus-acquisition/CROSS-CULTURAL-EMOTION-ETHNOGRAPHY.md — NEW sibling tracker created from Prompt 4 (cross-cultural ethnography) + cross-cultural sections of per-emotion runs.
  • docs/RESEARCH-PROGRAM.md §VI-B — new emotion-research thread block; provisional EM.1..EM.5 RQs.
  • docs/research/papers/emotion-research-program-public-introduction.md — public-facing essay (draft v1).
  • docs/research/emotion-research-program-literature-map.md — scaffold with column structure + cross-cutting rows populated.
  • docs/research/emotion-research-program-bibliography.bib — draft v1 with ~95 BibTeX entries spanning historiography (§A), constructionist/appraisal/affect-theory (§B), cross-cultural ethnography (§C), religious + contemplative (§D), per-emotion canonical Anger–Grief (§E), memoir CONVERGENT canon (§F), therapy canon (§G).
  • docs/research/OVERVIEW.md — emotion-research thread row added to parallel threads + what is published.
  • docs/research/PIPELINE_STATUS.md — emotion-research entry added to Running.

Convergent-source highlights (cited 3+ times across per-emotion runs OR cross-cutting LLM agreement):

  • Catherine Lutz, Unnatural Emotions (1988) — 8+ emotion runs.
  • Anna Wierzbicka, Emotions Across Languages and Cultures (1999) — 10+ emotion runs.
  • Aristotle, Rhetoric — 4+ emotion runs (anger, fear, contempt, etc.).
  • Søren Kierkegaard, The Concept of Anxiety (1844) — 3+ emotion runs.
  • Heidegger, Being and Time §40 — 4+ emotion runs.
  • Mary Karr, The Liars' Club + Lit — both CONVERGENT across OpenAI and Claude on Prompt 2.
  • Joan Didion, The Year of Magical Thinking — 4+ emotion runs (grief, bewilderment, disappointment).
  • C. S. Lewis, A Grief Observed + Surprised by Joy — both CONVERGENT.
  • Yalom, Love's Executioner — 4+ runs (desire, grief, source-discovery, therapy-discovery).

Honest gaps flagged across the bring-back (consolidated from §J of digest): East/Southeast Asian memoir; Continental European clinical writing; non-Anglophone ethnography beyond the canonical Lutz/Levy/Briggs lineage; emerging indigenous emotion frameworks beyond Kimmerer/Krenak/Akomolafe; disability/trans/sex-work memoir under-cited; mid-20th-c. Eastern European confessional traditions suspected to exist but undercited.

Next:

  • Phase 2.5 — extract per-emotion canonical anchors for the remaining 15 emotions (joy, hope, jealousy, longing, loneliness, love, nostalgia, pride, relief, resentment, sadness, shame, tenderness, trust, despair, contentment, contempt, boredom, bewilderment, awe). Filed as a follow-up under ASN-1023.
  • Phase 4lib/loom/taxonomy/emotion.ts extension covering 12 candidate primary additions identified in the bring-back INDEX (awe, bewilderment, boredom, contempt, contentment, despair, gratitude, hope, jealousy, loneliness, resentment, trust). Plus longing↔yearning collapse-or-keep decision per the OpenAI longing run §1.

Engineering shipped (2026-04-29)

  • lib/artwork/emotion-discovery-packs.ts — shared packs + Europeana strategy shapes + Smithsonian EDAN term hints.
  • scripts/artwork/discover-by-emotion.ts — CLI: --emotion, --sources, --audit, --list-emotions.
  • getEmotionDiscoveryQueriesForMet|Artic|Bnf, getSmithsonianEmotionEdanTermHints on museum clients.
  • searchEuropeana({ strategies }) override for emotion-targeted Europeana runs.
  • npm run artwork:discover:emotion.