research / ai-human-interaction / preregistrations & protocols

External-Operator Pilot Protocol

The pilot protocol that mitigates the auto-ethnography threat-to-validity for the Penwright Research Program. Recruits 5–10 outside writers across memoir / nonfiction / fiction and across emerging / mid-career / established experience tiers, instruments them under the same Penwright protocol the principal investigator runs against himself, and establishes the data baseline for Paper 5 and Paper 7 v2.0 confirmatory analyses. Sections cover recruitment criteria, recruitment channels (PI network capped + writing-community + Prolific + open form), onboarding flow with consent and instrumentation discipline, data-handling protocol (anonymization + retention + consent), success criteria (cohort + data + quality thresholds), pilot-vs-formal-study boundary, six downstream PA-009a..f next-action assignments, and a pilot-specific threats-to-validity register.

By Mike West

AI–Human Interaction·Preregistrations & protocols

External-Operator Pilot Protocol

Status: v1.0 — 2026-05-09 (PA-009). Owner: Mike West (principal investigator). Agent-supportable for protocol drafting; recruitment execution requires Mike-direct authority. Scope: the pilot that mitigates the auto-ethnography threat-to-validity for the Penwright Research Program — recruits, onboards, and tracks 5–10 outside writers under the same Penwright instrumentation that the principal investigator runs against himself.

1. Why this pilot exists

The Penwright Research Program ships its longitudinal papers (Papers 5, 7, 8) on a foundation that has one structural threat-to-validity named explicitly in every preregistration: the principal investigator (Mike West) is also Penwright's designer and most active user. This is acceptable for descriptive work — auto-ethnography produces real, situated, defensible findings — and explicitly not acceptable for the causal claims the longitudinal papers need to defend.

A program-of-one cannot generalize. A program of one writer plus five-to-ten outside writers, under the same instrumentation, with the same authorship-packet discipline and the same measurement framework, can begin to.

This is not the formal study. It is the data-baseline-builder for the formal study. The pilot's job is to demonstrate that:

Penwright's instrumentation produces interpretable signal across writers other than the PI;
The Authorship Packet Model is operable by writers who did not invent it;
The genre-aware behavior pattern (memoir / nonfiction / fiction never collapsed) holds when applied to writers whose primary genre is not the PI's;
The four failure modes the Penwright Measurement Framework names — voice flattening, dependency drift, surface-feature drift, sycophancy spiral — are detectable in writers other than the PI;
The longitudinal effects framing ("better with Penwright, than without it, in 6 months") admits a cohort comparison.

If those things hold across 5–10 outside writers active 3+ months, Paper 5 and Paper 7 confirmatory analyses can begin. If they do not, the program's confirmatory work is paused and the protocol is rewritten.

2. Recruitment criteria

The pilot recruits 5–10 writers. The criteria below are all requirements; each writer in the cohort meets each.

2.1 Genre coverage

The cohort spans the three genres the program treats as non-collapsible:

Memoir — at least 2 writers actively producing personal-narrative work.
Nonfiction — at least 2 writers actively producing analytical, journalistic, or expository nonfiction.
Fiction — at least 2 writers actively producing literary or narrative fiction.

A single cohort member can claim two genres if they actively produce in both, but their primary genre (the one they are most committed to growing in) must be specified at onboarding and is the genre their data is analyzed under unless they explicitly switch.

2.2 Experience-level diversity

To prevent the cohort from being uniformly novice or uniformly seasoned (either of which would produce skewed findings), recruit across three experience tiers:

Emerging — fewer than 5 years actively writing in their primary genre; no published book; minimal published shorter work. At least 2 writers.
Mid-career — 5–15 years; at least one published book or substantial published shorter-work portfolio. At least 2 writers.
Established — 15+ years; multiple published books or extensive professional credit. 1–2 writers.

2.3 Writers who would benefit from Penwright

The pilot is not a stress-test of unwilling participants. Writers who join should plausibly benefit from a tool designed to support authorship discipline. Specifically:

They have at least one in-progress writing project they are committed to sustaining for 6+ months;
They are open to using AI assistance (have used some AI tool at least occasionally for writing-related work; not strictly anti-AI);
They are also discerning about AI use — they are not looking for an AI to write for them; they are looking for a tool that supports their own writing process;
They are willing to be instrumented (they understand and consent to the data-collection protocol in §5);
They can commit to the time profile in §3.

2.4 Exclusions

Writers whose primary writing surface is professional-context-only (corporate communications, legal writing, technical documentation). The program studies authorship-as-craft; professional writing has different goals and would muddy the signal.
Writers on academic or publication deadlines that would force pacing inconsistent with the pilot's measurement cadence.
Writers who have a personal or professional relationship with the PI close enough to compromise their independence as an evaluator (immediate family; current colleagues; current consulting clients).
Writers younger than 18 (the consent and ethical-review structure for adult writers does not transfer).

3. Time profile

Each pilot writer commits to:

Initial onboarding session (~90 minutes; remote, recorded with consent) — context-gathering, instrumentation walkthrough, packet-discipline introduction.
Active writing time using Penwright — at writer's natural cadence; the pilot does not impose a writing-hours quota. The instrumentation is passive; the writer writes when they would have written anyway.
Quarterly check-in (~45 minutes; remote) — surfaces issues, captures qualitative signal, validates the data-collection is producing interpretable findings.
Optional final session at month 6 (~60 minutes) — debrief, comparative-claim work ("better with Penwright, than without it, in 6 months" — calibrating the writer's own assessment of the comparative).

The pilot's commitment to each writer is reciprocal:

Penwright access free for the duration of the pilot;
A clear, written summary of every signal collected about them, available on request at any time;
The right to withdraw at any time, with all data destroyed if requested at withdrawal;
A copy of any aggregate findings from the pilot, before publication.

4. Recruitment channels

The cohort is recruited through complementary channels rather than a single source, to avoid over-representing any one writing community.

4.1 Direct outreach (PI's network)

The PI has a network of writers via past consulting work, conference contacts, and the broader writing-tools community. Outreach is one-to-one, by email, with a written description of the pilot, the time commitment, and the data-handling protocol. Cap: 2 writers from PI's direct network, to prevent the cohort from skewing toward writers already aligned with the PI's views.

4.2 Writing-community channels

Posts (with the relevant moderators' permission) in:

One memoir-focused writing community (e.g., Lighthouse Writers Workshop's working-writer list; Memoir Writers' Collective);
One nonfiction / journalism community (e.g., a Substack writers' Slack; the Sigma Tau Delta nonfiction listserv; a nonfiction-writing publication's freelancer network);
One fiction-writing community (e.g., a sci-fi/fantasy or literary-fiction Slack; SFWA-adjacent channels for genre fiction; a literary-fiction listserv).

These posts state the time commitment, the inclusion criteria, the absence of compensation in dollars (and the presence of compensation in tool access + research transparency), and the application process. Target: 2–4 writers from this channel.

4.3 Paid recruitment platform (Prolific or similar)

For the experience-tier writers least represented by 4.1 and 4.2 — typically emerging writers, who are less likely to be in established writing communities — recruit via Prolific or similar academic-research-recruitment platform. Pre-screen for: active writing project; genre primary; AI-use openness; minimum age. Compensate at standard Prolific rates for the onboarding session and the quarterly check-ins; tool access remains the recurring compensation. Target: 1–3 writers from this channel.

4.4 Open application form

For balance: post a brief open application form on peopleanalyst.com/research/ai-human-interaction/external-operator-pilot/apply (route to be created if the pilot proceeds) and on Mike's professional channels (LinkedIn, Substack). Treat as overflow — fills genre or experience-tier slots that 4.1–4.3 didn't fill. Target: 0–2 writers from this channel.

4.5 Selection process

Applications collected for ~3 weeks. Screened against criteria in §2. Final selection prioritizes genre coverage and experience-tier diversity over depth in any single channel — the cohort's value is its breadth.

5. Onboarding flow

Onboarding is structured to set expectations clearly and to instrument each writer consistently.

5.1 Pre-onboarding

Consent form signed (digital signature acceptable) — explicit consent for: instrumentation data collection, anonymized aggregate analysis, retention of raw data for the pilot duration plus 12 months, deletion on request.
Genre + experience-tier classification confirmed.
Active writing project identified — the project the writer will be using Penwright for during the pilot, in writing.

5.2 Onboarding session (90 min)

Context-gathering (15 min) — writer's history, genre commitment, current project, prior AI-tool experience.
Penwright walkthrough (30 min) — Authorship Packet Model overview; what the seven non-negotiable rules are and why they matter; what the system does and does not do for the writer.
Instrumentation walkthrough (15 min) — what's being measured, how the data flows, how the writer can inspect their own data at any time.
Expectations-setting (15 min) — the writer is not a research subject in a controlled experiment; they are a co-investigator whose writing the program will learn from. The pilot is not a usability study; the writer's own subjective experience of using Penwright is one signal among several.
First-session writing exercise (15 min) — writer composes a short authorship packet against an in-progress section of their project, while the PI observes. Surfaces immediate issues and produces the first datum.

5.3 Account provisioning

Penwright account created with the writer's preferred email.
Genre setting configured at the writer's primary genre.
The writer's project is registered in the system as the active project (subsequent project additions are allowed).
The writer receives a written confirmation of: their account, the data-collection protocol, the consent terms, and the contact path for issues or withdrawal.

5.4 Instrumentation discipline

The writer is asked to:

Compose authorship packets for substantive writing sessions (not for every email or short note — for sessions of 30+ minutes of focused writing on the active project);
Tag each session by its writing-mode (drafting / revising / sharpening / editing) at start;
After each session, leave the system's reflection prompt unanswered if it does not feel interpretable; do not force a reflection.

The writer is not asked to:

Hit a writing-hours-per-week quota;
Use Penwright on every writing session, especially short ones;
Match the PI's intensity or volume.

The pilot is calibrated against the writer's natural cadence, not against a target volume.

6. Data-handling protocol

6.1 What data is collected

Authorship packet contents — intent, structure, key ideas, relevant passages, counterpositions per the Authorship Packet Model;
Generated draft outputs from the AI-assistance phase of each session;
Writer's edits and acceptances of generated content (what they kept, what they changed, what they discarded);
Reflection prompt responses (where given);
Session metadata — timestamps, duration, mode tag;
Quarterly check-in transcripts (with consent; recorded and transcribed; raw audio retained 12 months then destroyed).

6.2 Anonymization

All published findings reference writers by anonymous ID (W1, W2, …) and by genre + experience-tier only.
Quotations from authorship packets or check-ins are used in published findings only with explicit per-quotation consent from the writer at publication time.
Writers' specific projects are referenced by anonymized description (e.g., "a memoir about post-divorce identity reformation") only when the writer has approved the description.

6.3 Retention

Raw instrumentation data: pilot duration + 12 months.
Aggregate analytical artifacts: indefinite (for cumulative program findings).
Recorded check-in audio: 12 months then destroyed (transcripts retained anonymized).
On withdrawal: per-writer raw data destroyed within 30 days; aggregate analyses already published cannot be reverse-engineered to identify the withdrawing writer.

The pilot is not a formal IRB study at this stage — Penwright is a tool, not a clinical intervention, and the work is descriptive rather than experimental. However:

The consent form is modeled on standard ethical-research-participant consent;
The PI commits to not publishing findings about an individual writer that the writer has not seen and approved;
The PI commits to making the protocol — this document — available to each writer before consent;
If at any point the program decides to formalize beyond the pilot (Paper 5 / 7 / 8 confirmatory analyses), an IRB submission becomes part of that formalization.

7. Success criteria

The pilot is complete (in the sense of having delivered its purpose) when:

7.1 Cohort threshold

5+ writers active for 3+ months;
Genre coverage holds (≥2 memoir + ≥2 nonfiction + ≥2 fiction; can overlap if writer claims multiple primaries);
Experience-tier coverage holds (≥2 emerging + ≥2 mid-career + ≥1 established).

7.2 Data-collection threshold

Each active writer has produced ≥10 instrumented writing sessions, with a minimum total duration of 8 hours of measured writing;
Per-session data is interpretable (authorship packet contents readable; generated drafts available; edits captured);
The cohort's pooled data is sufficient for v1.0 (descriptive) analyses for Paper 5 and Paper 7 to run.

7.3 Quality threshold

The four failure modes named in the Penwright Measurement Framework (voice flattening, dependency drift, surface-feature drift, sycophancy spiral) are detectable in the pooled data — not all four need to be observed in every writer, but the framework's instrumentation must produce readings that distinguish the modes from each other and from baseline.
The genre-fork holds — analyzing the data with genres collapsed produces meaningfully different signal than analyzing with genres separated. (If genres collapse usefully, that's a Paper 7 finding the program needs to know.)

7.4 Pilot disposition

When all three thresholds are met, the pilot's initial purpose is complete. Two outcomes are possible:

Continue the cohort into the formal study. Paper 5 v2.0 and Paper 7 v2.0 confirmatory analyses begin against the cohort data, with cohort-size growth toward N≥18 (per Paper 7 v2.0) and N≥20 (per Paper 5 v2.0) through additional recruitment.
Restructure. If the pilot data surfaces a fundamental issue — instrumentation produces non-interpretable signal in a specific genre, the Authorship Packet Model is not operable by writers other than the PI, the genre-fork dissolves under outside-writer data — the formal study is paused and the protocol is rewritten before recruitment continues.

8. What this pilot does NOT do

To prevent scope creep:

It is not the formal study. Formal study is Paper 5 v2.0 and Paper 7 v2.0 (and eventually Paper 8). The pilot establishes the data baseline; the formal study makes the confirmatory claims.
It does not test causal effects. The pilot is descriptive. "Better with Penwright, than without it, in 6 months" requires within-writer comparison or matched-cohort comparison; the pilot is too small for either.
It does not adjudicate the architecture-level claims in Papers 1–4. Those papers stand on auto-ethnography and theoretical positioning; the pilot is a different kind of evidence.
It does not compensate writers in dollars (except for the Prolific-recruited tier, which is compensated per platform standard). Tool access + research transparency are the recurring compensation.
It does not guarantee outside writers will use Penwright productively. Some pilot writers will. Some won't. Both signals matter; the protocol does not pre-screen for guaranteed success.

9. Next-action assignments to file

The pilot's execution requires concrete recruitment and onboarding work that lives downstream of this protocol. Per the PA-009 prompt, these are filed as separate PA-NNN entries, not embedded here:

PA-009a — Recruitment outreach (PI direct network). One-to-one email outreach in PI's network meeting criteria, subject to the cap of 2 writers from this channel (§4.1); first-pass response collected; consent process initiated for those who proceed.
PA-009b — Recruitment outreach (writing-community channels). Identify and contact moderators for memoir / nonfiction / fiction communities; post recruitment notices with their permission; collect applications.
PA-009c — Prolific (or alternative) campaign setup. Configure the platform, write the screener, set compensation rates per platform standard, launch the campaign, screen applicants.
PA-009d — Selection + onboarding execution. From the pooled application pool, select cohort by criteria; schedule and conduct onboarding sessions; provision accounts; close the recruitment cycle.
PA-009e — Quarterly check-in cadence. Set the 3-month and 6-month check-in calendar for each cohort member; standardize the check-in instrument; run sessions.
PA-009f — Pilot-completion review. At ~6 months from the cohort's start, evaluate against §7 success criteria; decide on disposition (continue / restructure).

These are filed as OPEN entries in docs/AGENT-ASSIGNMENTS.md (PA-009a..f); execution is deferred until the principal investigator launches the pilot. Flip each stream to DONE in that doc when its acceptance criteria are satisfied. The protocol itself (this document) is the load-bearing artifact PA-009 commits to.

10. Threats to validity (pilot-specific)

The pilot has its own threats-to-validity register, separate from the program's overall TTV.

Selection bias from PI's network channel. Writers known to the PI may share his analytical or aesthetic predispositions. Mitigation: cap §4.1 at 2 writers; supplement heavily through §4.2 and §4.3.
Self-selection bias. Writers who volunteer for an instrumentation pilot are not a representative sample. Mitigation: explicit acknowledgement; pilot findings are descriptive of willing-to-be-instrumented writers, not of writers in general.
Hawthorne effects. Writers under instrumentation may write differently than they would otherwise. Mitigation: long-duration cadence (3+ months of measurement) attenuates novelty effects; quarterly check-ins surface where Hawthorne behavior may persist.
Genre-tag instability. Writers who claim a primary genre may drift across genres during the pilot. Mitigation: re-confirm at quarterly check-ins; analyze flexibly.
Tool-version drift. Penwright will continue to evolve during the pilot. Mitigation: log every version-change event in the instrumentation; report findings stratified by version where the change is material.

These threats are pilot-specific — they layer onto, not replace, the program-level TTV register documented in each preregistration.

11. Cross-references

content/research/ai-human-interaction/penwright-paper-05-dependency.md — Paper 5 preregistration; v2.0 confirmatory analyses depend on this pilot.
content/research/ai-human-interaction/penwright-paper-07-genre-effects.md — Paper 7 preregistration; v2.0 confirmatory analyses depend on this pilot.
content/research/ai-human-interaction/penwright-paper-04-measurement.md — Paper 4 measurement framework; the four failure modes named in §7.3 are defined here.
vela/docs/VISION-PENWRIGHT-AUTHORSHIP.md — the Authorship Packet Model and seven non-negotiable rules of authorship referenced in §5.2.
vela/docs/VISION-PENWRIGHT-MEASUREMENT.md — the measurement framework instrumented in §6.1.
docs/AGENT-ASSIGNMENTS.md — PA-009 (this protocol); PA-009a..f (downstream recruitment + execution work, filed separately as the pilot proceeds).

12. Decision log

2026-05-09 — v1.0 — Initial protocol drafted (PA-009). Eleven sections covering: rationale, recruitment criteria (genre + experience-tier + benefit-fit), time profile, recruitment channels (PI network capped + writing-community + Prolific + open form), onboarding flow (consent + walkthrough + account provisioning + instrumentation discipline), data-handling protocol (collection + anonymization + retention + consent), success criteria (cohort + data + quality thresholds), explicit pilot-vs-formal-study boundary, downstream PA-009a..f next-action assignments to file, pilot-specific threats-to-validity register, cross-references.

External-Operator Pilot Protocol

1. Why this pilot exists

2. Recruitment criteria

2.1 Genre coverage

2.2 Experience-level diversity

2.3 Writers who would benefit from Penwright

2.4 Exclusions

3. Time profile

4. Recruitment channels

4.1 Direct outreach (PI's network)

4.2 Writing-community channels

4.3 Paid recruitment platform (Prolific or similar)

4.4 Open application form

4.5 Selection process

5. Onboarding flow

5.1 Pre-onboarding

5.2 Onboarding session (90 min)

5.3 Account provisioning

5.4 Instrumentation discipline

6. Data-handling protocol

6.1 What data is collected

6.2 Anonymization

6.3 Retention

6.4 Consent and ethics

7. Success criteria

7.1 Cohort threshold

7.2 Data-collection threshold

7.3 Quality threshold

7.4 Pilot disposition

8. What this pilot does NOT do

9. Next-action assignments to file

10. Threats to validity (pilot-specific)

11. Cross-references

12. Decision log