Landing Contributor PRs: The Maintainer Runbook
Audience: Maintainers taking contributor PRs from "open with red CI" to "merge-ready, evidence posted, operator merges". Issue: Priivacy-ai/spec-kitty#2341 Origin: The 2026-07-04 landing pass (#2332, #2336, #2338, #2239, #2238), where this workflow was run end-to-end and its friction points were logged.
The deliverable of a landing pass is never a merge. It is a PR that is green, un-drafted, carries a full evidence trail in its comment thread, and states any landing-order constraints — so the operator can merge it without re-deriving the adjudication. The maintainer never merges (see step 11).
The workflow at a glance
- Claim before touching
- One isolated worktree per PR
- Rebase onto current upstream/main first
- Classify every red check
- Folds: remediation commits on the contributor branch
- Red-first verification for bugfix PRs
- Review focus areas beyond CI
- Adversarial squad for architectural or API-surface PRs
- Push discipline
- Post the remediation summary
- Hand-off — the operator merges
- Follow-up hygiene
1. Claim before touching
Post a claim comment on the PR before any rebase or review work: what you are picking up, in which landing queue, and what you plan to do. One claim per PR in the pass, posted first.
unset GITHUB_TOKEN # keyring auth has full repo scope; a limited env token may not
gh pr comment <N> --repo Priivacy-ai/spec-kitty \
--body "Claiming this PR for today's landing pass: rebase onto upstream/main, adjudicate red checks, fold fixes as needed. Evidence to follow."
Why: it prevents duplicated maintainer effort when several PRs are being landed in parallel, and it means the contributor is never surprised by maintainer commits appearing on their branch.
2. One isolated worktree per PR
Never touch the primary checkout — a mission session may own it. Give every PR its own worktree:
git fetch upstream pull/<N>/head:pr-<N>-local
git worktree add .worktrees/pr-<N>-landing pr-<N>-local
cd .worktrees/pr-<N>-landing
Each worktree builds its own uv virtualenv on the first uv run — expect
roughly 40 seconds and some disk on that first command. That is normal, not a
hang.
3. Rebase onto current upstream/main first
Contributor branches are routinely 100+ commits behind. Every adjudication — tests, gates, review — happens on the rebased tip, not the stale base:
git fetch upstream main
git rebase upstream/main
Changelog conflicts resolve in docs/changelog/CHANGELOG.md, which is the
canonical changelog. The root CHANGELOG.md is a symlink to it (since the
symlink cutover that rode #2338), so both paths reach the same file — resolve
the conflict once, in the canonical location.
4. Classify every red check
This is the core reviewer decision point. Diagnose each red check on the rebased tip and classify it into exactly one of four bins:
| Classification | What it looks like | Action |
|---|---|---|
| PR defect | The PR's own change breaks a test or gate | Fix it on the branch (a "fold", step 5) |
| Contract the PR legitimately crosses | A seam move-set completeness gate, a census tolerance band | Re-pin the contract in the same PR, with a dated rationale in the pin |
| Pre-existing main breakage | The same red reproduces on an unrelated main-based branch | Prove it cross-branch, file an upstream issue (campsite-cleaning standing order); do not fix it inside the contributor PR and do not retry-to-green |
| Perf-budget flake | A budget gate trips without a correctness signal | Note it, watch for recurrence, tune the budget at the root if it repeats — never retry-to-green |
The cross-branch reproduction for the third bin is cheap and decisive:
git worktree add /tmp/repro-main upstream/main
cd /tmp/repro-main && PWHEADLESS=1 uv run pytest <failing test> -q
If it is red there too, the PR does not wear the failure — the filed issue does. See the test-flakiness handling policy for the never-retry-to-green rule behind the fourth bin.
5. Folds: remediation commits on the contributor branch
Folds are maintainer commits pushed directly to the contributor branch. This
relies on maintainerCanModify, which is true by default on PRs from forks.
- Keep each fold small and single-purpose.
- Label the commit subject
landing fold: .... - Explain every fold in the remediation summary comment (step 10).
Typical folds: canonical-source fixes (the changelog lives in
docs/changelog/), seam re-pins with dated rationale, retired-shim API
migrations, and doc/contract artifact sync.
6. Red-first verification for bugfix PRs
A fix whose test is green before and after the fix captures nothing. Prove the PR's test actually witnesses the bug by swapping the pre-fix product file back in:
git checkout upstream/main -- <product-file>
PWHEADLESS=1 uv run pytest <the PR's test> -q # MUST FAIL
git checkout HEAD -- <product-file>
PWHEADLESS=1 uv run pytest <the PR's test> -q # must pass again
Post the result on the PR. If the test never goes red, the fold is a better test — not a green checkmark.
7. Review focus areas beyond CI
What the maintainer reads the diff for, beyond the checks:
- Canonical sources — does the change edit the source of truth, or a
generated mirror/agent copy? (Agent directories under
.claude/,.amazonq/, etc. are generated; sources live undersrc/doctrine/.) - SSOT / duplication — does new code near-copy an existing canonical seam or resolver? Justified divergence must be adjudicated explicitly (name the contract difference), never assumed.
- Contract artifacts — a new command or field on a versioned surface must
land in the machine contract (
upstream_contract.json), the version ledger, and the human docs together, in the same PR. - Scope-vs-spec — an apparent scope surprise may be required by the mission spec; check the FRs and constraints before flagging creep.
- Error-handling nets — best-effort helpers must catch the actual exception types their callees raise, not a guessed superset.
- Terminology canon — on any prose or doctrine touch, run the guard
locally:
PWHEADLESS=1 uv run pytest tests/architectural/test_no_legacy_terminology.py -q.
8. Adversarial squad for architectural or API-surface PRs
For changes to versioned contracts or shared seams, dispatch profile-loaded
review lenses in parallel — for example architect-alphonso for design and
contract adherence, paula-patterns for SSOT and duplication — with
read-only access to the landing worktree.
- Fold their MAJOR findings (step 5).
- File their MINORs and NOTEs as one follow-up issue, parented under the relevant functional epic.
9. Push discipline
Before any force-push to a fork branch, check for commits you have not seen — Copilot-review commits and parallel-session commits get cherry-picked, never clobbered:
git fetch <fork-remote> <branch>
git log <old-head>..FETCH_HEAD --oneline # anything here? cherry-pick it first
LEASE_SHA=$(git rev-parse FETCH_HEAD)
git push <fork-remote> HEAD:refs/heads/<branch> --force-with-lease=<branch>:"$LEASE_SHA"
Two lease lessons from the 2026-07-04 pass, both worth internalizing:
- A bare
--force-with-leasefails with(stale info)on fork branches you have never fetched — the lease has no remote-tracking ref to compare against locally. The explicit<branch>:<sha>form above is the standard flow, not a workaround. - The lease sha must come from
git rev-parse, never retyped from a display. Two pushes in the pass were rejected because a lease sha was retyped from a 9-character abbreviated prefix.
10. Post the remediation summary
After pushing folds, post one structured comment on the PR:
- the review verdict;
- each fold, with its why;
- squad verdicts, if a squad ran;
- local test evidence — counts, plus
ruff/mypyresults; - pre-existing failures called out with the filed issue number;
- the state: e.g. "watching CI; merge-ready on green".
Contributor-education notes (for example, which file is the canonical changelog) go in this comment too, addressed to the author.
11. Hand-off — the operator merges
The operator merges; the maintainer never runs gh pr merge. The hand-off
deliverable is:
- green CI;
- the PR un-drafted;
- the evidence trail on the PR;
- landing-order constraints stated explicitly — for example, a structural cutover riding one PR forces an order on the rest of the pass.
12. Follow-up hygiene
Everything discovered but out of scope gets a tracked home the same day: filed, labeled, and parented under a functional epic (never a meta rollup). New issues get processed by a triage pass immediately, so the next landing pass starts from a clean queue.
Gotchas
Field notes from the 2026-07-04 landing pass. Where the friction has since been fixed, the end-state is stated instead of the trap.
The changelog has one canonical home.
docs/changelog/CHANGELOG.mdis the canonical changelog; the rootCHANGELOG.mdis a symlink to it. Edits and conflict resolutions land in the canonical file either way — there is no longer a generated root mirror to trip docs-freshness.--force-with-leaseon never-fetched fork branches. See step 9: use the explicit<branch>:<sha>lease form, and take the sha fromgit rev-parse— never retype it from an abbreviated display.Pre-existing main breakage surfaces mid-pass. One broken contract on main (#2339: dotted
migration_idvs the dry-run JSON contract) turned local runs red on every rebased branch in the pass. Adjudication cost one cross-branch reproduction per PR until the issue was filed — file early; the filed issue is what lets subsequent PRs skip the reproduction.Saturated tolerance bands trip on the next legitimate change. The CLI visible-count census sat at the top of its band, so the next legitimate command (#2338) tripped it. A saturated band needs a re-pin with a dated rationale — e.g. the 2026-07-04 re-pin at 236 visible (tolerance 212–259) in
tests/docs/test_check_cli_reference_freshness.py; the current values live in that test, not here. That re-pin-in-the-same-PR pattern is the model for the second bin of step 4.Seam completeness gates are invisible to contributors. Adding a
deftosrc/specify_cli/cli/commands/agent/tasks_move_task.pyalso requires joining the_MOVE_SETpin intests/specify_cli/cli/commands/agent/test_tasks_move_task_seam.pyand the re-export block inagent/tasks.py. Expect this as a fold on PRs that touch decomposed command modules.CI-only architectural gates land late. Repo-wide gates (terminology, shim retirement, seam boundaries) run in the
integration-tests-core-misc (architectural)shard — a PR can pass every fast shard and fail ~40 minutes later. Runtests/architectural/locally on the rebased tip before declaring a branch green.Shard path-filters mask pre-existing failures. The
changesfilter skips shards likefast-tests-clion PRs that do not touch those paths, so a pre-existing red only surfaces on the first PR that does — the innocent PR wears the failure. Classify it as pre-existing (bin three of step 4), not as the PR's defect.scripts/invocations needPYTHONPATH=.. The docs scripts importscripts.docs.*as a package; without it they crash withModuleNotFoundError: scripts:PYTHONPATH=. uv run python scripts/docs/check_docs_freshness.py --cibuild_cli_reference.pydefaults to the wrong output path. Its defaults writedocs/reference/, while the live canonical reference isdocs/api/cli-commands.md. Always pass the outputs explicitly:PYTHONPATH=. uv run python scripts/docs/build_cli_reference.py \ --output docs/api/cli-commands.md \ --agent-output docs/api/agent-subcommands.mdPer-worktree venv rebuild. The first
uv runin a fresh landing worktree rebuilds the virtualenv (~40 s + disk). Budget for it; do not debug it.
See also
- Review gates: pre-PR / pre-review checklist — the contributor-side hygiene this runbook assumes.
- Test-flakiness handling policy — the never-retry-to-green rule and budget-gate tuning.
- Guides index