#334 Fix invalidate-then-reconcile sequencing in doctor extensions; failure-mode RowStates unreachable in sourceDetails
Opened by stack72 · 5/12/2026· Shipped 5/12/2026
Summary
swamp doctor extensions is designed to be the canonical way to inspect the
true on-disk state of extension catalog rows. After follow-up work in W3 and
W6, it should reliably surface every RowState (Indexed, BundleBuildFailed,
ValidationFailed, EntryPointUnreadable, OrphanedBundleOnly,
Tombstoned) in aggregateState.sourceDetails[].
In practice, BundleBuildFailed, ValidationFailed, and
EntryPointUnreadable are unreachable in sourceDetails[] for the most
common user-observable scenario (a previously-Indexed local extension whose
on-disk content is replaced with broken content). The row stays Indexed
indefinitely. Failures surface only in registries.<kind>.failures[] —
a different output shape with different field names — defeating the W6
unified aggregate-state surface for the cases users care about most.
Root cause is a sequencing bug in how doctor extensions invalidates
the catalog vs. when reconcile decides whether to fire. Details below.
Reproduction
# Setup: fresh repo with a valid local model extension
swamp repo init
mkdir -p extensions/models/test
cat > extensions/models/test/entry.ts <<'TS'
export const model = {
type: "@test/example",
attributes: { foo: { type: "string" } }
};
TS
# First doctor: row reaches Indexed
swamp doctor extensions --json --verbose | jq '.aggregateState.sourceDetails'
# → [{ sourcePath: ".../entry.ts", stateTag: "Indexed", ... }]
# Replace source with regex-matching but schema-invalid content
cat > extensions/models/test/entry.ts <<'TS'
export const model = "not an object";
TS
# Second doctor: expected BundleBuildFailed in sourceDetails. Observed:
# stays Indexed. Failure (if any) appears in registries.model.failures[].
swamp doctor extensions --json --verbose | jq '.aggregateState.sourceDetails'
# → [{ sourcePath: ".../entry.ts", stateTag: "Indexed", ... }] ❌Same symptom with unresolvable imports (import { x } from './nonexistent.ts'; export const model = x;) and with deleted source files for local origin.
Root cause analysis
The reconcile path (ReconcileFromDiskService) and the legacy buildIndex
path coexist after W3 shipped. Reconcile is the canonical failure-recording
mechanism (writes through the Extension aggregate, hits I-Repo-1, surfaces
in sourceDetails[]). Legacy buildIndex is the fallback (writes to
result.failed → registries.<kind>.failures[]).
Which one runs is gated by repository.anyKindNeedsInvalidation() evaluated
at process startup in src/cli/mod.ts:291-303:
// mod.ts startup — runs once, before any command handler
if (
repository.anyKindNeedsInvalidation() ||
repository.manifestIdentityChanged(localManifestIdentity)
) {
const reconciler = new ReconcileFromDiskService({...});
await reconciler.execute(); // ← reconcile fires here, or not at all
}doctor extensions then runs its own invalidation in the handler at
src/cli/commands/doctor_extensions.ts:163-180:
// AFTER startup. Too late.
const rescanRepo = new ExtensionRepository({...});
try {
rescanRepo.invalidateAll(); // clears isPopulated flags
} finally {
rescanRepo.close();
}
// Then registries.ensureLoaded() → triggers buildIndexThe sequence within a single swamp doctor extensions invocation:
- Process startup (
mod.ts:291): catalog is populated from prior run.anyKindNeedsInvalidation()returns false → reconcile is skipped. - Doctor handler (
doctor_extensions.ts:174): callsinvalidateAll()on a separaterescanRepo. ClearsisPopulatedflags after reconcile has already decided not to run. ensureLoadedper registry: entersExtensionLoader.buildIndex(extension_loader.ts:237+). SeesisPopulated === false, falls through to the full-bootstrap path (this.load(...)at line 318), which usesfindStaleFiles— a different freshness check from reconcile's fingerprint comparison.buildIndexthen re-marks the catalog populated during its run.
Net effect: invalidate within a single doctor command has no effect on the
reconcile-write path. It only swaps in the legacy buildIndex write-path
within the current invocation, and the re-population at step 4 even defeats
the intended next-invocation effect (subject to verification — see
acceptance criteria below).
The user-visible result: failure modes that should produce
BundleBuildFailed rows go through buildIndex instead, land in
registries.<kind>.failures[], and the source row in sourceDetails[]
stays at Indexed (its last successful state).
Proposed fix
The intent is clearly: "doctor extensions should reflect on-disk reality,
recomputing fingerprints and re-evaluating bundle validity for every
source." Three implementation options, in increasing order of architectural
ambition:
Option A — Drive reconcile synchronously from the doctor handler
In doctor_extensions.ts, after invalidateAll(), explicitly construct
and run a ReconcileFromDiskService instance against the same repository
the loaders are using:
rescanRepo.invalidateAll();
const reconciler = new ReconcileFromDiskService({
denoRuntime, repository, lockfileRepository, repoDir, localManifestIdentity
});
await reconciler.execute();
// THEN registries.ensureLoaded() — which will now see the post-reconcile catalogPros: smallest change. Doesn't touch the legacy buildIndex path. Doctor
becomes the one command that authoritatively re-evaluates everything.
Cons: doctor is now subtly slower (always reconciles). The reconcile + buildIndex combination needs careful ordering to avoid double-bundling.
Option B — Move the invalidate before startup
Restructure so doctor extensions (and any future commands that want
forced rescan) sets a marker before the swamp process initializes its
repository, so mod.ts:291 sees anyKindNeedsInvalidation() === true and
fires reconcile at startup.
Pros: aligns with the existing single-trigger-point design. No new synchronization between handlers and startup.
Cons: requires either pre-startup state (a flag file, an env var) or restructuring the CLI to defer repository setup until after argument parsing. Likely a bigger blast radius.
Option C — Eliminate the dual path entirely (W7)
Migrate buildIndex's freshness/rebundle/failure-recording entirely to
ReconcileFromDiskService. buildIndex becomes a thin wrapper that just
calls reconcile and registers types from the resulting catalog. Failures
land in one place (sourceDetails[]); registries.<kind>.failures[] is
either deprecated or kept as a denormalized view.
Pros: closes the architectural debt class entirely. Hugely simplifies the mental model. Makes the UAT matrix tractable for failure-mode states.
Cons: large refactor. Should be a workstream of its own ("W7 — unify failure recording through the aggregate"), not a tactical fix.
Recommendation
Ship Option A now, file Option C as a follow-up workstream.
Option A unblocks the UAT failure-mode matrix immediately, is a small focused change with clear acceptance criteria, and doesn't preclude Option C later. Option B is more invasive than Option A without buying anything extra. Option C is the right destination but shouldn't gate the UAT work.
Acceptance criteria
- The reproduction above produces
stateTag: \"BundleBuildFailed\"for the test source row in the seconddoctor extensions --json --verboseoutput. - An equivalent reproduction with regex-matching unresolvable-import
content (
import { x } from './nonexistent.ts'; export const model = x;) producesBundleBuildFailedinsourceDetails[]. - An equivalent reproduction with a deleted-source-file on a pulled
extension (lockfile entry preserved, file removed) produces
EntryPointUnreadableinsourceDetails[]. - An equivalent reproduction with a deleted-source-file on a local
extension (bundle preserved) produces
OrphanedBundleOnly. - The dual-write concern is addressed: any failure recorded in
sourceDetails[]MUST NOT also appear inregistries.<kind>.failures[]for the same source path. If both paths fire today, the fix must ensure reconcile's write "wins" and the legacy path either short-circuits or reads from the catalog rather than re-bundling. - No regression in existing
doctor extensionsUAT tests (swamp-uat/tests/cli/extension/state/*on branchextension-uat-phase-1). - The
--applyrepair path is not regressed (must still only pruneTombstonedrows;OrphanedBundleOnlydeferred per W6 decision).
Files an implementing agent should read first
src/cli/mod.ts:270-330— startup reconcile gate, see exactly what conditions trigger reconcile todaysrc/cli/commands/doctor_extensions.ts:140-200— the misplacedinvalidateAlland the registry ensure-load loopsrc/libswamp/extensions/reconcile_from_disk_service.ts:1-150, 395-510— reconcile entry point and the per-extension reconcile loop including the try/catch that writesBundleBuildFailedsrc/domain/extensions/extension_loader.ts:237-330— the legacybuildIndexpath that currently runs after invalidatesrc/domain/extensions/extension_loader.ts:389-460— thebundleAndIndexOneregex gate (line 394-397) and the validation-throw path (line 422-426) — both matter for understanding what content reaches the catch blocksrc/domain/extensions/bundle_freshness.ts:262-400—findStaleFiles(legacy freshness) andmarkCatalogValidationFailed(the dead-code validation-write path, see related context below)
Related context — DO NOT expand scope into these
The deeper investigation that produced this issue surfaced two adjacent architectural smells. They are NOT in scope for this fix:
recordValidationFailedhas zero production callers. The reconcile catch block collapses validation-throw and bundle-throw intorecordBundleBuildFailed. SoValidationFailedas a RowState is currently dead in production. Fixing this requires distinguishing the throw source in the catch block — a small change but with semantic implications. Track separately if prioritized.Tombstonedrows are DELETEd in the same transaction that records the transition (extension_repository.ts:applyDiffForExtensionaround line 523). SoTombstonedis never observable insourceDetails[]. The state exists only inReconcileResult.transitions[], which doctor doesn't currently surface. Either expose transitions in the doctor JSON or document Tombstoned as transient. Track separately.
Both smells were originally captured in earlier issue filings that are now being closed in favor of this one. The smells remain real but they don't block the UAT failure-mode matrix; this sequencing fix does.
Test plan
Integration tests must live in integration/extensions/ and drive the
CLI end-to-end (subprocess swamp doctor extensions, not unit-level
service invocation). Suggested test cases — one per acceptance criterion
1-4. Use the existing fixture pattern from
integration/extensions/doctor_extensions_aggregate_test.ts.
The fix should also enable the swamp-uat agent to remove the temporary
ValidationFailed-unreachable / BundleBuildFailed-unreachable shims
from tests/cli/extension/state/*_test.ts on branch
extension-uat-phase-1. Coordinate the UAT branch rebase as part of
landing this fix.
Why this matters for the rearchitecture story
After W1-W6, the public contract of doctor extensions was supposed to
be: "this is the one place users go to see the truth about their
extensions." Today it isn't, for the failure modes users hit most often.
This is the last user-visible loose thread from the rearchitecture
sprint.
Shipped
Click a lifecycle step above to view its details.
Sign in to post a ripple.