#332 Extension failure recording has dual write paths (legacy buildIndex vs W3 reconcile)
Opened by stack72 · 5/12/2026
Summary
Discovered during post-W6 UAT investigation: extension bundle/validation failures are recorded via two parallel write paths, and which one fires depends on whether anyKindNeedsInvalidation() triggered W3 reconcile this process. The same on-disk failure can surface in different places in doctor extensions --json --verbose output depending on prior process state.
Verdict from investigation: C (Hybrid with implementation gap) — not user-visible broken behavior today (the W3 rebundle-loop bug class is closed in practice), but hidden architectural debt with non-deterministic observability and a known invariant bypass.
Full investigation report: swamp-uat/ROWSTATE_INVESTIGATION.md (1,918 words, empirically verified).
The two paths
Path 1 — W3 reconcile (intended)
- Triggered when
anyKindNeedsInvalidation()fires (sources changed, guards tripped, etc.) ReconcileFromDiskService→extension.recordBundleBuildFailed(...)→repository.saveAll([...])- Surfaces as
aggregateState.sourceDetails[].stateTag === \"BundleBuildFailed\" - Goes through the Extension aggregate root (I-Repo-1 invariant fires)
Path 2 — legacy buildIndex (fallback)
- Triggered when reconcile does NOT fire this process
- Failures captured into
result.failedduringbuildIndex - Surfaces as
registries.<kind>.failures[]with shape{ file, error } - Bypasses the Extension aggregate entirely
Same failure, different output shape and different field names, no normalized surface for consumers.
Two related findings (nested under this)
Finding A — recordValidationFailed has zero production callers.
Production code writes ValidationFailed rows via bundle_freshness.ts:398 (markCatalogValidationFailed), which calls catalog.upsert({state: 'ValidationFailed'}) directly — bypassing the Extension aggregate. The extension.recordValidationFailed method exists only for tests.
Finding B — Tombstoned rows are unreachable in sourceDetails[].
applyDiffForExtension:523-526 DELETEs Tombstoned catalog rows in the same transaction that records the transition. State is observable only via ReconcileResult.transitions[], which doctor extensions --json doesn't expose. Any test or consumer expecting Tombstoned in sourceDetails[] is checking an unreachable state.
Why this matters
- Non-deterministic observability. Tools and tests querying failure state via
doctor extensions --jsonget different shapes depending on process history. Test-authoring agents have to pin reconcile state explicitly to get predictable assertions. - Aggregate invariant bypassed. The
markCatalogValidationFaileddirect upsert means I-Repo-1 (cross-aggregate uniqueness) is not enforced for that write path. - Architectural docs are wrong. W1b documented 7 RowStates as a uniform surface; reality is 5 reachable in
sourceDetails[], 1 transient at construction (Bundled), 1 transient at the persistence layer (Tombstoned).
Proposed resolution
Three discrete pieces of work, in priority order:
- Consolidate failure write paths — pick one canonical mechanism (recommend W3 reconcile path since it goes through the aggregate). Migrate
buildIndexto use the same path, or document/normalize theregistries.<kind>.failures[]surface as a stable contract. - Route validation-failed writes through the aggregate — replace
markCatalogValidationFaileddirect upsert withrepository.saveAll([extension.recordValidationFailed(...)])so I-Repo-1 fires. - Tombstoned visibility decision — either expose
ReconcileResult.transitions[]indoctor extensions --json(richer doctor surface), or documentTombstonedas transient-at-persistence (lighter option, matches current reality).
Impact on UAT matrix
The swamp-uat extension test suite (EXTENSION_UAT_SUITE.md §9 RowState matrix) is being authored against current empirical reality, not the architectural ideal. When this issue ships, several test entries will become eligible for simplification:
ValidationFailed/BundleBuildFailed/EntryPointUnreadabledual-path test pairs collapse to single canonical-surface testsTombstonedabsence-from-sourceDetails tests can become positive presence tests (if option 3a is taken)
Not blocking on this — matrix work proceeds against current implementation.
Environment
- Discovered: 2026-05-02 during W6 (
doctor extensionsaggregate state) UAT integration testing - Affected: all swamp versions post-W3 (which introduced ReconcileFromDiskService alongside the legacy buildIndex path)
- Surfaces:
swamp doctor extensions --json --verbose, integration tests asserting on RowState
Related
- W1-W6 rearchitecture: see
design/extension-rearchitecture.md - W3 reconcile service:
src/libswamp/extensions/reconcile_from_disk_service.ts - Catalog DELETE behavior:
src/infrastructure/persistence/extension_repository.ts:523-526 - Aggregate-bypass write:
src/domain/extensions/bundle_freshness.ts:398
Closed
No activity in this phase yet.
stack72 commented 5/12/2026, 3:55:25 PM
▎ Closing in favor of #334, which captures the actionable subset of the architectural debt described here (invalidate-then-reconcile sequencing). The broader unification work (collapse registries.failures[] into sourceDetails[], route validation-failed ▎ writes through the aggregate, surface Tombstoned transitions) is real but premature to track as one ticket — better filed when a workstream is actually prioritized. See #334's "Related context" section for the deferred items.
Sign in to post a ripple.