Skip to main content
← Back to list
01Issue
BugShippedSwamp CLI
Assigneesstack72

Relationships

#665 data gc --dry-run ignores the flag and performs a destructive GC

Opened by mgreten · 6/17/2026· Shipped 6/17/2026

Summary

swamp data gc --dry-run does not perform a dry run. Despite passing --dry-run, it executed a real garbage-collection pass and deleted data.

Environment

  • swamp 20260609.232501.0-sha.873563f2 (newer build 20260616.195738.0-sha.2c8bba58 available but not yet installed — unable to confirm whether fixed there; reproducible on the installed version)
  • Datastore: @swamp/s3-datastore v2026.06.03.2 (S3-compatible / MinIO backend)

Steps to reproduce

  1. Run swamp data gc --dry-run against a repo with accumulated data versions.

Expected

Per swamp help data gc, --dry-run should 'Show what would be deleted without deleting'. No mutation.

Actual

The command ran a real GC and reported:

INF data·gc GC complete: deleted 3430 expired items, 66382 excess versions reclaimed (137588260 bytes)
INF datastore·sync Pushing changes to "@swamp/s3-datastore"...

~70k versions / ~131 MB were reclaimed for real — the flag was silently ignored and a destructive operation proceeded.

Impact

--dry-run is the safe primitive operators reach for before a destructive GC. Silently executing the real operation under a dry-run flag is a footgun that can cause unrecoverable data loss. The flag should either gate all mutation or the command should error if the flag isn't honored.

Note

Also observed: data gc emitted dozens of identical WRN swamp·domain·data·lifecycle Ephemeral lifetime is not yet implemented lines (one per item) — likely should be deduped/summarized, but secondary to the dry-run bug.

02Bog Flow
OPENTRIAGEDIN PROGRESSSHIPPED+ 1 MOREASSIGNED+ 2 MOREREVIEW+ 3 MOREPR_MERGED+ 1 MORECONTRIBUTOR_NOTIFIED

Shipped

6/17/2026, 6:29:16 PM

Click a lifecycle step above to view its details.

03Sludge Pulse
stack72 assigned stack726/17/2026, 5:01:29 PM
Editable. Press Enter to edit.

mgreten commented 6/17/2026, 4:18:29 PM

Confirmed still present on 20260616.195738.0-sha.2c8bba58

Re-ran swamp data gc --dry-run after swamp update to the newer build. The flag is still ignored — it performed a real destructive GC:

INF data·gc GC complete: deleted 3433 expired items, 66394 excess versions reclaimed (137629018 bytes)
INF datastore·sync Pushing changes to "@swamp/s3-datastore"...

Additional finding: reclaimed versions are not actually deleted from the S3 backend

The counts are nearly identical across two separate gc runs (66382 then 66394 excess versions, ~131 MB each). The durable object count in the S3 datastore did not drop between runs (data/ stayed at ~217k objects, index stayed ~110 MB). This suggests GC reports reclaiming the local catalog/WAL but never persists the deletions to the S3 backend — so every run re-reports the same backlog and the bucket never shrinks.

Net: (1) --dry-run still mutates, and (2) the mutation it performs doesn't reach the backend, so it neither dry-runs nor durably collects. Both reproduce on the latest build.

mgreten commented 6/17/2026, 4:21:36 PM

Confirming for the record: swamp update + repo upgrade did not resolve this

To be explicit beyond the earlier ripple — after updating the CLI to 20260616.195738.0-sha.2c8bba58 and running swamp repo upgrade, swamp data gc --dry-run still performs a real destructive GC (flag ignored), and the reclaimed versions still don't durably leave the S3 backend (same ~66k versions / ~131 MB re-reported on each run; bucket object count unchanged). Both behaviors reproduce on the latest build. Filing this note so the issue isn't assumed fixed by the version bump.

stack72 commented 6/17/2026, 6:29:27 PM

Thanks @mgreten for reporting this! The fix has been merged and a release is on its way. We appreciate your contribution to swamp.

mgreten commented 6/17/2026, 6:49:27 PM

Thank you!!!!!!!

mgreten commented 6/17/2026, 11:03:57 PM

Verified fixed on 20260617.212026.0-sha.396e0952 — thank you

Confirmed the --dry-run half of this is resolved on the shipped build:

INF data·gc GC dry run: would delete 3619 expired items, would reclaim 71018 excess versions (144202579 bytes)

Two good signals: the output now reads "GC dry run: would deletewould reclaim" (was "GC complete: deleted …"), and the durable object count did not drop across the run (266501 → 266527; the small increase is unrelated concurrent writes, i.e. it went up, not down). So the flag is now honored and the dry run mutates nothing.

One open question: is the secondary finding covered?

This issue also captured a second behavior (earlier ripple): a real data gc reported reclaiming ~66k versions / ~131 MB but the versions never durably left the S3 backend — the bucket object count didn't drop and the same backlog was re-reported on each subsequent run. The dry run now correctly predicts reclaiming ~71k versions, but I haven't run a real (non-dry) gc on this build to confirm the deletions actually persist to the backend now.

Could you confirm whether the shipped status covers just the dry-run flag, or also the backend-durability behavior? If the latter is fixed too, a real gc should shrink the bucket object count — happy to verify and report back if useful.

Sign in to post a ripple.