Skip to main content
← Back to list
01Issue
FeatureClosedSwamp CLI
AssigneesNone

Relationships

#806 Optional scheduled / automatic datastore GC (retention-policy-driven pruning)

Opened by mgreten · 6/24/2026

First off — thank you for the recent @swamp/s3-datastore@2026.06.24.1 work (#666 per-namespace locks, #788 gc actually deleting S3 objects). #788 in particular is great: a real data gc now durably reclaims space (a recent run cleared ~104k versions / ~196 MB of expired poller history that previously never cleared).

This is a "could", not a "should" — just floating it in case it's useful:

On a high-churn S3 datastore (several frequent pollers each writing many short-lived, duration-expiring artifacts), expired data accumulates continuously, and reclaiming it requires running swamp data gc on a schedule. Today that means each user hand-rolls their own out-of-band scheduler + wrapper (dry-run first, a long sync timeout for the post-delete push, soft-skip on lock contention).

It might be nice if the datastore could optionally handle this itself — e.g. a config setting for periodic/automatic GC of duration-expired data, or a retention policy that prunes on write/sync so expired artifacts never pile up. Either would save every at-scale user from reinventing the same scheduler glue. Totally understand if it is out of scope or already planned — appreciate all the recent fixes regardless.

Upstream repository: https://github.com/systeminit/swamp-extensions

Environment

  • Extension: @swamp/s3-datastore@2026.06.24.1
  • swamp: 20260624.181631.0-sha.aa2ae00f
  • OS: darwin (aarch64)
  • Deno: 2.8.3
  • Shell: /bin/zsh
02Bog Flow
OPENTRIAGEDIN PROGRESSCLOSED

Closed

6/25/2026, 9:54:20 PM

No activity in this phase yet.

03Sludge Pulse
Editable. Press Enter to edit.

stack72 commented 6/25/2026, 9:54:19 PM

Hey @mgreten — we've consolidated this with #811 (method-summary report artifacts dominating the catalog export) into #823, which covers opt-in automatic GC for datastore data. Both issues share the same root cause: GC policies are declared on data metadata but only enforced during manual swamp data gc runs, causing unbounded catalog export growth on high-churn datastores. #823 captures the full picture and the approaches explored during triage. Thanks for filing this and for the detailed profiling data — it was instrumental in understanding the problem.

Sign in to post a ripple.