Skip to main content
← Back to list
01Issue
FeatureShippedSwamp CLI
Assigneesstack72

Relationships

#826 Batch / prefix delete for swamp data delete (single lock acquisition)

Opened by mgreten · 6/26/2026· Shipped 6/26/2026

Problem

When a model has accumulated many data artifacts that need removing (e.g. a batch of orphaned per-run scratch artifacts, or a deprecated data-name family), the only tool today is swamp data delete <model> <name>, which removes one artifact at a time. Each call acquires the datastore lock and does a full index sync, so removing N artifacts means N separate lock acquisitions.

On a high-churn shared datastore that's self-defeating: clearing a backlog of (in our case) ~2,500 artifacts via 2,500 sequential data delete calls would contend on the very namespace lock the cleanup is meant to relieve, and at ~60–90s per lock-hold it isn't practically completable. GC doesn't help here because the artifacts aren't expired (separate ripple on #823 re: ephemeral collection), and deleting them directly from object storage behind the datastore's index risks leaving the catalog and backing store inconsistent.

Proposed solution

A batch/bulk delete on swamp data delete that performs the removal in a single lock acquisition + single index sync, e.g. one of:

  • swamp data delete <model> --prefix <str> — delete all data-names starting with a prefix.
  • swamp data delete <model> --glob '<pattern>' — glob match on data-name.
  • swamp data delete <model> --names a,b,c / --names-file <file> — explicit batch list.

A --dry-run (listing what would be removed) and confirmation gate would match the existing data gc ergonomics.

Alternatives considered

  • Looping swamp data delete per name — the status quo; lock contention makes it impractical at scale.
  • A model-side cleanup method — not currently possible: the model-method dataRepository context exposes findAllForModel / getContent / listVersions but no delete primitive.
  • Direct object-store deletion — bypasses the index and risks catalog/store divergence.

This generalizes beyond our case — any at-scale user accumulating per-run artifacts hits the same "how do I clear a family of data-names without N lock acquisitions" wall. Totally understand if it's out of scope or better folded into auto-GC; appreciate the consideration either way.

Environment

  • @swamp/s3-datastore@2026.06.24.1
  • swamp 20260625.225837.0
02Bog Flow
OPENTRIAGEDIN PROGRESSSHIPPED+ 1 MOREASSIGNED+ 2 MOREREVIEW+ 3 MOREPR_MERGED+ 1 MORECONTRIBUTOR_NOTIFIED

Shipped

6/26/2026, 3:41:08 PM

Click a lifecycle step above to view its details.

03Sludge Pulse
stack72 assigned stack726/26/2026, 10:01:56 AM
Editable. Press Enter to edit.

stack72 commented 6/26/2026, 3:41:18 PM

Thanks @mgreten for reporting this! The fix has been merged and a release is on its way. We appreciate your contribution to swamp.

Sign in to post a ripple.