Relationships
#666 Single __global__ datastore lock serializes unrelated writes across all repos/namespaces
Opened by mgreten · 6/17/2026· Shipped 6/24/2026
Summary
The S3 datastore serializes every operation behind one __global__ lock. With conditional-write locking enabled (v2026.06.03.2), an interactive write in one repo waits behind any operation from any other repo or scheduled job sharing the bucket — including unrelated workflow runs and high-frequency pollers. A trivial ~1 KB vault write took ~70s end-to-end, with the dominant cost being lock acquisition, not the payload.
Environment
@swamp/s3-datastorev2026.06.03.2 (MinIO backend, conditional-write locking)- swamp
20260609.232501.0-sha.873563f2 - Shared bucket: all repos at bucket root + per-repo prefixes/namespaces; several launchd jobs poll on 60s/120s/180s intervals and acquire the same global lock.
Observed
vault put timeline (two consecutive writes):
- 'Syncing vault data from datastore' leg: ~35-40s each
- 'Pushing vault changes to datastore' leg: ~35s each, ending 'no changes'
Re-timing in isolation once the lock was warm and uncontended:
datastore sync(0 pulled / 0 pushed): ~1.0sdatastore compact: ~11s, of which ~9s was 'Acquiring lock for "global"'
So steady-state sync is ~1s; the multi-second-to-minute stalls are lock-wait while another process holds __global__. The contention is independent of what is being written.
Contributing factor
The datastore index (.datastore-index.json) is ~110 MB over ~262k objects (the catalog is shared across all repos via prefixes). Every sync pulls/pushes the whole index under the global lock, lengthening each lock hold and widening the window for collisions.
Expected
Lock scope should match the unit of isolation the datastore already exposes (datastore namespace / per-repo prefixes). A write to namespace A should not block a write to namespace B.
Proposed
- Per-namespace (or per-prefix) locks instead of one
__global__lock, so cross-repo / cross-namespace writes run concurrently. - And/or incremental index sync so a small write doesn't pull/push the entire ~110 MB manifest under lock.
Workaround in place
Staggering the polling intervals of the scheduled jobs that share the bucket to reduce collision frequency — mitigates but doesn't fix the underlying single-lock serialization.
Upstream repository: https://github.com/systeminit/swamp-extensions
Environment
- Extension:
@swamp/s3-datastore@2026.06.03.2 - swamp:
20260609.232501.0-sha.873563f2 - OS:
darwin(aarch64) - Deno:
2.8.2 - Shell:
/bin/zsh
Shipped
Click a lifecycle step above to view its details.
mgreten commented 6/17/2026, 4:21:28 PM
Not resolved by swamp update + repo upgrade
Updated the CLI 20260609.232501.0-sha.873563f2 → 20260616.195738.0-sha.2c8bba58 and ran swamp repo upgrade. The __global__ lock contention is unchanged — datastore sync/compact/vault put still serialize behind a single global lock, so a write in one repo/namespace still waits on operations from any other sharing the bucket.
Note: swamp update/repo upgrade does not pull the @swamp/s3-datastore extension (still v2026.06.03.2 here), so the datastore behavior wouldn't change from a CLI bump alone — but flagging that the upgrade path did not surface or ship a datastore version that addresses this. Still reproduces on the latest CLI build.
Current mitigation on our side: de-aligned the polling intervals of the launchd jobs sharing the bucket (was 60/120/180s, all sharing a 60s beat → coincided every 180s; now 90/150/210s → coincide only ~every 630s) to reduce collision frequency. This narrows the window but does not address the single-lock serialization.
mgreten commented 6/17/2026, 7:03:03 PM
Root cause located — this is a swamp-core issue, not an @swamp/s3-datastore extension issue
Please re-route/re-scope: the __global__ lock is hardcoded in swamp core, not in the S3 datastore extension. Apologies for filing under the extension — the contention symptom surfaces through the S3 backend, but the lock-key decision lives in core.
Where
src/infrastructure/persistence/datastore_sync_coordinator.ts:
// line 81
export const GLOBAL_LOCK_KEY = "__global__";
// line 224 — the public entry point always passes the global constant
export async function registerDatastoreSync(options: RegisterDatastoreSyncOptions): Promise<void> {
await registerDatastoreSyncNamed(GLOBAL_LOCK_KEY, options);
}So every datastore sync — across every repo and every namespace sharing a bucket — serializes on the single key __global__.
Why namespacing does not help today
swamp datastore namespace set + migrate change the object prefix layout, but the lock-acquisition path never reads the namespace — it calls registerDatastoreSync, which hardcodes GLOBAL_LOCK_KEY. Verified locally: with zero namespaces registered, datastore compact/sync still acquire __global__. Migrating to namespaces would relocate objects but give no contention relief. So there is currently no swamp-native way to avoid the global lock other than reducing collision frequency (e.g. staggering scheduled writers).
Suggested fix (the plumbing already exists)
registerDatastoreSyncNamed(key, options) is fully key-parameterized — registerDatastoreSync is the only thing forcing the global constant. Deriving the key from the repo's namespace (or repo id) instead of hardcoding GLOBAL_LOCK_KEY would scope locks per-namespace, so a write to namespace A no longer blocks a write to namespace B. This looks like a small, well-contained change: thread the namespace through the single call site at line 224 rather than passing the constant.
Open question for maintainers: is the global lock intentionally coarse to protect the shared .datastore-index.json (one manifest for the whole bucket)? If the index is bucket-global, per-namespace locks would also need a per-namespace (sharded) index to be safe — otherwise concurrent namespaced writers would race on the shared manifest. If so, the index sharding from the earlier note in this thread and the lock-scoping fix are the same piece of work.
Environment
- swamp
20260616.195738.0-sha.2c8bba58(reproduces on latest) @swamp/s3-datastorev2026.06.03.2, MinIO backend
stack72 commented 6/17/2026, 11:47:58 PM
Hey @mgreten, thanks for the detailed report — the timeline breakdowns are really helpful.
Before we dig into a fix, we want to understand your setup a bit better. There are actually two existing mechanisms that scope the lock away from the single global .datastore.lock:
S3
prefix— the S3 client prepends the configuredprefixto every key, including the lock. So if repo A usesprefix: "repo-a"and repo B usesprefix: "repo-b", their locks land at different S3 keys (repo-a/.datastore.lockvsrepo-b/.datastore.lock) and don't contend.Core
namespace(giga-swamp) — whennamespaceis set in.swamp.yaml, the lock moves to.locks/{namespace}.lock. Combined with the S3 prefix, that's{prefix}/.locks/{namespace}.lock.
Both of these are already wired end-to-end through the S3 extension. So a few questions to narrow down what's happening:
- Do your repos each have a distinct
prefixin their S3 datastore config, or do they share the same prefix (or no prefix)? - Do any of the repos have
namespaceconfigured? - Are the launchd jobs operating on different repos, or are some of them running multiple commands against the same repo?
The answer changes the shape of the fix significantly — if the repos share a prefix and don't have namespaces, the quickest path might be configuring distinct prefixes or namespaces. If they already have isolation configured and the contention is still happening, that's a different problem.
The 110MB index is a separate concern — even with scoped locks, a large index extends lock hold times. We're tracking that as an independent optimization (per-namespace index partitioning), but want to understand the lock contention piece first.
mgreten commented 6/18/2026, 6:01:12 PM
Thanks Stack — that's the missing piece. Here's how my setup is configured, plus an update: after digging into your two mechanisms, I went ahead and migrated, which changed some of the answers.
Q1 — prefixes: Before today, all my S3 repos had distinct prefixes except agentic-tooling, which had none (wrote to the bucket root). The others:
| Repo | prefix |
|---|---|
| moment-savor | moment-savor |
| matgreten.dev | matgreten-dev |
| offline-meeting-transcriber | offline-meeting-transcriber |
| homelab-docker-configs | docker-configs |
(unifi / Obsidian / swamp-cli-agent are local-filesystem datastores, not in the bucket.)
Q2 — namespaces: None were set originally. As of today I've namespaced agentic-tooling (see migration below), so it now has namespace: agentic-tooling; the others are still un-namespaced but prefix-isolated.
Q3 — launchd jobs: This turned out to be the crux. All my high-frequency bridges — pr-watch, todoist-poll, marketing-smoke, db-scale-sync — plus several ADW agents (oop-*, review-monitor-*, todoist-watcher) target the same repo, agentic-tooling. So my contention isn't cross-repo at all; it's many writers against one repo. I confirmed in the bucket there was exactly one lock, at the root, held by agentic-tooling — the other repos' locks sit under their own prefixes and never contended.
What I did: Since agentic-tooling was the lone repo writing to the bucket root, I set a namespace (agentic-tooling) and ran datastore namespace migrate --confirm to move it off the root into agentic-tooling/ (~270k objects / 248 MB), then deleted the orphaned old root layout. Goal was to get it off the shared root and onto the "correct" isolated config your mechanisms expect.
A couple of findings from the migration, in case they're useful:
namespace setimmediately split-brains a busy repo. The moment the namespace is set, new writes route to the namespaced path while existing data stays at root — and thenmigraterefuses with a merge conflict (both directions blocked). I had to pause all writers (14 launchd agents) and force-release a stale lock before a cleanset → migrate → syncworked. Might be worth a doc note that migration requires a quiesced repo.- The push exceeded the 300s default sync timeout at this scale;
--timeout 1800got it through. - A monolithic root
.datastore-index.jsonkeeps regenerating even under the namespace — after migrating and deleting it, a couple of routine commands recreated a ~100 MB root index. So the namespace relocates data and the lock, but the root index doesn't seem fully scoped out.
The result — and where I think this lands: the migration is good hygiene (agentic-tooling is now properly isolated), but it didn't change my contention, exactly as expected: the lock just moved to agentic-tooling/.locks/..., and the same set of jobs still serialize on that one lock because they all operate on this single repo. Splitting the tools into separate repos isn't viable — it's one cohesive toolset sharing state. So my remaining question is the same: is there a lever for concurrency within a single repo (per-command / per-model lock scoping for these read/write profiles), or is staggering the writers the expected remedy? Spreading my poller intervals apart is what's actually helped so far.
(And agreed the ~110 MB index is a separate concern — though finding #3 above suggests the root index lingers even after namespacing, if that's relevant to the partitioning work.)
mgreten commented 6/23/2026, 6:12:40 PM
Index-composition data (re: the ~110 MB manifest contributing factor)
Profiled our agentic-tooling namespace to quantify what fills the manifest that gets pulled/pushed under the global lock.
95% of the index is non-latest versions. Of 109,152 catalog rows / 156.6 MB tracked, only 13,884 rows (7.1 MB) are current state — the rest is historical versions dragged through every sync.
Two structural drivers, both swamp-side defaults rather than user data:
- Auto-generated per-run report artifacts (
report-swamp-method-summary/-json) — emitted on everymodel method runand retained as versions. These alone are ~half the manifest (adw-session 67.6 MB, github-pr-feed 37.8 MB, cli-agent 3.3 MB). High-frequency runners inflate this fast. - Per-
data_nameversion accumulation for frequent writers — single names reaching version 5,000+.
Two adjacent notes:
- Even with per-model locks now in place, the end-of-run push still takes the global lock and ships the whole per-file manifest, so manifest size gates every write — not just structural commands.
ephemerallifetime logsEphemeral lifetime is not yet implemented, so ephemeral-tagged artifacts never auto-collect.
This reinforces the incremental/sharded-index ask: capping report-artifact retention by default + incremental manifest sync would shrink the global-lock hold independent of the lock-scoping work.
Important caveat: the obvious mitigation (swamp data gc) does not shrink the manifest on S3 datastores — it prunes the catalog but never deletes the backing S3 objects, so the per-file manifest is unchanged. Filed separately as #788.
mgreten commented 6/24/2026, 4:39:12 PM
Update on 20260617.212026.0-sha.396e0952 (newer than the 20260609 this was filed against): the per-namespace split appears to have partly landed, but within-namespace serialization is still total.
The lock key is now .locks/<namespace>.lock (observed .locks/agentic-tooling.lock) rather than the single __global__ lock — so the "namespace A shouldn't block namespace B" ask looks at least partially addressed. Worth confirming that's intended.
However, writers in the same namespace are still fully serialized with no queuing/fairness, and it fails routine concurrency hard. Reproduction — 4 concurrent datastore sync against the same namespace, creds healthy (datastore status = 22ms), nothing else running:
run 1: LockTimeoutError after 60043ms (holder pid 32747)
run 2: LockTimeoutError after 60062ms (holder pid 32747)
run 3: LockTimeoutError after 60021ms (holder pid 32747)
run 4: Sync complete: 36 pulled, 0 pushed — real 109.62sSo 1 of 4 wins (taking 109s), the other 3 hit the 60s maxWaitMs and LockTimeoutError. This matches the contributing factor you noted: the winning sync holding the lock for ~110s lines up with the ~110MB index being pulled/pushed under lock. At that hold duration, any second same-namespace caller within the window is guaranteed to wait, and a third+ caller times out.
Two practical consequences worth flagging for the fix:
- Two ordinary same-namespace invocations at once (e.g. an ADW phase-runner step and a
my-linearmethod call in one repo) reliably produce a 60sLockTimeoutErroron the loser — not an edge case. - A ~110s lock hold is long enough that a caller under a ~2-min command/CI timeout can be SIGKILL'd mid-hold, orphaning the lock (the #218 path) — so the index-hold duration and the orphan-loop are coupled. Incremental index sync would shrink both the contention window and the orphan risk.
mgreten commented 6/24/2026, 7:02:23 PM
Confirming the per-namespace lock split shipped in @swamp/s3-datastore@2026.06.24.1 — thank you, the cross-namespace half of this is resolved. Verified on darwin after swamp update (binary 20260624.181631.0) + swamp extension update @swamp/s3-datastore:
- The lock key is now
.locks/<namespace>.lock(was the prefix-less root.datastore.lock). Writes in different namespaces no longer serialize against each other. 👍
One note in case it's useful for scoping the remainder: the second ask in the original report — incremental index sync so a small write doesn't pull/push the full ~110 MB manifest under the lock — does not appear to have landed, and it still produces total serialization within a single namespace. Repro (creds healthy ~20ms, lock clean, nothing else running), 4 concurrent datastore sync against the same namespace:
run A: Sync complete (1 pulled) — held the lock ~50s
run B: LockTimeoutError after 60500ms (same holder)
run C: LockTimeoutError after 60746ms (same holder)
run D: LockTimeoutError after 60266ms (same holder)So 1 winner + N-1 timeouts at the 60s maxWait, the winner's hold dominated by the whole-index pull under lock. (The hold did drop from ~117s pre-update to ~50s, presumably as the index converged.) Our real-world trigger is several launchd pollers all writing the same namespace, which the per-namespace split doesn't separate.
Would you prefer to track the incremental-index-sync / intra-namespace-concurrency piece by reopening this, or as a fresh feature request? Happy to file a focused one with the repro either way. Thanks again for the quick turnaround on both this and #788.
mgreten commented 6/24/2026, 8:12:56 PM
One more concrete data point for the incremental-index-sync half, in case it helps make the case (and thanks again for the per-namespace split):
Running a periodic data gc on the high-churn namespace, I noticed the lock-hold time is essentially independent of how much is being deleted — it's dominated by the full-index rewrite/push. A GC that expired only 7 entries still held the namespace lock for ~5.5 minutes, because the post-delete push rewrites the whole ~110 MB .datastore-index.json under the lock. (A larger ~104k-version GC took about the same order of magnitude.) So even tiny, frequent maintenance writes pay the full-index cost while holding the lock — which is exactly the window where the other intra-namespace pollers then time out at the 60s maxWait.
That seems like a strong argument for the incremental / partitioned index sync mentioned in the original report: it would shrink both the lock-hold time and the collision window in one go. No urgency — just adding the measurement. Appreciate the work here.
mgreten commented 6/25/2026, 4:14:10 AM
Followed up on my reopen-vs-new question above by filing the remainder as a fresh feature, #812 (same-namespace writers still fully serialize on .locks/<namespace>.lock; the 4-concurrent-sync repro lives there). Also split out the manifest-size contributor — auto-generated method-summary artifact retention — as #811. Both cross-link back here. Totally happy to consolidate into #666 instead if you'd rather reopen it; just didn't want to flip a correctly-shipped issue back open. Thanks again for the per-namespace split.
mgreten commented 6/26/2026, 2:02:54 AM
A follow-up with fresh measurements, in case it's useful for scoping the remaining incremental-sync work — no pressure on prioritization, just data.
Per-namespace locks (this issue) are working as intended: the lock is now .locks/<namespace>.lock and isolates writes between namespaces. But for a single busy namespace, write throughput is still gated by something downstream of the lock: every write pulls and pushes the whole index under the lock, so lock-hold scales with total index size rather than with the shard a given write touches.
Measured on aligned latest versions (@swamp/s3-datastore@2026.06.24.1, swamp 20260625.225837.0 on both writer machines), sampling the lock object every 2s and attributing each hold to its command:
- p50 hold ≈ 94–102s, max 268s, across a ~20-minute window.
- The longest holds were spread across completely different operations — a notification send, a small provider-lookup, a poller refresh — all converging on the same ~268s ceiling. That uniformity is the tell: the duration tracks the shared per-write index sync, not what the operation logically does.
- Index at the time was ~93 MB; a single high-cardinality data stream was ~40% of it, so it inflated the hold for every writer in the namespace, not just its own.
Net effect under normal concurrent load (a few pollers + a pipeline + a second machine): writers routinely time out at the 60s lock-acquire ceiling because a prior writer is mid-sync.
The partitioned _index/ shards reduced storage nicely, but the lock-hold seems to still be a full-index operation. If incremental/scoped sync (only sync the partition a write actually touched) is on the roadmap, this is the case it would help most. Glad to share the raw episode log or the sampling script if that'd be useful. Thanks for all the datastore work lately — #788 in particular has been reclaiming real space for us.
mgreten commented 6/26/2026, 2:33:13 AM
Following up with the workload context behind my earlier note — I don't think I've ever actually described the shape of what I'm running, and it may explain why per-namespace locks (which are great) didn't move the needle for my case. Sharing in case it's useful for thinking about the incremental-sync remainder; no ask on priority.
The workload: a fan-out pipeline that funnels into one namespace.
I run an automated dev-workflow system that executes many pipeline runs concurrently, each in its own git worktree (isolated filesystem, branch, ports). Each run is event-driven and emits a datastore write on every phase transition and every agent call — so a single run produces dozens of writes, and I often have several runs in flight at once. Separately, ~13 scheduled jobs on one machine and a second machine write the same datastore continuously.
The catch: every one of those concurrent runs writes to the same namespace. The worktrees isolate the filesystem, but they all point their datastore calls at one control-plane repo → one namespace → one lock. So per-namespace locking (#666) isolates me from other namespaces, but my entire concurrent workload is inside a single namespace and serializes on its lock. With the whole index synced under the lock per write (the measurements in my earlier ripple: trivial ops holding 85–268s), concurrent writers routinely hit the 60s lock-acquire timeout.
Where this points (thinking out loud, not a prescription):
Two things would each independently unblock a fan-out workload like this:
Incremental/scoped index sync (the remainder here) — if a write only synced the partition it touched rather than the whole index, even same-namespace concurrent writers would stop blocking each other for tens of seconds. This is the more general fix and would help every at-scale user, not just fan-out ones.
A lighter-weight per-run/ephemeral namespace primitive — something that let a short-lived job cheaply get its own lock scope (and fold its data into a parent namespace afterward) without standing up a whole separate repo checkout + recreating model instances by hand. Today the only way to get a second lock is a second repo checkout, which is heavy for a per-run pattern.
Totally possible #1 alone makes #2 unnecessary, and equally possible this is all already in view for the incremental-sync work. Mostly I wanted to put the actual topology on the record so the design has the fan-out case in mind. Happy to share the per-run write trace or the lock-sampling data if it'd help. Thanks again — the recent datastore work has been genuinely useful even as I work through this.
Sign in to post a ripple.