Skip to main content
← Back to list
01Issue
BugShippedSwamp CLI
Assigneesstack72

#220 workflow run search returns empty after S3 datastore setup, despite run YAMLs in datastore

Opened by ynm · 5/4/2026· Shipped 5/4/2026

Description

After switching a repo from filesystem to S3 datastore via swamp datastore setup extension @swamp/s3-datastore --config '{...}' --skip-migration, swamp workflow run search returns no results — even though the S3 bucket contains a populated workflow-runs/<workflow-id>/workflow-run-*.yaml tree from prior runs (executed by other contributors against the same shared bucket).

The --skip-migration path is currently the only supported route into this configuration on macOS because of swamp Lab #213 (deno TLS panic on migration), so this is the realistic onboarding flow for any new developer joining a project that's already on a shared S3 datastore.

swamp datastore status reports the S3 datastore as healthy. swamp datastore sync reports "already up to date" / "0 pulled, 0 pushed", yet the local cache .swamp/workflow-runs/ stays empty until the files are fetched manually with aws s3 sync. Even after the files are in the local cache, swamp workflow run search still returns empty results — so the search command appears to consult a separate index (similar to .swamp/data/_catalog.db) that is neither shipped via S3 nor rebuilt from the synced YAML files.

Steps to reproduce

  1. Start with a repo whose shared S3 datastore already has run history (i.e. another contributor has executed workflows against it).
  2. On a fresh clone, run:
    swamp datastore setup extension @swamp/s3-datastore \
      --config '{"bucket":"<bucket>","region":"<region>"}' \
      --skip-migration
  3. swamp datastore status --json → reports @swamp/s3-datastore, healthy.
  4. swamp workflow search --json → returns the workflow definitions from S3.
  5. swamp workflow run search --limit 100 --json → returns {"query":"","results":[]}.
  6. aws s3 ls s3://<bucket>/workflow-runs/ → shows per-workflow directories with workflow-run-*.yaml and .log files.
  7. swamp datastore sync → reports already up to date / no changes.
  8. Manually aws s3 sync s3://<bucket>/workflow-runs/ .swamp/workflow-runs/ → 20+ run files land locally.
  9. Re-run swamp workflow run search --limit 100 --json → still {"query":"","results":[]}.
  10. Parsing the local YAML files directly confirms the run history is well- formed (status, startedAt, workflowName all populated).

Expected behavior

swamp workflow run search should surface workflow runs that exist in the shared datastore. Either:

  • swamp datastore sync should populate whatever local index workflow run search consults (mirroring how the s3-datastore extension already syncs data/, workflows-evaluated/, etc.), OR
  • swamp workflow run search should fall back to scanning the workflow-runs/ directory directly when the index is empty / stale.

Affected components

  • @swamp/s3-datastore extension's sync logic — appears to skip workflow-runs/ (the directory exists in S3 but isn't part of the sync manifest).
  • swamp workflow run search — depends on a local index that the S3 datastore's sync flow doesn't seed for new contributors.

Fix approach (high-level)

Either include workflow-runs/ in the s3-datastore sync set so the local index can be rebuilt from synced YAMLs, or add an index-rebuild step to datastore setup and datastore sync that scans workflow-runs/ and indexes any runs not already present. A lazy rebuild on first workflow run search call (when the index is empty but YAML files exist) would also make the failure self-healing.

Environment

  • swamp version: 20260421.213501.0-sha.0432a31a
  • Datastore extension: @swamp/s3-datastore 2026.04.28.4
  • OS: macOS (Darwin 23.1.0)
  • Setup invoked with --skip-migration due to Lab #213.
  • #213 — Filesystem→S3 migration TLS panic (forces --skip-migration, which is the path that exposes this bug).
  • #218 — S3 datastore stale-lock loop (different S3 sync issue).
  • #201 — Catalog retains stale source-file entries (catalog/index consistency theme).
02Bog Flow
OPENTRIAGEDIN PROGRESSSHIPPED+ 1 MOREASSIGNED+ 7 MOREREVIEW+ 5 MOREPR_MERGEDSHIPPED

Shipped

5/4/2026, 2:12:20 PM

Click a lifecycle step above to view its details.

03Sludge Pulse
stack72 assigned stack725/4/2026, 12:06:14 PM
Editable. Press Enter to edit.

ynm commented 5/4/2026, 11:49:56 AM

Re-verified on swamp 20260502.153639.0-sha.907c6883 (the original report was filed against 20260421.213501.0-sha.0432a31a). The bug is unchanged:

  • swamp datastore status --json → healthy @swamp/s3-datastore.
  • rm -rf .swamp/workflow-runs/* then swamp datastore sync → reports "already up to date / 0 pulled, 0 pushed", and .swamp/workflow-runs/ stays empty even though aws s3 ls s3://<bucket>/workflow-runs/ shows populated per-workflow directories with workflow-run-*.yaml and .log files.
  • swamp workflow run search --limit 100 --json → still {"query":"","results":[]}.

So the upgrade between 20260421 and 20260502 doesn't address the issue — workflow-runs/ is still excluded from the s3-datastore sync manifest, and workflow run search still has no fallback to scan the synced YAML files when the local index is empty.

stack72 commented 5/4/2026, 1:26:22 PM

Filed sibling lab #221 for the user-facing manual edits at content/manual/reference/datastore-configuration.md (Setup Pipeline list, --skip-migration flag descriptions). Tracked separately so the swamp-club docs flip with the swamp release that ships PR https://github.com/systeminit/swamp/pull/1290, rather than with the swamp commit itself.

ynm commented 5/4/2026, 4:53:59 PM

Filed #225 for the residual case the ship in this issue did not cover: buckets that already have data under standard prefixes but no .datastore-index.json at the root. The new Hydrating cache from remote... step in setup reports Hydrated: 0 pulled against such a bucket because the path is purely index-driven; there is no ListObjectsV2 fallback when HeadObject on the index returns 404.

Concretely on our shared bucket: 7 populated workflow-runs/<id>/ prefixes plus data/, outputs/, etc., and aws s3api list-objects-v2 --query 'Contents[?starts_with(Key, .)].Key' returns only .datastore.lock. After the upgrade swamp datastore status is healthy and setup exits clean, but the local cache stays empty and follow-up reads (swamp data list, swamp workflow run search, …) return nothing.

#225 proposes the discover-unindexed-on-404 fallback in pullIndex() so hydrate is self-healing for any bucket regardless of how its contents got there.

Sign in to post a ripple.