Skip to main content
← Back to list
01Issue
FeatureOpenUAT
AssigneesNone

Relationships

#630 Remote execution: UAT coverage for tokens, enrollment, and dispatch

Opened by adam · 6/11/2026

Remote execution shipped in swamp-club#535 (PR #1547) with unit and integration tests, but there is no user-acceptance test suite that exercises the feature the way an operator would — real processes, real sockets, real restarts. This issue records the UAT scenarios that need writing.

Scope of changes

A UAT suite (scripted scenarios runnable against a built swamp binary, orchestrator and workers as separate OS processes) covering the scenarios below. Each scenario should assert on user-visible surfaces: CLI output (both log and --json modes), swamp worker list / swamp worker token list state, and workflow run results — not on internals.

UAT scenarios needed

Token lifecycle

  1. Mintswamp worker token create prints the <name>.<secret> credential exactly once; plaintext never appears in swamp data output or on disk outside the vault.
  2. Duplicate name — minting the same token name twice fails with the documented error.
  3. List statesswamp worker token list shows unused after mint, enrolled after first connect with the bound machine id, expired past the lifetime, revoked after revoke.
  4. Revoke — a revoked token cannot enroll; revoking twice is idempotent.

Enrollment & machine binding

  1. Happy path — worker connects, enrolls, shows idle and connected in swamp worker list; labels/platform/arch recorded.
  2. Restart survival — worker with --cache-dir is killed and restarted; it re-enrolls on the same token as the same pool member. Repeat across a simulated reboot (fresh process tree).
  3. Temp cache dir caveat — worker WITHOUT --cache-dir is restarted; re-enrollment is rejected (already bound to another machine) and the worker process exits permanently rather than retry-looping.
  4. Second machine rejected — same token from a different machine (or different cache dir) while the first is enrolled: rejected, first worker unaffected.
  5. Concurrent duplicate — second live connection presenting the same token while the first is connected: rejected with already_connected.
  6. Version lockstep — a worker binary with a different protocol version is rejected at enrollment with protocol_mismatch and stops cleanly.

Expiry enforcement

  1. Hard deadline while connected — mint a short-lived token, connect, wait past expiry: orchestrator disconnects the worker, the reconnect attempt fails with expired, the worker process stops, the pool entry is removed after the grace window, and token list shows expired.
  2. Expired before first use — an unused token past its lifetime cannot enroll.
  3. Expiry while disconnected — worker drops, token expires during the grace window, reconnect is rejected.

Reconnection & failure semantics

  1. Socket blip — kill the network path (not the process); worker reconnects within the grace window as the same pool member; a queued step then dispatches to it.
  2. Worker death mid-step, no writes — kill the worker during a read-only step: the step re-dispatches (to a reconnected or another matching worker) and the run succeeds.
  3. Worker death mid-step, after a write — kill the worker after the step persisted data: the run fails with the documented write-then-fail behavior and partial data is visible/inspectable.
  4. Cancellation — cancel a workflow with a step in flight on a worker: the dispatch is cooperatively cancelled, the worker returns to idle, no orphaned lease remains.

Dispatch & scheduling

  1. Label selection — steps with labels: selectors dispatch only to matching workers; with no match the step queues and dispatches when a matching worker enrolls.
  2. Direct targetingtarget: by worker name and by instance uuid.
  3. Serialization — two steps placed on one worker run serially; a busy worker is not double-dispatched.
  4. Fan-out — a matrix/expanded workflow spreads steps across multiple enrolled workers and all outputs land in the datastore with correct versions.

Data, secrets, and environment

  1. Capability proxying — a placed step reads prior step data, queries with CEL, and writes outputs; results identical to local execution.
  2. Bundle shipping — a step using a pulled extension runs on a worker with no local extensions; second dispatch hits the worker bundle cache (assert via timing or logs).
  3. Secrets stay home — a step resolving a vault secret succeeds; the secret value never appears in the worker cache directory or worker logs.
  4. Environment snapshot scoping — orchestrator env vars are visible to the method during the step and gone from the worker process after it completes.
  5. Large artifacts — a step producing a multi-hundred-MB file output streams through the data plane and round-trips intact (checksum).

Operability & UX

  1. JSON mode parity — every swamp worker * command produces valid, documented JSON with --json and human-readable output without.
  2. swamp serve --server — workflows and methods run through a server end-to-end.
  3. TLS — worker connects over wss://; a wrong/self-signed certificate is rejected unless pinned/trusted as documented.
  4. Cross-platform workers — scenarios 5–7 pass with workers on Linux, macOS, and Windows (path handling for the cache directory and machine-id file).

Acceptance criteria

  • Each scenario is automated and runnable against a release binary in CI or a bench environment, with orchestrator and workers as separate processes.
  • Scenarios assert on user-visible behavior (CLI output, pool/token state, run results), not internal APIs.
  • A coverage map ties each scenario above to its script so gaps stay visible.
02Bog Flow
OPENTRIAGEDIN PROGRESSSHIPPED

Open

6/11/2026, 8:19:32 PM

No activity in this phase yet.

03Sludge Pulse

Sign in to post a ripple.