Skip to main content
← Back to list
01Issue
FeatureOpenSwamp Club
AssigneesNone

Relationships

#629 Remote execution: comprehensive reference documentation

Opened by adam · 6/11/2026

Remote execution shipped in swamp-club#535 (PR #1547). The only documentation today is the internal design doc (design/remote-execution.md) and a short section in the swamp skill — there is no user-facing reference. This issue records the full coverage a reference doc set needs.

Scope of changes

A new reference section for remote execution, written for two audiences: operators provisioning workers, and workflow authors placing steps on them. Affected components are documentation only (docs site / reference tree plus cross-links from the workflow and serve docs).

Documentation coverage needed

1. Concepts & architecture overview

  • Orchestrator vs worker roles; workers are stateless compute that dial home (outbound ws control socket + HTTP/2 data plane); orchestrator owns the durable world.
  • The executor concept (local loopback vs remote worker) and how it replaced execution drivers — include a migration note for anyone with driver: fields.
  • Capability proxying: every datastore/vault/definition access goes back through the orchestrator; workers hold no credentials.

2. Getting started walkthrough

  • Mint a token, start swamp serve, connect a worker, run a placed workflow step end-to-end.
  • Deployment one-liners: ssh, cloud-init, container, k8s Job, CI runner.

3. CLI reference (every command, both log and --json output documented)

  • swamp worker token create <name> --duration <dur> [--vault <vault>] — plaintext shown exactly once; <name>.<secret> credential format.
  • swamp worker token list — recorded vs effective (display) state, expiry, bound machine.
  • swamp worker token revoke <name>.
  • swamp worker connect <url> --token [--label k=v ...] [--cache-dir] [--data-plane-url] [--no-reconnect].
  • swamp worker list — pool view, status, labels.
  • swamp serve / --server flag for running workflows and methods through a server.

4. Enrollment token lifecycle reference

  • State machine: unused → enrolled → expired plus revoked; what each state means and what transitions it.
  • Machine binding: token binds to a machine-id file in the worker cache directory on first redemption; the bound machine reconnects across blips, restarts, and reboots; any other machine is rejected.
  • The --cache-dir requirement for a stable machine identity (default temp dir = new identity per process) — this is the most likely operator footgun and deserves a callout box.
  • Expiry is a hard deadline: the orchestrator disconnects a connected worker when the lifetime elapses and rejects re-enrollment; revoke for early cutoff.
  • Restart/reboot scenarios spelled out as a table: socket blip, process restart, machine reboot, token expired, token revoked, second machine.

5. Workflow placement reference

  • Placement fields on a step: target: (worker name or instance uuid) and labels: selector map; queuing behavior when no worker matches; interaction with non-placed steps.
  • How outputs, reports, and follow-up actions behave identically on a remote executor.

6. Failure & reconnection semantics

  • Reconnection grace window; what happens to an in-flight dispatch on socket drop (no-write steps re-dispatch; write-bearing steps fail the run — the write-then-fail rule).
  • Cooperative cancellation of dispatched steps.
  • Session credential lifetime and sliding refresh (operator-visible only as log lines, but worth documenting for debugging).

7. Security reference

  • Trust model: bearer {token, machineId} over wss; machine id is client-asserted (contains accidental reuse, not a malicious token holder); recommend short enrollment lifetimes, TLS, and certificate pinning.
  • What a worker can and cannot see: no datastore/vault credentials, per-dispatch environment snapshot held in memory only, secrets resolved orchestrator-side.
  • The reverse trust direction: connecting a worker to an untrusted URL grants that URL code execution (same model as a self-hosted CI runner).

8. Operations guide

  • Monitoring the pool (swamp worker list, querying the built-in models with swamp data).
  • Worker state as swamp data: swamp/worker, swamp/enrollment-token, swamp/step-lease built-in models and their queryable fields.
  • Troubleshooting table: protocol_mismatch (binary version lockstep), already_connected, already bound to another machine, expired, revoked, does not match — cause and fix for each.
  • Upgrade guidance: protocol version lockstep means orchestrator and workers upgrade together.
  • Workflow guide (placement fields), serve docs (--server), vault docs (secret resolution path), extension docs (bundle shipping/fingerprints).

Acceptance criteria

  • Every CLI command above has a reference page with flags, examples, and JSON output shape.
  • The lifecycle/restart table and the --cache-dir callout exist.
  • The troubleshooting table covers all permanent enrollment failure messages.
  • Getting-started walkthrough verified end-to-end on a clean machine.
02Bog Flow
OPENTRIAGEDIN PROGRESSSHIPPED

Open

6/11/2026, 8:18:55 PM

No activity in this phase yet.

03Sludge Pulse

Sign in to post a ripple.