Skip to main content
← Back to list
01Issue
FeatureShippedSwamp CLI
Assigneesadam

Relationships

#535 Remote execution: orchestrator/worker fan-out (replaces execution drivers)

Opened by adam · 6/3/2026· Shipped 6/11/2026

Problem

swamp execution is single-host and in-process. There is no way to fan a workflow or method run out across many machines, and the existing execution-driver abstraction (raw / docker) only chooses how a method runs locally — not which machine it runs on. We want first-class remote execution: a central orchestrator distributing work across a fleet of disposable workers.

Proposed solution

A simple, brutalist design — bidirectional libswamp over a worker-initiated connection — that replaces execution drivers entirely. The design has been validated against the current code; the summary below reflects the revised design.

Core shape

* A single **orchestrator** owns the durable world: DAG/run state, datastore,
  vaults, catalog, definitions, extension bundles, locks, scheduler, tokens,
  audit.
* **Workers** are disposable swamp processes that carry no repository, datastore,
  vault, or extension state — just a binary, a token, and a URL. They **dial
  home** (outbound, NAT-friendly), enroll, and run whatever is dispatched.
* Every side-effecting capability a running method touches is **proxied back
  to the orchestrator**, the single durable authority. This drops onto the
  existing `MethodContext` / libswamp `*Deps` injection seam: remote execution
  swaps the *leaves* of the dependency tree for proxy adapters. Method and
  operation code are unchanged.

The capability protocol is a closed set of fourteen verbs, generated by walking the actual MethodContext, DataWriter, and injected service ports (not estimated): getData, queryData, listVersions, persistResource, persistFile, appendData, deleteData, resolveSecret, putSecret, readDefinition, readOutput, resolveModel, getExtensionFile, log/event. Every other context member has an explicit disposition: repoDir becomes a per-dispatch scratch dir; subprocess/network/SDK compute stays worker-local; CEL environments construct locally over proxied leaves; vault/datastore provider code never ships (it runs orchestrator-side behind the verbs); followUpActions ride back serialized for the orchestrator to perform. Two write modes get remote shapes: writeLine maps to a per-request-durable appendData (live-log contract preserved), and getFilePath becomes a worker-local spool file uploaded on finalize().

Environment shipping. The orchestrator snapshots its environment and ships it with every dispatch — worker-memory only, applied to the method and its subprocesses, dropped when the step ends. This is the ambient-credential path (cloud SDKs, shell-model CLIs). The snapshot overlays the worker's base env and never ships a fixed denylist of process-identity/host-runtime variables (HOME, USER, PATH, TMPDIR, DENO_*, SWAMP_*, etc.) — those describe where the process runs, which is exactly what remote execution changes. The denylist is pinned in code and versioned with the protocolVersion.

Drivers removed. No ExecutionDriver, no raw/docker/custom selection, no driver type registry. The driver:/driverConfig: fields live in the workflow/job/step/definition schemas, .swamp.yaml defaults, --driver CLI flags, and run events (not on ExecutionRequest, which never carried them) — removal is a user-facing schema deprecation across those surfaces. Isolation/environment become a deployment property of the worker.

Two transports (both worker-initiated):

* **Control plane — WebSocket**: enrollment, dispatch, cancel, run events, and
  the metadata capability verbs. A symmetric two-handler-registry protocol built
  by splitting `src/serve/`'s server-role from its executor-role; the existing
  client protocol (`@swamp-club/swamp-lib`) is unchanged alongside it.
* **Data plane — HTTP/2**: byte-heavy ops only — `GET /data/{dataId}/{version}`,
  `PUT /data/...`, append requests, `GET /bundle/{fingerprint}`, and lazy
  co-located asset fetch via `GET /bundle/{fingerprint}/file/{relPath}`.
  HTTP/2 gives multiplexing + flow control natively. (Single-connection
  ws-over-h2 per RFC 8441 is the ideal but Deno doesn't support it yet.)

Checks and reports run on the executor, over the same proxied context; report-provider bundles ship by fingerprint exactly like model bundles.

Enrollment & tokens. A named, time-boxed enrollment token enrolls exactly one worker once (bound to a per-instance UUID), then reconnects that same instance for its lifetime. Enrollment issues a short-lived bearer session credential (sliding-window refresh) for the data plane. The datastore has no compare-and-swap, so token/lease transition atomicity comes from the single orchestrator process serializing those transitions in memory (it is the sole writer of these models); a conditional save is the future scale-out primitive. swamp worker token create/list/revoke + swamp worker connect.

Worker state is swamp data. The pool, token lifecycle, and step leases are persisted by first-class built-in models (worker, enrollment-token, step-lease) through the normal datastore/catalog — provisioning/autoscaling are just workflows that data.query the pool (content fields via attributes.*), and lifecycle history is queryable. The built-in models declare retention (garbageCollection/lifetime) and the orchestrator runs data GC periodically so control-plane churn stays bounded.

Scheduling. Steps declare requirements with new target: / labels: / platform: YAML fields; matching is direct target → labels → platform, least-loaded tiebreak, queue when busy. forEach is the fan-out construct — forEach over a list plus a label selector fans out across the fleet, with step-level concurrency capping in-flight dispatches. v1 dispatch slots into the existing level-parallel execution loop, so fan-out breadth is bounded by the current topological level; a cross-level ready-step queue is future work.

Host launching is a swamp workflow. A built-in mint model writes the token to a vault (via the existing vaultService.put) and returns a reference; user-authored launch models read it via ${{ vault.get(...) }} and boot a worker. Bootstrapped on the loopback executor, so no chicken-and-egg.

Data semantics (verified against current code). Writes are immediately durable (no staging); a PUT completes only once repo.save() persists. Write-scoping is the existing declared-spec enforcement — spec-name only; schema validation is warn-only in the writer today. Data is versioned-immutable, so workers cache artifact bytes by (dataId, version) and latest/queries stay live.

Failure semantics. Control socket = liveness. A drop opens a reconnection grace window so reconnect and re-dispatch never double-execute. Reads resume; an in-flight write is the ambiguity case → fail the run. No transparent retry of write-bearing steps in v1 (needs write idempotency first).

Scope / non-goals

In scope: dial-home enrollment, drivers removed, extension code shipped (model

  • report bundles, lazy assets), the full 14-verb remote MethodContext incl. spool/append write modes, per-dispatch environment shipping with the identity denylist, checks/reports on the executor, ws control + h2 data plane, worker state as swamp data with declared retention, label + direct scheduling with forEach fan-out, reconnection grace window, host-launch-as-workflow.

Non-goals (v1): remote datastore config for workers (one proxied plane only); data-locality scheduler affinity; shipping cloud/k8s launch integrations; cross-level ready-step queue; per-token/label environment scoping.

Full design

The complete, converged design — ubiquitous language, protocol, capability inventory, reuse-vs-new map, known limits — lives in the repo at design/remote-execution.md. This issue tracks building it.

02Bog Flow
OPENTRIAGEDIN PROGRESSSHIPPED+ 1 MOREASSIGNED+ 5 MOREREVIEW+ 3 MOREPR_MERGED+ 1 MORENOTIFICATION_SKIPPED

Shipped

6/11/2026, 8:14:53 PM

Click a lifecycle step above to view its details.

03Sludge Pulse
adam assigned adam6/9/2026, 9:30:02 PM

Sign in to post a ripple.