Workflow-level workspace for docker driver: stateful multi-step workflows
Opened by stack72 · 4/11/2026
Problem
The docker driver currently runs each step in an isolated container. This is great for provisioning-style workflows where steps are independent, but it makes CI-style workflows painful — workflows that have shared filesystem state (checkouts, installed dependencies, build artifacts) across multiple steps.
Concrete example: we're building a multi-model-eval workflow that mirrors a GitHub Actions workflow for evaluating skill triggers across multiple LLMs. The workflow structure is:
checkout— clone the swamp reposetup-npm— runnpm installinevals/promptfoo/run-evals— run 4 parallel evals (forEach over models) against the checkoutcleanup— remove the checkout
All four eval steps in step 3 need to read the same checkout and share the same node_modules tree. In raw mode this "just works" because everything runs on the host filesystem. In docker mode, each step is a fresh container with no shared state, so we have to:
- Explicitly configure a volume mount in
driverConfig.volumes - Use identical host and container paths (e.g.,
/tmp/swamp-eval-workspace:/tmp/swamp-eval-workspace) so the same path string is valid in both raw and docker modes - Store the clone path as a data artifact so downstream steps can look it up via
data.latest('swamp-repo', 'repository').attributes.path - Add a dedicated
setup-npmjob that runs once before the parallel evals to populate the shared volume withnode_modules(otherwise 4 parallelnpm installruns race against each other) - Add a cleanup job to remove the checkout when done
- Worry about host/container path parity, volume lifecycle, and npm cache persistence
Every CI-shaped workflow that uses the docker driver will have to reinvent this pattern. It's a lot of ceremony for something that should be "give me a shared working directory."
What we'd like
A first-class "workspace" concept for workflows. One of the following, in order of preference:
Option A: Workflow-level workspace primitive
Each workflow run gets an automatically-provisioned working directory that's mounted into every step's container at a stable path. Lifecycle is tied to the workflow run — created on start, cleaned up on end (unless --preserve-workspace is passed for debugging). Referenced in CEL as workspace.path.
workspace:
enabled: true
persistent: false # optional: survive between runs for caching
jobs:
- name: checkout
steps:
- name: clone
task:
type: model_method
modelIdOrName: swamp-repo
methodName: clone
inputs:
# workDir defaults to workspace.pathThis eliminates:
- Manual
driverConfig.volumesconfig - Host/container path parity hacks
- Manual cleanup jobs
- Storing workdir paths as data artifacts purely for path propagation
Option B: Session-mode docker driver
Instead of one container per step, one container per workflow run. Steps execute as sub-operations inside the same long-lived container. State naturally persists without volume mounts. Parallel steps run as concurrent operations within the container.
driver: docker
driverConfig:
mode: session # vs. "per-step" (current default)
image: ghcr.io/systeminit/swamp-eval-runner:latestThis matches how GitHub Actions / GitLab CI actually work (one runner hosts the entire job) and matches what users intuitively expect from "CI in a container." Per-step mode stays available for the current use cases.
Option C: Step output files as implicit inputs
A lighter-weight version: a step can declare "this file or directory is my output," and swamp makes it available at the same path in downstream steps' containers. Similar to GHA's upload-artifact/download-artifact but automatic based on step dependencies.
steps:
- name: install-deps
task: ...
outputs:
- path: node_modules
makeAvailableTo: [run-evals]Why this matters
- CI is a first-class use case. swamp is pitching itself as a general automation framework. Multi-step CI workflows with shared state are one of the most common automation patterns.
- The current workarounds leak driver-specific concerns into workflow YAML. Users have to know that docker steps are isolated containers and plan around it. A workspace primitive abstracts this away.
- The workarounds don't compose. If another workflow wants a similar pattern, it has to re-solve volume mounts, path parity, setup steps, and cleanup from scratch. That's a sign we're missing an abstraction.
- It unblocks parallelism. Right now we can run 4 parallel eval steps, but only after carefully engineering around shared state. A workspace primitive or session mode makes parallelism the default, not a puzzle.
Concrete reference
The full multi-model-eval workflow and extension code are in this repo at:
workflows/workflow-8a88a569-4620-431c-9028-643df0118c72.yamlextensions/models/ci_git.tsextensions/models/ci_promptfoo_eval.tsextensions/reports/ci_eval_analysis.tsextensions/reports/ci_eval_result.ts
It's a complete, working example of the workarounds described above, in case it's useful to look at when designing the primitive.
Triaged
Click a lifecycle step above to view its details.