#186 Workflow tasks should support ephemeral model instances (modelType + globalArgs) so workflows are zero-prereq
Opened by swamp_lord · 4/29/2026
Problem
Workflow steps today must reference a pre-existing model instance by modelIdOrName. This forces the workflow author (and ultimately the operator) to manually create every model instance the workflow uses before the first run, with the right --global-arg values.
For an extension that ships a useful "diagnose this whole namespace" workflow (@john/debug-namespace-deep is the motivating case), this means the user can't just run:
swamp extension pull @john/k8s
swamp workflow run @john/debug-namespace-deep --input namespace=my-broken-nsThey first have to do 8 manual swamp model create calls (one per resource type — pod, service, deployment, event, configmap, pvc, secret, netpol), each with the right --global-arg namespace=my-broken-ns, before the workflow will execute. That's far higher friction than the equivalent imperative path (kubectl get …), and it's the single biggest reason an LLM agent given a debugging task chooses raw kubectl over invoking an existing swamp workflow.
Empirically: in our k8s-debug benchmark (https://github.com/systeminit/swamp-benchmark), agents find the @john/debug-namespace-deep workflow but consistently choose to reimplement its diagnosis with model_method primitives because of the prerequisite cost.
Proposed solution
Extend the existing model_method task type to accept a model type + globalArgs as an alternative to a pre-existing instance name. When modelType is given, swamp creates an ephemeral instance for the call (or caches one per workflow run), invokes the method, and tears down (or GCs) when the run completes.
# Today — requires pre-existing `foo-pod` instance
- name: list-pods
task:
type: model_method
modelIdOrName: foo-pod
methodName: list
# Proposed — workflow author specifies the type + globalArgs;
# swamp instantiates ephemerally for this call.
- name: list-pods
task:
type: model_method
modelType: "@john/pod" # NEW — alternative to modelIdOrName
globalArgs: # NEW — only used when modelType is given
namespace: ${{ inputs.namespace }}
methodName: listmodelIdOrName and modelType are mutually exclusive on a single task. When modelType is used:
- A transient instance is created if one with matching (type, globalArgs) doesn't already exist for this workflow run.
- It's reused across steps within the same run that match the same (type, globalArgs).
- It's torn down at workflow-run completion (or marked for GC under the existing data lifecycle rules).
Data artefacts produced by the ephemeral instance are tagged with the workflow run id (already done) so output lookups via data.findBySpec(...) continue to work — they reference the produced dataNames rather than the (transient) instance name.
Alternatives considered
- Workflow
bootstrapjob usingcommand/shell— works, but requires a one-timeswamp model create command/shell shelland pollutes the workflow with shell out / shell quoting / error swallowing. Tried this; it works but feels like a workaround. - Extension manifests declare default instances — the extension ships with a manifest entry like "create a default
podinstance of type@john/podon extension pull." Cleaner than bootstrap-via-shell, but doesn't solve namespace parameterisation (the global arg is fixed at create time, so one instance can only serve one namespace). - Make models accept namespace as a method input rather than a global arg — done in the @john/k8s extension as a backwards-compatible refinement. Helps direct CLI usage and pairs nicely with this proposal (the ephemeral instance can serve any namespace), but on its own still requires an instance to exist before the workflow can run. So this is complementary, not a substitute.
Impact
- Workflow authors: ship genuinely zero-prereq workflows. "Run this command, get the diagnosis."
- Operators / agents: from
swamp extension pull Xtoswamp workflow run X/foois one hop, no instance bookkeeping. - LLM agents specifically: removes the largest friction we observe pushing them toward reimplementing diagnosis in shell rather than using the framework.
Empirical context
Benchmarked in https://github.com/systeminit/swamp-benchmark — the k8s-debug-v2 challenge. With the current pre-create requirement, swamp-prompted agents take ~140 turns / ~$1.50 per run for a 4-fault namespace because they spend most turns rebuilding diagnosis from primitives. Agents with the same toolchain but no swamp steer (raw kubectl) finish the same task in ~30 turns / ~$0.30. The throughput delta is largely attributable to the prerequisite-instance step the workflow path forces before any structured diagnosis can run.
Open
No activity in this phase yet.
Sign in to post a ripple.