USE EPHEMERAL DATA IN WORKFLOWS
Prerequisites: Swamp installed, a repository with at least one model definition. Familiarity with workflows and data queries.
Override output lifetime to ephemeral
Add a dataOutputOverrides block to the workflow step that produces
intermediate data. Set lifetime: ephemeral on each output spec you want to
keep in memory only:
steps:
- name: list-instances
task:
type: model_method
modelIdOrName: instance-lister
methodName: execute
dataOutputOverrides:
- specName: result
lifetime: ephemeral
- specName: log
lifetime: ephemeralThis overrides the model type's default lifetime (usually infinite) for this
workflow step only. The model definition itself is unchanged — other workflows
and standalone runs still use the original lifetime.
Read ephemeral data in a downstream step
A downstream step reads ephemeral data the same way it reads any other data —
through data.latest() in a CEL expression:
steps:
- name: list-instances
task:
type: model_method
modelIdOrName: instance-lister
methodName: execute
dataOutputOverrides:
- specName: result
lifetime: ephemeral
- specName: log
lifetime: ephemeral
- name: report
task:
type: model_method
modelIdOrName: instance-reporter
methodName: execute
inputs:
run: "echo \"Instances: ${{ data.latest('instance-lister', 'result').attributes.stdout }}\""
dependsOn:
- step: list-instances
condition:
type: succeededThe data.latest() expression in task.inputs is a
deferred step-output dependency.
The engine leaves it unevaluated during workflow evaluation and resolves it at
step execution time, after the upstream step has written its ephemeral output.
Run the workflow
$ swamp workflow run instance-scaninfo workflow·run·instance-scan Starting workflow
info workflow·run·instance-scan·main Job started
info workflow·run·instance-scan·main·list-instances Step started
...
info workflow·run·instance-scan·main·list-instances Step completed
info workflow·run·instance-scan·main·report Step started
...
info workflow·run·instance-scan·main·report Step completed
info workflow·run·instance-scan·main Job completed
info workflow·run·instance-scan Workflow "succeeded"Both steps succeed. The report step resolved
data.latest('instance-lister', 'result') from the in-memory ephemeral store.
Verify ephemeral data is gone
After the workflow completes, query the producing model's data:
$ swamp data list instance-listerData for instance-lister (command/shell)
report (2 items):
report-swamp-method-summary v1 text/markdown 672B ...
report-swamp-method-summary-json v1 application/json 2.8KB ...The result and log entries are absent — they lived only in memory during the
workflow run. Reports were not overridden and persisted normally.
Override selectively
You do not need to override every output spec. Override only the outputs that are intermediate. In a three-step pipeline where step 1 produces raw data, step 2 transforms it, and step 3 writes a final report, override steps 1 and 2 to ephemeral and leave step 3 at its default lifetime:
steps:
- name: fetch
task:
type: model_method
modelIdOrName: fetcher
methodName: execute
dataOutputOverrides:
- specName: result
lifetime: ephemeral
- name: transform
task:
type: model_method
modelIdOrName: transformer
methodName: execute
inputs:
raw: "${{ data.latest('fetcher', 'result').attributes.stdout }}"
dependsOn:
- step: fetch
condition:
type: succeeded
dataOutputOverrides:
- specName: result
lifetime: ephemeral
- name: report
task:
type: model_method
modelIdOrName: reporter
methodName: execute
inputs:
transformed: "${{ data.latest('transformer', 'result').attributes.stdout }}"
dependsOn:
- step: transform
condition:
type: succeededAfter this workflow runs, only the reporter's output is on disk.
Tune the memory budget
Ephemeral data is capped at 512 MB by default. If your workflow processes large
intermediate datasets, raise the budget with the SWAMP_EPHEMERAL_BUDGET
environment variable:
SWAMP_EPHEMERAL_BUDGET=1g swamp workflow run instance-scanAccepted formats: 256m, 1g, 2048m. If a step's write would exceed the
budget, it fails with EphemeralBudgetExceededError.
Use with --last-evaluated
When re-running the same workflow rapidly (e.g., polling infrastructure state on
a schedule), --last-evaluated skips the full evaluation pass:
swamp workflow run instance-scan --last-evaluatedDeferred data expressions — data.latest(), data.query(), data.findByTag()
— inside step task.inputs are still resolved at step execution time. The
ephemeral data produced by upstream steps is available to downstream steps
exactly as in a full evaluation run. Refer to the
workflows reference for details.
Further reading
- Data lifetimes — when to choose ephemeral vs workflow vs duration vs infinite
- Data reference — lifetime policies and expiration rules
- The Data Layer — why Swamp uses versioned, immutable data as the composition mechanism