Skip to main content

USE EPHEMERAL DATA IN WORKFLOWS

Prerequisites: Swamp installed, a repository with at least one model definition. Familiarity with workflows and data queries.

Override output lifetime to ephemeral

Add a dataOutputOverrides block to the workflow step that produces intermediate data. Set lifetime: ephemeral on each output spec you want to keep in memory only:

steps:
  - name: list-instances
    task:
      type: model_method
      modelIdOrName: instance-lister
      methodName: execute
    dataOutputOverrides:
      - specName: result
        lifetime: ephemeral
      - specName: log
        lifetime: ephemeral

This overrides the model type's default lifetime (usually infinite) for this workflow step only. The model definition itself is unchanged — other workflows and standalone runs still use the original lifetime.

Read ephemeral data in a downstream step

A downstream step reads ephemeral data the same way it reads any other data — through data.latest() in a CEL expression:

steps:
  - name: list-instances
    task:
      type: model_method
      modelIdOrName: instance-lister
      methodName: execute
    dataOutputOverrides:
      - specName: result
        lifetime: ephemeral
      - specName: log
        lifetime: ephemeral
  - name: report
    task:
      type: model_method
      modelIdOrName: instance-reporter
      methodName: execute
      inputs:
        run: "echo \"Instances: ${{ data.latest('instance-lister', 'result').attributes.stdout }}\""
    dependsOn:
      - step: list-instances
        condition:
          type: succeeded

The data.latest() expression in task.inputs is a deferred step-output dependency. The engine leaves it unevaluated during workflow evaluation and resolves it at step execution time, after the upstream step has written its ephemeral output.

Run the workflow

$ swamp workflow run instance-scan
info    workflow·run·instance-scan Starting workflow
info    workflow·run·instance-scan·main Job started
info    workflow·run·instance-scan·main·list-instances Step started
...
info    workflow·run·instance-scan·main·list-instances Step completed
info    workflow·run·instance-scan·main·report Step started
...
info    workflow·run·instance-scan·main·report Step completed
info    workflow·run·instance-scan·main Job completed
info    workflow·run·instance-scan Workflow "succeeded"

Both steps succeed. The report step resolved data.latest('instance-lister', 'result') from the in-memory ephemeral store.

Verify ephemeral data is gone

After the workflow completes, query the producing model's data:

$ swamp data list instance-lister
Data for instance-lister (command/shell)

report (2 items):
  report-swamp-method-summary  v1  text/markdown  672B  ...
  report-swamp-method-summary-json  v1  application/json  2.8KB  ...

The result and log entries are absent — they lived only in memory during the workflow run. Reports were not overridden and persisted normally.

Override selectively

You do not need to override every output spec. Override only the outputs that are intermediate. In a three-step pipeline where step 1 produces raw data, step 2 transforms it, and step 3 writes a final report, override steps 1 and 2 to ephemeral and leave step 3 at its default lifetime:

steps:
  - name: fetch
    task:
      type: model_method
      modelIdOrName: fetcher
      methodName: execute
    dataOutputOverrides:
      - specName: result
        lifetime: ephemeral
  - name: transform
    task:
      type: model_method
      modelIdOrName: transformer
      methodName: execute
      inputs:
        raw: "${{ data.latest('fetcher', 'result').attributes.stdout }}"
    dependsOn:
      - step: fetch
        condition:
          type: succeeded
    dataOutputOverrides:
      - specName: result
        lifetime: ephemeral
  - name: report
    task:
      type: model_method
      modelIdOrName: reporter
      methodName: execute
      inputs:
        transformed: "${{ data.latest('transformer', 'result').attributes.stdout }}"
    dependsOn:
      - step: transform
        condition:
          type: succeeded

After this workflow runs, only the reporter's output is on disk.

Tune the memory budget

Ephemeral data is capped at 512 MB by default. If your workflow processes large intermediate datasets, raise the budget with the SWAMP_EPHEMERAL_BUDGET environment variable:

SWAMP_EPHEMERAL_BUDGET=1g swamp workflow run instance-scan

Accepted formats: 256m, 1g, 2048m. If a step's write would exceed the budget, it fails with EphemeralBudgetExceededError.

Use with --last-evaluated

When re-running the same workflow rapidly (e.g., polling infrastructure state on a schedule), --last-evaluated skips the full evaluation pass:

swamp workflow run instance-scan --last-evaluated

Deferred data expressions — data.latest(), data.query(), data.findByTag() — inside step task.inputs are still resolved at step execution time. The ephemeral data produced by upstream steps is available to downstream steps exactly as in a full evaluation run. Refer to the workflows reference for details.

Further reading

  • Data lifetimes — when to choose ephemeral vs workflow vs duration vs infinite
  • Data reference — lifetime policies and expiration rules
  • The Data Layer — why Swamp uses versioned, immutable data as the composition mechanism