Skip to main content

WORKING WITH DATA

In this tutorial, we will create two models, wire one's output into the other using a data.latest() expression, run them through a workflow, then query and inspect the versioned data they produce.

What we will build

We are going to set up two command/shell models. The first collects system information. The second reads the first's output and produces a summary. We will connect them in a workflow, run it, then explore what ended up in The Swamp — listing data, reading specific artifacts, querying across models, and looking at version history.

Prerequisites

  • Swamp installed (Hello World covers installation)
  • A terminal open in an empty directory

Initialize the repository

First, we create a fresh Swamp repo:

$ mkdir data-tutorial
$ cd data-tutorial
$ swamp repo init --tool none

You will see the Swamp banner followed by:

info    repo·init Initialized swamp repository at "..." (tools: "none")

Create the first model

We create a model called system-info that runs uname -a:

$ swamp model create command/shell system-info
Created: system-info (command/shell)
Path: .../models/command/shell/....yaml

Methods:
  execute - Execute the shell command and capture stdout, stderr, and exit code
    Inputs:
      run (string) *required
      ...

Now we edit the definition to set the command. Open the YAML file shown in the Path line above — or use swamp model edit system-info — and set the run global argument:

globalArguments:
  run: "uname -a"

Run the first model

$ swamp model method run system-info execute

You will see a progress tree, then output like:

info    model·method·run·system-info·execute Executing method "execute"
info    model·method·run·system-info·execute Darwin Mac.localdomain 25.3.0 ...
info    model·method·run·system-info·execute Method "execute" completed on "system-info"

The model ran uname -a and stored the result. Let us look at what it produced:

$ swamp data list system-info
Data for system-info (command/shell)

file (1 item):
  log  v1  text/plain  146B  ...

resource (1 item):
  result  v1  application/json  251B  ...

report (2 items):
  report-swamp-method-summary  v1  text/markdown  471B  ...
  report-swamp-method-summary-json  v1  application/json  2.6KB  ...

Notice three kinds of data: a file (the raw log output), a resource (the structured JSON result), and report entries (summaries generated automatically). We will work with the result resource.

Read the data

$ swamp data get system-info result
Data: result (v1)
Model: system-info (command/shell)
Content: application/json, 251B
Lifetime: infinite | GC: 10
Tags: type=resource, specName=result, modelName=system-info
Owner: model-method (...)
Created: ...
Path: .swamp/data/command/shell/.../result/1/raw

{
  "exitCode": 0,
  "executedAt": "...",
  "command": "uname -a",
  "durationMs": 7,
  "stdout": "Darwin Mac.localdomain 25.3.0 ...",
  "stderr": ""
}

Notice the metadata header above the JSON content — it shows the version (v1), content type, lifetime, tags, and where the file lives on disk. The JSON underneath is the actual data the model produced.

Create a model that reads another model's data

Now we create a second model that reads from system-info. This is where data.latest() comes in:

$ swamp model create command/shell summariser

Open the new definition file (swamp model edit summariser) and set the run global argument to a CEL expression that reads the first model's output:

globalArguments:
  run: "echo \"System: ${{ data.latest('system-info', 'result').attributes.stdout }}\""

The ${{ }} marks a CEL expression. data.latest('system-info', 'result') fetches the latest version of system-info's result data. .attributes.stdout reads the stdout field from it.

Now run it:

$ swamp model method run summariser execute
info    model·method·run·summariser·execute Evaluating expressions
info    model·method·run·summariser·execute Executing method "execute"
info    model·method·run·summariser·execute System: Darwin Mac.localdomain 25.3.0 ...
info    model·method·run·summariser·execute Method "execute" completed on "summariser"

Notice the "Evaluating expressions" line — Swamp resolved the data.latest() expression before running the command. The summariser read system-info's output from The Swamp without either model knowing about the other.

Connect them in a workflow

We have been running models one at a time. Now we wire them into a workflow so they run together:

$ swamp workflow create info-pipeline
Created: info-pipeline
Path: .../workflows/workflow-....yaml

Open the workflow file (swamp workflow edit info-pipeline) and replace its contents with:

id: <keep the id from the generated file>
name: info-pipeline
description: Gather system info and summarise it
jobs:
  - name: main
    description: Run both models in sequence
    steps:
      - name: gather
        description: Collect system information
        task:
          type: model_method
          modelIdOrName: system-info
          methodName: execute
      - name: summarise
        description: Summarise the gathered data
        depends_on:
          - gather
        task:
          type: model_method
          modelIdOrName: summariser
          methodName: execute

Notice depends_on: [gather] on the summarise step — it waits for the gather step to complete before starting.

Validate the workflow:

$ swamp workflow validate info-pipeline
Validating: info-pipeline
  ✓ Schema validation
  ✓ Unique job names
  ✓ Unique step names in job 'main'
  ✓ Valid job dependency references
  ✓ Valid step dependency references in job 'main'
  ✓ No cyclic job dependencies
  ✓ No cyclic step dependencies in job 'main'
  ✓ Step inputs for 'gather' in job 'main' (system-info.execute)
  ✓ Step inputs for 'summarise' in job 'main' (summariser.execute)
  ...
Summary: 11 passed
Result: PASSED

Now run it:

$ swamp workflow run info-pipeline
info    workflow·run·info-pipeline Starting workflow
info    workflow·run·info-pipeline·main Job started
info    workflow·run·info-pipeline·main·gather Step started
info    workflow·run·info-pipeline·main·gather Step completed
info    workflow·run·info-pipeline·main·summarise Step started
info    workflow·run·info-pipeline·main·summarise Step completed
info    workflow·run·info-pipeline·main Job completed
info    workflow·run·info-pipeline Workflow "succeeded"

Both models ran in sequence. We can see the workflow-scoped data:

$ swamp data list --workflow info-pipeline
Data for workflow info-pipeline (run ...)

file (2 items):
  log  v3  system-info  main.gather  146B
  log  v2  summariser  main.summarise  154B

resource (2 items):
  result  v3  system-info  main.gather  251B
  result  v2  summariser  main.summarise  405B

report (6 items):
  ...

Notice the version numbers — result v3 for system-info because we have run it three times now (twice manually, once through the workflow).

Query across models

So far we have looked at data one model at a time. swamp data query searches across everything in The Swamp using CEL predicates:

$ swamp data query 'tags.type == "resource"'
┌────────┬─────────────┬──────────┬──────────┬─────────┬──────┐
│ name   │ modelName   │ specName │ dataType │ version │ size │
├────────┼─────────────┼──────────┼──────────┼─────────┼──────┤
│ result │ system-info │ result   │ resource │ ...     │ 251B │
├────────┼─────────────┼──────────┼──────────┼─────────┼──────┤
│ result │ summariser  │ result   │ resource │ ...     │ 405B │
└────────┴─────────────┴──────────┴──────────┴─────────┴──────┘

2 results

The predicate tags.type == "resource" matched every resource across both models. We could also query by model name, content type, or any tag.

View version history

Every time a model runs, it creates a new version of its data. Let us see system-info's version history:

$ swamp data versions system-info result

You will see output like:

{
  "dataName": "result",
  "modelName": "system-info",
  "versions": [
    {
      "version": 3,
      "createdAt": "...",
      "size": 251,
      "isLatest": true
    },
    {
      "version": 2,
      "createdAt": "...",
      "size": 251,
      "isLatest": false
    },
    {
      "version": 1,
      "createdAt": "...",
      "size": 251,
      "isLatest": false
    }
  ],
  "total": 3
}

Three versions — one for each time we ran system-info. data.latest() always reads the most recent version. Older versions stay available until garbage collection removes them.

What we built

We created two models, connected them with a data.latest() expression so one reads the other's output, ran them through a workflow, and explored the data they produced — listing artifacts, reading content, querying across models with CEL predicates, and inspecting version history.