Data

Data outputs are the versioned artifacts produced when model methods execute. Each method execution can write structured data (resources) and unstructured content (files). Data outputs are stored in .swamp/data/ within the repository, organized by model type, model ID, and data name.

Output Types

Model types declare their output specifications using two categories: resources and files. Each declared spec has a name (the spec name) that identifies it within the model type.

Resource Outputs

Structured JSON data validated against a schema.

Field	Type	Required	Default	Description
`description`	string	No	None	Human-readable description
`schema`	Zod schema	Yes	—	Validates data on write
`lifetime`	`Lifetime`	Yes	—	Retention policy
`garbageCollection`	`GarbageCollectionPolicy`	Yes	—	Version retention policy
`tags`	`Record<string, string>`	No	`{}`	Default tags (auto-includes `type: "resource"`)
`sensitiveOutput`	boolean	No	`false`	Treat all fields as sensitive
`vaultName`	string	No	First available vault	Vault for storing sensitive field values

Resource content is always stored as application/json.

File Outputs

Binary or text content identified by MIME type.

Field	Type	Required	Default	Description
`description`	string	No	None	Human-readable description
`contentType`	string	Yes	—	MIME type (e.g., `text/plain`)
`lifetime`	`Lifetime`	Yes	—	Retention policy
`garbageCollection`	`GarbageCollectionPolicy`	Yes	—	Version retention policy
`streaming`	boolean	No	`false`	Line-oriented append mode
`tags`	`Record<string, string>`	No	`{}`	Default tags (auto-includes `type: "file"`)

Example: `command/shell` Model Type

The built-in command/shell model type declares one resource and one file output:

resources:
  result        [resource]  — Shell command execution result (infinite)
  
files:
  log           [file]      — Shell command output, text/plain (infinite, streaming)

After execution, both are visible via swamp data list:

Data for hello-world (command/shell)

file (1 item):
  log  v2  text/plain  19B  2026-04-07

resource (1 item):
  result  v1  application/json  135B  2026-04-07

report (2 items):
  report-swamp-method-summary  v2  text/markdown  482B  2026-04-07
  report-swamp-method-summary-json  v2  application/json  2.6KB  2026-04-07

Lifetime

Lifetime determines how long data is retained before it becomes eligible for garbage collection.

Value	Description
Duration string	`1h`, `5m`, `10d`, `2w`, `1mo`, `10y`
`ephemeral`	Deleted when the process ends
`infinite`	Never automatically deleted
`job`	Lives until the job completes
`workflow`	Lives until the workflow completes

Duration format: {number}{unit} where unit is h (hours), m (minutes), d (days), w (weeks), mo (months), or y (years).

Zero-duration strings (e.g., 0h, 0d) are normalized to workflow.

Duration Conversion

Unit	Conversion
`m`	value × 60,000 ms
`h`	value × 3,600,000 ms
`d`	value × 86,400,000 ms
`w`	value × 604,800,000 ms
`mo`	value × 2,592,000,000 ms (30 days)
`y`	value × 31,536,000,000 ms (365 days)

Expiration Rules

infinite: Never expires.
ephemeral: Not yet implemented — treated as non-expiring.
job / workflow: Expires when the associated workflow run no longer exists. Requires workflowId and workflowRunId in the owner definition. If either is missing, the data is not expired.
Duration strings: Expires when createdAt + duration is in the past.

Garbage Collection Policy

Garbage collection controls how many versions of a data item are retained.

Value	Description
integer	Keep the N most recent versions
Duration string	Keep versions created within the duration

The integer must be a positive integer. The duration string uses the same format as lifetime durations and must be greater than zero.

# Keep the 10 most recent versions
garbageCollection: 10

# Keep versions from the last 7 days
garbageCollection: 7d

Garbage collection runs as part of swamp data gc and during the lifecycle service. It operates in two phases:

Expired data deletion — removes all versions of data items whose lifetime has elapsed.
Version pruning — for non-expired data, removes old versions that exceed the garbage collection policy.

Versioning

Each data item is versioned with sequential positive integers starting at 1. Every method execution that writes to the same data name produces a new version.

$ swamp data versions hello-world result --json

{
  "dataName": "result",
  "modelName": "hello-world",
  "modelType": "command/shell",
  "versions": [
    {
      "version": 2,
      "createdAt": "2026-04-07T18:03:08.737Z",
      "size": 135,
      "checksum": "d58d16...",
      "isLatest": true
    },
    {
      "version": 1,
      "createdAt": "2026-04-07T18:02:58.146Z",
      "size": 157,
      "checksum": "c631b6...",
      "isLatest": false
    }
  ],
  "total": 2
}

The "latest" Pointer

Each data item has a latest file in its directory containing the current version number as plain text. When data is retrieved without an explicit --version flag, the latest version is returned.

.swamp/data/command/shell/{model-id}/result/
  1/
  2/
  latest          # Contains: "2"

The name latest is reserved — it cannot be used as a data name.

Checksums

Each version includes a SHA-256 checksum computed from the content file at finalization time. The checksum is stored in metadata.yaml and returned by data access commands.

Storage Layout

Data is stored on disk under .swamp/data/ in a hierarchical directory structure:

.swamp/data/
  {model-type-path}/
    {model-id}/
      {data-name}/
        1/
          metadata.yaml
          raw
        2/
          metadata.yaml
          raw
        latest

{model-type-path} — the model type as a directory path (e.g., command/shell).
{model-id} — the UUID of the model definition.
{data-name} — the instance name given when data is written.
{version}/ — a numbered directory for each version.
metadata.yaml — full metadata for the version.
raw — the content (JSON for resources, binary or text for files).
latest — text file containing the current version number.

Metadata File

Each version's metadata.yaml contains the complete data record:

id: e204ea55-3d64-48a0-aa78-32fea656fdac
name: result
version: 1
contentType: application/json
lifetime: infinite
garbageCollection: 10
streaming: false
tags:
  type: resource
  specName: result
  modelName: hello-world
ownerDefinition:
  ownerType: model-method
  ownerRef: 7347cf2c-cc9e-4203-8897-e10845af9732
createdAt: "2026-04-07T18:02:58.146Z"
size: 157
checksum: c631b676cd069af1decf4f20c27568f44bcccf062846bb32bbeae187573c2fe6

Tag Key	Auto-Injected	Description
`type`	Yes	`resource`, `file`, or `report`
`specName`	Yes	Output spec name from the model type
`modelName`	Yes	Definition name for orphan data recovery

Streaming

When streaming: true is set on a file output spec, the data writer operates in line-oriented append mode. Lines are written incrementally to disk as they are produced, rather than buffered in memory.

The command/shell model type declares streaming: true on its log file output to capture stdout and stderr as the command executes.

Streaming file writers support three write patterns:

writeLine(line) — appends a single line with a newline character.
writeStream(stream) — pipes a ReadableStream to disk, invoking optional line callbacks as newlines are encountered.
getFilePath() — returns the allocated content path for direct file writes.

Non-streaming outputs use writeAll(content) or writeText(text) to write the complete content at once.

Sensitive Output

Resource output specs can mark fields as sensitive. Sensitive values are stored in a vault and replaced with vault.get() reference expressions before the data is persisted to disk.

Field-Level Sensitivity

Individual fields are marked sensitive through Zod schema metadata:

schema: z.object({
  apiKey: z.string().meta({ sensitive: true }),
  publicId: z.string(),
});

Only apiKey is stored in the vault. publicId is persisted as-is.

Whole-Output Sensitivity

When sensitiveOutput: true is set on the resource spec, all top-level fields are treated as sensitive.

Vault Resolution Order

The vault used for storing sensitive fields is resolved in this order:

Field-level vaultName from schema metadata
Spec-level vaultName from the resource output specification
First available vault from the vault service

If sensitive fields exist but no vault is configured, an error is thrown.

Vault Key Format

Auto-generated vault keys follow this pattern:

{sanitized-model-type}-{model-id}-{method-name}-{field-path}

Sanitization: @ and null bytes are removed, / and \ are replaced with -, .. is replaced with ..

Persisted Format

After processing, persisted resource data contains vault references:

apiKey: "${{ vault.get('my-secrets', 'command-shell-abc-execute-apiKey') }}"
publicId: "pk_12345"

On read, vault references are automatically resolved back to their original values. Resolved secrets are registered with the secret redactor to prevent log leakage.

Non-string sensitive values are JSON-stringified before vault storage.

Lifecycle States

Each data entry has a lifecycle state.

State	Description
`active`	Normal, live data (default)
`deleted`	Tombstone marker — the data has been deleted or renamed

Deletion Markers

A deletion marker is a version with lifecycle: "deleted", application/json content type, and streaming: false. It signals that the data was intentionally removed.

Rename Markers

A rename marker is a deletion marker with an additional renamedTo field pointing to the new data name. When the latest version of a data item is a rename marker, lookups without an explicit version follow the forward reference to the new name (up to 5 levels deep).

$ swamp data rename hello-world result execution-result --json

{
  "oldName": "result",
  "newName": "execution-result",
  "modelId": "7347cf2c-cc9e-4203-8897-e10845af9732",
  "modelName": "hello-world",
  "modelType": "command/shell",
  "copiedVersion": 2,
  "newVersion": 1,
  "warning": "Any workflows or models that produce data under \"result\" will overwrite the forward reference. Update them to use \"execution-result\" instead."
}

The rename process:

Copies the latest version of the old data name to version 1 under the new name.
Writes a tombstone with a forward reference on the old name.
Updates the latest marker on the old name to point to the tombstone.

Owner Definition

Every data entry tracks its owner — who created it.

Field	Type	Required	Description
`ownerType`	string	Yes	`model-method`, `workflow-step`, or `manual`
`ownerRef`	string	Yes	Identifier of the owner (model ID, step ref)
`workflowId`	string	No	Workflow UUID (for `job`/`workflow` lifetimes)
`workflowRunId`	string	No	Workflow run UUID

Ownership is validated on write — new versions of an existing data name must have the same ownerType and ownerRef as the original.

Data Output Overrides

Workflow steps can override the default output spec settings for data produced by their tasks. See dataOutputOverrides in the workflows reference.

Field	Type	Description
`specName`	string	Output spec name to override
`lifetime`	`Lifetime`	Override retention policy
`garbageCollection`	`GarbageCollectionPolicy`	Override version retention
`tags`	`Record<string, string>`	Additional tags merged with output tags
`vary`	`string[]`	Input key names to vary by (composite data names)

When vary is set, the resolved values of the named input keys are appended as a suffix to the data instance name. This produces distinct data items per iteration in forEach steps.

CEL Access

Data outputs are accessible in CEL expressions through the data namespace:

Function	Description
`data.latest(modelName, dataName)`	Latest version of a data item
`data.latest(modelName, dataName, varyValues[])`	Latest version with vary suffix
`data.version(modelName, dataName, version)`	Specific version
`data.version(modelName, dataName, varyValues[], version)`	Specific version with vary suffix
`data.listVersions(modelName, dataName)`	All version numbers
`data.listVersions(modelName, dataName, varyValues[])`	All version numbers with vary suffix
`data.findByTag(tagKey, tagValue)`	Search by tag
`data.findBySpec(modelName, specName)`	Find by output spec name
`data.query(predicate, select?)`	CEL predicate query

data.latest() and data.version() return null when the data item does not exist. Use the .? (optional select) operator to chain through null safely instead of throwing, and .orValue() to provide inline defaults:

data.latest("factory", "code-review").?attributes.?findings
data.latest("factory", "code-review").?attributes.?findings.orValue([])
data.version("scanner", "result", 1).?attributes.?status.orValue("unknown")

See Optional Select in the CEL reference for full syntax and usage guidance.

These functions return DataRecord objects with the following fields:

Field	Type	Description
`id`	string	Data UUID
`name`	string	Data instance name
`version`	number	Version number
`createdAt`	string	ISO 8601 timestamp
`attributes`	`Record<string, unknown>`	Parsed JSON content (resources only)
`tags`	`Record<string, string>`	All tags
`modelName`	string	Model definition name
`modelType`	string	Model type path
`specName`	string	Output spec name
`dataType`	string	`resource` or `file`
`contentType`	string	MIME type
`lifetime`	string	Lifetime policy
`ownerType`	string	Owner type
`streaming`	boolean	Whether streaming is enabled
`size`	number	Content size in bytes
`content`	string	Raw content string

CLI Commands

All data commands accept --json to output structured JSON instead of human-readable text.

`swamp data get <model> <data_name>`

Retrieve data by model name and data name. Returns the latest version by default.

Option	Description
`--version`	Retrieve a specific version number
`--workflow`	Get data produced by a workflow
`--run`	Specific workflow run ID
`--no-content`	Show metadata only, without content
`--server`	Run against a remote `swamp serve` instance (env: `SWAMP_SERVE_URL`)
`--token`	Server token in `<name>.<secret>` format; only with `--server` (overrides stored credentials and SWAMP_SERVER_TOKEN)
`--repo-dir`	Repository directory (default `.`)

$ swamp data get hello-world result --json

{
  "id": "e204ea55-3d64-48a0-aa78-32fea656fdac",
  "name": "result",
  "modelName": "hello-world",
  "modelType": "command/shell",
  "version": 1,
  "contentType": "application/json",
  "lifetime": "infinite",
  "garbageCollection": 10,
  "streaming": false,
  "tags": {
    "type": "resource",
    "specName": "result",
    "modelName": "hello-world"
  },
  "ownerDefinition": {
    "ownerType": "model-method",
    "ownerRef": "7347cf2c-cc9e-4203-8897-e10845af9732"
  },
  "createdAt": "2026-04-07T18:02:58.146Z",
  "size": 157,
  "checksum": "c631b676cd...",
  "contentPath": ".swamp/data/command/shell/.../result/1/raw",
  "content": {
    "exitCode": 0,
    "executedAt": "2026-04-07T18:02:58.143Z",
    "command": "echo \"Hello from the swamp!\"",
    "durationMs": 4,
    "stdout": "Hello from the swamp!",
    "stderr": ""
  }
}

`swamp data list [model]`

List all data for a model, grouped by type.

Option	Description
`--type`	Filter by data type (`resource`, `file`, `report`)
`--workflow`	List data produced by a workflow
`--run`	Specific workflow run ID
`--server`	Run against a remote `swamp serve` instance (env: `SWAMP_SERVE_URL`)
`--token`	Server token in `<name>.<secret>` format; only with `--server` (overrides stored credentials and SWAMP_SERVER_TOKEN)
`--repo-dir`	Repository directory (default `.`)

$ swamp data list hello-world

Data for hello-world (command/shell)

file (1 item):
  log  v2  text/plain  19B  2026-04-07

resource (1 item):
  result  v1  application/json  135B  2026-04-07

report (2 items):
  report-swamp-method-summary  v2  text/markdown  482B  2026-04-07
  report-swamp-method-summary-json  v2  application/json  2.6KB  2026-04-07

`swamp data versions <model> <data_name>`

Show all versions of a data item.

Option	Description
`--server`	Run against a remote `swamp serve` instance (env: `SWAMP_SERVE_URL`)
`--token`	Server token in `<name>.<secret>` format; only with `--server` (overrides stored credentials and SWAMP_SERVER_TOKEN)
`--repo-dir`	Repository directory (default `.`)

$ swamp data versions hello-world result --json

{
  "dataName": "result",
  "modelName": "hello-world",
  "modelType": "command/shell",
  "versions": [
    {
      "version": 2,
      "createdAt": "2026-04-07T18:03:08.737Z",
      "size": 135,
      "checksum": "d58d1607...",
      "isLatest": true
    },
    {
      "version": 1,
      "createdAt": "2026-04-07T18:02:58.146Z",
      "size": 157,
      "checksum": "c631b676...",
      "isLatest": false
    }
  ],
  "total": 2
}

`swamp data search [query]`

Search across all data in the repository. Opens an interactive picker in a terminal, or returns JSON with --json.

Option	Description
`--type`	Filter by data type tag (`resource`, `file`, `report`)
`--lifetime`	Filter by lifetime (`ephemeral`, `infinite`, `job`, `workflow`, or duration)
`--owner-type`	Filter by owner type (`model-method`, `workflow-step`, `manual`)
`--workflow`	Filter to data tagged with this workflow name
`--model`	Filter to data owned by this model name
`--content-type`	Filter by MIME content type
`--since`	Only data created within duration (`1h`, `1d`, `7d`, `1w`, `1mo`)
`--output`	Filter by output ID
`--run`	Filter by workflow run ID
`--tag`	Filter by tag (`KEY=VALUE`, repeatable)
`--streaming`	Only show streaming data
`--limit`	Maximum results (default 50)
`--server`	Run against a remote `swamp serve` instance (env: `SWAMP_SERVE_URL`)
`--token`	Server token in `<name>.<secret>` format; only with `--server` (overrides stored credentials and SWAMP_SERVER_TOKEN)
`--repo-dir`	Repository directory (default `.`)

$ swamp data search --type resource --json

{
  "query": "",
  "filters": {
    "type": "resource"
  },
  "results": [
    {
      "id": "5e7d72ab-7e0d-492e-ab3d-61463d9d4a85",
      "name": "execution-result",
      "version": 1,
      "contentType": "application/json",
      "type": "resource",
      "lifetime": "infinite",
      "ownerType": "model-method",
      "modelName": "hello-world",
      "modelType": "command/shell",
      "streaming": false,
      "size": 135,
      "createdAt": "2026-04-07T18:03:27.361Z",
      "tags": {
        "type": "resource",
        "specName": "result",
        "modelName": "hello-world"
      }
    }
  ],
  "total": 1,
  "limited": false
}

`swamp data query [predicate]`

Query data using a CEL predicate. The predicate evaluates against DataRecord fields directly (not prefixed with data.).

Option	Description
`--select`	CEL expression to project fields (e.g., `name`)
`--limit`	Maximum results (default 100)
`--server`	Run against a remote `swamp serve` instance (env: `SWAMP_SERVE_URL`)
`--token`	Server token in `<name>.<secret>` format; only with `--server` (overrides stored credentials and SWAMP_SERVER_TOKEN)
`--repo-dir`	Repository directory (default `.`)

Available fields in the predicate: attributes, content, contentType, createdAt, dataType, id, lifetime, modelName, modelType, name, ownerType, size, specName, streaming, tags, version.

$ swamp data query 'tags.type == "resource"' --json

{
  "predicate": "tags.type == \"resource\"",
  "results": [
    {
      "id": "85c471af-a4c8-4f03-a5df-768351388d09",
      "name": "result",
      "version": 2,
      "tags": {
        "type": "resource",
        "specName": "result",
        "modelName": "hello-world"
      },
      "modelName": "hello-world",
      "modelType": "command/shell",
      "dataType": "resource",
      "contentType": "application/json",
      "lifetime": "infinite",
      "streaming": false,
      "size": 135
    }
  ],
  "total": 1,
  "limited": false
}

With --select to project a single field:

$ swamp data query 'tags.type == "resource"' --select 'name' --json

{
  "results": ["result", "execution-result"],
  "total": 2,
  "limited": false
}

`swamp data rename <model> <old_name> <new_name>`

Rename a data item. Creates a copy under the new name and writes a tombstone with a forward reference on the old name.

Option	Description
`--server`	Run against a remote `swamp serve` instance (env: `SWAMP_SERVE_URL`)
`--token`	Server token in `<name>.<secret>` format; only with `--server` (overrides stored credentials and SWAMP_SERVER_TOKEN)
`--repo-dir`	Repository directory (default `.`)

$ swamp data rename hello-world result execution-result --json

{
  "oldName": "result",
  "newName": "execution-result",
  "modelId": "7347cf2c-cc9e-4203-8897-e10845af9732",
  "modelName": "hello-world",
  "modelType": "command/shell",
  "copiedVersion": 2,
  "newVersion": 1,
  "warning": "Any workflows or models that produce data under \"result\" will overwrite the forward reference. Update them to use \"execution-result\" instead."
}

Lookups for the old name without an explicit version follow the forward reference to the new name (up to 5 levels deep).

`swamp data delete <model> <data_name>`

Permanently delete a data artifact, or a single version of one. Without --version, every version of the artifact is removed.

Option	Description
`--version`	Delete only that version. The `latest` pointer follows the highest remaining version
`-f, --force`	Skip the `[y/N]` confirmation prompt
`--server`	Run against a remote `swamp serve` instance (env: `SWAMP_SERVE_URL`). Batch modes (`--prefix`, `--all`) are not available remotely
`--token`	Server token in `<name>.<secret>` format; only with `--server` (overrides stored credentials and SWAMP_SERVER_TOKEN)
`--repo-dir`	Repository directory (default `.`)

By default the command prompts for confirmation before deleting. With --force or --json it runs non-interactively.

$ swamp data delete hello-world result --log
About to delete 1 version(s) of "result" from hello-world. Proceed? [y/N] y
19:00:25.924 INF data·delete Deleted 1 version(s) of "result" for "hello-world" ("command/shell")

$ swamp data delete hello-world result --version 1 --force --json

{
  "modelId": "589d3566-0144-478c-a5fc-dcf069d455af",
  "modelName": "hello-world",
  "modelType": "command/shell",
  "dataName": "result",
  "version": 1,
  "versionsDeleted": 1
}

The version field is omitted when every version is deleted. versionsDeleted is the count of versions actually removed.

Error responses:

Missing model — Model not found: <ref>
Missing artifact — No data named "<name>" exists for model <model>
Missing version — Version <V> does not exist for "<name>" (available versions: <list>)

If old_name was previously renamed to new_name (see Rename Markers), swamp data delete <model> <old_name> removes the tombstone forwarder literally — it does not traverse the forward reference to delete new_name. Lookups under the old name afterwards fail with Data not found; the rename target is unaffected.

`swamp data gc`

Run garbage collection — delete expired data and prune old versions.

Option	Description
`--dry-run`	Show what would be deleted
`-f, --force`	Skip confirmation prompt
`--repo-dir`	Repository directory (default `.`)

Two phases execute in sequence:

Expired data deletion — removes all versions of data whose lifetime has elapsed.
Version pruning — removes old versions exceeding the garbage collection policy for non-expired data.

$ swamp data gc --dry-run --json

{
  "dataEntriesExpired": 0,
  "versionsDeleted": 0,
  "bytesReclaimed": 0,
  "dryRun": true,
  "expiredEntries": []
}

GC can also run automatically after each model method run when autoGc: true is set in .swamp.yaml. Automatic GC runs after the completed event (the user sees their method result first) and errors are caught at warn level, so auto-GC failure never fails the method run. See Repository Configuration for the full autoGc field reference.

Validation Rules

Data names must be non-empty strings.
Data names must not contain .., /, \, or null bytes (path traversal protection).
The name latest is reserved (case-insensitive) and cannot be used as a data name.
Resource data is validated against the spec's Zod schema on write. Schema mismatches produce a warning, not an error.
New writes to an existing data name must have the same owner (ownerType + ownerRef) as the original.
Tags must include a type key.

Querying

swamp data query searches across all data artifacts in a repository using CEL predicates. Each predicate is evaluated against every data record, and matching records are returned. The same query engine powers the data.query() function in CEL expressions.

Command

swamp data query [predicate]

Option	Type	Default	Description
`--select`	string	None	CEL expression to project fields from matching records
`--limit`	number	`100`	Maximum number of results returned
`--server`	string	None	Run against a remote `swamp serve` instance (env: `SWAMP_SERVE_URL`)
`--token`	string	None	Server token in `<name>.<secret>` format; only with `--server` (overrides stored credentials and SWAMP_SERVER_TOKEN)
`--repo-dir`	string	`.`	Repository directory
`--json`	flag	—	Output in JSON format

When no predicate is provided and stdout is a TTY, the command opens an interactive TUI for browsing and filtering data. When no predicate is provided in non-interactive mode (piped output or --json), the command returns an error.

DataRecord Fields

Predicates evaluate against DataRecord fields as top-level variables. No namespace prefix is needed — use modelName, not data.modelName.

Field	Type	Description
`id`	`string`	Record UUID
`name`	`string`	Data item name
`version`	`int`	Version number
`createdAt`	`string`	ISO 8601 timestamp
`attributes`	`map<string, dyn>`	Parsed JSON content (resource data)
`tags`	`map<string, string>`	Metadata tags
`modelName`	`string`	Owning model definition name
`modelType`	`string`	Model type path (e.g., `command/shell`)
`specName`	`string`	Output spec name
`dataType`	`string`	`resource`, `file`, or `report`
`contentType`	`string`	MIME type (e.g., `application/json`)
`lifetime`	`string`	Retention policy (e.g., `infinite`, `30d`)
`ownerType`	`string`	`model-method`, `workflow-step`, or `manual`
`streaming`	`bool`	Whether the data uses streaming writes
`size`	`int`	Content size in bytes
`content`	`string`	Raw text content

Referencing an unknown field produces an error listing the available fields:

$ swamp data query 'badField == "test"' --json

{
  "error": "Unknown field \"badField\" in query predicate.\nAvailable: attributes, content, contentType, createdAt, dataType, id, lifetime, modelName, modelType, name, ownerType, size, specName, streaming, tags, version"
}

Lazy-Loaded Fields

The attributes and content fields are loaded from disk only when referenced in the predicate or the --select expression. All other fields are read from the metadata catalog without touching data files.

attributes: Populated for application/json data. The raw content is parsed as JSON. Invalid JSON is treated as an empty map.
content: Populated for text content types (text/plain, text/markdown, application/json, application/yaml, etc.). Binary content types produce an empty string.

When neither field is referenced, queries run entirely against the catalog index.

Predicates

A predicate is a CEL expression that evaluates to a boolean. Records where the predicate returns true are included in the results.

Comparison

$ swamp data query 'modelName == "scanner"' --json

{
  "predicate": "modelName == \"scanner\"",
  "results": [
    {
      "id": "7a3708d4-4767-4e45-912e-6e4ab42f5ea5",
      "name": "log",
      "version": 1,
      "tags": {
        "type": "file",
        "specName": "log",
        "modelName": "scanner",
        "env": "prod"
      },
      "modelName": "scanner",
      "modelType": "command/shell",
      "specName": "log",
      "dataType": "file",
      "contentType": "text/plain",
      "lifetime": "infinite",
      "streaming": true,
      "size": 22
    }
  ],
  "total": 1,
  "limited": false
}

Note

Examples on this page show a subset of DataRecord fields for brevity. Actual JSON output includes all fields listed in the table above (e.g., createdAt, ownerType, attributes, content).

Numeric Comparison

$ swamp data query 'size > 100' --limit 2 --json

When results are truncated by --limit, the response includes "limited": true.

Boolean Fields

$ swamp data query 'streaming == true' --json

Returns all data records with streaming enabled.

Compound Predicates

Combine conditions with && (and) and || (or):

$ swamp data query 'modelName == "scanner" && specName == "result"' --json

String Methods

CEL string methods work on string fields:

swamp data query 'name.contains("result")'
swamp data query 'modelName.startsWith("scan")'
swamp data query 'contentType.matches("application/.*")'

See String Methods in the CEL reference for the complete list.

Attribute Filtering

Access nested fields within attributes to filter on resource content. This triggers lazy loading of the content from disk.

$ swamp data query 'dataType == "resource" && attributes.exitCode == 0' --json

When a record's attributes map does not contain the referenced key, the record is excluded from results rather than producing an error.

Tag Filtering

Tags are accessible as a nested map via the tags field. Use dot notation or bracket notation to access tag values:

swamp data query 'tags.env == "prod"'
swamp data query 'tags["env"] == "prod"'
swamp data query 'tags.type == "resource"'

Records that do not have the referenced tag key are silently excluded from results (no error).

Tag Sources

Tags on data records come from multiple sources, resolved in order. See the Tag Resolution Chain above for the full precedence.

Three tags are always present on every data record:

Tag Key	Description
`type`	`resource`, `file`, or `report`
`specName`	Output spec name from the model type
`modelName`	Model definition name

Custom tags are added via --tag flags on method runs, workflow dataOutputOverrides, or the output spec's tags field in the model definition.

Projections

The --select flag transforms each matching record into a specified shape. The select expression is a CEL expression evaluated against each matching record's fields.

Scalar Projection

Extract a single field value. Returns an array of values.

$ swamp data query 'dataType == "resource"' --select name --json

{
  "results": [
    "result",
    "result"
  ],
  "total": 2,
  "limited": false
}

Map Projection

Build an object from selected fields. Returns an array of objects.

$ swamp data query 'dataType == "resource"' --select '{"name": name, "model": modelName, "size": size}' --json

List Projection

Build an array from selected fields. Returns an array of arrays.

$ swamp data query 'dataType == "resource"' --select '[name, modelName, size]' --json

Accessing Nested Data in Projections

Select expressions can reference attributes and content even if the predicate does not. The query engine detects field references in both the predicate and select expression to determine which fields to load from disk.

swamp data query 'dataType == "resource"' --select 'attributes.stdout'

If a record's attributes do not contain the referenced key, the projection produces null for that record.

Null-Safe Access with `.?`

When using data.query() results in CEL expressions, the .? (optional select) operator provides null-safe field access. If the receiver is null, .? returns null instead of throwing. Use .orValue() to provide a fallback:

data.query("modelName == \"scanner\"", "attributes.?findings.orValue([])")

See Optional Select in the CEL reference for full syntax and usage guidance.

`data.query()` in CEL Expressions

The data.query() function provides the same query capability inside CEL expressions used in model definitions, workflow steps, and data output overrides.

data.query("modelName == \"scanner\" && size > 1000")
data.query("modelName == \"scanner\"", "attributes.status")

Signature	Returns	Description
`data.query(predicate)`	`list<DataRecord>`	Records matching the predicate
`data.query(predicate, select)`	`list<dyn>`	Projected values from matches

The predicate and select arguments are strings containing CEL expressions. The same DataRecord fields and operators are available as in the CLI command.

Interactive Mode

When invoked without a predicate in a terminal, swamp data query opens an interactive TUI for browsing data. The TUI supports:

Filtering by tag keys and values
Text search across record fields
Selecting and inspecting individual records

Non-interactive invocations (piped output, --json flag, or no TTY) require a CEL predicate argument.

Query Result Structure

JSON output includes these top-level fields:

Field	Type	Description
`predicate`	`string`	The CEL predicate used (omitted with `--select`)
`results`	`list`	Matching DataRecords or projected values
`total`	`int`	Number of results returned
`limited`	`bool`	`true` when results were truncated by `--limit`

Without --select, each result is a full DataRecord. With --select, each result is the projected value (scalar, map, or list depending on the select expression shape).

Output Types

Resource Outputs

File Outputs

Example: command/shell Model Type

Lifetime

Duration Conversion

Expiration Rules

Garbage Collection Policy

Versioning

The "latest" Pointer

Checksums

Storage Layout

Metadata File

Tags

Tag Resolution Chain

Common Tag Values

Streaming

Sensitive Output

Field-Level Sensitivity

Whole-Output Sensitivity

Vault Resolution Order

Vault Key Format

Persisted Format

Lifecycle States

Deletion Markers

Rename Markers

Owner Definition

Data Output Overrides

CEL Access

CLI Commands

swamp data get <model> <data_name>

swamp data list [model]

swamp data versions <model> <data_name>

swamp data search [query]

swamp data query [predicate]

swamp data rename <model> <old_name> <new_name>

swamp data delete <model> <data_name>

swamp data gc

Validation Rules

Querying

Command

DataRecord Fields

Lazy-Loaded Fields

Predicates

Comparison

Numeric Comparison

Boolean Fields

Compound Predicates

String Methods

Attribute Filtering

Tag Filtering

Tag Sources

Projections

Scalar Projection

Map Projection

List Projection

Accessing Nested Data in Projections

Null-Safe Access with .?

data.query() in CEL Expressions

Interactive Mode

Query Result Structure

Example: `command/shell` Model Type

`swamp data get <model> <data_name>`

`swamp data list [model]`

`swamp data versions <model> <data_name>`

`swamp data search [query]`

`swamp data query [predicate]`

`swamp data rename <model> <old_name> <new_name>`

`swamp data delete <model> <data_name>`

`swamp data gc`

Null-Safe Access with `.?`

`data.query()` in CEL Expressions