DATA
Data outputs are the versioned artifacts produced when model methods execute.
Each method execution can write structured data (resources) and unstructured
content (files). Data outputs are stored in .swamp/data/ within the
repository, organized by model type, model ID, and data name.
Output Types
Model types declare their output specifications using two categories: resources and files. Each declared spec has a name (the spec name) that identifies it within the model type.
Resource Outputs
Structured JSON data validated against a schema.
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
description |
string | No | None | Human-readable description |
schema |
Zod schema | Yes | — | Validates data on write |
lifetime |
Lifetime |
Yes | — | Retention policy |
garbageCollection |
GarbageCollectionPolicy |
Yes | — | Version retention policy |
tags |
Record<string, string> |
No | {} |
Default tags (auto-includes type: "resource") |
sensitiveOutput |
boolean | No | false |
Treat all fields as sensitive |
vaultName |
string | No | First available vault | Vault for storing sensitive field values |
Resource content is always stored as application/json.
File Outputs
Binary or text content identified by MIME type.
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
description |
string | No | None | Human-readable description |
contentType |
string | Yes | — | MIME type (e.g., text/plain) |
lifetime |
Lifetime |
Yes | — | Retention policy |
garbageCollection |
GarbageCollectionPolicy |
Yes | — | Version retention policy |
streaming |
boolean | No | false |
Line-oriented append mode |
tags |
Record<string, string> |
No | {} |
Default tags (auto-includes type: "file") |
Example: command/shell Model Type
The built-in command/shell model type declares one resource and one file
output:
resources:
result [resource] — Shell command execution result (infinite)
files:
log [file] — Shell command output, text/plain (infinite, streaming)After execution, both are visible via swamp data list:
Data for hello-world (command/shell)
file (1 item):
log v2 text/plain 19B 2026-04-07
resource (1 item):
result v1 application/json 135B 2026-04-07
report (2 items):
report-swamp-method-summary v2 text/markdown 482B 2026-04-07
report-swamp-method-summary-json v2 application/json 2.6KB 2026-04-07Lifetime
Lifetime determines how long data is retained before it becomes eligible for garbage collection.
| Value | Description |
|---|---|
| Duration string | 1h, 5m, 10d, 2w, 1mo, 10y |
ephemeral |
Deleted when the process ends |
infinite |
Never automatically deleted |
job |
Lives until the job completes |
workflow |
Lives until the workflow completes |
Duration format: {number}{unit} where unit is h (hours), m (minutes), d
(days), w (weeks), mo (months), or y (years).
Zero-duration strings (e.g., 0h, 0d) are normalized to workflow.
Duration Conversion
| Unit | Conversion |
|---|---|
m |
value × 60,000 ms |
h |
value × 3,600,000 ms |
d |
value × 86,400,000 ms |
w |
value × 604,800,000 ms |
mo |
value × 2,592,000,000 ms (30 days) |
y |
value × 31,536,000,000 ms (365 days) |
Expiration Rules
infinite: Never expires.ephemeral: Not yet implemented — treated as non-expiring.job/workflow: Expires when the associated workflow run no longer exists. RequiresworkflowIdandworkflowRunIdin the owner definition. If either is missing, the data is not expired.- Duration strings: Expires when
createdAt + durationis in the past.
Garbage Collection Policy
Garbage collection controls how many versions of a data item are retained.
| Value | Description |
|---|---|
| integer | Keep the N most recent versions |
| Duration string | Keep versions created within the duration |
The integer must be a positive integer. The duration string uses the same format as lifetime durations and must be greater than zero.
# Keep the 10 most recent versions
garbageCollection: 10
# Keep versions from the last 7 days
garbageCollection: 7dGarbage collection runs as part of swamp data gc and during the lifecycle
service. It operates in two phases:
- Expired data deletion — removes all versions of data items whose lifetime has elapsed.
- Version pruning — for non-expired data, removes old versions that exceed the garbage collection policy.
Versioning
Each data item is versioned with sequential positive integers starting at 1. Every method execution that writes to the same data name produces a new version.
$ swamp data versions hello-world result --json{
"dataName": "result",
"modelName": "hello-world",
"modelType": "command/shell",
"versions": [
{
"version": 2,
"createdAt": "2026-04-07T18:03:08.737Z",
"size": 135,
"checksum": "d58d16...",
"isLatest": true
},
{
"version": 1,
"createdAt": "2026-04-07T18:02:58.146Z",
"size": 157,
"checksum": "c631b6...",
"isLatest": false
}
],
"total": 2
}The "latest" Pointer
Each data item has a latest file in its directory containing the current
version number as plain text. When data is retrieved without an explicit
--version flag, the latest version is returned.
.swamp/data/command/shell/{model-id}/result/
1/
2/
latest # Contains: "2"The name latest is reserved — it cannot be used as a data name.
Checksums
Each version includes a SHA-256 checksum computed from the content file at
finalization time. The checksum is stored in metadata.yaml and returned by
data access commands.
Storage Layout
Data is stored on disk under .swamp/data/ in a hierarchical directory
structure:
.swamp/data/
{model-type-path}/
{model-id}/
{data-name}/
1/
metadata.yaml
raw
2/
metadata.yaml
raw
latest{model-type-path}— the model type as a directory path (e.g.,command/shell).{model-id}— the UUID of the model definition.{data-name}— the instance name given when data is written.{version}/— a numbered directory for each version.metadata.yaml— full metadata for the version.raw— the content (JSON for resources, binary or text for files).latest— text file containing the current version number.
Metadata File
Each version's metadata.yaml contains the complete data record:
id: e204ea55-3d64-48a0-aa78-32fea656fdac
name: result
version: 1
contentType: application/json
lifetime: infinite
garbageCollection: 10
streaming: false
tags:
type: resource
specName: result
modelName: hello-world
ownerDefinition:
ownerType: model-method
ownerRef: 7347cf2c-cc9e-4203-8897-e10845af9732
createdAt: "2026-04-07T18:02:58.146Z"
size: 157
checksum: c631b676cd069af1decf4f20c27568f44bcccf062846bb32bbeae187573c2fe6Tags
Tags are key-value string pairs attached to data. They are used for filtering, discovery, and categorization.
Tag Resolution Chain
Tags are resolved in order, with later steps overriding earlier ones:
- Type auto-tag —
type: "resource"ortype: "file"(always present). - Definition tags — tags from the model definition.
- Spec defaults — tags declared on the output specification.
- Method write overrides — tags passed by the method when writing.
specNameauto-tag — the output spec name (always injected).modelNameauto-tag — the definition name (always injected).- Workflow tag overrides — tags from workflow step context.
- Runtime tags — tags provided via
--tagCLI flags. - Data output overrides — tags from
workflow
dataOutputOverrides.
The type tag is required on all data. Data without a type tag fails
validation.
Common Tag Values
| Tag Key | Auto-Injected | Description |
|---|---|---|
type |
Yes | resource, file, or report |
specName |
Yes | Output spec name from the model type |
modelName |
Yes | Definition name for orphan data recovery |
Streaming
When streaming: true is set on a file output spec, the data writer operates in
line-oriented append mode. Lines are written incrementally to disk as they are
produced, rather than buffered in memory.
The command/shell model type declares streaming: true on its log file
output to capture stdout and stderr as the command executes.
Streaming file writers support three write patterns:
writeLine(line)— appends a single line with a newline character.writeStream(stream)— pipes aReadableStreamto disk, invoking optional line callbacks as newlines are encountered.getFilePath()— returns the allocated content path for direct file writes.
Non-streaming outputs use writeAll(content) or writeText(text) to write the
complete content at once.
Sensitive Output
Resource output specs can mark fields as sensitive. Sensitive values are stored
in a vault and replaced with vault.get() reference
expressions before the data is persisted to disk.
Field-Level Sensitivity
Individual fields are marked sensitive through Zod schema metadata:
schema: z.object({
apiKey: z.string().meta({ sensitive: true }),
publicId: z.string(),
});Only apiKey is stored in the vault. publicId is persisted as-is.
Whole-Output Sensitivity
When sensitiveOutput: true is set on the resource spec, all top-level fields
are treated as sensitive.
Vault Resolution Order
The vault used for storing sensitive fields is resolved in this order:
- Field-level
vaultNamefrom schema metadata - Spec-level
vaultNamefrom the resource output specification - First available vault from the vault service
If sensitive fields exist but no vault is configured, an error is thrown.
Vault Key Format
Auto-generated vault keys follow this pattern:
{sanitized-model-type}-{model-id}-{method-name}-{field-path}Sanitization: @ and null bytes are removed, / and \ are replaced with -,
.. is replaced with ..
Persisted Format
After processing, persisted resource data contains vault references:
apiKey: "${{ vault.get('my-secrets', 'command-shell-abc-execute-apiKey') }}"
publicId: "pk_12345"On read, vault references are automatically resolved back to their original values. Resolved secrets are registered with the secret redactor to prevent log leakage.
Non-string sensitive values are JSON-stringified before vault storage.
Lifecycle States
Each data entry has a lifecycle state.
| State | Description |
|---|---|
active |
Normal, live data (default) |
deleted |
Tombstone marker — the data has been deleted or renamed |
Deletion Markers
A deletion marker is a version with lifecycle: "deleted", application/json
content type, and streaming: false. It signals that the data was intentionally
removed.
Rename Markers
A rename marker is a deletion marker with an additional renamedTo field
pointing to the new data name. When the latest version of a data item is a
rename marker, lookups without an explicit version follow the forward reference
to the new name (up to 5 levels deep).
$ swamp data rename hello-world result execution-result --json{
"oldName": "result",
"newName": "execution-result",
"modelId": "7347cf2c-cc9e-4203-8897-e10845af9732",
"modelName": "hello-world",
"modelType": "command/shell",
"copiedVersion": 2,
"newVersion": 1,
"warning": "Any workflows or models that produce data under \"result\" will overwrite the forward reference. Update them to use \"execution-result\" instead."
}The rename process:
- Copies the latest version of the old data name to version 1 under the new name.
- Writes a tombstone with a forward reference on the old name.
- Updates the latest marker on the old name to point to the tombstone.
Owner Definition
Every data entry tracks its owner — who created it.
| Field | Type | Required | Description |
|---|---|---|---|
ownerType |
string | Yes | model-method, workflow-step, or manual |
ownerRef |
string | Yes | Identifier of the owner (model ID, step ref) |
workflowId |
string | No | Workflow UUID (for job/workflow lifetimes) |
workflowRunId |
string | No | Workflow run UUID |
Ownership is validated on write — new versions of an existing data name must
have the same ownerType and ownerRef as the original.
Data Output Overrides
Workflow steps can override the default output spec settings for data produced
by their tasks. See
dataOutputOverrides in the
workflows reference.
| Field | Type | Description |
|---|---|---|
specName |
string | Output spec name to override |
lifetime |
Lifetime |
Override retention policy |
garbageCollection |
GarbageCollectionPolicy |
Override version retention |
tags |
Record<string, string> |
Additional tags merged with output tags |
vary |
string[] |
Input key names to vary by (composite data names) |
When vary is set, the resolved values of the named input keys are appended as
a suffix to the data instance name. This produces distinct data items per
iteration in forEach steps.
CEL Access
Data outputs are accessible in
CEL expressions through the data
namespace:
| Function | Description |
|---|---|
data.latest(modelName, dataName) |
Latest version of a data item |
data.latest(modelName, dataName, varyValues[]) |
Latest version with vary suffix |
data.version(modelName, dataName, version) |
Specific version |
data.version(modelName, dataName, varyValues[], version) |
Specific version with vary suffix |
data.listVersions(modelName, dataName) |
All version numbers |
data.listVersions(modelName, dataName, varyValues[]) |
All version numbers with vary suffix |
data.findByTag(tagKey, tagValue) |
Search by tag |
data.findBySpec(modelName, specName) |
Find by output spec name |
data.query(predicate, select?) |
CEL predicate query |
data.latest() and data.version() return null when the data item does not
exist. Use the .? (optional select) operator to chain through null safely
instead of throwing, and .orValue() to provide inline defaults:
data.latest("factory", "code-review").?attributes.?findings
data.latest("factory", "code-review").?attributes.?findings.orValue([])
data.version("scanner", "result", 1).?attributes.?status.orValue("unknown")See Optional Select in the CEL reference for full syntax and usage guidance.
These functions return DataRecord objects with the following fields:
| Field | Type | Description |
|---|---|---|
id |
string | Data UUID |
name |
string | Data instance name |
version |
number | Version number |
createdAt |
string | ISO 8601 timestamp |
attributes |
Record<string, unknown> |
Parsed JSON content (resources only) |
tags |
Record<string, string> |
All tags |
modelName |
string | Model definition name |
modelType |
string | Model type path |
specName |
string | Output spec name |
dataType |
string | resource or file |
contentType |
string | MIME type |
lifetime |
string | Lifetime policy |
ownerType |
string | Owner type |
streaming |
boolean | Whether streaming is enabled |
size |
number | Content size in bytes |
content |
string | Raw content string |
CLI Commands
All data commands accept --json to output structured JSON instead of
human-readable text.
swamp data get <model> <data_name>
Retrieve data by model name and data name. Returns the latest version by default.
| Option | Description |
|---|---|
--version |
Retrieve a specific version number |
--workflow |
Get data produced by a workflow |
--run |
Specific workflow run ID |
--no-content |
Show metadata only, without content |
--server |
Run against a remote swamp serve instance (env: SWAMP_SERVE_URL) |
--token |
Server token in <name>.<secret> format; only with --server (overrides stored credentials and SWAMP_SERVER_TOKEN) |
--repo-dir |
Repository directory (default .) |
$ swamp data get hello-world result --json{
"id": "e204ea55-3d64-48a0-aa78-32fea656fdac",
"name": "result",
"modelName": "hello-world",
"modelType": "command/shell",
"version": 1,
"contentType": "application/json",
"lifetime": "infinite",
"garbageCollection": 10,
"streaming": false,
"tags": {
"type": "resource",
"specName": "result",
"modelName": "hello-world"
},
"ownerDefinition": {
"ownerType": "model-method",
"ownerRef": "7347cf2c-cc9e-4203-8897-e10845af9732"
},
"createdAt": "2026-04-07T18:02:58.146Z",
"size": 157,
"checksum": "c631b676cd...",
"contentPath": ".swamp/data/command/shell/.../result/1/raw",
"content": {
"exitCode": 0,
"executedAt": "2026-04-07T18:02:58.143Z",
"command": "echo \"Hello from the swamp!\"",
"durationMs": 4,
"stdout": "Hello from the swamp!",
"stderr": ""
}
}swamp data list [model]
List all data for a model, grouped by type.
| Option | Description |
|---|---|
--type |
Filter by data type (resource, file, report) |
--workflow |
List data produced by a workflow |
--run |
Specific workflow run ID |
--server |
Run against a remote swamp serve instance (env: SWAMP_SERVE_URL) |
--token |
Server token in <name>.<secret> format; only with --server (overrides stored credentials and SWAMP_SERVER_TOKEN) |
--repo-dir |
Repository directory (default .) |
$ swamp data list hello-worldData for hello-world (command/shell)
file (1 item):
log v2 text/plain 19B 2026-04-07
resource (1 item):
result v1 application/json 135B 2026-04-07
report (2 items):
report-swamp-method-summary v2 text/markdown 482B 2026-04-07
report-swamp-method-summary-json v2 application/json 2.6KB 2026-04-07swamp data versions <model> <data_name>
Show all versions of a data item.
| Option | Description |
|---|---|
--server |
Run against a remote swamp serve instance (env: SWAMP_SERVE_URL) |
--token |
Server token in <name>.<secret> format; only with --server (overrides stored credentials and SWAMP_SERVER_TOKEN) |
--repo-dir |
Repository directory (default .) |
$ swamp data versions hello-world result --json{
"dataName": "result",
"modelName": "hello-world",
"modelType": "command/shell",
"versions": [
{
"version": 2,
"createdAt": "2026-04-07T18:03:08.737Z",
"size": 135,
"checksum": "d58d1607...",
"isLatest": true
},
{
"version": 1,
"createdAt": "2026-04-07T18:02:58.146Z",
"size": 157,
"checksum": "c631b676...",
"isLatest": false
}
],
"total": 2
}swamp data search [query]
Search across all data in the repository. Opens an interactive picker in a
terminal, or returns JSON with --json.
| Option | Description |
|---|---|
--type |
Filter by data type tag (resource, file, report) |
--lifetime |
Filter by lifetime (ephemeral, infinite, job, workflow, or duration) |
--owner-type |
Filter by owner type (model-method, workflow-step, manual) |
--workflow |
Filter to data tagged with this workflow name |
--model |
Filter to data owned by this model name |
--content-type |
Filter by MIME content type |
--since |
Only data created within duration (1h, 1d, 7d, 1w, 1mo) |
--output |
Filter by output ID |
--run |
Filter by workflow run ID |
--tag |
Filter by tag (KEY=VALUE, repeatable) |
--streaming |
Only show streaming data |
--limit |
Maximum results (default 50) |
--server |
Run against a remote swamp serve instance (env: SWAMP_SERVE_URL) |
--token |
Server token in <name>.<secret> format; only with --server (overrides stored credentials and SWAMP_SERVER_TOKEN) |
--repo-dir |
Repository directory (default .) |
$ swamp data search --type resource --json{
"query": "",
"filters": {
"type": "resource"
},
"results": [
{
"id": "5e7d72ab-7e0d-492e-ab3d-61463d9d4a85",
"name": "execution-result",
"version": 1,
"contentType": "application/json",
"type": "resource",
"lifetime": "infinite",
"ownerType": "model-method",
"modelName": "hello-world",
"modelType": "command/shell",
"streaming": false,
"size": 135,
"createdAt": "2026-04-07T18:03:27.361Z",
"tags": {
"type": "resource",
"specName": "result",
"modelName": "hello-world"
}
}
],
"total": 1,
"limited": false
}swamp data query [predicate]
Query data using a CEL predicate. The predicate evaluates against DataRecord
fields directly (not prefixed with data.).
| Option | Description |
|---|---|
--select |
CEL expression to project fields (e.g., name) |
--limit |
Maximum results (default 100) |
--server |
Run against a remote swamp serve instance (env: SWAMP_SERVE_URL) |
--token |
Server token in <name>.<secret> format; only with --server (overrides stored credentials and SWAMP_SERVER_TOKEN) |
--repo-dir |
Repository directory (default .) |
Available fields in the predicate: attributes, content, contentType,
createdAt, dataType, id, lifetime, modelName, modelType, name,
ownerType, size, specName, streaming, tags, version.
$ swamp data query 'tags.type == "resource"' --json{
"predicate": "tags.type == \"resource\"",
"results": [
{
"id": "85c471af-a4c8-4f03-a5df-768351388d09",
"name": "result",
"version": 2,
"tags": {
"type": "resource",
"specName": "result",
"modelName": "hello-world"
},
"modelName": "hello-world",
"modelType": "command/shell",
"dataType": "resource",
"contentType": "application/json",
"lifetime": "infinite",
"streaming": false,
"size": 135
}
],
"total": 1,
"limited": false
}With --select to project a single field:
$ swamp data query 'tags.type == "resource"' --select 'name' --json{
"results": ["result", "execution-result"],
"total": 2,
"limited": false
}swamp data rename <model> <old_name> <new_name>
Rename a data item. Creates a copy under the new name and writes a tombstone with a forward reference on the old name.
| Option | Description |
|---|---|
--server |
Run against a remote swamp serve instance (env: SWAMP_SERVE_URL) |
--token |
Server token in <name>.<secret> format; only with --server (overrides stored credentials and SWAMP_SERVER_TOKEN) |
--repo-dir |
Repository directory (default .) |
$ swamp data rename hello-world result execution-result --json{
"oldName": "result",
"newName": "execution-result",
"modelId": "7347cf2c-cc9e-4203-8897-e10845af9732",
"modelName": "hello-world",
"modelType": "command/shell",
"copiedVersion": 2,
"newVersion": 1,
"warning": "Any workflows or models that produce data under \"result\" will overwrite the forward reference. Update them to use \"execution-result\" instead."
}Lookups for the old name without an explicit version follow the forward reference to the new name (up to 5 levels deep).
swamp data delete <model> <data_name>
Permanently delete a data artifact, or a single version of one. Without
--version, every version of the artifact is removed.
| Option | Description |
|---|---|
--version |
Delete only that version. The latest pointer follows the highest remaining version |
-f, --force |
Skip the [y/N] confirmation prompt |
--server |
Run against a remote swamp serve instance (env: SWAMP_SERVE_URL). Batch modes (--prefix, --all) are not available remotely |
--token |
Server token in <name>.<secret> format; only with --server (overrides stored credentials and SWAMP_SERVER_TOKEN) |
--repo-dir |
Repository directory (default .) |
By default the command prompts for confirmation before deleting. With --force
or --json it runs non-interactively.
$ swamp data delete hello-world result --log
About to delete 1 version(s) of "result" from hello-world. Proceed? [y/N] y
19:00:25.924 INF data·delete Deleted 1 version(s) of "result" for "hello-world" ("command/shell")$ swamp data delete hello-world result --version 1 --force --json{
"modelId": "589d3566-0144-478c-a5fc-dcf069d455af",
"modelName": "hello-world",
"modelType": "command/shell",
"dataName": "result",
"version": 1,
"versionsDeleted": 1
}The version field is omitted when every version is deleted. versionsDeleted
is the count of versions actually removed.
Error responses:
- Missing model —
Model not found: <ref> - Missing artifact —
No data named "<name>" exists for model <model> - Missing version —
Version <V> does not exist for "<name>" (available versions: <list>)
If old_name was previously renamed to new_name (see
Rename Markers), swamp data delete <model> <old_name>
removes the tombstone forwarder literally — it does not traverse the forward
reference to delete new_name. Lookups under the old name afterwards fail with
Data not found; the rename target is unaffected.
swamp data gc
Run garbage collection — delete expired data and prune old versions.
| Option | Description |
|---|---|
--dry-run |
Show what would be deleted |
-f, --force |
Skip confirmation prompt |
--repo-dir |
Repository directory (default .) |
Two phases execute in sequence:
- Expired data deletion — removes all versions of data whose lifetime has elapsed.
- Version pruning — removes old versions exceeding the garbage collection policy for non-expired data.
$ swamp data gc --dry-run --json{
"dataEntriesExpired": 0,
"versionsDeleted": 0,
"bytesReclaimed": 0,
"dryRun": true,
"expiredEntries": []
}GC can also run automatically after each model method run when autoGc: true is
set in .swamp.yaml. Automatic GC runs after the completed event (the user sees
their method result first) and errors are caught at warn level, so auto-GC
failure never fails the method run. See
Repository Configuration for the
full autoGc field reference.
Validation Rules
- Data names must be non-empty strings.
- Data names must not contain
..,/,\, or null bytes (path traversal protection). - The name
latestis reserved (case-insensitive) and cannot be used as a data name. - Resource data is validated against the spec's Zod schema on write. Schema mismatches produce a warning, not an error.
- New writes to an existing data name must have the same owner (
ownerType+ownerRef) as the original. - Tags must include a
typekey.
Querying
swamp data query searches across all data artifacts in a repository using CEL
predicates. Each predicate is evaluated against every data record, and matching
records are returned. The same query engine powers the data.query() function
in CEL expressions.
Command
swamp data query [predicate]| Option | Type | Default | Description |
|---|---|---|---|
--select |
string | None | CEL expression to project fields from matching records |
--limit |
number | 100 |
Maximum number of results returned |
--server |
string | None | Run against a remote swamp serve instance (env: SWAMP_SERVE_URL) |
--token |
string | None | Server token in <name>.<secret> format; only with --server (overrides stored credentials and SWAMP_SERVER_TOKEN) |
--repo-dir |
string | . |
Repository directory |
--json |
flag | — | Output in JSON format |
When no predicate is provided and stdout is a TTY, the command opens an
interactive TUI for browsing and filtering data. When no predicate is provided
in non-interactive mode (piped output or --json), the command returns an
error.
DataRecord Fields
Predicates evaluate against DataRecord fields as top-level variables. No
namespace prefix is needed — use modelName, not data.modelName.
| Field | Type | Description |
|---|---|---|
id |
string |
Record UUID |
name |
string |
Data item name |
version |
int |
Version number |
createdAt |
string |
ISO 8601 timestamp |
attributes |
map<string, dyn> |
Parsed JSON content (resource data) |
tags |
map<string, string> |
Metadata tags |
modelName |
string |
Owning model definition name |
modelType |
string |
Model type path (e.g., command/shell) |
specName |
string |
Output spec name |
dataType |
string |
resource, file, or report |
contentType |
string |
MIME type (e.g., application/json) |
lifetime |
string |
Retention policy (e.g., infinite, 30d) |
ownerType |
string |
model-method, workflow-step, or manual |
streaming |
bool |
Whether the data uses streaming writes |
size |
int |
Content size in bytes |
content |
string |
Raw text content |
Referencing an unknown field produces an error listing the available fields:
$ swamp data query 'badField == "test"' --json{
"error": "Unknown field \"badField\" in query predicate.\nAvailable: attributes, content, contentType, createdAt, dataType, id, lifetime, modelName, modelType, name, ownerType, size, specName, streaming, tags, version"
}Lazy-Loaded Fields
The attributes and content fields are loaded from disk only when referenced
in the predicate or the --select expression. All other fields are read from
the metadata catalog without touching data files.
attributes: Populated forapplication/jsondata. The raw content is parsed as JSON. Invalid JSON is treated as an empty map.content: Populated for text content types (text/plain,text/markdown,application/json,application/yaml, etc.). Binary content types produce an empty string.
When neither field is referenced, queries run entirely against the catalog index.
Predicates
A predicate is a CEL expression that evaluates to a boolean. Records where the
predicate returns true are included in the results.
Comparison
$ swamp data query 'modelName == "scanner"' --json{
"predicate": "modelName == \"scanner\"",
"results": [
{
"id": "7a3708d4-4767-4e45-912e-6e4ab42f5ea5",
"name": "log",
"version": 1,
"tags": {
"type": "file",
"specName": "log",
"modelName": "scanner",
"env": "prod"
},
"modelName": "scanner",
"modelType": "command/shell",
"specName": "log",
"dataType": "file",
"contentType": "text/plain",
"lifetime": "infinite",
"streaming": true,
"size": 22
}
],
"total": 1,
"limited": false
}Note
Examples on this page show a subset of DataRecord fields for brevity. Actual
JSON output includes all fields listed in the table above (e.g., createdAt,
ownerType, attributes, content).
Numeric Comparison
$ swamp data query 'size > 100' --limit 2 --jsonWhen results are truncated by --limit, the response includes
"limited": true.
Boolean Fields
$ swamp data query 'streaming == true' --jsonReturns all data records with streaming enabled.
Compound Predicates
Combine conditions with && (and) and || (or):
$ swamp data query 'modelName == "scanner" && specName == "result"' --jsonString Methods
CEL string methods work on string fields:
swamp data query 'name.contains("result")'
swamp data query 'modelName.startsWith("scan")'
swamp data query 'contentType.matches("application/.*")'See String Methods in the CEL reference for the complete list.
Attribute Filtering
Access nested fields within attributes to filter on resource content. This
triggers lazy loading of the content from disk.
$ swamp data query 'dataType == "resource" && attributes.exitCode == 0' --jsonWhen a record's attributes map does not contain the referenced key, the record
is excluded from results rather than producing an error.
Tag Filtering
Tags are accessible as a nested map via the tags field. Use dot notation or
bracket notation to access tag values:
swamp data query 'tags.env == "prod"'
swamp data query 'tags["env"] == "prod"'
swamp data query 'tags.type == "resource"'Records that do not have the referenced tag key are silently excluded from results (no error).
Tag Sources
Tags on data records come from multiple sources, resolved in order. See the Tag Resolution Chain above for the full precedence.
Three tags are always present on every data record:
| Tag Key | Description |
|---|---|
type |
resource, file, or report |
specName |
Output spec name from the model type |
modelName |
Model definition name |
Custom tags are added via --tag flags on method runs, workflow
dataOutputOverrides, or the output spec's tags field in the
model definition.
Projections
The --select flag transforms each matching record into a specified shape. The
select expression is a CEL expression evaluated against each matching record's
fields.
Scalar Projection
Extract a single field value. Returns an array of values.
$ swamp data query 'dataType == "resource"' --select name --json{
"results": [
"result",
"result"
],
"total": 2,
"limited": false
}Map Projection
Build an object from selected fields. Returns an array of objects.
$ swamp data query 'dataType == "resource"' --select '{"name": name, "model": modelName, "size": size}' --jsonList Projection
Build an array from selected fields. Returns an array of arrays.
$ swamp data query 'dataType == "resource"' --select '[name, modelName, size]' --jsonAccessing Nested Data in Projections
Select expressions can reference attributes and content even if the
predicate does not. The query engine detects field references in both the
predicate and select expression to determine which fields to load from disk.
swamp data query 'dataType == "resource"' --select 'attributes.stdout'If a record's attributes do not contain the referenced key, the projection
produces null for that record.
Null-Safe Access with .?
When using data.query() results in CEL expressions, the .? (optional select)
operator provides null-safe field access. If the receiver is null, .?
returns null instead of throwing. Use .orValue() to provide a fallback:
data.query("modelName == \"scanner\"", "attributes.?findings.orValue([])")See Optional Select in the CEL reference for full syntax and usage guidance.
data.query() in CEL Expressions
The data.query() function provides the same query capability inside
CEL expressions used in model definitions,
workflow steps, and data output overrides.
data.query("modelName == \"scanner\" && size > 1000")
data.query("modelName == \"scanner\"", "attributes.status")| Signature | Returns | Description |
|---|---|---|
data.query(predicate) |
list<DataRecord> |
Records matching the predicate |
data.query(predicate, select) |
list<dyn> |
Projected values from matches |
The predicate and select arguments are strings containing CEL expressions. The same DataRecord fields and operators are available as in the CLI command.
Interactive Mode
When invoked without a predicate in a terminal, swamp data query opens an
interactive TUI for browsing data. The TUI supports:
- Filtering by tag keys and values
- Text search across record fields
- Selecting and inspecting individual records
Non-interactive invocations (piped output, --json flag, or no TTY) require a
CEL predicate argument.
Query Result Structure
JSON output includes these top-level fields:
| Field | Type | Description |
|---|---|---|
predicate |
string |
The CEL predicate used (omitted with --select) |
results |
list |
Matching DataRecords or projected values |
total |
int |
Number of results returned |
limited |
bool |
true when results were truncated by --limit |
Without --select, each result is a full DataRecord. With --select, each
result is the projected value (scalar, map, or list depending on the select
expression shape).