Skip to main content
← Back to list
01Issue
BugClosedSwamp CLI
Assigneesstack72

context.readModelData returns different results depending on invocation context (manual vs workflow)

Opened by stack72 · 4/7/2026· GitHub #1113

Description

context.readModelData(modelName, specName) produces different results depending on whether the method is invoked manually (swamp model method run) or within a workflow. This makes manual runs unreliable for debugging workflow behavior.

  • Manual run: readModelData returns ALL historical data for the source model (no workflowRunId available → no scoping)
  • Workflow run: readModelData is scoped to data produced by the current workflow run (via workflowRunId tag filtering in raw_execution_driver.ts lines 138-140)

This means a method that works correctly in a workflow can produce wildly different (and incorrect) results when run manually for debugging.

Concrete Example

anime-source has 27 configured shows. search_configured produces 182 episodes per run.

# Manual run — returns 921 items (all historical data, including removed shows)
swamp model method run dedup filter --input sourceModel=anime-source
→ Read 921 episodes from "anime-source"
→ 304 "new" episodes (many are false positives from orphaned data)

# Workflow run — returns 182 items (current run only)
swamp workflow run discover-and-download
→ Read 182 episodes from "anime-source"
→ correct dedup results

The 921 items include data from shows that were removed from the config months ago (e.g., "Dark Gathering" removed from globalArgs, but its data persists with lifetime: infinite). This orphaned data is invisible in workflow runs but pollutes manual runs.

Why This Matters

  1. You can't debug workflows with manual runs. The primary way to test a model method is swamp model method run. If it returns different data than the workflow, you're debugging a different system.

  2. False confidence in fixes. A dedup fix that looks correct in manual testing may behave completely differently in the workflow (or vice versa). We spent significant time chasing dedup bugs that only manifested in one invocation context.

  3. No way to opt into scoping manually. There's no --scope-to-latest-run flag or equivalent. Manual runs always get the unscoped path.

Current Implementation

In raw_execution_driver.ts:

const workflowRunId = this.context.tagOverrides?.["workflowRunId"];
const readModelData = (modelName: string, specName?: string) =>
  dataAccessService.readModelData(modelName, specName, workflowRunId);

When workflowRunId is undefined (manual run), readModelData returns everything. When set (workflow run), it filters by workflowRunId tag.

Proposed Solution

readModelData should behave consistently regardless of invocation context. Options:

  1. Default to latest execution's output — when no workflowRunId is available, scope to the source model's most recent method output instead of returning all historical data
  2. Add a CLI flagswamp model method run ... --scope-to-latest to simulate workflow scoping during manual runs
  3. Always scope by default — return only the latest version of each unique data name, with an explicit opt-in for historical data

Any of these would make manual runs trustworthy for debugging.

Environment

  • swamp version: 20260206.200442.0
  • Extension: @keeb/mms/dedup calling readModelData("anime-source", "episode")
  • #1020 — closed as not-a-bug (findBySpec run-scoped, but same inconsistency exists)
  • #966 — forEach data.findBySpec resolves empty when data written by prior job
  • #914 — context.readModelData feature request

Automoved by swampadmin from GitHub issue #1113

02Bog Flow
OPENTRIAGEDIN PROGRESSCLOSED+ 1 MOREASSIGNED

Closed

4/17/2026, 8:48:46 PM

No activity in this phase yet.

03Sludge Pulse
stack72 assigned stack724/17/2026, 8:44:20 PM
Editable. Press Enter to edit.

stack72 commented 4/17/2026, 8:48:45 PM

Closing as already-fixed by #1145 (commit d9562498, merged 2026-04-08).

That PR removed all hidden workflowRunId scoping from readModelData, findBySpec, and queryData. Current behavior: manual runs and workflow runs both return all data — the inconsistency described here no longer exists.

Relevant code:

  • src/domain/drivers/raw_execution_driver.ts:139-140 — readModelData is called without workflowRunId.
  • src/domain/data/data_access_service.ts:105-117 — signature is readModelData(modelName, specName?); no scoping.
  • src/domain/data/data_access_service_test.ts:399, :480 — tests assert all data is returned regardless of workflowRunId.

Note: the underlying concern about orphaned data (e.g. removed shows persisting with lifetime: infinite) still exists, but now affects both contexts equally rather than causing a manual-vs-workflow divergence. If a --scope-to-latest-run flag or similar debugging affordance is still wanted, please file a fresh feature request — the proposed solutions in this issue conflict with #1145's explicit "remove hidden scoping" design direction.

Sign in to post a ripple.