feat(clickhouse): tracked archive→CH backfill tooling (backfill.sql)

#331 shipped run.id in the CEL templating context — the workflow-run UUID — which unblocked run-scoped resource KEYS. Three gaps remain for CI-style and self-provisioning workflows:

CEL-only, not env. run.id is reachable when expanding a step's inputs, but a model method's body (and any subprocess or container it spawns) has no ambient way to learn the run it belongs to. The only path today is to thread run.id in as an explicit input arg to every method, then re-thread it into every spawned docker run -e … / child process. GitHub Actions solves this by injecting a fixed GITHUB_* env set into every step's environment, so any tool the step shells out to can self-identify without the author plumbing it.
Only the run, nothing finer. There is no job id, step id, fan-out (forEach) item coordinate, retry/attempt counter, or "which worker executed this step." Correlating the artifacts of a fan-out needs at least run + job + item-index to tell parallel branches apart.
CEL works in inputs but NOT in placement label selectors. ${{ … }} is evaluated when expanding a step's task.inputs, but a placed step's labels: selector is NOT templated — a run: ${{ run.id }} selector passes through VERBATIM and matches no worker (No connected worker matches labels …run=${{ run.id }}), while swamp workflow validate accepts it silently. So even though the worker side CAN advertise a run-scoped label (it's a model-method input, where CEL works), the selector side can't match on it — making per-run placement scoping inexpressible. (Verified on build 20260623; corrects an earlier draft of this issue that assumed run.id was usable in a placement selector.)

Concretely: we run a workflow that provisions its own pool of resource-capped containerized remote-execution workers (a phase = provision N workers -> fan out work across them -> tear down) via a model method that docker runs sibling containers. To scope placement deterministically and to label each container so docker ps --filter label=… and log correlation work, every container needs a stable id tying it to (a) the run and (b) the job/phase that spawned it. We CAN stamp the worker side with run.id (it's a CEL input on the provision method), but we can neither (a) match it on the placement selector (gap 3) nor (b) have the spawning method self-identify without threading run.id through as an input (gap 1) — and there is nothing for the job/step/item axis at all (gap 2). We're forced to derive a synthetic scope from inputs and thread it everywhere — exactly the boilerplate #331 set out to remove for resource keys.

Proposed Solution

Expose a small, GitHub-Actions-inspired family of identifiers in BOTH surfaces — the CEL templating context AND the per-step process environment under a SWAMP_ prefix — so each is usable from ${{ … }} expressions and from Deno.env.get(…) / shell $SWAMP_* alike. A step's spawned children inherit the env, so the contract reaches one level deeper than CEL can. AND run placement-label selectors through the same CEL pass that step inputs already get, so these ids are usable on the matching side too (gap 3).

Proposed set (GH Actions analogue shown for reference; names are a strawman):

CEL	env var	GH Actions analogue	notes
`run.id`	`SWAMP_RUN_ID`	`GITHUB_RUN_ID`	already in CEL via #331; add the env var
`run.attempt`	`SWAMP_RUN_ATTEMPT`	`GITHUB_RUN_ATTEMPT`	increments on retry / resume
`run.workflowName`	`SWAMP_WORKFLOW_NAME`	`GITHUB_WORKFLOW`
`run.workflowId`	`SWAMP_WORKFLOW_ID`	`GITHUB_WORKFLOW_REF`	the definition UUID
`run.actor`	`SWAMP_ACTOR`	`GITHUB_ACTOR`	principal that launched the run (audit / auth)
`run.trigger`	`SWAMP_EVENT_NAME`	`GITHUB_EVENT_NAME`	manual / scheduled / resumed
`job.id`	`SWAMP_JOB_ID`	—	UUID of this job within the run
`job.name`	`SWAMP_JOB_NAME`	`GITHUB_JOB`
`step.id`	`SWAMP_STEP_ID`	`GITHUB_ACTION`
`step.name`	`SWAMP_STEP_NAME`	—
`item.index` / `item.key`	`SWAMP_ITEM_INDEX` / `SWAMP_ITEM_KEY`	matrix context	`forEach` fan-out coordinate
`worker.id`	`SWAMP_WORKER_ID`	`RUNNER_NAME`	which remote worker executed the step (placement debugging)

run.id is the load-bearing value; the minimum that unblocks us is SWAMP_RUN_ID as an env var, job.name / item.index for fan-out correlation, and CEL-templated placement selectors. The rest is the natural CI surface and can land incrementally. Happy to converge on swamp's vocabulary (run vs execution, etc.).

Example — the self-provisioning worker case SHOULD collapse to this (gap 3 must be closed first; today the selector is not templated):

# DESIRED: placement selector matches ONLY this run's workers — a prior run's
# leaked/zombie worker carries a different run.id and is never selected.
# Does NOT work today: selector labels aren't CEL-evaluated (gap 3).
labels:
  run: ${{ run.id }}

// the provision method stamps each spawned container with no input threading
// (needs gap 1 — SWAMP_RUN_ID in the env):
//   docker run --label run=$SWAMP_RUN_ID --label job=$SWAMP_JOB_NAME …
// and `docker ps --filter label=run=<uuid>` then lists exactly that run's fleet.

Affected Components

CEL evaluator / templating — extend variable resolution with job, step, item, worker, and the extra run.* fields (run.id already wired by #331); and run placement-label selector values through the same evaluator (gap 3).
Workflow runner / step executor — inject the SWAMP_* env vars into each step's process environment at the same point inputs are bound, and forward them in the dispatch envelope so a remote-worker-executed step sees identical $SWAMP_* to a loopback one.
swamp workflow validate — at minimum, flag a literal ${{ … }} left unexpanded in a selector instead of accepting it (today it passes, then silently matches nothing at runtime).
Docs — add the SWAMP_* env table + new CEL bindings wherever inputs.* / self.* / run.id are listed.

High-Level Approach

These values already exist in the run / job / step records the engine walks — this is exposure, not new state. The env-var half is a dictionary merge into the step's spawn environment; the CEL half is the same extension #331 made for run.id, widened to the job / step / item records already in scope at expansion time, and applied to selector labels as well. The one piece of genuine plumbing is forwarding the env to remote workers so the contract holds regardless of where a step lands.

Why It Matters

Self-provisioning, fan-out, and CI-driven workflows all need to correlate what a run spawns — containers, child processes, intermediate resources, logs — back to the run (and job / item) that owns them, AND to place work only on the resources that run provisioned. #331 unblocked run-scoped resource KEYS via CEL; this extends the same idea to the EXECUTION ENVIRONMENT (spawned processes self-identify with zero input threading), to FINER granularity (job / step / fan-out item), and to the PLACEMENT SELECTOR (so a run can target only its own workers). It mirrors the GITHUB_* contract every CI tool already expects, which makes swamp workflows legible to anyone arriving from Actions or GitLab CI.

Builds directly on #331 (run.id in CEL, shipped).
#519 (persistent, queryable workflow runs) — same run-identity surface, complementary.

02Bog Flow

Open

6/26/2026, 6:12:34 PM

No activity in this phase yet.

03Sludge Pulse

feat(clickhouse): tracked archive→CH backfill tooling (backfill.sql)

feat(clickhouse): idempotent DDL migration path for running prod (#859 deliverable 2)

chore(clickhouse): retire S3-backed v1 + s3 objects after #859 cutover

Decouple prod ClickHouse from S3 (drop storage_policy=s3_main) + add a DDL migration path

Epic #847 · Unit 6: Document the Mongo-vs-ClickHouse storage-architecture split in scoring.md

Epic #847 · Unit 5: ClickHouse materialized-view projections + atomic leaderboard read-flip + delete Mongo OLAP

Epic #847 · Unit 4: Stream confirmed grants into ClickHouse score_grants (ReplacingMergeTree)

Epic #847 · Unit 3: Migrate the 5 recompute contributions to per-event grants; delete the recompute path

Epic #847 · Unit 2: score_grants append-only ledger write-model in Mongo (shadow, no read flip)

Epic #847 · Unit 1: Land the ClickHouse projection foundation (schema + init SQL + compose service)

Global skills should auto-sync when binary version advances

autoGc emits auto_gc_completed event on --json stdout, breaking single-parse consumers

Extension publish score is non-monotonic: yanking versions lowers a user's score

Live Swamp Club event console on /feed — scrolling stream of all non-sensitive events

Docs: add --ws-idle-timeout to serve flags reference

copy/rsync ignores transport extraOptions (and proxyCommand), unlike exec/script

Remove feed comments — consolidate discussion in Discord

Make serve WebSocket idle/keepalive timeout configurable (untunable default aborts runs when serve's loop briefly blocks)

Docs: update extension info reference with content metadata output

serve startup time regression: synchronous catalog init delays WebSocket listener by ~4.5 minutes

Expose run/job/step identifiers as SWAMP_* env vars + CEL values, and template placement selectors (extends #331's run.id)

fix: add .namespace.json to isInternalCacheFile() in datastore extensions

docs: document swamp serve daemon enable/disable/status subcommands

docs: document execution cancellation commands and cancelled status

docs: document autoGc config option for automatic garbage collection

Docs: document @env= and @file= webhook secret indirection in swamp-serve reference

fix: datastore sync --push deletes the namespace registration manifest (canonical namespace flow un-registers itself)

docs: bundled swamp agent skill lacks datastore-namespace guidance (giga-swamp)

data query --select crashes on BigInt: "Do not know how to serialize a BigInt" when CEL size() reaches the JSON renderer

Intra-namespace write concurrency: whole-index sync under the lock serializes fan-out workloads (split from shipped #666)

Telemetry recoverOrphaned startup race with multiple replicas (created_at-based)

Telemetry retry/failed path has the same non-atomic claim as #820

Batch / prefix delete for swamp data delete (single lock acquisition)

Surface extension type+method detail in CLI to eliminate expensive discovery loops

Skill guides lack progressive reveal boundaries — agents over-read by 4x

Opt-in automatic garbage collection for datastore data

UAT: swamp workflow evaluate/run with forEach dynamic workflowIdOrName targets

Telemetry watcher has no replica coordination: N replicas double-process the same batch (non-atomic find→updateMany claim)

Telemetry drain still capped ~80-100/s in prod: per-username full-history re-aggregate is O(users) sequential per batch (deferred #817 fix #4)

Telemetry ingest is consumer-bound: counter & stats dedup via O(N) sequential insertOne, throughput stuck ~20 events/s regardless of BATCH_SIZE

Resolve dynamic workflow task targets inside forEach

Leaderboard and profile streak not reporting

Same-namespace writers fully serialize on the per-namespace lock — could maintenance/append writes avoid holding it?

Could method-summary report artifacts get a default retention cap? They grow to dominate the datastore manifest

Docs: update vault inspect output in manual reference

Execution cancellation: abort stuck workflow runs and model method runs, bulk cleanup, and daemon-restart reaping

Docs: document .? optional select for null-safe CEL data access

Optional scheduled / automatic datastore GC (retention-policy-driven pruning)

Notify issue author/participants on ripples & status changes — with Discord bot DM as a delivery channel

Batch step 2 of enrichAuthorPlans (per-collective subscription reads)

Datastore should fail fast on unresolvable credentials instead of stalling on the AWS provider chain

Add SWAMP CLUB wordmark logo next to sc-mark.png in TRADEMARKS.md

Add SWAMP CLUB wordmark logo next to sc-mark.png in TRADEMARKS.md

pushChanged does not implement absence-on-disk deletion (markDirty contract rule #2)

pushChanged does not implement absence-on-disk deletion (markDirty contract rule #2)

Add `vault delete` support to @swamp/aws-sm extension

Add `vault delete` support to @swamp/azure-kv extension

Add `vault delete` support to @swamp/1password extension

Leaderboard window baseline: 90-day cutoff zeroes returning-dormant users (latent, 0 impact today)

swamp data gc prunes the catalog but never deletes objects from S3 datastores (markDirty hook not wired) — sync manifest never shrinks

SKILL.md Common Commands: model type search uses wrong command and syntax

SKILL.md Common Commands: model create uses wrong @<type> prefix

swamp issue bug times out posting to the Lab while swamp-club.com returns HTTP 200

telemetry stats fatally fails to load an installed datastore extension (auto-resolve path); all other commands load it fine

tf plan: FETCH_BUNDLE PAGE_FETCH_ERR / NO_STATES on cleanup-only plan (no resource changes)

Add deleteResource to MethodContext and document dataRepository.delete in skills

Homebrew formula

Yank semantics inconsistent: all-versions-yanked acts as a free hidden/private extension; extension-level yank hard-blocks re-push

Extension search returns edit-distance noise for short queries ("asdl" → "AWS DEADLINE")

workflow resume holds the global lock across the resumed step, deadlocking any datastore op the step performs

Trajectory chart: current-day x-axis label is clipped at the right edge

Telemetry not synced to swamp-club: local queue accumulating ~3 days despite valid auth

extension pull serves a stale version that disagrees with search (honors a legacy per-extension serverUrl)

serve --webhook usage string makes <header> look optional for generic scheme

serve: webhook scheme not surfaced in startup event, health endpoint, or log line

Slack webhook pre-body gate only checks signature header, not timestamp

Dead code: verifySignature in webhook.ts superseded by verifier abstraction

extension source: install skills from source-path extensions

Data-driven webhook signature verifiers (avoid a code change + release per provider)

swamp.club extension view: multi-line code fence in manifest description renders each line as a separate inline code span

Single global datastore lock serializes unrelated writes across all repos/namespaces