Skip to main content
← Back to list
01Issue
FeatureClosedSwamp CLI
AssigneesNone

Relationships

#598 Add 'creek' extension kind for cross-querying external systems alongside swamp data

Opened by john · 6/8/2026

Summary

Today the only way for swamp to fetch data from an external system (Postgres, MySQL, a SaaS API, anything) is to author a model whose read/list method hits that system and writes the result back into the local Data catalog. That bakes external state into local storage and forces a write/persist step before the data can be queried — there's no way to express "join my swamp data with a row from select … from production_db.users at query time."

This proposes a new first-class extension contribution kind — creek — that lets authors ship a TypeScript module wrapping any external system and have end-users call its methods directly from CEL inside swamp data query. Creeks sit alongside models, vaults, drivers, datastores, reports in the extension manifest and are authored, bundled, and registered exactly like every other kind.

Two new built-in CEL functions make this bidirectional:

  • creek(type, method, args) — outbound (swamp → external) lookups
  • swamp.data(predicate, select?) — inbound (external → swamp) lookups

Both share a per-query memoisation cache, so a 1000-row predicate that calls creek("@me/jira", "issue", {key: name}) for 30 distinct names hits Jira 30 times, not 1000. swamp data query also gains --order-by / --order-direction so cross-source "latest updated" ordering works in one command.

The naming is intentional — creeks are tributaries that flow into a larger body of water, which is exactly what external data does to a swamp query. It also dodges the near-collision that datasource would have had with the existing datastore extension kind.

POC

Branch: feat/creek-cross-query

Latest commits:

  • b6b4d83d..a7ba6470 — feat(creeks): add creek extension kind for cross-querying external systems
  • 9c6dba92 — fix(creeks): wire findNearestDenoConfig into the kind adapter
  • d7408964 — docs(creeks): add Postgres + MySQL + swamp cross-query example

Worked example lives in examples/creek-cross-query/ and includes:

  • fixtures/docker-compose.yaml — Postgres on :5439, MySQL on :3309, both pre-seeded
  • extensions/creeks/postgres_creek.ts@example/postgres wrapping the customers + incidents tables
  • extensions/creeks/mysql_creek.ts@example/mysql wrapping a deploy_audits log
  • extensions/models/deployment_seeder.ts@example/deployment so there's swamp-side data to join against
  • README.md walking through five cross-query examples, all verified to work end-to-end

The headline example — a single swamp data query joining swamp + Postgres

  • MySQL — looks like this:
dataType == \"resource\"
&& attributes.service == \"api\"
&& creek(\"@example/postgres\", \"customer\", {\"name\": attributes.customer}).plan == \"enterprise\"
&& creek(\"@example/mysql\", \"last_deploy_for\", {\"customer\": attributes.customer}).status == \"success\"

…with a --select that projects fields from all three sources into a single result row.

Scope of the POC

Done:

  • Creek domain layer (registry, definition, Proxy-wrapped CreekHandle, per-query memoisation cache)
  • Built-in @swamp/echo-creek test fixture
  • Extension kind adapter (Zod-validated, parallel to datastore)
  • Manifest + catalog + sources + repo-marker scaffolding (mirrors datastore)
  • libswamp services: creekDescribe, creekTypeSearch, creekCall
  • CLI commands: swamp creek type search, swamp creek describe, swamp creek call
  • CEL integration: creek(type, method, args) function + swamp namespace with .data(predicate, select?) + recursion depth guard (cap 3)
  • DataQueryService cross-query async path + --order-by + --order-direction
  • MethodContext.creek(type) for in-model code calls (Proxy-style receiver-method syntax)
  • Loader wiring in cli/mod.ts so user creeks in extensions/creeks/*.ts load at CLI startup
  • Tests: ~50 new unit + integration tests; full project suite (6770+ tests) passes with no regressions

Intentionally deferred (not blocking the cross-query runtime):

  • Full extension publishing pipeline through swamp extension push/pull/ doctor for creeks (resolve_extension_files, push.ts archive layout, doctor reconcile). Creeks load from local extensions/creeks/*.ts today; registry round-trips are the follow-up.

One DX wrinkle worth flagging

cel-js requires pre-registered method signatures for receiver dispatch, but creek method names are author-defined and unknown at registration time. So the CEL surface uses a 3-arg form:

creek(\"@me/jira\", \"issue\", {\"key\": name}).status == \"open\"

…rather than the receiver-style creek(\"@me/jira\").issue({key: name}).status from the original sketch. The TS programmatic interface on MethodContext keeps the receiver style via a Proxy, so model authors writing in TS get ctx.creek(\"@me/jira\").issue({key: \"FOO-1\"}). Open to feedback if the CEL shape should change.

Open trade-offs documented in the POC

  • Per-query memoisation only — no cross-query TTL cache in v1. Vault invalidation, identity changes, and key salting all need UX before that lands.
  • Output validation via optional returns: z.ZodTypeAny is warn-by-default, opt-in strictReturns: true per method. Strict-by-default would mean one bad API response fails every row that references it.
  • No concurrency cap in v1 — a 1000-row predicate fires 1000 concurrent outbound calls. Documented gotcha; the v2 hook is an optional batch(args[], ctx) method.

Happy to chat through any of the above and tighten the POC before promoting it.

02Bog Flow
OPENTRIAGEDIN PROGRESSCLOSED

Closed

6/8/2026, 10:15:12 PM

No activity in this phase yet.

03Sludge Pulse

Sign in to post a ripple.