Skip to main content
← Back to list
01Issue
BugOpenSwamp CLI
AssigneesNone

Relationships

#636 model method run OOMs at 4GB V8 heap on long high-fan-out methods; non-configurable heap + crash leaves run stuck in "running"

Opened by john · 6/12/2026

Description

A swamp model method run (invoked as a workflow step) executing a long-running, high-fan-out extension method crashed with Fatal JavaScript out of memory: Ineffective mark-compacts near heap limit after ~11.5 hours. Memory (RSS/V8 old-space) climbed steadily to the ~4GB V8 ceiling across ~3.7M HTTP calls (1.24M objects x HEAD+GET+PUT). Three swamp-core gaps compounded:

  1. Heap ceiling is not configurable. The compiled binary runs with V8's default ~4GB old-space limit; scripts/compile.ts / deno.json pass no --max-old-space-size or --v8-flags, and there is no env passthrough. Long fan-out methods have a hard, untunable ceiling.

  2. No streaming/batched fan-out primitive. A single method execution processing ~1.24M items must hold its working set for the method's entire lifetime; there is no checkpoint/stream/batch API to bound memory across a long run.

  3. OOM leaves the run record wedged (no reaper). The datastore lock self-heals via heartbeat + TTL (verified reclaimed — swamp datastore lock status reports "no lock held" after the crash), but the workflow run record stays running permanently — there is no TTL/reaper on run status, so a crashed run is indistinguishable from a live one until manually corrected. (It does not block re-runs.)

Steps to reproduce

  1. Author an extension method that fans out over ~1M+ items, making per-item HTTP requests.
  2. Run it via swamp model method run / a workflow step against a real dataset.
  3. Watch RSS climb steadily; after enough iterations the process aborts with the V8 mark-compact OOM.
  4. Observe the run record remains in running afterward (no terminal status written).

Environment

  • swamp 20260527.193433.0-sha.ec40452d, macOS (darwin 25.5.0)
  • Workflow sync-events-archive-from-aws, method @swamp/digitalocean/space.sync_from_s3
  • ~1,245,694 source objects; crash at ~4GB heap after 11h35m

Suggested fixes

  • Expose a heap-size flag / env passthrough (e.g. --max-old-space-size) for model method run.
  • Provide a streaming/batched fan-out helper so methods need not retain their full working set.
  • On method-process death, reap the run: mark it failed (TTL/heartbeat on run status, mirroring the lock's self-healing) so stale running records don't persist.

Notes

The triggering extension also contributed a per-iteration leak (undrained PUT response bodies across ~1.2M requests), which we are fixing separately. The swamp-core asks above are about making such a run survivable, tunable, and observable rather than a silent 11.5h march to a hard ceiling.

02Bog Flow
OPENTRIAGEDIN PROGRESSSHIPPED

Open

6/12/2026, 9:23:46 AM

No activity in this phase yet.

03Sludge Pulse

Sign in to post a ripple.