Skip to main content
← Back to list
01Issue
BugClosedSwamp CLI
Assigneesstack72

Relationships

#708 Eager render stack (Ink/yoga-layout + marked-terminal) adds ~540ms to every command's startup

Opened by magistr · 6/19/2026

Summary

The terminal-render stack — Ink/React (+ the yoga-layout WASM engine) and marked-terminal (+ highlight.js) — is statically imported through the command tree, so it loads on every invocation, including commands that render no TUI and no markdown (--version, config list, model list, anything with --json, all non-interactive *search). It costs ~540 ms (~27%) of startup that the vast majority of commands never use.

Impact

Every non-rendering subcommand pays ~540 ms of import/instantiate cost for code it doesn't execute. Since the per-command telemetry tax is now small, this eager render stack is the single largest reducible component of the startup floor — and it's paid on essentially every invocation.

Evidence

1. Startup CPU attribution (deno run --v8-flags=--prof main.ts --version, node --prof-process, ~551 in-V8 ticks):

dep share of in-V8 startup pulled in by
yoga-layout (WASM, instantiated at import) ~41% (228 ms) Ink
highlight.js (registers ~190 languages) ~26% (145 ms) marked-terminal
parse5 + string-width + cli-highlight + cli-table3 + react + marked ~13% render stack

~80% of in-V8 startup CPU is the render stack.

2. Module-stub POC — aliasing the four entry packages (ink, react, marked, marked-terminal) to no-op stubs in deno.json (dropping their transitive closure from the loaded graph):

--version median npm modules in graph
baseline (render stack eager) 1963 ms 201
render stack removed 1424 ms 117
delta −539 ms (−27%) −84 modules

--version still printed correctly and exited 0 under the stubs, so this is the genuine cost of loading a stack --version never uses, not a crash-fast artifact. Stubbed runs were also far more stable (no multi-second tail).

Root cause

  • src/presentation/markdown_renderer.ts statically imports marked + marked-terminal (and instantiates them at module eval), and is pulled in by ~8 renderers.
  • The Ink .tsx renderers statically import ink/react; ink instantiates the yoga-layout WASM engine at import time (~228 ms) regardless of whether any TUI is shown.
  • All of these are reachable statically from the command tree built in src/cli/mod.ts, so they load for every command.

Proposed fix

Lazy-load the render stack so only commands that actually render pay for it (once, on first render):

  • Split each Ink .tsx renderer from its command definition; await import("./<renderer>.tsx") inside the .action() handler, gated on interactive-TTY mode (not --json, isatty). Keeps the Cliffy command tree light.
  • Make markdown_renderer.ts's render functions async and dynamic-import() marked/marked-terminal on first use.

Estimated recovery: ~540 ms (~27%) off startup for the ~80% of invocations that render neither a TUI nor markdown.

Environment

  • swamp current main (and shipped 20260619); the eager static imports are present in main
  • macOS, Intel x86_64
  • Measured from source via deno run (warm cache); the shipped binary embeds a snapshot but reaches the same ~1.9 s floor
02Bog Flow
OPENTRIAGEDIN PROGRESSCLOSED+ 1 MOREASSIGNED+ 2 MOREREVIEW

Closed

6/23/2026, 9:16:22 PM

No activity in this phase yet.

03Sludge Pulse
stack72 assigned stack726/23/2026, 8:56:25 PM
Editable. Press Enter to edit.

stack72 commented 6/23/2026, 9:16:21 PM

Thanks for the thorough analysis here @magistr — the profiling, the stub POC, and the module attribution breakdown are all really well done.

After looking at this, we've decided not to pursue this change. The render stack is deeply woven through the command tree (23 command files, 39 .tsx renderers, plus the markdown renderer chain), and the approaches available to us — dynamic imports, command tree restructuring, or swapping out the underlying libraries — all carry significant complexity and maintenance burden that we don't think justifies the payoff.

The ~540ms cost is real from source via deno run, but the compiled binary (which is what most users run) mitigates much of this through V8 snapshots. Our profiling was on Apple Silicon where the compiled binary runs in ~400ms total — we acknowledge your measurements on Intel x86_64 may tell a different story, but even so, the available fixes aren't a path we want to go down. We'd rather invest effort elsewhere.

Closing this one out — appreciate the detailed write-up.

Sign in to post a ripple.