Loom Stack - durable async agent loops

Python · async · local-first

Loom Stack

Canonical hub for five Loom projects: stack-safe loops, durable runtime, offline traces, and proof apps. Each layer does one job. Wire them together when you need durability and inspectability — not a planner, not memory, not AGI.

Explore packages Dev showcase Ops product What you can build Quick start Ecosystem map

How the stack fits together

loom-tailcalls  →  loom-runner  →  flow-xray
  primitive          runtime       microscope
       \                \             \
        \                +---- loom-run / loom-ops proof apps
         +---- loom-stack docs hub
  1. Write explicit state transitions with stack-safe @tailrec.
  2. Run with SQLite checkpoint/resume, retries, and CLI inspection.
  3. Debug locally — one HTML file, no hosted tracing account.

Packages

loom-tailcalls

Stack-safe async transitions

Turn tail-position return await fn(...) into an async while loop. Keep the state-machine shape; lose the stack growth.

  • @tailrec and @tailstream
  • explain_tailcalls() for rejected shapes
  • Python 3.11+

loom-runner

Checkpoint, resume, inspect

Durable async transition runner on SQLite. Idempotent managed tool calls, retry policies, and CLI commands to explain a run after the fact.

  • run / resume / explain
  • history, attempts, tool-calls
  • --trace trace.html via flow-xray

flow-xray

Local HTML execution traces

See LLM calls, tool calls, branches, errors, tokens, and cost in one offline HTML file. Use standalone or through loom-runner.

  • @trace decorator
  • Single-file HTML export
  • No cloud account

loom-run

Dev stack showcase

Runnable reference product: durable chat agent on your repo, multi-agent supervisor, MCP tools, mock LLM for CI. Wires tailcalls + runner + flow-xray end-to-end.

  • loom-run chat / supervise
  • Checkpoint/resume + explain
  • --trace trace.html

loom-ops

Ops / runbook product

Incident and deploy runbooks: planner → parallel executor + verifier, HITL approve, allowlisted shell, workspace memory. Same loom-runner durability as loom-run.

  • loom-ops supervise / runbook
  • HITL approve + audit trail
  • Optional Telegram gateway

MCP adapters (docs-memory, rule-based-verifier, ops-runtime): ECOSYSTEM.md

Which product?

  • loom-run — dev assistant: read repo, search, tests.
  • loom-ops — ops runbooks: incident response, approve gates, audit.
  • loom-runner — library only: embed checkpoint/resume in your app.

Shared runtime

Both products use loom-runner + flow-xray. Choose the CLI and tools for your domain; do not mix dev chat with production runbooks in one binary.

Who it is for

  • Authors of long-running async agent loops who need checkpoint/resume without building their own store.
  • Teams that want inspectable runs (explain, attempt history) instead of a black box.
  • Indie builders who want a small Python slice before adopting a full orchestration platform.

Who it is not for

  • Single LLM call + single tool call workflows.
  • Teams already happy with LangGraph, Temporal, or similar persistence.
  • Anyone looking for reasoning, planning, memory, or an AGI product.

What changed

Nothing here was physically impossible before — but it used to mean Temporal, LangGraph, hosted tracing, or months of custom infra. Loom stack makes a local-first durable agent runtime realistic for one builder.

ProblemBeforeWith Loom stack
100k-step loopRecursionError or messy manual while@tailrec (tailcalls)
Crash on step 47Start overloom-runner resume
What happened?Log grep or cloud trace accountexplain + local HTML (flow-xray)
Retries duplicate side effectsRoll your own idempotencyRunContext.call_tool
Everything offlineHard to assembleThree pip packages, SQLite, one HTML file

What you can build

The stack is a runtime primitive, not a finished product. These are concrete directions people can ship on top — without AGI claims.

Resumable local agent

Hours or days · crash-safe · explainable

LLM + tools on Ollama or an API. Run until budget or completion; kill the process; resume later. After an incident: explain plus --trace trace.html.

Before: custom checkpoint store, retries, and debug tooling.
Now: loom-run showcase or glue model + tools onto AgentRunner. See also demo-loom-flow.

Inspectable automation

Not an LLM · long pipelines · no cloud

Poll APIs, transform, notify, repeat — as an explicit state machine. Every transition checkpointed; history and attempts when something breaks at 3am.

Before: Celery, cron scripts, or hope it does not crash.
Now: same @tailrec loop + loom-runner.

Agent regression lab

CI gate · compare runs · local traces

Golden scenarios with mock or real models. Compare checkpoint history and HTML traces across prompt or code changes — a verifiable gate, not vibes.

Before: bespoke harness + hosted observability.
Now: pytest + loom-runner + flow-xray in CI.

Audit-friendly runner

Every step stored · offline explain

Workflows where you must show what ran, how many times, and why it failed — SQLite checkpoints plus CLI inspection, no enterprise orchestration platform required.

Before: workflow engine procurement.
Now: narrow vertical on loom-runner.

Intermittent / offline loops

Bad network · checkpoint · continue later

Field sync, protocol sessions, or batch jobs that cannot assume a stable process or connection. Durable state machines that are not tied to LLMs at all.

Before: hand-rolled persistence per project.
Now: tailcalls shape + runner durability.

loom-ops — runbook ops agent

Product fork for incident/deploy automation where checkpoint/resume and audit matter more than 24/7 chat. Built on the same stack as loom-run.

Repo: github.com/kroq86/loom-ops · Stack map: docs/STACK.md

loom-ops supervise  →  planner + executor∥verifier  →  loom-runner  →  --trace trace.html

For dev-repo chat and MCP verifier tools, use loom-run instead.

loom-run — dev showcase

Reference product that wires the stack into a resumable chat agent — coordinator step, mock/OpenAI LLM, local tools, optional MCP, multi-agent supervisor (v0.2).

The smallest stack that lets a Python agent run for days, survive crashes, and show you exactly what happened — on your machine.

Official showcase: github.com/kroq86/loom-run

loom-run chat  →  coordinator (LLM + tools)  →  loom-runner  →  --trace trace.html

Extended kroq86 stack (optional MCP): agents_architecture pattern, rule-based-verifier, mcp-docs-memory.

Not a full port of agents_architecture — a portable shell with local fallbacks when MCP is unset.

Not a full agent framework

LangGraph / Temporal / …Loom stack
ScopeOrchestration, graphs, infraFocused primitives you compose
PhilosophyPlatformLibraries
TracingOften hostedLocal HTML (flow-xray)
ClaimOften broadRuntime shape + durability + inspect

Quick start

# 1. Stack-safe loop shape
pip install loom-tailcalls

# 2. Durable run + inspect
pip install loom-runner

# 3. Optional local trace
pip install flow-xray

# Example (loom-runner)
loom-runner run examples/counter_agent.py --run-id demo --db runs.sqlite --max-steps 5
loom-runner resume examples/counter_agent.py --run-id demo --db runs.sqlite
loom-runner explain examples/counter_agent.py --run-id demo --db runs.sqlite
loom-runner run examples/counter_agent.py --run-id demo --trace trace.html

# Dev showcase (loom-run)
loom-run chat "hello" --run-id demo --db runs.sqlite --mock-llm
loom-run supervise "hello" --run-id team --db runs.sqlite --mock-llm

# Ops product (loom-ops)
loom-ops supervise "incident: API latency" --run-id inc-001 --db ops.sqlite --mock-llm

Integration lab in demo-loom-flow combines tailcalls + flow-xray (+ optional Ollama). Dev demo: loom-run. Ops runbooks: loom-ops.

AGI and marketing

There is no direct AGI claim. Autonomous systems are modeled as long state machines: think → act → observe → checkpoint → resume. Loom stack covers shape, durability, and local observability for that pattern — building blocks for agent runtimes, not a path to superintelligence.