design: thread-based agent scheduling with named workers #19

Closed
opened 2026-03-28 14:36:42 +00:00 by dev-bot · 0 comments
Collaborator

Context

The current cron-based scheduler runs each agent type independently every 5 minutes. This doesn't model the real constraints:

  • Anthropic API: stateless, fast context, but OAuth concurrency limits and rate limits
  • Local llama: stateful KV cache, slow context load, unlimited tokens. Context switches between agents thrash the cache — a 100k token dev session gets evicted when a reviewer loads, then must reload from scratch.
  • Cron environment: doesn't inherit Docker compose env vars (FORGE_TOKEN, FORGE_URL), causing silent failures that are hard to diagnose

Design

Three concepts

Thread — an inference slot tied to a backend with its own concurrency rules.

  • Anthropic thread: 1 slot, agents can overlap (stateless, fast context switch)
  • Llama thread: N slots (matching -parallel), agents within a slot can NOT overlap (sticky sessions, protect KV cache)
  • Each thread runs its own scheduling loop (started from entrypoint, inherits full container env — no more cron env issues)

Agent — a named worker with a Forgejo account and a profile repo (#761).

  • Has identity: name, model, capabilities, role (dev, review, gardener, etc.)
  • Created via disinto hire-an-agent (creates Forgejo user, seeds profile repo)
  • Profile repo holds persistent memory, learned patterns, track record
  • Agents are assigned to threads by the operator

Thread loop — the scheduler per inference source.

  • Replaces cron entries with a while sleep loop started from entrypoint
  • Inherits all compose env vars naturally (solves the cron env problem)
  • Scans for work (backlog issues, unreviewed PRs, gardener schedule)
  • Picks a free agent assigned to this thread
  • Respects thread concurrency rules (sticky for llama, overlapping for API)
  • Sends agent off, tracks PID, reaps on completion

Constraints

  • Never pair local dev with local review (quality too low). Local dev needs strong API review.
  • Llama threads: finish current session before switching agents (sticky — protect KV cache)
  • Anthropic threads: fire-and-forget, short sessions, respect rate limits
  • Operator manually assigns agents to threads (controls which models do what)

Agent lifecycle

disinto hire-an-agent "dev-qwen" --type dev --model qwen3.5 --thread llama
  1. Creates Forgejo user dev-qwen with API token
  2. Creates profile repo dev-qwen/dev-qwen-profile
  3. Seeds profile with role, model, capabilities
  4. Registers agent in thread config

Thread config (in project TOML or factory config)

[threads.anthropic]
backend = "anthropic"
max_concurrent = 1
overlap = true    # agents can share the slot
agents = ["review-sonnet", "planner-sonnet"]

[threads.llama]
backend = "local"
base_url = "http://localhost:8080/v1"
max_concurrent = 2
overlap = false   # sticky — one agent per slot until done
agents = ["dev-qwen", "gardener-qwen"]

Status and visibility

Agent names appear in:

  • Logs: [dev-qwen #11] running implementation
  • PR titles/comments: created by dev-qwen, reviewed by review-sonnet
  • disinto status: shows which thread, which agent, which issue

What replaces cron

The entrypoint starts one loop per thread:

# Instead of cron
run_thread "anthropic" &
run_thread "llama" &
wait

Each run_thread is a bash loop that:

  1. Checks available slots
  2. Scans for work matching its agents' roles
  3. Spawns an agent if slot is free and work exists
  4. Sleeps (configurable interval per thread)
  5. Reaps finished agents, frees slots

Migration path

  1. Implement thread loops alongside cron (feature flag)
  2. Move dev + review to thread-based scheduling
  3. Once stable, remove cron entries
  4. Implement hire-an-agent for named workers
  5. Connect to agent profile repos (#761)
  • #761 — Agent profile repos (identity, memory, track record)
  • Current cron issues: env vars not inherited, FORGE_TOKEN empty, FORGE_URL clobbered, silent failures
## Context The current cron-based scheduler runs each agent type independently every 5 minutes. This doesn't model the real constraints: - **Anthropic API**: stateless, fast context, but OAuth concurrency limits and rate limits - **Local llama**: stateful KV cache, slow context load, unlimited tokens. Context switches between agents thrash the cache — a 100k token dev session gets evicted when a reviewer loads, then must reload from scratch. - **Cron environment**: doesn't inherit Docker compose env vars (FORGE_TOKEN, FORGE_URL), causing silent failures that are hard to diagnose ## Design ### Three concepts **Thread** — an inference slot tied to a backend with its own concurrency rules. - Anthropic thread: 1 slot, agents can overlap (stateless, fast context switch) - Llama thread: N slots (matching `-parallel`), agents within a slot can NOT overlap (sticky sessions, protect KV cache) - Each thread runs its own scheduling loop (started from entrypoint, inherits full container env — no more cron env issues) **Agent** — a named worker with a Forgejo account and a profile repo (#761). - Has identity: name, model, capabilities, role (dev, review, gardener, etc.) - Created via `disinto hire-an-agent` (creates Forgejo user, seeds profile repo) - Profile repo holds persistent memory, learned patterns, track record - Agents are assigned to threads by the operator **Thread loop** — the scheduler per inference source. - Replaces cron entries with a `while sleep` loop started from entrypoint - Inherits all compose env vars naturally (solves the cron env problem) - Scans for work (backlog issues, unreviewed PRs, gardener schedule) - Picks a free agent assigned to this thread - Respects thread concurrency rules (sticky for llama, overlapping for API) - Sends agent off, tracks PID, reaps on completion ### Constraints - Never pair local dev with local review (quality too low). Local dev needs strong API review. - Llama threads: finish current session before switching agents (sticky — protect KV cache) - Anthropic threads: fire-and-forget, short sessions, respect rate limits - Operator manually assigns agents to threads (controls which models do what) ### Agent lifecycle ``` disinto hire-an-agent "dev-qwen" --type dev --model qwen3.5 --thread llama ``` 1. Creates Forgejo user `dev-qwen` with API token 2. Creates profile repo `dev-qwen/dev-qwen-profile` 3. Seeds profile with role, model, capabilities 4. Registers agent in thread config ### Thread config (in project TOML or factory config) ```toml [threads.anthropic] backend = "anthropic" max_concurrent = 1 overlap = true # agents can share the slot agents = ["review-sonnet", "planner-sonnet"] [threads.llama] backend = "local" base_url = "http://localhost:8080/v1" max_concurrent = 2 overlap = false # sticky — one agent per slot until done agents = ["dev-qwen", "gardener-qwen"] ``` ### Status and visibility Agent names appear in: - Logs: `[dev-qwen #11] running implementation` - PR titles/comments: created by dev-qwen, reviewed by review-sonnet - `disinto status`: shows which thread, which agent, which issue ### What replaces cron The entrypoint starts one loop per thread: ```bash # Instead of cron run_thread "anthropic" & run_thread "llama" & wait ``` Each `run_thread` is a bash loop that: 1. Checks available slots 2. Scans for work matching its agents' roles 3. Spawns an agent if slot is free and work exists 4. Sleeps (configurable interval per thread) 5. Reaps finished agents, frees slots ### Migration path 1. Implement thread loops alongside cron (feature flag) 2. Move dev + review to thread-based scheduling 3. Once stable, remove cron entries 4. Implement `hire-an-agent` for named workers 5. Connect to agent profile repos (#761) ## Related - #761 — Agent profile repos (identity, memory, track record) - Current cron issues: env vars not inherited, FORGE_TOKEN empty, FORGE_URL clobbered, silent failures
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: johba/disinto#19
No description provided.