design: thread-based agent scheduling with named workers #19

New issue

Closed

opened 2026-03-28 14:36:42 +00:00 by dev-bot · 0 comments

dev-bot commented

2026-03-28 14:36:42 +00:00

Collaborator

Context

The current cron-based scheduler runs each agent type independently every 5 minutes. This doesn't model the real constraints:

Anthropic API: stateless, fast context, but OAuth concurrency limits and rate limits
Local llama: stateful KV cache, slow context load, unlimited tokens. Context switches between agents thrash the cache — a 100k token dev session gets evicted when a reviewer loads, then must reload from scratch.
Cron environment: doesn't inherit Docker compose env vars (FORGE_TOKEN, FORGE_URL), causing silent failures that are hard to diagnose

Design

Three concepts

Thread — an inference slot tied to a backend with its own concurrency rules.

Anthropic thread: 1 slot, agents can overlap (stateless, fast context switch)
Llama thread: N slots (matching -parallel), agents within a slot can NOT overlap (sticky sessions, protect KV cache)
Each thread runs its own scheduling loop (started from entrypoint, inherits full container env — no more cron env issues)

Agent — a named worker with a Forgejo account and a profile repo (#761).

Has identity: name, model, capabilities, role (dev, review, gardener, etc.)
Created via disinto hire-an-agent (creates Forgejo user, seeds profile repo)
Profile repo holds persistent memory, learned patterns, track record
Agents are assigned to threads by the operator

Thread loop — the scheduler per inference source.

Replaces cron entries with a while sleep loop started from entrypoint
Inherits all compose env vars naturally (solves the cron env problem)
Scans for work (backlog issues, unreviewed PRs, gardener schedule)
Picks a free agent assigned to this thread
Respects thread concurrency rules (sticky for llama, overlapping for API)
Sends agent off, tracks PID, reaps on completion

Constraints

Never pair local dev with local review (quality too low). Local dev needs strong API review.
Llama threads: finish current session before switching agents (sticky — protect KV cache)
Anthropic threads: fire-and-forget, short sessions, respect rate limits
Operator manually assigns agents to threads (controls which models do what)

Agent lifecycle

disinto hire-an-agent "dev-qwen" --type dev --model qwen3.5 --thread llama

Creates Forgejo user dev-qwen with API token
Creates profile repo dev-qwen/dev-qwen-profile
Seeds profile with role, model, capabilities
Registers agent in thread config

Thread config (in project TOML or factory config)

[threads.anthropic]
backend = "anthropic"
max_concurrent = 1
overlap = true    # agents can share the slot
agents = ["review-sonnet", "planner-sonnet"]

[threads.llama]
backend = "local"
base_url = "http://localhost:8080/v1"
max_concurrent = 2
overlap = false   # sticky — one agent per slot until done
agents = ["dev-qwen", "gardener-qwen"]

Status and visibility

Agent names appear in:

Logs: [dev-qwen #11] running implementation
PR titles/comments: created by dev-qwen, reviewed by review-sonnet
disinto status: shows which thread, which agent, which issue

What replaces cron

The entrypoint starts one loop per thread:

# Instead of cron
run_thread "anthropic" &
run_thread "llama" &
wait

Each run_thread is a bash loop that:

Checks available slots
Scans for work matching its agents' roles
Spawns an agent if slot is free and work exists
Sleeps (configurable interval per thread)
Reaps finished agents, frees slots

Migration path

Implement thread loops alongside cron (feature flag)
Move dev + review to thread-based scheduling
Once stable, remove cron entries
Implement hire-an-agent for named workers
Connect to agent profile repos (#761)

#761 — Agent profile repos (identity, memory, track record)
Current cron issues: env vars not inherited, FORGE_TOKEN empty, FORGE_URL clobbered, silent failures

## Context The current cron-based scheduler runs each agent type independently every 5 minutes. This doesn't model the real constraints: - **Anthropic API**: stateless, fast context, but OAuth concurrency limits and rate limits - **Local llama**: stateful KV cache, slow context load, unlimited tokens. Context switches between agents thrash the cache — a 100k token dev session gets evicted when a reviewer loads, then must reload from scratch. - **Cron environment**: doesn't inherit Docker compose env vars (FORGE_TOKEN, FORGE_URL), causing silent failures that are hard to diagnose ## Design ### Three concepts **Thread** — an inference slot tied to a backend with its own concurrency rules. - Anthropic thread: 1 slot, agents can overlap (stateless, fast context switch) - Llama thread: N slots (matching `-parallel`), agents within a slot can NOT overlap (sticky sessions, protect KV cache) - Each thread runs its own scheduling loop (started from entrypoint, inherits full container env — no more cron env issues) **Agent** — a named worker with a Forgejo account and a profile repo (#761). - Has identity: name, model, capabilities, role (dev, review, gardener, etc.) - Created via `disinto hire-an-agent` (creates Forgejo user, seeds profile repo) - Profile repo holds persistent memory, learned patterns, track record - Agents are assigned to threads by the operator **Thread loop** — the scheduler per inference source. - Replaces cron entries with a `while sleep` loop started from entrypoint - Inherits all compose env vars naturally (solves the cron env problem) - Scans for work (backlog issues, unreviewed PRs, gardener schedule) - Picks a free agent assigned to this thread - Respects thread concurrency rules (sticky for llama, overlapping for API) - Sends agent off, tracks PID, reaps on completion ### Constraints - Never pair local dev with local review (quality too low). Local dev needs strong API review. - Llama threads: finish current session before switching agents (sticky — protect KV cache) - Anthropic threads: fire-and-forget, short sessions, respect rate limits - Operator manually assigns agents to threads (controls which models do what) ### Agent lifecycle ``` disinto hire-an-agent "dev-qwen" --type dev --model qwen3.5 --thread llama ``` 1. Creates Forgejo user `dev-qwen` with API token 2. Creates profile repo `dev-qwen/dev-qwen-profile` 3. Seeds profile with role, model, capabilities 4. Registers agent in thread config ### Thread config (in project TOML or factory config) ```toml [threads.anthropic] backend = "anthropic" max_concurrent = 1 overlap = true # agents can share the slot agents = ["review-sonnet", "planner-sonnet"] [threads.llama] backend = "local" base_url = "http://localhost:8080/v1" max_concurrent = 2 overlap = false # sticky — one agent per slot until done agents = ["dev-qwen", "gardener-qwen"] ``` ### Status and visibility Agent names appear in: - Logs: `[dev-qwen #11] running implementation` - PR titles/comments: created by dev-qwen, reviewed by review-sonnet - `disinto status`: shows which thread, which agent, which issue ### What replaces cron The entrypoint starts one loop per thread: ```bash # Instead of cron run_thread "anthropic" & run_thread "llama" & wait ``` Each `run_thread` is a bash loop that: 1. Checks available slots 2. Scans for work matching its agents' roles 3. Spawns an agent if slot is free and work exists 4. Sleeps (configurable interval per thread) 5. Reaps finished agents, frees slots ### Migration path 1. Implement thread loops alongside cron (feature flag) 2. Move dev + review to thread-based scheduling 3. Once stable, remove cron entries 4. Implement `hire-an-agent` for named workers 5. Connect to agent profile repos (#761) ## Related - #761 — Agent profile repos (identity, memory, track record) - Current cron issues: env vars not inherited, FORGE_TOKEN empty, FORGE_URL clobbered, silent failures