Multi-Agent Coordination

Build agent swarms with real-time state sync, resource locking, and task handoffs. Coordinate multiple AI agents working on complex, interdependent tasks.

Plan Requirements

Multi-Agent Coordination features vary by plan. Task queues require Pro+, real-time events require Team+.

Overview

Multi-Agent Coordination enables multiple AI agents to work together on complex tasks. This includes:

  • Swarms — Groups of agents working toward a shared goal
  • Resource Locking — Prevent conflicts when modifying shared resources
  • Shared State — Synchronized key-value store across agents
  • Task Queues — Distributed work queues with claim/complete semantics
  • Task Monitoring — Stats, pagination, and incremental event updates
  • Real-time Events — Broadcast messages to other agents

Swarms

A swarm is a coordinated group of agents working together. Each swarm has a unique ID, shared state, and can coordinate via resource locks and task queues.

Creating a Swarm

rlm_swarm_create({
  name: "refactor-auth",
  goal: "Refactor authentication system to use JWT",
  maxAgents: 5
})
// Response:
{
  "swarmId": "swarm_abc123",
  "joinCode": "XKCD-1234"
}

Joining a Swarm

Other agents can join using the swarm ID or join code:

rlm_swarm_join({
  swarmId: "swarm_abc123",
  role: "worker"  // Optional: "coordinator", "worker", or "observer"
})

Swarm Roles

RoleDescriptionTypical Use
coordinatorPlans and assigns tasksSenior agent that decomposes work
workerExecutes assigned tasksAgents that write code, review, or test
observerRead-only accessMonitoring, logging

Listing Swarm Members

Inspect who is currently in the swarm, what role they hold, and which agents are available for more work.

const members = await rlm_swarm_members({
  swarmId: "swarm_abc123"
})

Leaving or Removing an Agent

Remove an agent when work is done or when a crashed worker needs cleanup. Leaving a swarm releases held claims and unassigns that agent's queued work.

await rlm_swarm_leave({
  swarmId: "swarm_abc123",
  agentId: "worker-3"
})

Resource Locking

Prevent multiple agents from modifying the same resource simultaneously using claims:

Claiming a Resource

// Agent wants to modify a file
const claim = await rlm_claim({
  resource: "file:src/auth/login.ts",
  swarmId: "swarm_abc123",
  ttl: 300  // 5 minute lock
})
if (claim.acquired) {
  // Safe to modify the file
  // ... do work ...
  await rlm_release({ claimId: claim.claimId })
} else {
  // Another agent has this resource
  console.log("Held by:", claim.heldBy)
}

Claim Parameters

ParameterTypeDescription
resourcestringResource identifier (e.g., "file:path", "db:table")
swarmIdstringSwarm context for the lock
ttlnumberLock timeout in seconds (max 3600)
waitbooleanWait for lock vs immediate fail
waitTimeoutnumberMax wait time in seconds

Shared State

Agents in a swarm can read and write shared key-value state:

// Set state
rlm_state_set({
  swarmId: "swarm_abc123",
  key: "current_phase",
  value: "implementation"
})
// Get state
const state = await rlm_state_get({
  swarmId: "swarm_abc123",
  key: "current_phase"
})
// state.value === "implementation"

Atomic Updates

For counters and other values that need atomic updates:

// Increment a counter atomically
rlm_state_set({
  swarmId: "swarm_abc123",
  key: "files_processed",
  operation: "increment",
  delta: 1
})

Polling Multiple State Keys

Poll several shared-state keys at once and receive only the values that changed since your last known versions.

const changed = await rlm_state_poll({
  swarmId: "swarm_abc123",
  keys: ["current_phase", "files_processed", "blocked_count"],
  lastVersions: { current_phase: 2, files_processed: 18 }
})

Task Queues

Distribute work across agents using task queues. Tasks are claimed, processed, and completed:

Creating Tasks

// Coordinator creates tasks
rlm_task_create({
  swarmId: "swarm_abc123",
  type: "implement",
  title: "Add JWT token generation",
  description: "Implement JWT signing in src/auth/jwt.ts",
  priority: "high",
  dependencies: []  // IDs of tasks that must complete first
})

Creating Tasks in Bulk

Seed a backlog quickly by creating many tasks in one call. This is useful after planning, decomposition, or importing a queue from another system.

await rlm_task_bulk_create({
  swarmId: "swarm_abc123",
  tasks: [
    { title: "Add JWT utilities", priority: 2 },
    { title: "Write auth migration tests", priority: 1 },
    { title: "Update rollout checklist", priority: 0 }
  ]
})

Claiming and Completing Tasks

// Worker claims next available task
const task = await rlm_task_claim({
  swarmId: "swarm_abc123",
  types: ["implement", "fix"]  // Task types this agent handles
})
if (task) {
  // Do the work...
  // Mark complete with result
  await rlm_task_complete({
    taskId: task.id,
    status: "completed",
    result: { filesModified: ["src/auth/jwt.ts"] }
  })
}

Task States

pending

Task created, waiting to be claimed

claimed

Agent is working on the task

completed

Task finished successfully

failed

Task failed (can be retried)

Unclaiming a Stuck Task

If an agent crashes or abandons a task after claiming it, return the task to pending so another worker can pick it up.

await rlm_task_unclaim({
  swarmId: "swarm_abc123",
  taskId: "task_dead_worker",
  reason: "worker process terminated before completion"
})

Recovering Stale Tasks in Batch

Use batch recovery to scan a swarm for claimed or in-progress tasks that have gone stale. Start with dryRun: true to preview what would be recovered.

const preview = await rlm_task_recover({
  swarmId: "swarm_abc123",
  stuckThresholdMinutes: 30,
  dryRun: true
})
await rlm_task_recover({
  swarmId: "swarm_abc123",
  stuckThresholdMinutes: 30,
  dryRun: false
})

Task Statistics

Get aggregated task counts for dashboards and progress tracking. Distinguishes between blocked (waiting on dependencies) and pending (ready to claim).

rlm_task_stats({ swarmId: "swarm_abc123" })
// Response:
{
  "done": 15,        // Completed
  "in_progress": 3,  // Currently claimed
  "blocked": 2,      // Waiting on dependencies
  "pending": 5,      // Ready to claim
  "failed": 1,
  "total": 26
}

Task List with Pagination

List tasks with cursor-based pagination for large task queues. Returns owner (agent who claimed/completed) and updated_at timestamp.

const page1 = await rlm_task_list({
  swarmId: "swarm_abc123",
  status: "completed",  // optional filter
  limit: 20
})
// Response: { tasks: [...], has_more: true, next_cursor: "..." }
// Get next page
const page2 = await rlm_task_list({
  swarmId: "swarm_abc123",
  cursor: page1.next_cursor
})

Snapshot Task Listing

Use rlm_tasks when you want a simpler filtered list without cursor-based iteration. It's a good fit for quick inspections and lightweight dashboards.

const openTasks = await rlm_tasks({
  swarmId: "swarm_abc123",
  status: "pending",
  assignedTo: "worker-2",
  limit: 25
})

Task Events (Incremental Updates)

Get task status change events since a timestamp. Use for incremental progress reports ("5 tasks completed since last check").

// Get events from last 15 minutes
const since = new Date(Date.now() - 15*60*1000).toISOString()
const events = await rlm_task_events({
  swarmId: "swarm_abc123",
  since: since
})
// Response:
// { events: [{ event_type: "task_completed", task_id: "...", ... }], total: 5 }
ToolUse Case
rlm_task_bulk_createSeed a swarm backlog in one call after planning or decomposition
rlm_tasksQuick snapshot of tasks filtered by status or assigned agent
rlm_task_statsDashboard metrics, completion %, health checks
rlm_task_listBuild task tables, export reports, iterate all tasks
rlm_task_eventsIncremental updates, "X tasks closed since last sync"
rlm_task_unclaimManually re-open a task after a worker crash or abandoned claim
rlm_task_recoverFind and recover stale claimed tasks in bulk with a dry-run option
rlm_task_updateAdmin-level title, description, priority, or status changes
rlm_task_reassignMove work to another agent when load balancing or recovering from failure
rlm_task_deleteRemove obsolete, test, or cancelled tasks from the queue

Updating a Task

Update queue metadata when the task itself changes. This is typically reserved for admins or coordinators.

await rlm_task_update({
  swarmId: "swarm_abc123",
  taskId: "task_auth_jwt",
  title: "Add JWT signing + verification",
  priority: 3
})

Reassigning a Task

Reassign a task when the current owner is overloaded, unavailable, or no longer the best fit. Use force only when you intentionally override an in-progress assignment.

await rlm_task_reassign({
  swarmId: "swarm_abc123",
  taskId: "task_auth_jwt",
  newAgentId: "worker-security",
  force: false
})

Deleting a Task

Delete obsolete or erroneous tasks from the queue. Keep this for cleanup and cancelled work rather than normal completion flow.

await rlm_task_delete({
  swarmId: "swarm_abc123",
  taskId: "task_bad_fixture",
  force: false
})

Real-time Events (Team+)

Broadcast messages to other agents in the swarm for real-time coordination:

// Broadcast event to swarm
rlm_broadcast({
  swarmId: "swarm_abc123",
  event: "phase_complete",
  data: {
    phase: "planning",
    tasksCreated: 12
  }
})

Event Types

EventDescriptionUse Case
task_availableNew task added to queueWake idle workers
phase_completeMajor milestone reachedCoordinate phase transitions
resource_releasedLock releasedAllow waiting agents to proceed
errorCritical error occurredAlert other agents
customApplication-specificAny coordination need

Querying Swarm Events

Use rlm_swarm_events to inspect prior broadcast activity, filter by sender or event type, and build lightweight event timelines.

const recentEvents = await rlm_swarm_events({
  swarmId: "swarm_abc123",
  eventType: "phase_complete",
  since: new Date(Date.now() - 60 * 60 * 1000).toISOString(),
  limit: 20
})

Example: Coordinated Refactoring

Here's a complete example of multiple agents working together to refactor a codebase:

Coordinator Agent
// Create swarm and plan work
const swarm = await rlm_swarm_create({
  name: "auth-refactor",
  goal: "Migrate from sessions to JWT"
})
// Create tasks for workers
await rlm_task_create({
  swarmId: swarm.swarmId,
  type: "implement",
  title: "Create JWT utilities",
  description: "Add sign/verify functions"
})
// ... more tasks ...
await rlm_broadcast({
  swarmId: swarm.swarmId,
  event: "tasks_ready",
  data: { count: 5 }
})
Worker Agent
// Join swarm and work
await rlm_swarm_join({
  swarmId: "swarm_abc123",
  role: "implementer"
})
while (true) {
  const task = await rlm_task_claim({
    swarmId: "swarm_abc123",
    types: ["implement"]
  })
  if (!task) break  // No more tasks
  // Claim resource lock before editing
  const lock = await rlm_claim({
    resource: `file:${task.targetFile}`,
    swarmId: "swarm_abc123"
  })
  // Do the implementation...
  await rlm_release({ claimId: lock.claimId })
  await rlm_task_complete({ taskId: task.id })
}

Plan Limits

FeatureStarterProTeamEnterprise
Swarms1520Unlimited
Agents/Swarm251550
Resource Locks10100500Unlimited
Task QueueYesYesYes
Real-time EventsYesYes

Best Practices

  • Start simple — Use shared state before adding task queues
  • Always release locks — Use try/finally to ensure release
  • Set appropriate TTLs — Prevent deadlocks from crashed agents
  • Use task dependencies — Ensure correct execution order
  • Monitor swarm state — Check swarm status via the dashboard or API
  • Handle claim failures gracefully — Retry or work on other tasks

Next Steps