Designed & built by Vladimir Kamenev burademirung@gmail.com (512) 336-9618 github.com/burademirung/serval
Deployed on Cloudflare · serval-orchestrator.burademirung.workers.dev

A multi-agent control plane for IT service management.

A supervisor agent decomposes each request and dispatches scoped specialist agents — Triage, Access-Review, Onboarding — that operate Serval's system of record over the Model Context Protocol. Every agent is a Cloudflare Durable Object. Watch it orchestrate, live.

How it works ↓
4
Coordinated agents
5
Durable Objects
12
Serval MCP tools
30
Best practices applied
opus·sonnet·haiku
Tiered Claude models
01

The orchestration, streaming live

Pick a scenario. The supervisor Durable Object plans, delegates over RPC, and the chosen specialists run their own Claude tool-loops against the Serval MCP backend — every decision and tool call streamed to your browser over Server-Sent Events.

orchestration trace · /api/run idle
Agents
Supervisor
opus-4-8
Triage
haiku-4-5
Access-Review
sonnet-4-6
Onboarding
sonnet-4-6

Live runs call Claude through Cloudflare AI Gateway. The backend is a faithful in-Worker mock of Serval (12 tools); flip one env var to point at the real Serval MCP server.

02

One supervisor, three specialists, a shared system of record

The hard part of a multi-agent system is coordination and context. Here, each agent is an independently-addressable Durable Object, so context isolation is enforced by the runtime — not by prompt discipline. The supervisor holds only references and distilled findings; specialists never leak their transcripts upward.

Supervisor Agent · Durable Object
plans · simplicity-gates · cheap routing via haiku · delegates via getAgentByName() RPC · synthesizes with opus-4-8
resilient parallel fan-out · Promise.allSettled · per-specialist 45 s timeout
Triage
haiku-4-5 · classify, prioritize, reply
Access-Review
sonnet-4-6 · evaluate JIT access
Onboarding
sonnet-4-6 · tickets, access, workflow
each runs its own Claude tool-loop · scoped to a least-privilege slice of tools · via this.mcp RPC transport
ServalMCP · McpAgent
12 tools over Streamable HTTP /mcp · stateful seeds · mock now ⇄ live Serval by env

Deployed on Cloudflare

One Worker · five Durable Objects · Claude via AI Gateway — no servers to manage, scales at the edge.

HTTPS guard /api/run fan-out this.mcp route · synthesize tool-loop HTTPS Browserconsole · EventSource Workerfetch · routingstatic assets (SPA) Access / Rate-limitCloudflare Access · WAFrecommended for prod SupervisorDurable Object · opus-4-8 Specialist AgentsTriage · Access-Review · Onboarding3 Durable Objects · haiku · sonnet ServalMCPMcpAgent · Durable Object12 tools · public /mcp AI Gatewaycache · retries · analytics Claudeopus · sonnet · haiku

solid = request path dashed = async / optionalbackend: mock ⇄ live via SERVAL_MODE

Engagement

System of engagement

A Worker routes /api/run to a Supervisor Durable Object that streams the run back as SSE. Static assets serve this page from the edge.

Reasoning

Tiered models

Opus 4.8 supervises and synthesizes; Sonnet 4.6 and Haiku 4.5 do the specialist work. Strong where it matters, cheap on the leaves.

Record

System of record

The mock Serval backend is a real remote MCP server — point MCP Inspector or Claude Desktop at /mcp and call its 12 tools directly.

03

The agent registry

Least privilege is real, not cosmetic: each specialist is handed only its slice of the Serval toolset before any tool reaches Claude. A specialist literally cannot see — and therefore cannot call — a tool outside its scope.

Agent
Model
Role
Scoped tools
Supervisor
claude-opus-4-8
Plan · delegate (RPC) · synthesize the final answer
Agent / delegate only
Triage
claude-haiku-4-5
Classify & prioritize tickets, draft replies
list_ticketsget_ticketupdate_ticketpost_message
Access-Review
claude-sonnet-4-6
Evaluate JIT access against deterministic policy
list_access_requestsget_access_requestget_userreview_access_request
Onboarding
claude-sonnet-4-6
New-hire flow: tickets, baseline access, workflow
get_usercreate_ticketcreate_access_requestlist_workflowsrun_workflow

Tools marked in red mutate state — they're flagged destructiveHint in MCP and traced on every call.

04

Everything used to build it

A deliberately current stack — the June-2026 frontier of agents, protocol, and edge platform, with every choice justified.

runtime
Cloudflare

Agents SDK on Durable Objects

Supervisor + specialists + the MCP backend are all Durable Objects. RPC, SQLite state, hibernation, and edge streaming come for free.

protocol
MCP · 2025-11-25

McpAgent backend

The Serval mock speaks the latest MCP — structuredContent results, readOnlyHint/destructiveHint annotations, and isError results the model can recover from.

model
Anthropic

Claude via AI Gateway

@anthropic-ai/sdk routed through Cloudflare AI Gateway: caching, retries, cost tracking, and reconnect buffering — with a hand-rolled, fully-traced tool loop.

orchestration
Pattern

Supervisor / workers

Dynamic decomposition, a 4-field delegation contract, a simplicity gate, cheap routing via Haiku for single-domain requests, and resilient fan-out via getAgentByName() + Promise.allSettled — one specialist failing never kills the run.

interface
Edge SSE

Streaming over the edge

The supervisor returns a ReadableStream of trace events — no duration limit on Workers — rendered live in this console.

quality
TDD

Typed & tested

Zod contracts at every boundary, pure tool operations unit-tested, scenario routing guarded, and a deterministic access policy with its own suite.

05

The best practices, and how each is implemented

Thirty researched practices from Anthropic's agent & context-engineering guidance, the MCP spec, OpenAI/LangChain operations, and the Cloudflare platform — every one mapped to a concrete mechanism in the code, not just an aspiration.

A Orchestration

Orchestrator–workers patternA supervisor dynamically decomposes each request and delegates to worker agents over RPC.
4-field delegation contractEvery task spec carries objective, output format, tool guidance, and explicit boundaries — the #1 fix for agents duplicating or missing work.
Simplicity gate + cheap routingA Haiku model (MODEL_ROUTER) classifies intent cheaply; single-domain requests skip fan-out entirely. Full fan-out only when work genuinely spans domains.
Effort scaled to complexityPer-agent step caps; a strong supervisor with cheaper specialist leaves.
Verbatim forwardingSpecialist findings pass through with fidelity where it matters — no telephone-game distortion.

B Context engineering

Runtime context isolationEach specialist is a separate Durable Object — isolation is enforced by the platform, not by prompting.
Lean supervisorIt holds only the plan and distilled findings — never a specialist's raw transcript or tool output.
Distilled returnsSpecialists return a one-paragraph finding plus references, not payloads.
Context-rot defensesCritical instructions at prompt edges; least-privilege tools keep each window small and high-signal.
Just-in-time dataAgents load Serval data through tools at runtime rather than pre-stuffing context.

C Tools & protocol

Faithful mockIdentical tool names, shapes, and error modes to real Serval — a true stand-in, swappable by env.
Structured tool outputstructuredContent results per MCP 2025-11-25, validated client-side with Zod.
Tool annotationsreadOnlyHint / destructiveHint on every tool so consumers show accurate intent.
Errors as results, not throwsisError results feed back so the agent self-corrects instead of crashing.
Curated, non-overlapping toolsMinimal per-specialist surface — no ambiguity for the model to trip on.

D Safety & security

Deterministic policy boundarydecideAccess() is enforced server-side in the tool — an agent can never be more permissive than policy.
Least-privilege scopingTools filtered to each specialist's allowlist before reaching the model.
Idempotent writesIdempotency keys make every create safe under retry.
Secrets handlingKeys live in Wrangler secrets / .dev.vars, never in code, logs, or the bundle.
Annotations treated as untrustedTool metadata never drives privileged behavior.

E Observability & eval

OTel-shaped tracingOne trace per run — agent, tool, and result events — the very stream this console renders.
Structured output validationZod-validated findings and results at every boundary, with graceful fallback.
End-state evaluationScenarios graded on the final Serval state, not on a brittle exact trajectory.
Routing contract testsCheap structural evals guard which specialists each scenario engages.
AI Gateway analyticsEvery model call is observable, cached, and retried at the gateway.

F Model & platform controls

Tiered model selectionOpus supervisor, Sonnet/Haiku specialists — model IDs in env, bumpable without code.
Graceful degradationFan-out uses Promise.allSettled with per-specialist 45 s timeouts — a failing specialist's error is surfaced and skipped; the run continues. Synthesis falls back to a deterministic merge if the model call fails.
Mock-now / real-readyA single SERVAL_MODE flip points the MCP client at real Serval.
Per-run isolationFresh specialist instances per run keep each demo clean and reproducible.
Zero-build edge deployWrangler bundles and ships the whole stack — Worker, Durable Objects, and this page.
06

The safety boundary is code, not a prompt

The headline claim of an agentic ITSM tool is that it grants access safely. We don't trust the model to honor that. The Access-Review agent's decisions pass through a deterministic policy inside the tool itself — so even a jailbroken model cannot approve production or admin access.

decideAccess() · enforced in review_access_request
● approve — low-risk read, active user ● escalate — admin or production grant ● deny — inactive requester
// An agent decision can never be more permissive than policy allows.
const verdict = decideAccess({ resource, scope, requesterActive, isProduction, isAdmin });
const enforced = RANK[agentDecision] > RANK[verdict.decision];
record(enforced ? verdict.decision : agentDecision);  // policy wins, always
07

How it was built

From a single URL to a deployed multi-agent system — research-first, spec-driven, and verified in the real app.

Research

Deep, fact-checked research

The product, the MCP API, the June-2026 agent & context-engineering frontier, and the Cloudflare platform — adversarially verified, then distilled into a design.

Design

Spec, then re-spec for Cloudflare

A full design spec — re-platformed when the Claude Agent SDK proved incompatible with Workers, onto the Cloudflare Agents SDK, McpAgent, and AI Gateway.

Build

Subagent-driven TDD

A phased plan executed task-by-task with two-stage review — every SDK shape verified against the installed packages, every boundary typed and tested.

Verify

Caught in the real app

A live run revealed the MCP client tags tools with a generated connection id, not the server name — silently hiding all 12 tools. Only running it for real surfaced it. Fixed, redeployed, verified.