You're on hold.
Let the agents entertain you.
Cat-Herding AI is a multi-agent chat backend I built to explore what happens when an LLM “customer-service queue” isn't elevator music — it's a rotation of specialist AI agents whose job is to keep you entertained until the thing you were actually waiting for arrives. Jokes. GIFs. YouTube picks. A round of 20 Questions. A bedtime story if it's that kind of night.
Sign in through the bubble in the corner and the hold-flow bootstraps. The agents pass you around between themselves, each staying in character, each aware of the shared conversation — a small but honest multi-agent system with handoffs, goal-seeking, and rich tool use. The sign-in itself runs against my Rust OAuth2 Server over OAuth2 + PKCE, entirely in the browser.
The scenario: 20 minutes on hold
Pretend you called support, or you're queued for a long-running AI task, or a batch job kicks off at 3am. You have dead time. The classic product answer is “show a spinner.” The better product answer is “give people something to do.” This demo is me asking: what if the hold experience was itself the product?
Concretely, when you sign in on this page the widget goes into mode: 'demo' and the backend runs its hold-flow bootstrap: a welcome, an introduction of the agents on shift, and a proactive opener from whichever agent matches the moment. Every other page on this site mounts the same widget in mode: 'lean', so it sits quietly in the corner without pinging you until you click it.
The agents on shift
Clean-ish one-liners, callbacks to earlier jokes in the conversation, won't run out.
Tool-driven: picks a curated video based on your vibe, embeds it inline via youtube-nocookie.
Runs 20 Questions, Would You Rather, trivia. Keeps score. Routes from ‘play a game’ intents.
Short interactive fiction. Will not hallucinate images — that was a bug fix (PR #172, if you're curious).
Reacts with a curated GIF when the vibe calls for one. Attachment flows through as an inline image.
Picks which agent speaks next, issues `handoff_event` messages, keeps the conversation coherent across personas.
Under the hood these are separate prompt+tool bundles behind a router. Streaming is Socket.IO; each token arrives individually so you see the response typed out. Handoffs surface as handoff_event frames and show up as a small “Transferring you toAgent Name…” message in the transcript.
Goal-seeking without a monolith
Each agent has its own lane — jokes, games, videos, stories — but the system as a whole is goal-seeking: keep the user engaged, and track any explicit goal they mention (“I just need a status update”, “I actually want to book a meeting”) so the right escalation path is always one intent away. The router and the per-agent tools make it possible to add a new persona without teaching every other agent about it.
I wrote about the broader pattern — small, cooperating, goal-aware agents instead of one giant prompt — in Goal-Seeking AI Architecture. This widget is one of the reference implementations.
What's going on under the hood
The widget runs the full Authorization Code + PKCE flow in the browser: popup opens roauth2.cat-herding.net, user signs in, the popup posts the code back, and a same-origin proxy on the chat backend exchanges it for a JWT. No client secret ships in the bundle.
Claude (via Azure AI Foundry) + OpenAI for routing, tools, and rich media. Each agent is a prompt + tool bundle; the orchestrator picks who speaks next and issues handoffs.
Rust OAuth2 server and chat backend both run in AKS behind Istio — managed certs, JWT-aware AuthorizationPolicies, RequestAuthentication binding the token audience to chat-backend.
The sign-in request path
- 1You click the floating chat bubble, then Sign in. The widget generates a PKCE verifier and challenge in the browser and opens a popup.
- 2Popup lands on roauth2.cat-herding.net/oauth/authorize. You authenticate (optionally via GitHub or Google federation) and approve the scopes.
- 3The auth server redirects the popup to chat.cat-herding.net/embed/callback.html with an authorization code. Callback postMessages the code back to the widget and closes.
- 4Widget POSTs {code, code_verifier} to a same-origin proxy on the chat backend that forwards to the issuer's /oauth/token and strips refresh_token before handing the JWT back.
- 5Widget stores the access token in sessionStorage, reconnects its Socket.IO client with the token in the handshake, and Istio RequestAuthentication validates the JWT against the in-cluster JWKS before forwarding to the chat backend. Hold flow starts.
Embed it anywhere
The widget is a ~20 KB gzipped IIFE that mounts inside a Shadow DOM so host CSS can't bleed in. The same bundle is running on every page of this site right now — in lean mode elsewhere, demo mode here. Drop it into any site:
<script src="https://chat.cat-herding.net/embed/cat-herding-chat.js" defer></script>
<script>
window.addEventListener('load', () => {
window.CatHerdingChat.init({
apiUrl: 'https://chat.cat-herding.net',
mode: 'lean',
auth: {
type: 'oauth2',
issuer: 'https://roauth2.cat-herding.net',
clientId: 'cat-herding-chat-embed',
scopes: 'openid profile email',
},
});
});
</script>