Live demo

You're on hold.
Let the agents entertain you.

Cat-Herding AI is a multi-agent chat backend I built to explore what happens when an LLM “customer-service queue” isn't elevator music — it's a rotation of specialist AI agents whose job is to keep you entertained until the thing you were actually waiting for arrives. Jokes. GIFs. YouTube picks. A round of 20 Questions. A bedtime story if it's that kind of night.

Sign in through the bubble in the corner and the hold-flow bootstraps. The agents pass you around between themselves, each staying in character, each aware of the shared conversation — a small but honest multi-agent system with handoffs, goal-seeking, and rich tool use. The sign-in itself runs against my Rust OAuth2 Server over OAuth2 + PKCE, entirely in the browser.

The scenario: 20 minutes on hold

Pretend you called support, or you're queued for a long-running AI task, or a batch job kicks off at 3am. You have dead time. The classic product answer is “show a spinner.” The better product answer is “give people something to do.” This demo is me asking: what if the hold experience was itself the product?

Concretely, when you sign in on this page the widget goes into mode: 'demo' and the backend runs its hold-flow bootstrap: a welcome, an introduction of the agents on shift, and a proactive opener from whichever agent matches the moment. Every other page on this site mounts the same widget in mode: 'lean', so it sits quietly in the corner without pinging you until you click it.

The agents on shift

Joke Teller

Clean-ish one-liners, callbacks to earlier jokes in the conversation, won't run out.

YouTube Guru

Tool-driven: picks a curated video based on your vibe, embeds it inline via youtube-nocookie.

Game Host

Runs 20 Questions, Would You Rather, trivia. Keeps score. Routes from ‘play a game’ intents.

Story Teller

Short interactive fiction. Will not hallucinate images — that was a bug fix (PR #172, if you're curious).

GIF Buddy

Reacts with a curated GIF when the vibe calls for one. Attachment flows through as an inline image.

Orchestrator

Picks which agent speaks next, issues `handoff_event` messages, keeps the conversation coherent across personas.

Under the hood these are separate prompt+tool bundles behind a router. Streaming is Socket.IO; each token arrives individually so you see the response typed out. Handoffs surface as handoff_event frames and show up as a small “Transferring you toAgent Name…” message in the transcript.

Goal-seeking without a monolith

Each agent has its own lane — jokes, games, videos, stories — but the system as a whole is goal-seeking: keep the user engaged, and track any explicit goal they mention (“I just need a status update”, “I actually want to book a meeting”) so the right escalation path is always one intent away. The router and the per-agent tools make it possible to add a new persona without teaching every other agent about it.

I wrote about the broader pattern — small, cooperating, goal-aware agents instead of one giant prompt — in Goal-Seeking AI Architecture. This widget is one of the reference implementations.

What's going on under the hood

OAuth2 + PKCE, all client-side

The widget runs the full Authorization Code + PKCE flow in the browser: popup opens roauth2.cat-herding.net, user signs in, the popup posts the code back, and a same-origin proxy on the chat backend exchanges it for a JWT. No client secret ships in the bundle.

Multi-LLM backend

Claude (via Azure AI Foundry) + OpenAI for routing, tools, and rich media. Each agent is a prompt + tool bundle; the orchestrator picks who speaks next and issues handoffs.

Kubernetes-native

Rust OAuth2 server and chat backend both run in AKS behind Istio — managed certs, JWT-aware AuthorizationPolicies, RequestAuthentication binding the token audience to chat-backend.

The sign-in request path

  1. 1You click the floating chat bubble, then Sign in. The widget generates a PKCE verifier and challenge in the browser and opens a popup.
  2. 2Popup lands on roauth2.cat-herding.net/oauth/authorize. You authenticate (optionally via GitHub or Google federation) and approve the scopes.
  3. 3The auth server redirects the popup to chat.cat-herding.net/embed/callback.html with an authorization code. Callback postMessages the code back to the widget and closes.
  4. 4Widget POSTs {code, code_verifier} to a same-origin proxy on the chat backend that forwards to the issuer's /oauth/token and strips refresh_token before handing the JWT back.
  5. 5Widget stores the access token in sessionStorage, reconnects its Socket.IO client with the token in the handshake, and Istio RequestAuthentication validates the JWT against the in-cluster JWKS before forwarding to the chat backend. Hold flow starts.

Embed it anywhere

The widget is a ~20 KB gzipped IIFE that mounts inside a Shadow DOM so host CSS can't bleed in. The same bundle is running on every page of this site right now — in lean mode elsewhere, demo mode here. Drop it into any site:

<script src="https://chat.cat-herding.net/embed/cat-herding-chat.js" defer></script>
<script>
  window.addEventListener('load', () => {
    window.CatHerdingChat.init({
      apiUrl: 'https://chat.cat-herding.net',
      mode: 'lean',
      auth: {
        type: 'oauth2',
        issuer: 'https://roauth2.cat-herding.net',
        clientId: 'cat-herding-chat-embed',
        scopes: 'openid profile email',
      },
    });
  });
</script>