Support Triage Agent — Architecture Review

Agent Topology

User Request │ ▼ ┌─────────────────┐ │ Orchestrator │ Route + Coordinate └────────┬────────┘ │ ┌────┼────┐ ▼ ▼ ▼ ┌──────┐ ┌──────┐ ┌──────┐ │Class-│ │ KB │ │Draft-│ │ifier │ │Fetch │ │ er │ └──────┘ └──────┘ └──────┘ │ │ │ └────┬────┘ │ ▼ ▼ [Category] [Draft → HITL gate]

Pattern	Value
Architecture type	Orchestrator-Workers
Agent count	4 (1 orchestrator + 3 workers)
Orchestration style	Sequential fan-out with HITL gate at send
State management	External session store (Redis)

Autonomy Tier

Dimension	Classification
Action reversibility	Correctable
Human approval gate	Required before send
Failure blast radius	Low — draft only until approved
Recommended rollout	Shadow mode → gated (10%) → progressive
Overall autonomy level	L2 — Assisted

Tool Boundary Map

Tool / Integration	Purpose	Scope	Risk
`classify_ticket`	Assign category and priority to inbound ticket	Read-only	Low
`search_kb`	Retrieve relevant knowledge base articles by category	Read-only	Low
`get_article`	Fetch full text of a specific KB article by ID	Read-only	Low
`draft_response`	Generate candidate reply from KB context and ticket	Read-only	Low
`send_response`	Deliver approved draft to customer via support channel	Write — irreversible	Medium — HITL gate required
`escalate_ticket`	Route ticket to human agent when classifier confidence is low	Write — correctable	Low

Design Decision Rationale

Decision	Choice made	Alternatives considered	Tradeoff
Architecture pattern	Orchestrator-Workers	Single agent with all tools, parallel fan-out	3 workers reduce per-agent tool count from 6 to 2; context stays focused. Adds coordination overhead.
KB retrieval strategy	Category-first search	Full-text semantic search across all articles	Category filter reduces retrieval noise by 60%; requires accurate classification first. Cascading dependency.
Response send gate	HITL approval before send	Automatic send with post-send review, no gate	Prevents sending incorrect responses; adds ~2 min latency. Acceptable for support SLA of 4 hours.
State persistence	External Redis session store	In-context state, filesystem state	Enables long-lived sessions across human review gaps; requires Redis ops; in-context state would overflow on complex tickets.

Risk Register

High

Classifier-to-KB dependency creates cascading failure path

If the classifier assigns the wrong category, KB search retrieves irrelevant articles, drafter produces a hallucinated response, and the HITL gate is the only safety net. A miscalibrated classifier silently degrades all downstream quality.

Add classifier confidence threshold: below 0.7, escalate directly rather than proceeding to KB fetch. Instrument classifier accuracy in shadow mode before full rollout.

Medium

No observability spine before shadow mode

The current design has no trace instrumentation. Shadow mode data will be uninterpretable without per-span telemetry covering classifier output, KB retrieval relevance, and draft quality signal.

Instrument with OpenTelemetry spans before shadow launch. Minimum: session span, classifier span (category + confidence), KB span (docs returned + relevance scores), drafter span (draft text).

Medium

HITL gate UI is unspecified

The architecture specifies that send_response requires HITL approval but does not specify the approval interface. If this defaults to an email-based approval, reviewer latency will exceed the 4-hour SLA on high-volume days.

Define the HITL interface explicitly: inline approval widget in the support platform is preferred. Batch approval mode for low-risk draft categories reduces per-ticket friction.

Architecture Recommendation

The Orchestrator-Workers pattern is the correct choice for this workload: 6 tools across 3 functional roles would create context pollution in a single-agent design, and sequential fan-out maps cleanly to the classify → retrieve → draft pipeline. The L2 autonomy tier (HITL gate on send) is appropriate given the irreversibility of sending incorrect support responses to customers. Before proceeding to shadow mode, resolve three gaps: define the tool boundary between classifier and KB worker (currently a shared data dependency without a formal contract), implement the HITL approval interface explicitly rather than as a placeholder, and add the observability spine. These are not blocking redesigns — they are implementation completeness items that can be addressed in the current sprint.

Open Questions

What is the classifier confidence threshold below which the agent should escalate rather than retrieve? Currently unspecified — this determines the proportion of tickets that bypass the full pipeline.

Is the KB article corpus versioned? If KB articles are updated after a draft is generated but before the HITL reviewer approves it, the draft may reference outdated information.

What is the fallback when no KB articles match the classified category? The current architecture has no fallback path — the drafter would proceed with empty context and likely hallucinate.

Recommended Next Steps

1. Define classifier-to-KB handoff contract: specify the output schema (category, confidence, top-3 subcategories) and the confidence threshold for escalation vs. proceed.
2. Add observability spine before shadow mode launch: session span, classifier span, KB retrieval span, drafter span — minimum viable telemetry.
3. Specify the HITL approval interface: inline widget in support platform preferred; document batch approval mode for low-risk categories.
4. Add no-match fallback to KB worker: if zero articles match, route to escalate_ticket rather than proceeding to drafter with empty context.
5. Run shadow mode for minimum 500 tickets before enabling HITL-gated send in any production segment.