9.4 KiB
U3 Architecture Design: Semantic Router
Status: APPROVED — Design follows existing CostAwareRouter layer pattern Date: 2026-06-14 Unit: U3 of P0 Production Hardening Plan
1. Design Goals
- Zero LLM cost for confident matches: When semantic similarity > 0.85, skip Layer 2 LLM classification entirely
- Reduce LLM tokens for medium matches: When similarity 0.6-0.85, pass skill hint to Layer 2, reducing classification tokens
- Chinese-first: Embedding model must handle Chinese+English mixed text well
- Pre-computed skill embeddings: Compute at skill registration time, not query time
- Graceful degradation: If embedder fails, fall through to existing Layer 1/2 flow
2. Insertion Point Analysis
Current CostAwareRouter.route() flow:
Layer 0: Rule-based (zero cost)
→ explicit_skill / greeting / chat_mode / identity → return
Layer 1: Complexity classification
→ low (<0.3) → DIRECT_CHAT → return
→ medium (0.3-0.7) → _classify_merged() or IntentRouter → return
→ high (>0.7) → Layer 2
Layer 2: Capability matching / Auction
→ return
Semantic Router insertion: Between Layer 1 complexity classification and the medium/high branching
Layer 1: Complexity classification → complexity score
→ low (<0.3) → DIRECT_CHAT → return
→ medium (0.3-0.7):
┌─ Layer 1.5: Semantic Router (NEW) ─────────────┐
│ embed query → compare with skill embeddings │
│ sim > 0.85 → SKILL_REACT with matched skill │
│ sim 0.6-0.85 → pass skill_hint to _classify_merged │
│ sim < 0.6 → proceed to _classify_merged normally │
└──────────────────────────────────────────────────┘
→ high (>0.7):
┌─ Layer 1.5: Semantic Router (NEW) ─────────────┐
│ sim > 0.85 → SKILL_REACT with matched skill │
│ sim 0.6-0.85 → pass skill_hint to Layer 2 │
│ sim < 0.6 → proceed to Layer 2 normally │
└──────────────────────────────────────────────────┘
Why both medium AND high complexity? The plan says "when Layer 1 returns medium complexity (0.3-0.7), try semantic routing first." But semantic routing is also valuable for high complexity — if we can confidently match a skill at zero cost, we should. The cost saving is even greater for high complexity (which would use more expensive Layer 2 LLM calls).
3. Component Design
3.1 SkillEmbeddingIndex
class SkillEmbeddingIndex:
"""Pre-computed embedding index for registered skills."""
def __init__(self, embedder: Embedder):
self._embedder = embedder
self._index: dict[str, tuple[list[float], str]] = {} # skill_name → (embedding, source_text)
async def build(self, skill_registry) -> None:
"""Build index from all registered skills."""
...
async def update_skill(self, skill_name: str, skill) -> None:
"""Re-embed a single skill (on registration/update)."""
...
def remove_skill(self, skill_name: str) -> None:
"""Remove a skill from the index."""
...
async def search(self, query: str, top_k: int = 5) -> list[tuple[str, float]]:
"""Search for skills matching the query. Returns [(skill_name, similarity)]."""
...
3.2 SemanticRouter
class SemanticRouter:
"""Embedding-based semantic routing as Layer 1.5."""
def __init__(
self,
embedder: Embedder,
similarity_high: float = 0.85,
similarity_low: float = 0.6,
):
self._index = SkillEmbeddingIndex(embedder)
self._similarity_high = similarity_high
self._similarity_low = similarity_low
self._enabled = True
async def route(self, query: str) -> SemanticRouteResult:
"""Route a query using semantic similarity.
Returns:
SemanticRouteResult with:
- confidence: "high" | "medium" | "low"
- skill_name: matched skill name (None if low confidence)
- similarity: cosine similarity score
"""
...
@dataclass
class SemanticRouteResult:
confidence: str # "high" | "medium" | "low"
skill_name: str | None
similarity: float
4. Skill Embedding Source Text
Design Decision: What text to embed for each skill?
source_text = f"{skill.description} | {' '.join(skill.intent.keywords)} | {' '.join(cap.tag for cap in skill.capabilities)}"
Why this combination?
description: Captures the semantic meaning of what the skill doesintent.keywords: Captures the trigger phrases users might usecapability tags: Captures the functional categories
Chinese consideration: Skill descriptions and keywords are often in Chinese. The embedding model must handle this well. bge-m3 is the default for this reason.
5. Integration into CostAwareRouter
5.1 Constructor Change
class CostAwareRouter:
def __init__(self, ..., semantic_router: SemanticRouter | None = None):
self._semantic_router = semantic_router
...
5.2 Route Method Modification
The key change is in route(), after Layer 1 complexity classification:
# After complexity is determined (medium or high)
if self._semantic_router is not None and complexity >= 0.3:
try:
semantic_result = await self._semantic_router.route(clean_content)
if semantic_result.confidence == "high":
# Direct skill match — skip Layer 2
result = await resolve_skill_routing(
content=content,
skill_registry=skill_registry,
intent_router=intent_router,
...,
force_skill=semantic_result.skill_name, # NEW parameter
)
result.match_method = "semantic_high"
result.match_confidence = semantic_result.similarity
result.execution_mode = ExecutionMode.SKILL_REACT
return result
elif semantic_result.confidence == "medium":
# Pass skill hint to Layer 1.5 merged classify or Layer 2
skill_hint = semantic_result.skill_name
except Exception as e:
logger.warning(f"Semantic routing failed, falling through: {e}")
5.3 Skill Hint Propagation
For medium confidence matches, the skill hint is passed to _classify_merged() or _route_layer2() via a new skill_hint parameter. This reduces the LLM classification prompt by providing a strong signal.
Implementation: Add skill_hint: str | None = None parameter to _classify_merged() and _route_layer2(). When provided, include it in the LLM prompt: "Based on semantic analysis, the query may relate to skill '{skill_hint}'. Please confirm or override."
6. Embedding Caching
Skill embeddings are pre-computed and cached in SkillEmbeddingIndex. Query embeddings are computed per-request but can be cached using the existing EmbeddingCache from agentkit.memory.embedder.
Design: The SemanticRouter uses an OpenAIEmbedder with EmbeddingCache for query embeddings. Skill embeddings are stored in SkillEmbeddingIndex and only re-computed on skill registration/update.
7. Edge Cases
| Edge Case | Behavior |
|---|---|
| No skills registered | SkillEmbeddingIndex is empty, route() returns low confidence |
| Embedder API fails | Catch exception, return low confidence, fall through to existing flow |
| Skill has no description | Use skill.name as fallback source text |
| Chinese query, English skill description | bge-m3 handles cross-lingual matching |
| Multiple skills with similar embeddings | Return top match; if top_k > 1, could return alternatives (deferred) |
| Semantic router disabled (None) | Existing flow unchanged, zero overhead |
8. Test Strategy
- test_semantic_high_confidence: Query matches skill with sim > 0.85 → SKILL_REACT returned
- test_semantic_medium_confidence: Query matches skill with sim 0.6-0.85 → skill_hint passed
- test_semantic_low_confidence: Query has sim < 0.6 → normal routing proceeds
- test_semantic_router_disabled: No semantic_router → existing flow unchanged
- test_embedder_failure: Embedder throws error → falls through gracefully
- test_skill_registration_updates_index: New skill added → embedding computed
- test_chinese_query: Chinese query matches Chinese skill description
9. Argumentation Summary
| Design Choice | Alternatives Considered | Why This Choice |
|---|---|---|
| Layer 1.5 for both medium AND high | Only medium | High complexity benefits even more from zero-cost skill match |
| Pre-computed skill embeddings | Compute per query | O(n) embedding per query is ~100ms × n_skills; pre-compute is O(1) per query |
| bge-m3 default | text-embedding-3-small | Chinese+English mixed text; bge-m3 is SOTA for multilingual |
| Skill hint for medium confidence | Direct match for medium | Medium confidence isn't reliable enough for direct match; hint reduces LLM tokens without risking wrong routing |
| Separate SemanticRouter class | Inline in CostAwareRouter | Separation of concerns; testable independently; can be disabled without touching router |