fischer-agentkit/docs/plans/2026-06-14-004-u3-semantic-...

236 lines
9.4 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# U3 Architecture Design: Semantic Router
> Status: APPROVED — Design follows existing CostAwareRouter layer pattern
> Date: 2026-06-14
> Unit: U3 of P0 Production Hardening Plan
---
## 1. Design Goals
1. **Zero LLM cost for confident matches**: When semantic similarity > 0.85, skip Layer 2 LLM classification entirely
2. **Reduce LLM tokens for medium matches**: When similarity 0.6-0.85, pass skill hint to Layer 2, reducing classification tokens
3. **Chinese-first**: Embedding model must handle Chinese+English mixed text well
4. **Pre-computed skill embeddings**: Compute at skill registration time, not query time
5. **Graceful degradation**: If embedder fails, fall through to existing Layer 1/2 flow
---
## 2. Insertion Point Analysis
### Current `CostAwareRouter.route()` flow:
```
Layer 0: Rule-based (zero cost)
→ explicit_skill / greeting / chat_mode / identity → return
Layer 1: Complexity classification
→ low (<0.3) → DIRECT_CHAT → return
→ medium (0.3-0.7) → _classify_merged() or IntentRouter → return
→ high (>0.7) → Layer 2
Layer 2: Capability matching / Auction
→ return
```
### Semantic Router insertion: **Between Layer 1 complexity classification and the medium/high branching**
```
Layer 1: Complexity classification → complexity score
→ low (<0.3) → DIRECT_CHAT → return
→ medium (0.3-0.7):
┌─ Layer 1.5: Semantic Router (NEW) ─────────────┐
│ embed query → compare with skill embeddings │
│ sim > 0.85 → SKILL_REACT with matched skill │
│ sim 0.6-0.85 → pass skill_hint to _classify_merged │
│ sim < 0.6 → proceed to _classify_merged normally │
└──────────────────────────────────────────────────┘
→ high (>0.7):
┌─ Layer 1.5: Semantic Router (NEW) ─────────────┐
│ sim > 0.85 → SKILL_REACT with matched skill │
│ sim 0.6-0.85 → pass skill_hint to Layer 2 │
│ sim < 0.6 → proceed to Layer 2 normally │
└──────────────────────────────────────────────────┘
```
**Why both medium AND high complexity?** The plan says "when Layer 1 returns medium complexity (0.3-0.7), try semantic routing first." But semantic routing is also valuable for high complexity — if we can confidently match a skill at zero cost, we should. The cost saving is even greater for high complexity (which would use more expensive Layer 2 LLM calls).
---
## 3. Component Design
### 3.1 SkillEmbeddingIndex
```python
class SkillEmbeddingIndex:
"""Pre-computed embedding index for registered skills."""
def __init__(self, embedder: Embedder):
self._embedder = embedder
self._index: dict[str, tuple[list[float], str]] = {} # skill_name → (embedding, source_text)
async def build(self, skill_registry) -> None:
"""Build index from all registered skills."""
...
async def update_skill(self, skill_name: str, skill) -> None:
"""Re-embed a single skill (on registration/update)."""
...
def remove_skill(self, skill_name: str) -> None:
"""Remove a skill from the index."""
...
async def search(self, query: str, top_k: int = 5) -> list[tuple[str, float]]:
"""Search for skills matching the query. Returns [(skill_name, similarity)]."""
...
```
### 3.2 SemanticRouter
```python
class SemanticRouter:
"""Embedding-based semantic routing as Layer 1.5."""
def __init__(
self,
embedder: Embedder,
similarity_high: float = 0.85,
similarity_low: float = 0.6,
):
self._index = SkillEmbeddingIndex(embedder)
self._similarity_high = similarity_high
self._similarity_low = similarity_low
self._enabled = True
async def route(self, query: str) -> SemanticRouteResult:
"""Route a query using semantic similarity.
Returns:
SemanticRouteResult with:
- confidence: "high" | "medium" | "low"
- skill_name: matched skill name (None if low confidence)
- similarity: cosine similarity score
"""
...
@dataclass
class SemanticRouteResult:
confidence: str # "high" | "medium" | "low"
skill_name: str | None
similarity: float
```
---
## 4. Skill Embedding Source Text
**Design Decision**: What text to embed for each skill?
```python
source_text = f"{skill.description} | {' '.join(skill.intent.keywords)} | {' '.join(cap.tag for cap in skill.capabilities)}"
```
**Why this combination?**
- `description`: Captures the semantic meaning of what the skill does
- `intent.keywords`: Captures the trigger phrases users might use
- `capability tags`: Captures the functional categories
**Chinese consideration**: Skill descriptions and keywords are often in Chinese. The embedding model must handle this well. `bge-m3` is the default for this reason.
---
## 5. Integration into CostAwareRouter
### 5.1 Constructor Change
```python
class CostAwareRouter:
def __init__(self, ..., semantic_router: SemanticRouter | None = None):
self._semantic_router = semantic_router
...
```
### 5.2 Route Method Modification
The key change is in `route()`, after Layer 1 complexity classification:
```python
# After complexity is determined (medium or high)
if self._semantic_router is not None and complexity >= 0.3:
try:
semantic_result = await self._semantic_router.route(clean_content)
if semantic_result.confidence == "high":
# Direct skill match — skip Layer 2
result = await resolve_skill_routing(
content=content,
skill_registry=skill_registry,
intent_router=intent_router,
...,
force_skill=semantic_result.skill_name, # NEW parameter
)
result.match_method = "semantic_high"
result.match_confidence = semantic_result.similarity
result.execution_mode = ExecutionMode.SKILL_REACT
return result
elif semantic_result.confidence == "medium":
# Pass skill hint to Layer 1.5 merged classify or Layer 2
skill_hint = semantic_result.skill_name
except Exception as e:
logger.warning(f"Semantic routing failed, falling through: {e}")
```
### 5.3 Skill Hint Propagation
For medium confidence matches, the skill hint is passed to `_classify_merged()` or `_route_layer2()` via a new `skill_hint` parameter. This reduces the LLM classification prompt by providing a strong signal.
**Implementation**: Add `skill_hint: str | None = None` parameter to `_classify_merged()` and `_route_layer2()`. When provided, include it in the LLM prompt: "Based on semantic analysis, the query may relate to skill '{skill_hint}'. Please confirm or override."
---
## 6. Embedding Caching
Skill embeddings are pre-computed and cached in `SkillEmbeddingIndex`. Query embeddings are computed per-request but can be cached using the existing `EmbeddingCache` from `agentkit.memory.embedder`.
**Design**: The `SemanticRouter` uses an `OpenAIEmbedder` with `EmbeddingCache` for query embeddings. Skill embeddings are stored in `SkillEmbeddingIndex` and only re-computed on skill registration/update.
---
## 7. Edge Cases
| Edge Case | Behavior |
|-----------|----------|
| No skills registered | `SkillEmbeddingIndex` is empty, `route()` returns low confidence |
| Embedder API fails | Catch exception, return low confidence, fall through to existing flow |
| Skill has no description | Use `skill.name` as fallback source text |
| Chinese query, English skill description | `bge-m3` handles cross-lingual matching |
| Multiple skills with similar embeddings | Return top match; if top_k > 1, could return alternatives (deferred) |
| Semantic router disabled (None) | Existing flow unchanged, zero overhead |
---
## 8. Test Strategy
1. **test_semantic_high_confidence**: Query matches skill with sim > 0.85 → SKILL_REACT returned
2. **test_semantic_medium_confidence**: Query matches skill with sim 0.6-0.85 → skill_hint passed
3. **test_semantic_low_confidence**: Query has sim < 0.6 normal routing proceeds
4. **test_semantic_router_disabled**: No semantic_router existing flow unchanged
5. **test_embedder_failure**: Embedder throws error falls through gracefully
6. **test_skill_registration_updates_index**: New skill added embedding computed
7. **test_chinese_query**: Chinese query matches Chinese skill description
---
## 9. Argumentation Summary
| Design Choice | Alternatives Considered | Why This Choice |
|--------------|------------------------|----------------|
| Layer 1.5 for both medium AND high | Only medium | High complexity benefits even more from zero-cost skill match |
| Pre-computed skill embeddings | Compute per query | O(n) embedding per query is ~100ms × n_skills; pre-compute is O(1) per query |
| bge-m3 default | text-embedding-3-small | Chinese+English mixed text; bge-m3 is SOTA for multilingual |
| Skill hint for medium confidence | Direct match for medium | Medium confidence isn't reliable enough for direct match; hint reduces LLM tokens without risking wrong routing |
| Separate SemanticRouter class | Inline in CostAwareRouter | Separation of concerns; testable independently; can be disabled without touching router |