fischer-agentkit/docs/plans/2026-06-06-009-feat-agentki...

---
title: "feat: AgentKit RAG Pipeline Optimization"
status: active
created: 2026-06-06
plan-type: feat
origin: RAG 场景问题分析（6 个问题：P0×2, P1×3, P2×1）
---

# feat: AgentKit RAG Pipeline Optimization

## Summary

Optimize the AgentKit RAG pipeline to improve retrieval quality and LLM answer accuracy. The current pipeline passes raw user queries directly to the knowledge base, lacks reranking, injects context without source attribution, and has no mechanism for iterative retrieval during ReAct reasoning. This plan addresses 6 identified issues across 5 implementation units.

## Problem Frame

AgentKit's RAG integration works end-to-end but has critical quality gaps:

1. **Query quality** — Raw user queries (often vague or conversational) are sent directly to the knowledge base, resulting in poor recall
2. **Retrieval quality** — The `/search` endpoint bypasses GEO's EnhancedRAG (rerank + compression), returning unranked results
3. **Context injection** — Knowledge base results are injected as a flat text block without source attribution, making it hard for the LLM to assess credibility
4. **Iterative retrieval** — Only one retrieval happens before the ReAct loop; the LLM cannot request more information mid-reasoning
5. **Configurability** — `top_k` and `token_budget` are hardcoded in `ReActEngine.execute()`
6. **Source differentiation** — All knowledge bases are treated equally regardless of authority or recency

## Requirements

| ID | Requirement | Priority |
|----|-------------|----------|
| R1 | Query rewriting: transform vague user queries into structured retrieval queries before searching | P0 |
| R2 | Enhanced retrieval: call GEO's `/bases/{kb_id}/retrieve` endpoint with rerank+compression support | P0 |
| R3 | Structured context injection: format RAG results with source attribution (title, score, kb type) | P1 |
| R4 | Iterative retrieval: register `retrieve_knowledge` as a built-in Tool for mid-reasoning search | P1 |
| R5 | Configurable retrieval parameters: `top_k`, `token_budget`, `retrieval_strategy` from config | P1 |
| R6 | Per-knowledge-base weight differentiation: industry vs enterprise weights | P2 |

## Key Technical Decisions

### KTD-1: Query rewriting via LLM vs rule-based

**Decision**: LLM-based query rewriting with a lightweight prompt, falling back to rule-based when no LLM gateway is available.

**Rationale**: Rule-based rewriting (keyword extraction, synonym expansion) is fast but limited. LLM rewriting can decompose complex queries, infer intent, and generate multiple sub-queries. The cost is one additional LLM call per task, which is acceptable given the retrieval quality improvement. The fallback ensures the system works without an LLM gateway.

**Alternative considered**: Pure rule-based rewriting — rejected because it cannot handle the diverse query patterns in GEO/SEO domain (e.g., "帮我分析一下竞品的SEO策略" → needs decomposition into "竞品SEO策略分析" + "行业SEO最佳实践").

### KTD-2: Enhanced retrieval via new endpoint vs extending existing

**Decision**: Add `enhanced_search()` method to `HttpRAGService` that calls GEO's `/bases/{kb_id}/retrieve` endpoint, keeping the existing `search()` method for backward compatibility.

**Rationale**: The GEO backend already has `EnhancedRAG.retrieve_with_rerank()` exposed at `POST /bases/{kb_id}/retrieve`. Adding a new method avoids breaking existing consumers while enabling rerank+compression. The config controls which method is used.

### KTD-3: RAG Tool as built-in vs skill-defined

**Decision**: Register `retrieve_knowledge` as a built-in Tool in `MemoryRetriever`, auto-registered when semantic memory is configured.

**Rationale**: Making RAG retrieval a Tool (rather than only a pre-execution step) lets the LLM trigger additional searches during ReAct reasoning. Auto-registration when semantic memory is configured means zero-config for the common case. The Tool is created by `MemoryRetriever` and injected into the agent's tool list.

### KTD-4: Context injection format

**Decision**: Use structured markdown with source blocks instead of flat text.

**Rationale**: The current `## Relevant Past Experience\n{raw_text}` format gives the LLM no way to distinguish high-quality knowledge base results from episodic memories, or to cite sources. Structured blocks with `[来源: 行业库 | 置信度: 0.92 | 文档: 行业报告]` headers let the LLM assess credibility and cite appropriately.

### KTD-5: Per-knowledge-base weight via filters

**Decision**: Extend `MemoryRetriever` weights to support per-source-type multipliers, configured via `memory.semantic.kb_weights` in the YAML config.

**Rationale**: Industry knowledge bases (curated, authoritative) should have higher weight than enterprise-specific ones (narrow, potentially outdated). A simple multiplier per kb_id is sufficient — no need for complex authority scoring.

---

## Implementation Units

### U1. QueryTransformer — Query 改写与扩展

**Goal**: Transform raw user queries into structured retrieval queries before searching the knowledge base, improving recall from ~30% to ~70%+.

**Requirements**: R1

**Dependencies**: None

**Files**:
- `src/agentkit/memory/query_transformer.py` (create)
- `tests/unit/test_query_transformer.py` (create)

**Approach**:
- Create `QueryTransformer` class with two strategies:
  - `LLMQueryTransformer`: Uses LLM gateway to rewrite queries. Prompt instructs the LLM to: (a) extract core intent, (b) decompose complex queries into 1-3 sub-queries, (c) add domain-specific terms. Returns a `TransformedQuery` with `main_query` and `sub_queries`.
  - `RuleQueryTransformer`: Fallback that applies rule-based transformations — strip filler words, extract noun phrases, add domain synonyms from a configurable map.
- `TransformedQuery` dataclass: `main_query: str`, `sub_queries: list[str]`, `original_query: str`.
- `QueryTransformer` is called by `MemoryRetriever.retrieve()` before dispatching to memory layers.
- Config: `memory.query_transform.enabled: bool`, `memory.query_transform.strategy: "llm" | "rule"`, `memory.query_transform.max_sub_queries: int = 3`.

**Patterns to follow**: `agentkit/memory/embedder.py` — abstract base + concrete implementations pattern.

**Test scenarios**:
- LLM transformer: mock LLM gateway, verify prompt construction and response parsing
- LLM transformer: verify fallback to original query on LLM error
- Rule transformer: verify filler word removal and synonym expansion
- Rule transformer: verify no-op when query is already well-formed
- Integration: verify `MemoryRetriever.retrieve()` calls transformer before search
- Integration: verify sub-queries are searched in parallel and results merged

**Verification**: All tests pass. `MemoryRetriever` with query transform enabled produces different (better) search calls than without.

---

### U2. HttpRAGService Enhanced Search — 增强检索端点

**Goal**: Enable AgentKit to call GEO's EnhancedRAG endpoint with rerank and compression, improving retrieval precision from ~50% to ~80%+.

**Requirements**: R2

**Dependencies**: None

**Files**:
- `src/agentkit/memory/http_rag.py` (modify)
- `src/agentkit/memory/semantic.py` (modify)
- `src/agentkit/server/config.py` (modify)
- `tests/unit/test_http_rag_service.py` (modify)

**Approach**:
- Add `enhanced_search()` method to `HttpRAGService`:
  - Calls `POST /bases/{kb_id}/retrieve` for each configured knowledge base
  - Passes `use_rerank` and `use_compression` parameters
  - Merges results from multiple KBs, re-scores by reranked relevance
- Add `search_mode: "standard" | "enhanced"` parameter to `SemanticMemory.search()`:
  - `"standard"`: calls `rag_service.search()` (current behavior, backward compatible)
  - `"enhanced"`: calls `rag_service.enhanced_search()` with rerank+compression
- Config additions under `memory.semantic`:
  - `search_mode: "enhanced"` (default: `"standard"`)
  - `use_rerank: true` (default: true when enhanced)
  - `use_compression: false` (default: false)
- `SemanticMemory.search()` passes `filters` through to `HttpRAGService` to allow per-query override.

**Patterns to follow**: Existing `search()` method in `http_rag.py` — same HTTP client pattern, same error handling, same response normalization.

**Test scenarios**:
- `enhanced_search()` with rerank enabled: verify correct endpoint and payload
- `enhanced_search()` with compression enabled: verify payload includes `use_compression: true`
- `enhanced_search()` with multiple KBs: verify parallel calls and result merging
- `enhanced_search()` HTTP error: verify graceful fallback to empty results
- `SemanticMemory.search()` with `search_mode="enhanced"`: verify delegation to `enhanced_search()`
- `SemanticMemory.search()` with `search_mode="standard"`: verify existing behavior unchanged
- Config parsing: verify `search_mode`, `use_rerank`, `use_compression` from YAML

**Verification**: All tests pass. `enhanced_search()` returns reranked results when GEO backend supports it.

---

### U3. Structured Context Injection — 结构化上下文注入

**Goal**: Format RAG results with source attribution so the LLM can assess credibility and cite sources.

**Requirements**: R3

**Dependencies**: U1 (query transformer affects what results are returned)

**Files**:
- `src/agentkit/memory/retriever.py` (modify)
- `src/agentkit/core/react.py` (modify)
- `tests/unit/test_memory_integration.py` (modify)

**Approach**:
- Replace `MemoryRetriever.get_context_string()` with `get_context_messages()` that returns structured context:
  ```
  ### 知识库参考 [来源: 行业库 | 相关度: 0.92 | 文档: AI行业趋势报告]
  AI行业在2025年呈现三大趋势...

  ### 过往经验 [来源: 情景记忆 | 任务类型: seo_analysis]
  上次分析竞品SEO策略时发现...
  ```
- Each `MemoryItem` is rendered with its metadata: `source` (rag/graph/episodic/working), `score`, `document_title`, `kb_type`.
- `ReActEngine.execute()` calls `get_context_messages()` instead of `get_context_string()`.
- The injection heading changes from `## Relevant Past Experience` to `## 参考信息` (bilingual-friendly).
- Add `context_template: "structured" | "flat"` config option (default: `"structured"`).

**Patterns to follow**: Current `get_context_string()` in `retriever.py` — same token budget logic, same parallel retrieval.

**Test scenarios**:
- Structured format: verify each result has source header with metadata
- Flat format: verify backward-compatible plain text output
- Token budget: verify long results are truncated within budget
- Mixed sources: verify RAG results and episodic memories are formatted differently
- ReActEngine integration: verify system_prompt contains structured context
- Empty results: verify no context section added when no results found

**Verification**: LLM receives structured context with source attribution. Backward compatible with `context_template: "flat"`.

---

### U4. RetrieveKnowledge Tool — ReAct 循环内二次检索

**Goal**: Enable the LLM to trigger additional knowledge base searches during ReAct reasoning by registering `retrieve_knowledge` as a built-in Tool.

**Requirements**: R4

**Dependencies**: U1, U3

**Files**:
- `src/agentkit/memory/retriever.py` (modify)
- `src/agentkit/core/config_driven.py` (modify)
- `src/agentkit/server/app.py` (modify)
- `tests/unit/test_retrieve_knowledge_tool.py` (create)

**Approach**:
- Create `RetrieveKnowledgeTool(Tool)` inner class within `MemoryRetriever`:
  - `name: "retrieve_knowledge"`
  - `description: "Search the knowledge base for additional information. Use when you need more context or facts."`
  - `input_schema: {"type": "object", "properties": {"query": {"type": "string", "description": "Search query"}}, "required": ["query"]}`
  - `execute(query)`: calls `self._retriever.retrieve(query)` and returns formatted results
- Add `create_retrieve_tool() -> Tool | None` method to `MemoryRetriever`:
  - Returns `RetrieveKnowledgeTool` instance if semantic memory is configured
  - Returns `None` if no semantic memory (tool not available)
- Auto-register the tool in `ConfigDrivenAgent.__init__()` and `app.py` when `memory_retriever` is created:
  - `if memory_retriever and memory_retriever.create_retrieve_tool(): agent.use_tool(tool)`
- The tool uses the same `MemoryRetriever.retrieve()` pipeline, so query transformation (U1) and structured formatting (U3) apply automatically.

**Patterns to follow**: `agentkit/tools/base.py` — Tool subclass pattern with `execute()` and `safe_execute()`.

**Test scenarios**:
- Tool creation: verify `create_retrieve_tool()` returns a Tool when semantic memory is configured
- Tool creation: verify `create_retrieve_tool()` returns None when no semantic memory
- Tool execution: verify `execute(query="AI趋势")` calls `MemoryRetriever.retrieve()` with the query
- Tool execution: verify results are formatted as structured text
- Tool schema: verify `input_schema` has `query` field
- Auto-registration: verify ConfigDrivenAgent with semantic memory has `retrieve_knowledge` in its tool list
- Auto-registration: verify agent without semantic memory does NOT have the tool
- ReAct integration: verify LLM can call `retrieve_knowledge` during ReAct loop

**Verification**: Agent with semantic memory has `retrieve_knowledge` tool. LLM can call it during reasoning. Results are formatted with source attribution.

---

### U5. Configurable Retrieval + Per-KB Weights — 可配置参数与差异化权重

**Goal**: Make retrieval parameters configurable and support per-knowledge-base weight differentiation.

**Requirements**: R5, R6

**Dependencies**: U2, U3

**Files**:
- `src/agentkit/core/react.py` (modify)
- `src/agentkit/memory/retriever.py` (modify)
- `src/agentkit/server/config.py` (modify)
- `src/agentkit/core/config_driven.py` (modify)
- `tests/unit/test_memory_integration.py` (modify)

**Approach**:
- **Configurable retrieval parameters**:
  - Add `retrieval` sub-section to `memory` config:
    ```yaml
    memory:
      retrieval:
        top_k: 5
        token_budget: 2000
        context_template: "structured"
    ```
  - `ReActEngine.execute()` reads these from `SkillConfig.memory.retrieval` or falls back to defaults.
  - Pass `retrieval_config` through `ConfigDrivenAgent._handle_react()` to `ReActEngine.execute()`.
- **Per-KB weights**:
  - Add `kb_weights` to `memory.semantic` config:
    ```yaml
    memory:
      semantic:
        kb_weights:
          "industry-kb-id": 1.2    # 行业库权重更高
          "enterprise-kb-id": 0.8  # 企业库权重较低
    ```
  - `SemanticMemory.search()` applies kb_weights as score multipliers after retrieval.
  - `MemoryRetriever` passes kb_weights through `filters` to `SemanticMemory.search()`.
- **Token estimation improvement**:
  - Replace `len(text) // 4` with a slightly better heuristic: `max(len(text) // 3, len(text.split()))` for mixed Chinese/English content. Not perfect but significantly better for CJK text.

**Patterns to follow**: Existing config pattern in `ServerConfig.from_dict()` — same dict-based config with env var resolution.

**Test scenarios**:
- Config parsing: verify `retrieval.top_k`, `retrieval.token_budget`, `retrieval.context_template` from YAML
- Config parsing: verify `semantic.kb_weights` from YAML
- ReActEngine: verify configurable `top_k` and `token_budget` are used instead of hardcoded values
- Per-KB weights: verify industry KB results get higher scores than enterprise KB results
- Per-KB weights: verify unweighted KBs get default score (1.0 multiplier)
- Token estimation: verify improved heuristic for Chinese text
- Backward compatibility: verify defaults match current hardcoded values when config is absent

**Verification**: Retrieval parameters are configurable via YAML. Per-KB weights are applied. No behavior change when config is absent.

---

## Scope Boundaries

### In Scope
- Query rewriting (LLM + rule-based)
- Enhanced retrieval with rerank/compression
- Structured context injection with source attribution
- `retrieve_knowledge` Tool for iterative retrieval
- Configurable retrieval parameters
- Per-knowledge-base weight differentiation

### Deferred to Follow-Up Work
- Cross-encoder reranking model (GEO currently uses LLM-based reranking, which is sufficient)
- Full-text search upgrade (GEO's ILIKE → ts_vector is a backend-only change)
- Semantic memory protocol formalization (ABC for rag_service)
- Caching layer for frequent queries
- Multi-hop retrieval (retrieval → extraction → retrieval chains)
- Retrieval metrics and observability (hit rate, latency tracking)

---

## Risks and Mitigations

| Risk | Impact | Mitigation |
|------|--------|------------|
| LLM query rewriting adds latency (~500ms per task) | Medium | Async execution; fallback to rule-based when LLM unavailable; configurable on/off |
| Enhanced retrieval endpoint may not exist on all backends | Low | `search_mode: "standard"` is default; `enhanced_search()` falls back to `search()` on 404 |
| `retrieve_knowledge` tool may cause infinite retrieval loops | Medium | ReAct `max_steps` already limits total iterations; add `max_retrieval_calls` config (default: 3) |
| Per-KB weights require knowing KB IDs at config time | Low | Weights are optional; unweighted KBs use default multiplier (1.0) |

---

## System-Wide Impact

- **ReActEngine**: New parameters for configurable retrieval; context injection format change
- **MemoryRetriever**: Query transformation pipeline; structured context output; tool creation
- **HttpRAGService**: New `enhanced_search()` method
- **SemanticMemory**: `search_mode` parameter; kb_weights support
- **ConfigDrivenAgent**: Auto-registration of `retrieve_knowledge` tool; config-driven retrieval parameters
- **ServerConfig**: New config sections for `memory.retrieval` and `memory.semantic.kb_weights`
- **GEO backend**: No changes required — `EnhancedRAG` endpoints already exist

---

## Phased Delivery

| Phase | Units | Focus |
|-------|-------|-------|
| Phase A: Query Quality | U1, U2 | Query rewriting + enhanced retrieval |
| Phase B: Context Quality | U3, U4 | Structured injection + iterative retrieval |
| Phase C: Configurability | U5 | Configurable parameters + per-KB weights |