18 KiB

Raw Blame History

title	status	created	plan-type	origin
feat: AgentKit RAG Pipeline Optimization	active	2026-06-06	feat	RAG 场景问题分析（6 个问题：P0×2, P1×3, P2×1）

feat: AgentKit RAG Pipeline Optimization

Summary

Optimize the AgentKit RAG pipeline to improve retrieval quality and LLM answer accuracy. The current pipeline passes raw user queries directly to the knowledge base, lacks reranking, injects context without source attribution, and has no mechanism for iterative retrieval during ReAct reasoning. This plan addresses 6 identified issues across 5 implementation units.

Problem Frame

AgentKit's RAG integration works end-to-end but has critical quality gaps:

Query quality — Raw user queries (often vague or conversational) are sent directly to the knowledge base, resulting in poor recall
Retrieval quality — The /search endpoint bypasses GEO's EnhancedRAG (rerank + compression), returning unranked results
Context injection — Knowledge base results are injected as a flat text block without source attribution, making it hard for the LLM to assess credibility
Iterative retrieval — Only one retrieval happens before the ReAct loop; the LLM cannot request more information mid-reasoning
Configurability — top_k and token_budget are hardcoded in ReActEngine.execute()
Source differentiation — All knowledge bases are treated equally regardless of authority or recency

Requirements

ID	Requirement	Priority
R1	Query rewriting: transform vague user queries into structured retrieval queries before searching	P0
R2	Enhanced retrieval: call GEO's `/bases/{kb_id}/retrieve` endpoint with rerank+compression support	P0
R3	Structured context injection: format RAG results with source attribution (title, score, kb type)	P1
R4	Iterative retrieval: register `retrieve_knowledge` as a built-in Tool for mid-reasoning search	P1
R5	Configurable retrieval parameters: `top_k`, `token_budget`, `retrieval_strategy` from config	P1
R6	Per-knowledge-base weight differentiation: industry vs enterprise weights	P2

Key Technical Decisions

KTD-1: Query rewriting via LLM vs rule-based

Decision: LLM-based query rewriting with a lightweight prompt, falling back to rule-based when no LLM gateway is available.

Rationale: Rule-based rewriting (keyword extraction, synonym expansion) is fast but limited. LLM rewriting can decompose complex queries, infer intent, and generate multiple sub-queries. The cost is one additional LLM call per task, which is acceptable given the retrieval quality improvement. The fallback ensures the system works without an LLM gateway.

Alternative considered: Pure rule-based rewriting — rejected because it cannot handle the diverse query patterns in GEO/SEO domain (e.g., "帮我分析一下竞品的SEO策略" → needs decomposition into "竞品SEO策略分析" + "行业SEO最佳实践").

KTD-2: Enhanced retrieval via new endpoint vs extending existing

Decision: Add enhanced_search() method to HttpRAGService that calls GEO's /bases/{kb_id}/retrieve endpoint, keeping the existing search() method for backward compatibility.

Rationale: The GEO backend already has EnhancedRAG.retrieve_with_rerank() exposed at POST /bases/{kb_id}/retrieve. Adding a new method avoids breaking existing consumers while enabling rerank+compression. The config controls which method is used.

KTD-3: RAG Tool as built-in vs skill-defined

Decision: Register retrieve_knowledge as a built-in Tool in MemoryRetriever, auto-registered when semantic memory is configured.

Rationale: Making RAG retrieval a Tool (rather than only a pre-execution step) lets the LLM trigger additional searches during ReAct reasoning. Auto-registration when semantic memory is configured means zero-config for the common case. The Tool is created by MemoryRetriever and injected into the agent's tool list.

KTD-4: Context injection format

Decision: Use structured markdown with source blocks instead of flat text.

Rationale: The current ## Relevant Past Experience\n{raw_text} format gives the LLM no way to distinguish high-quality knowledge base results from episodic memories, or to cite sources. Structured blocks with [来源: 行业库 | 置信度: 0.92 | 文档: 行业报告] headers let the LLM assess credibility and cite appropriately.

KTD-5: Per-knowledge-base weight via filters

Decision: Extend MemoryRetriever weights to support per-source-type multipliers, configured via memory.semantic.kb_weights in the YAML config.

Rationale: Industry knowledge bases (curated, authoritative) should have higher weight than enterprise-specific ones (narrow, potentially outdated). A simple multiplier per kb_id is sufficient — no need for complex authority scoring.

Implementation Units

U1. QueryTransformer — Query 改写与扩展

Goal: Transform raw user queries into structured retrieval queries before searching the knowledge base, improving recall from ~30% to ~70%+.

Requirements: R1

Dependencies: None

Files:

src/agentkit/memory/query_transformer.py (create)
tests/unit/test_query_transformer.py (create)

Approach:

Create QueryTransformer class with two strategies:
- LLMQueryTransformer: Uses LLM gateway to rewrite queries. Prompt instructs the LLM to: (a) extract core intent, (b) decompose complex queries into 1-3 sub-queries, (c) add domain-specific terms. Returns a TransformedQuery with main_query and sub_queries.
- RuleQueryTransformer: Fallback that applies rule-based transformations — strip filler words, extract noun phrases, add domain synonyms from a configurable map.
TransformedQuery dataclass: main_query: str, sub_queries: list[str], original_query: str.
QueryTransformer is called by MemoryRetriever.retrieve() before dispatching to memory layers.
Config: memory.query_transform.enabled: bool, memory.query_transform.strategy: "llm" | "rule", memory.query_transform.max_sub_queries: int = 3.

Patterns to follow: agentkit/memory/embedder.py — abstract base + concrete implementations pattern.

Test scenarios:

LLM transformer: mock LLM gateway, verify prompt construction and response parsing
LLM transformer: verify fallback to original query on LLM error
Rule transformer: verify filler word removal and synonym expansion
Rule transformer: verify no-op when query is already well-formed
Integration: verify MemoryRetriever.retrieve() calls transformer before search
Integration: verify sub-queries are searched in parallel and results merged

Verification: All tests pass. MemoryRetriever with query transform enabled produces different (better) search calls than without.

U2. HttpRAGService Enhanced Search — 增强检索端点

Goal: Enable AgentKit to call GEO's EnhancedRAG endpoint with rerank and compression, improving retrieval precision from ~50% to ~80%+.

Requirements: R2

Dependencies: None

Files:

src/agentkit/memory/http_rag.py (modify)
src/agentkit/memory/semantic.py (modify)
src/agentkit/server/config.py (modify)
tests/unit/test_http_rag_service.py (modify)

Approach:

Add enhanced_search() method to HttpRAGService:
- Calls POST /bases/{kb_id}/retrieve for each configured knowledge base
- Passes use_rerank and use_compression parameters
- Merges results from multiple KBs, re-scores by reranked relevance
Add search_mode: "standard" | "enhanced" parameter to SemanticMemory.search():
- "standard": calls rag_service.search() (current behavior, backward compatible)
- "enhanced": calls rag_service.enhanced_search() with rerank+compression
Config additions under memory.semantic:
- search_mode: "enhanced" (default: "standard")
- use_rerank: true (default: true when enhanced)
- use_compression: false (default: false)
SemanticMemory.search() passes filters through to HttpRAGService to allow per-query override.

Patterns to follow: Existing search() method in http_rag.py — same HTTP client pattern, same error handling, same response normalization.

Test scenarios:

enhanced_search() with rerank enabled: verify correct endpoint and payload
enhanced_search() with compression enabled: verify payload includes use_compression: true
enhanced_search() with multiple KBs: verify parallel calls and result merging
enhanced_search() HTTP error: verify graceful fallback to empty results
SemanticMemory.search() with search_mode="enhanced": verify delegation to enhanced_search()
SemanticMemory.search() with search_mode="standard": verify existing behavior unchanged
Config parsing: verify search_mode, use_rerank, use_compression from YAML

Verification: All tests pass. enhanced_search() returns reranked results when GEO backend supports it.

U3. Structured Context Injection — 结构化上下文注入

Goal: Format RAG results with source attribution so the LLM can assess credibility and cite sources.

Requirements: R3

Dependencies: U1 (query transformer affects what results are returned)

Files:

src/agentkit/memory/retriever.py (modify)
src/agentkit/core/react.py (modify)
tests/unit/test_memory_integration.py (modify)

Approach:

Replace MemoryRetriever.get_context_string() with get_context_messages() that returns structured context:

### 知识库参考 [来源: 行业库 | 相关度: 0.92 | 文档: AI行业趋势报告]
AI行业在2025年呈现三大趋势...

### 过往经验 [来源: 情景记忆 | 任务类型: seo_analysis]
上次分析竞品SEO策略时发现...

Each MemoryItem is rendered with its metadata: source (rag/graph/episodic/working), score, document_title, kb_type.
ReActEngine.execute() calls get_context_messages() instead of get_context_string().
The injection heading changes from ## Relevant Past Experience to ## 参考信息 (bilingual-friendly).
Add context_template: "structured" | "flat" config option (default: "structured").

Patterns to follow: Current get_context_string() in retriever.py — same token budget logic, same parallel retrieval.

Test scenarios:

Structured format: verify each result has source header with metadata
Flat format: verify backward-compatible plain text output
Token budget: verify long results are truncated within budget
Mixed sources: verify RAG results and episodic memories are formatted differently
ReActEngine integration: verify system_prompt contains structured context
Empty results: verify no context section added when no results found

Verification: LLM receives structured context with source attribution. Backward compatible with context_template: "flat".

U4. RetrieveKnowledge Tool — ReAct 循环内二次检索

Goal: Enable the LLM to trigger additional knowledge base searches during ReAct reasoning by registering retrieve_knowledge as a built-in Tool.

Requirements: R4

Dependencies: U1, U3

Files:

src/agentkit/memory/retriever.py (modify)
src/agentkit/core/config_driven.py (modify)
src/agentkit/server/app.py (modify)
tests/unit/test_retrieve_knowledge_tool.py (create)

Approach:

Create RetrieveKnowledgeTool(Tool) inner class within MemoryRetriever:
- name: "retrieve_knowledge"
- description: "Search the knowledge base for additional information. Use when you need more context or facts."
- input_schema: {"type": "object", "properties": {"query": {"type": "string", "description": "Search query"}}, "required": ["query"]}
- execute(query): calls self._retriever.retrieve(query) and returns formatted results
Add create_retrieve_tool() -> Tool | None method to MemoryRetriever:
- Returns RetrieveKnowledgeTool instance if semantic memory is configured
- Returns None if no semantic memory (tool not available)
Auto-register the tool in ConfigDrivenAgent.__init__() and app.py when memory_retriever is created:
- if memory_retriever and memory_retriever.create_retrieve_tool(): agent.use_tool(tool)
The tool uses the same MemoryRetriever.retrieve() pipeline, so query transformation (U1) and structured formatting (U3) apply automatically.

Patterns to follow: agentkit/tools/base.py — Tool subclass pattern with execute() and safe_execute().

Test scenarios:

Tool creation: verify create_retrieve_tool() returns a Tool when semantic memory is configured
Tool creation: verify create_retrieve_tool() returns None when no semantic memory
Tool execution: verify execute(query="AI趋势") calls MemoryRetriever.retrieve() with the query
Tool execution: verify results are formatted as structured text
Tool schema: verify input_schema has query field
Auto-registration: verify ConfigDrivenAgent with semantic memory has retrieve_knowledge in its tool list
Auto-registration: verify agent without semantic memory does NOT have the tool
ReAct integration: verify LLM can call retrieve_knowledge during ReAct loop

Verification: Agent with semantic memory has retrieve_knowledge tool. LLM can call it during reasoning. Results are formatted with source attribution.

U5. Configurable Retrieval + Per-KB Weights — 可配置参数与差异化权重

Goal: Make retrieval parameters configurable and support per-knowledge-base weight differentiation.

Requirements: R5, R6

Dependencies: U2, U3

Files:

src/agentkit/core/react.py (modify)
src/agentkit/memory/retriever.py (modify)
src/agentkit/server/config.py (modify)
src/agentkit/core/config_driven.py (modify)
tests/unit/test_memory_integration.py (modify)

Approach:

Configurable retrieval parameters:
- Add retrieval sub-section to memory config:
```
memory:
  retrieval:
    top_k: 5
    token_budget: 2000
    context_template: "structured"
```
- ReActEngine.execute() reads these from SkillConfig.memory.retrieval or falls back to defaults.
- Pass retrieval_config through ConfigDrivenAgent._handle_react() to ReActEngine.execute().
Per-KB weights:
- Add kb_weights to memory.semantic config:
```
memory:
  semantic:
    kb_weights:
      "industry-kb-id": 1.2    # 行业库权重更高
      "enterprise-kb-id": 0.8  # 企业库权重较低
```
- SemanticMemory.search() applies kb_weights as score multipliers after retrieval.
- MemoryRetriever passes kb_weights through filters to SemanticMemory.search().
Token estimation improvement:
- Replace len(text) // 4 with a slightly better heuristic: max(len(text) // 3, len(text.split())) for mixed Chinese/English content. Not perfect but significantly better for CJK text.

Patterns to follow: Existing config pattern in ServerConfig.from_dict() — same dict-based config with env var resolution.

Test scenarios:

Config parsing: verify retrieval.top_k, retrieval.token_budget, retrieval.context_template from YAML
Config parsing: verify semantic.kb_weights from YAML
ReActEngine: verify configurable top_k and token_budget are used instead of hardcoded values
Per-KB weights: verify industry KB results get higher scores than enterprise KB results
Per-KB weights: verify unweighted KBs get default score (1.0 multiplier)
Token estimation: verify improved heuristic for Chinese text
Backward compatibility: verify defaults match current hardcoded values when config is absent

Verification: Retrieval parameters are configurable via YAML. Per-KB weights are applied. No behavior change when config is absent.

Scope Boundaries

In Scope

Query rewriting (LLM + rule-based)
Enhanced retrieval with rerank/compression
Structured context injection with source attribution
retrieve_knowledge Tool for iterative retrieval
Configurable retrieval parameters
Per-knowledge-base weight differentiation

Deferred to Follow-Up Work

Cross-encoder reranking model (GEO currently uses LLM-based reranking, which is sufficient)
Full-text search upgrade (GEO's ILIKE → ts_vector is a backend-only change)
Semantic memory protocol formalization (ABC for rag_service)
Caching layer for frequent queries
Multi-hop retrieval (retrieval → extraction → retrieval chains)
Retrieval metrics and observability (hit rate, latency tracking)

Risks and Mitigations

Risk	Impact	Mitigation
LLM query rewriting adds latency (~500ms per task)	Medium	Async execution; fallback to rule-based when LLM unavailable; configurable on/off
Enhanced retrieval endpoint may not exist on all backends	Low	`search_mode: "standard"` is default; `enhanced_search()` falls back to `search()` on 404
`retrieve_knowledge` tool may cause infinite retrieval loops	Medium	ReAct `max_steps` already limits total iterations; add `max_retrieval_calls` config (default: 3)
Per-KB weights require knowing KB IDs at config time	Low	Weights are optional; unweighted KBs use default multiplier (1.0)

System-Wide Impact

ReActEngine: New parameters for configurable retrieval; context injection format change
MemoryRetriever: Query transformation pipeline; structured context output; tool creation
HttpRAGService: New enhanced_search() method
SemanticMemory: search_mode parameter; kb_weights support
ConfigDrivenAgent: Auto-registration of retrieve_knowledge tool; config-driven retrieval parameters
ServerConfig: New config sections for memory.retrieval and memory.semantic.kb_weights
GEO backend: No changes required — EnhancedRAG endpoints already exist

Phased Delivery

Phase	Units	Focus
Phase A: Query Quality	U1, U2	Query rewriting + enhanced retrieval
Phase B: Context Quality	U3, U4	Structured injection + iterative retrieval
Phase C: Configurability	U5	Configurable parameters + per-KB weights

18 KiB Raw Blame History Unescape Escape

feat: AgentKit RAG Pipeline Optimization

Summary

Problem Frame

Requirements

Key Technical Decisions

KTD-1: Query rewriting via LLM vs rule-based

KTD-2: Enhanced retrieval via new endpoint vs extending existing

KTD-3: RAG Tool as built-in vs skill-defined

KTD-4: Context injection format

KTD-5: Per-knowledge-base weight via filters

Implementation Units

U1. QueryTransformer — Query 改写与扩展

U2. HttpRAGService Enhanced Search — 增强检索端点

U3. Structured Context Injection — 结构化上下文注入

U4. RetrieveKnowledge Tool — ReAct 循环内二次检索

U5. Configurable Retrieval + Per-KB Weights — 可配置参数与差异化权重

Scope Boundaries

In Scope

Deferred to Follow-Up Work

Risks and Mitigations

System-Wide Impact

Phased Delivery

18 KiB

Raw Blame History