fischer-agentkit/docs/plans/2026-06-06-009-feat-agentki...

18 KiB
Raw Blame History

title status created plan-type origin
feat: AgentKit RAG Pipeline Optimization active 2026-06-06 feat RAG 场景问题分析6 个问题P0×2, P1×3, P2×1

feat: AgentKit RAG Pipeline Optimization

Summary

Optimize the AgentKit RAG pipeline to improve retrieval quality and LLM answer accuracy. The current pipeline passes raw user queries directly to the knowledge base, lacks reranking, injects context without source attribution, and has no mechanism for iterative retrieval during ReAct reasoning. This plan addresses 6 identified issues across 5 implementation units.

Problem Frame

AgentKit's RAG integration works end-to-end but has critical quality gaps:

  1. Query quality — Raw user queries (often vague or conversational) are sent directly to the knowledge base, resulting in poor recall
  2. Retrieval quality — The /search endpoint bypasses GEO's EnhancedRAG (rerank + compression), returning unranked results
  3. Context injection — Knowledge base results are injected as a flat text block without source attribution, making it hard for the LLM to assess credibility
  4. Iterative retrieval — Only one retrieval happens before the ReAct loop; the LLM cannot request more information mid-reasoning
  5. Configurabilitytop_k and token_budget are hardcoded in ReActEngine.execute()
  6. Source differentiation — All knowledge bases are treated equally regardless of authority or recency

Requirements

ID Requirement Priority
R1 Query rewriting: transform vague user queries into structured retrieval queries before searching P0
R2 Enhanced retrieval: call GEO's /bases/{kb_id}/retrieve endpoint with rerank+compression support P0
R3 Structured context injection: format RAG results with source attribution (title, score, kb type) P1
R4 Iterative retrieval: register retrieve_knowledge as a built-in Tool for mid-reasoning search P1
R5 Configurable retrieval parameters: top_k, token_budget, retrieval_strategy from config P1
R6 Per-knowledge-base weight differentiation: industry vs enterprise weights P2

Key Technical Decisions

KTD-1: Query rewriting via LLM vs rule-based

Decision: LLM-based query rewriting with a lightweight prompt, falling back to rule-based when no LLM gateway is available.

Rationale: Rule-based rewriting (keyword extraction, synonym expansion) is fast but limited. LLM rewriting can decompose complex queries, infer intent, and generate multiple sub-queries. The cost is one additional LLM call per task, which is acceptable given the retrieval quality improvement. The fallback ensures the system works without an LLM gateway.

Alternative considered: Pure rule-based rewriting — rejected because it cannot handle the diverse query patterns in GEO/SEO domain (e.g., "帮我分析一下竞品的SEO策略" → needs decomposition into "竞品SEO策略分析" + "行业SEO最佳实践").

KTD-2: Enhanced retrieval via new endpoint vs extending existing

Decision: Add enhanced_search() method to HttpRAGService that calls GEO's /bases/{kb_id}/retrieve endpoint, keeping the existing search() method for backward compatibility.

Rationale: The GEO backend already has EnhancedRAG.retrieve_with_rerank() exposed at POST /bases/{kb_id}/retrieve. Adding a new method avoids breaking existing consumers while enabling rerank+compression. The config controls which method is used.

KTD-3: RAG Tool as built-in vs skill-defined

Decision: Register retrieve_knowledge as a built-in Tool in MemoryRetriever, auto-registered when semantic memory is configured.

Rationale: Making RAG retrieval a Tool (rather than only a pre-execution step) lets the LLM trigger additional searches during ReAct reasoning. Auto-registration when semantic memory is configured means zero-config for the common case. The Tool is created by MemoryRetriever and injected into the agent's tool list.

KTD-4: Context injection format

Decision: Use structured markdown with source blocks instead of flat text.

Rationale: The current ## Relevant Past Experience\n{raw_text} format gives the LLM no way to distinguish high-quality knowledge base results from episodic memories, or to cite sources. Structured blocks with [来源: 行业库 | 置信度: 0.92 | 文档: 行业报告] headers let the LLM assess credibility and cite appropriately.

KTD-5: Per-knowledge-base weight via filters

Decision: Extend MemoryRetriever weights to support per-source-type multipliers, configured via memory.semantic.kb_weights in the YAML config.

Rationale: Industry knowledge bases (curated, authoritative) should have higher weight than enterprise-specific ones (narrow, potentially outdated). A simple multiplier per kb_id is sufficient — no need for complex authority scoring.


Implementation Units

U1. QueryTransformer — Query 改写与扩展

Goal: Transform raw user queries into structured retrieval queries before searching the knowledge base, improving recall from ~30% to ~70%+.

Requirements: R1

Dependencies: None

Files:

  • src/agentkit/memory/query_transformer.py (create)
  • tests/unit/test_query_transformer.py (create)

Approach:

  • Create QueryTransformer class with two strategies:
    • LLMQueryTransformer: Uses LLM gateway to rewrite queries. Prompt instructs the LLM to: (a) extract core intent, (b) decompose complex queries into 1-3 sub-queries, (c) add domain-specific terms. Returns a TransformedQuery with main_query and sub_queries.
    • RuleQueryTransformer: Fallback that applies rule-based transformations — strip filler words, extract noun phrases, add domain synonyms from a configurable map.
  • TransformedQuery dataclass: main_query: str, sub_queries: list[str], original_query: str.
  • QueryTransformer is called by MemoryRetriever.retrieve() before dispatching to memory layers.
  • Config: memory.query_transform.enabled: bool, memory.query_transform.strategy: "llm" | "rule", memory.query_transform.max_sub_queries: int = 3.

Patterns to follow: agentkit/memory/embedder.py — abstract base + concrete implementations pattern.

Test scenarios:

  • LLM transformer: mock LLM gateway, verify prompt construction and response parsing
  • LLM transformer: verify fallback to original query on LLM error
  • Rule transformer: verify filler word removal and synonym expansion
  • Rule transformer: verify no-op when query is already well-formed
  • Integration: verify MemoryRetriever.retrieve() calls transformer before search
  • Integration: verify sub-queries are searched in parallel and results merged

Verification: All tests pass. MemoryRetriever with query transform enabled produces different (better) search calls than without.


U2. HttpRAGService Enhanced Search — 增强检索端点

Goal: Enable AgentKit to call GEO's EnhancedRAG endpoint with rerank and compression, improving retrieval precision from ~50% to ~80%+.

Requirements: R2

Dependencies: None

Files:

  • src/agentkit/memory/http_rag.py (modify)
  • src/agentkit/memory/semantic.py (modify)
  • src/agentkit/server/config.py (modify)
  • tests/unit/test_http_rag_service.py (modify)

Approach:

  • Add enhanced_search() method to HttpRAGService:
    • Calls POST /bases/{kb_id}/retrieve for each configured knowledge base
    • Passes use_rerank and use_compression parameters
    • Merges results from multiple KBs, re-scores by reranked relevance
  • Add search_mode: "standard" | "enhanced" parameter to SemanticMemory.search():
    • "standard": calls rag_service.search() (current behavior, backward compatible)
    • "enhanced": calls rag_service.enhanced_search() with rerank+compression
  • Config additions under memory.semantic:
    • search_mode: "enhanced" (default: "standard")
    • use_rerank: true (default: true when enhanced)
    • use_compression: false (default: false)
  • SemanticMemory.search() passes filters through to HttpRAGService to allow per-query override.

Patterns to follow: Existing search() method in http_rag.py — same HTTP client pattern, same error handling, same response normalization.

Test scenarios:

  • enhanced_search() with rerank enabled: verify correct endpoint and payload
  • enhanced_search() with compression enabled: verify payload includes use_compression: true
  • enhanced_search() with multiple KBs: verify parallel calls and result merging
  • enhanced_search() HTTP error: verify graceful fallback to empty results
  • SemanticMemory.search() with search_mode="enhanced": verify delegation to enhanced_search()
  • SemanticMemory.search() with search_mode="standard": verify existing behavior unchanged
  • Config parsing: verify search_mode, use_rerank, use_compression from YAML

Verification: All tests pass. enhanced_search() returns reranked results when GEO backend supports it.


U3. Structured Context Injection — 结构化上下文注入

Goal: Format RAG results with source attribution so the LLM can assess credibility and cite sources.

Requirements: R3

Dependencies: U1 (query transformer affects what results are returned)

Files:

  • src/agentkit/memory/retriever.py (modify)
  • src/agentkit/core/react.py (modify)
  • tests/unit/test_memory_integration.py (modify)

Approach:

  • Replace MemoryRetriever.get_context_string() with get_context_messages() that returns structured context:
    ### 知识库参考 [来源: 行业库 | 相关度: 0.92 | 文档: AI行业趋势报告]
    AI行业在2025年呈现三大趋势...
    
    ### 过往经验 [来源: 情景记忆 | 任务类型: seo_analysis]
    上次分析竞品SEO策略时发现...
    
  • Each MemoryItem is rendered with its metadata: source (rag/graph/episodic/working), score, document_title, kb_type.
  • ReActEngine.execute() calls get_context_messages() instead of get_context_string().
  • The injection heading changes from ## Relevant Past Experience to ## 参考信息 (bilingual-friendly).
  • Add context_template: "structured" | "flat" config option (default: "structured").

Patterns to follow: Current get_context_string() in retriever.py — same token budget logic, same parallel retrieval.

Test scenarios:

  • Structured format: verify each result has source header with metadata
  • Flat format: verify backward-compatible plain text output
  • Token budget: verify long results are truncated within budget
  • Mixed sources: verify RAG results and episodic memories are formatted differently
  • ReActEngine integration: verify system_prompt contains structured context
  • Empty results: verify no context section added when no results found

Verification: LLM receives structured context with source attribution. Backward compatible with context_template: "flat".


U4. RetrieveKnowledge Tool — ReAct 循环内二次检索

Goal: Enable the LLM to trigger additional knowledge base searches during ReAct reasoning by registering retrieve_knowledge as a built-in Tool.

Requirements: R4

Dependencies: U1, U3

Files:

  • src/agentkit/memory/retriever.py (modify)
  • src/agentkit/core/config_driven.py (modify)
  • src/agentkit/server/app.py (modify)
  • tests/unit/test_retrieve_knowledge_tool.py (create)

Approach:

  • Create RetrieveKnowledgeTool(Tool) inner class within MemoryRetriever:
    • name: "retrieve_knowledge"
    • description: "Search the knowledge base for additional information. Use when you need more context or facts."
    • input_schema: {"type": "object", "properties": {"query": {"type": "string", "description": "Search query"}}, "required": ["query"]}
    • execute(query): calls self._retriever.retrieve(query) and returns formatted results
  • Add create_retrieve_tool() -> Tool | None method to MemoryRetriever:
    • Returns RetrieveKnowledgeTool instance if semantic memory is configured
    • Returns None if no semantic memory (tool not available)
  • Auto-register the tool in ConfigDrivenAgent.__init__() and app.py when memory_retriever is created:
    • if memory_retriever and memory_retriever.create_retrieve_tool(): agent.use_tool(tool)
  • The tool uses the same MemoryRetriever.retrieve() pipeline, so query transformation (U1) and structured formatting (U3) apply automatically.

Patterns to follow: agentkit/tools/base.py — Tool subclass pattern with execute() and safe_execute().

Test scenarios:

  • Tool creation: verify create_retrieve_tool() returns a Tool when semantic memory is configured
  • Tool creation: verify create_retrieve_tool() returns None when no semantic memory
  • Tool execution: verify execute(query="AI趋势") calls MemoryRetriever.retrieve() with the query
  • Tool execution: verify results are formatted as structured text
  • Tool schema: verify input_schema has query field
  • Auto-registration: verify ConfigDrivenAgent with semantic memory has retrieve_knowledge in its tool list
  • Auto-registration: verify agent without semantic memory does NOT have the tool
  • ReAct integration: verify LLM can call retrieve_knowledge during ReAct loop

Verification: Agent with semantic memory has retrieve_knowledge tool. LLM can call it during reasoning. Results are formatted with source attribution.


U5. Configurable Retrieval + Per-KB Weights — 可配置参数与差异化权重

Goal: Make retrieval parameters configurable and support per-knowledge-base weight differentiation.

Requirements: R5, R6

Dependencies: U2, U3

Files:

  • src/agentkit/core/react.py (modify)
  • src/agentkit/memory/retriever.py (modify)
  • src/agentkit/server/config.py (modify)
  • src/agentkit/core/config_driven.py (modify)
  • tests/unit/test_memory_integration.py (modify)

Approach:

  • Configurable retrieval parameters:
    • Add retrieval sub-section to memory config:
      memory:
        retrieval:
          top_k: 5
          token_budget: 2000
          context_template: "structured"
      
    • ReActEngine.execute() reads these from SkillConfig.memory.retrieval or falls back to defaults.
    • Pass retrieval_config through ConfigDrivenAgent._handle_react() to ReActEngine.execute().
  • Per-KB weights:
    • Add kb_weights to memory.semantic config:
      memory:
        semantic:
          kb_weights:
            "industry-kb-id": 1.2    # 行业库权重更高
            "enterprise-kb-id": 0.8  # 企业库权重较低
      
    • SemanticMemory.search() applies kb_weights as score multipliers after retrieval.
    • MemoryRetriever passes kb_weights through filters to SemanticMemory.search().
  • Token estimation improvement:
    • Replace len(text) // 4 with a slightly better heuristic: max(len(text) // 3, len(text.split())) for mixed Chinese/English content. Not perfect but significantly better for CJK text.

Patterns to follow: Existing config pattern in ServerConfig.from_dict() — same dict-based config with env var resolution.

Test scenarios:

  • Config parsing: verify retrieval.top_k, retrieval.token_budget, retrieval.context_template from YAML
  • Config parsing: verify semantic.kb_weights from YAML
  • ReActEngine: verify configurable top_k and token_budget are used instead of hardcoded values
  • Per-KB weights: verify industry KB results get higher scores than enterprise KB results
  • Per-KB weights: verify unweighted KBs get default score (1.0 multiplier)
  • Token estimation: verify improved heuristic for Chinese text
  • Backward compatibility: verify defaults match current hardcoded values when config is absent

Verification: Retrieval parameters are configurable via YAML. Per-KB weights are applied. No behavior change when config is absent.


Scope Boundaries

In Scope

  • Query rewriting (LLM + rule-based)
  • Enhanced retrieval with rerank/compression
  • Structured context injection with source attribution
  • retrieve_knowledge Tool for iterative retrieval
  • Configurable retrieval parameters
  • Per-knowledge-base weight differentiation

Deferred to Follow-Up Work

  • Cross-encoder reranking model (GEO currently uses LLM-based reranking, which is sufficient)
  • Full-text search upgrade (GEO's ILIKE → ts_vector is a backend-only change)
  • Semantic memory protocol formalization (ABC for rag_service)
  • Caching layer for frequent queries
  • Multi-hop retrieval (retrieval → extraction → retrieval chains)
  • Retrieval metrics and observability (hit rate, latency tracking)

Risks and Mitigations

Risk Impact Mitigation
LLM query rewriting adds latency (~500ms per task) Medium Async execution; fallback to rule-based when LLM unavailable; configurable on/off
Enhanced retrieval endpoint may not exist on all backends Low search_mode: "standard" is default; enhanced_search() falls back to search() on 404
retrieve_knowledge tool may cause infinite retrieval loops Medium ReAct max_steps already limits total iterations; add max_retrieval_calls config (default: 3)
Per-KB weights require knowing KB IDs at config time Low Weights are optional; unweighted KBs use default multiplier (1.0)

System-Wide Impact

  • ReActEngine: New parameters for configurable retrieval; context injection format change
  • MemoryRetriever: Query transformation pipeline; structured context output; tool creation
  • HttpRAGService: New enhanced_search() method
  • SemanticMemory: search_mode parameter; kb_weights support
  • ConfigDrivenAgent: Auto-registration of retrieve_knowledge tool; config-driven retrieval parameters
  • ServerConfig: New config sections for memory.retrieval and memory.semantic.kb_weights
  • GEO backend: No changes required — EnhancedRAG endpoints already exist

Phased Delivery

Phase Units Focus
Phase A: Query Quality U1, U2 Query rewriting + enhanced retrieval
Phase B: Context Quality U3, U4 Structured injection + iterative retrieval
Phase C: Configurability U5 Configurable parameters + per-KB weights