fischer-agentkit/docs/plans/2026-06-23-002-feat-documen...

---
date: 2026-06-23
status: active
origin: docs/brainstorms/2026-06-23-document-processing-capability-requirements.md
---

# feat: Document Processing Capability

## Summary

为 AgentKit 增加文档处理能力，v1 聚焦 Word/Excel/PDF 三种格式的创建和读取，以及 Word 模板填充。通过自研 DocumentService 统一封装所有文档操作（python-docx/openpyxl/reportlab/python-docx-template），Agent 工具和前端 REST API 共用同一套业务逻辑。生成的文档保存在服务器并持久化元数据，对话中返回文件卡片，同时在右侧面板展示当前对话的文档列表。

## Problem Frame

当前 Agent 工具集没有格式化文档处理能力。用户需要生成报告、合同、数据表等文档时，Agent 只能通过 shell 创建纯文本文件。

原计划集成 MCP Document Tools，但功能验证发现：版本 0.1.0 未验证状态不建议生产使用、不支持模板填充（核心需求）、Office→PDF 仅限 docx。因此改为全部自研，使用成熟的 python-docx/openpyxl/reportlab/python-docx-template 库，完全可控且无外部依赖风险。

## Requirements

Traceability to origin requirements doc (R-IDs preserved):

- R1-R4: 文档处理能力（Word/Excel/PDF 创建 + 读取）
- R5-R8: Agent 工具集成
- R9-R10: 前端界面
- R11-R16: 文件存储与生命周期
- R17-R18: 对话中文档展示
- R19-R22: 右侧文档/附件面板
- R23-R25: 模板填充（Word only）
- R26-R28: 安全

## Key Technical Decisions

- **自研而非 MCP 集成** — MCP Document Tools 版本 0.1.0 未验证、不支持模板填充、不建议生产使用。改用成熟的生产级库：python-docx（Word）、openpyxl（Excel）、reportlab（PDF）、python-docx-template（Word 模板填充）。MCP Document Tools 降级为可选增强，不在 v1 范围。
- **DocumentService 统一封装** — DocumentService 作为唯一业务逻辑层，Agent 工具和前端 REST API 都是薄封装。内部按格式分派到对应的 renderer 模块。
- **Agent 生成 Markdown，Service 负责格式映射** — Agent 生成 Markdown 格式的结构化内容，DocumentService 内部有 Markdown→Word/Excel/PDF 的 renderer，将 Markdown 结构映射为目标格式。Agent 不直接操作 Office XML。
- **数据库用 aiosqlite 裸连接** — 遵循项目现有模式（auth.py 的 `aiosqlite.connect`），不引入 SQLAlchemy session 依赖注入。文档元数据表用原生 SQL 建表。
- **Jinja2 沙箱化** — 模板填充使用 `jinja2.sandbox.SandboxedEnvironment`，防止 SSTI 攻击。
- **文件存储复用 data/uploads/** — 复用现有上传目录和 `_sanitize_filename` 函数，但下载 API 新增认证。

---

## Implementation Units

### U1. DocumentService 核心架构 + 数据库模型

**Goal:** 建立 DocumentService 骨架和文档元数据持久化基础。

**Requirements:** R11, R13, R14, R15, R16

**Dependencies:** 无

**Files:**
- `src/agentkit/documents/__init__.py`（新建）
- `src/agentkit/documents/service.py`（新建）
- `src/agentkit/documents/models.py`（新建）
- `src/agentkit/documents/db.py`（新建）
- `pyproject.toml`（修改：添加 python-docx, openpyxl, reportlab, docxtpl, jinja2 依赖）

**Approach:**
- `DocumentService` 类：`create_document(format, content, conversation_id, template_path?) -> DocumentMeta`、`get_conversation_documents(conversation_id) -> list[DocumentMeta]`、`get_download_path(doc_id) -> Path`
- `DocumentMeta` dataclass：`id, filename, stored_name, format, size, conversation_id, created_at, download_url`
- 数据库表 `documents`：id (UUID), filename, stored_name, format, size, conversation_id, created_at。用 aiosqlite 裸连接，`init_documents_db()` 建表。
- 文件存储：UUID + 扩展名，存到 `data/uploads/`，复用 `_sanitize_filename`。

**Patterns to follow:** `src/agentkit/server/auth/models.py`（aiosqlite 模式）、`src/agentkit/server/routes/chat.py` 的 `_sanitize_filename` 函数。

**Test scenarios:**
- Happy path: 创建文档元数据记录，查询返回正确数据
- Edge case: 不存在的 conversation_id 返回空列表
- Edge case: 文件名包含路径遍历字符（../）被清洗
- Integration: init_documents_db 幂等（重复调用不报错）

**Verification:** 运行 `pytest tests/documents/test_db.py`，确认元数据 CRUD 和文件存储正常。

---

### U2. Word 文档创建（python-docx + Markdown→Word 映射）

**Goal:** 实现 Markdown→Word 的格式映射，Agent 生成 Markdown 内容，DocumentService 生成 .docx 文件。

**Requirements:** R1

**Dependencies:** U1

**Files:**
- `src/agentkit/documents/renderers/__init__.py`（新建）
- `src/agentkit/documents/renderers/word_renderer.py`（新建）
- `tests/documents/test_word_renderer.py`（新建）

**Approach:**
- `WordRenderer.render(markdown_content: str, output_path: Path) -> Path`
- Markdown 解析：用 `markdown` 库解析为 AST，遍历 AST 映射到 python-docx 对象：
  - `# 标题` → `doc.add_heading(text, level=1)`
  - `## 二级标题` → `doc.add_heading(text, level=2)`
  - 段落 → `doc.add_paragraph(text)`
  - `- 列表项` → `doc.add_paragraph(text, style='List Bullet')`
  - `1. 有序列表` → `doc.add_paragraph(text, style='List Number')`
  - Markdown 表格 → `doc.add_table(rows, cols)` + 填充
  - `**粗体**` → run with `bold=True`
  - `*斜体*` → run with `italic=True`

**Patterns to follow:** python-docx 官方文档的基本用法。

**Test scenarios:**
- Happy path: 包含标题、段落、列表、表格的 Markdown 生成正确的 .docx
- Edge case: 空 Markdown 生成空文档（只有标题或完全空）
- Edge case: 嵌套格式（粗体+斜体混合）正确渲染
- Error path: 无效 Markdown 不崩溃，按纯文本处理

**Verification:** 运行 `pytest tests/documents/test_word_renderer.py`，打开生成的 .docx 确认格式正确。

---

### U3. Excel 文档创建（openpyxl + Markdown 表格→Excel 映射）

**Goal:** 实现 Markdown 表格/JSON→Excel 的格式映射。

**Requirements:** R2

**Dependencies:** U1

**Files:**
- `src/agentkit/documents/renderers/excel_renderer.py`（新建）
- `tests/documents/test_excel_renderer.py`（新建）

**Approach:**
- `ExcelRenderer.render(markdown_content: str, output_path: Path) -> Path`
- 解析 Markdown 中的表格（`| col1 | col2 |` 格式），每个表格映射到一个 worksheet
- 非表格文本（标题、段落）作为注释行或单独的 "Summary" sheet
- 支持 JSON 格式输入：`{"Sheet1": [["A1","B1"],["A2","B2"]]}`（当 content 是有效 JSON 时走 JSON 路径）

**Patterns to follow:** openpyxl 官方文档的基本用法。

**Test scenarios:**
- Happy path: Markdown 表格生成正确的 .xlsx，数据对齐
- Happy path: JSON 格式输入生成多 sheet Excel
- Edge case: 无表格的 Markdown 生成单 sheet 纯文本
- Edge case: 多个表格生成多个 sheet

**Verification:** 运行 `pytest tests/documents/test_excel_renderer.py`，打开生成的 .xlsx 确认数据正确。

---

### U4. PDF 文档创建（reportlab + Markdown→PDF 映射）

**Goal:** 实现 Markdown→PDF 的格式映射，使用 reportlab 生成 PDF。

**Requirements:** R3

**Dependencies:** U1

**Files:**
- `src/agentkit/documents/renderers/pdf_renderer.py`（新建）
- `tests/documents/test_pdf_renderer.py`（新建）

**Approach:**
- `PDFRenderer.render(markdown_content: str, output_path: Path) -> Path`
- 用 reportlab 的 `SimpleDocTemplate` + `Paragraph` + `Table` + `ListFlowable`
- Markdown 解析同 U2，映射到 reportlab flowables：
  - `# 标题` → `Paragraph(text, Heading1 style)`
  - 段落 → `Paragraph(text, Normal style)`
  - 列表 → `ListFlowable([ListItem(...)])`
  - 表格 → `Table(data)` + 基础样式
  - `**粗体**` → `<b>text</b>`（reportlab Paragraph 支持 HTML 标签）

**Patterns to follow:** reportlab 官方文档。

**Test scenarios:**
- Happy path: 包含标题、段落、列表、表格的 Markdown 生成正确的 PDF
- Edge case: 空 Markdown 生成空白 PDF
- Edge case: 中文字符正确渲染（需注册中文字体）
- Error path: 无效 Markdown 不崩溃

**Verification:** 运行 `pytest tests/documents/test_pdf_renderer.py`，打开生成的 PDF 确认格式和中文渲染。

---

### U5. Word 模板填充（python-docx-template + Jinja2 沙箱）

**Goal:** 实现 Word 模板填充，用户上传 .docx 模板，Agent 提供数据，填充 Jinja2 占位符。

**Requirements:** R23, R24, R25, R26

**Dependencies:** U1, U2

**Files:**
- `src/agentkit/documents/renderers/template_renderer.py`（新建）
- `tests/documents/test_template_renderer.py`（新建）

**Approach:**
- `TemplateRenderer.render(template_path: Path, data: dict, output_path: Path) -> Path`
- 用 `docxtpl.DocxTemplate(template_path)` 加载模板
- 用 `jinja2.sandbox.SandboxedEnvironment` 创建沙箱环境
- `template.render(data)` 填充数据
- 支持 `{{variable}}`、`{% if %}`、`{% for %}` 基本控制结构

**Patterns to follow:** python-docx-template 官方文档。

**Test scenarios:**
- Happy path: 模板包含 `{{name}}`，data=`{"name":"张三"}`，输出文档中 "张三" 替换占位符
- Happy path: `{% for item in items %}` 循环正确展开
- Happy path: `{% if condition %}` 条件渲染正确
- Security: SSTI 攻击 payload（`{{config.__class__}}`）被沙箱拦截
- Edge case: 模板无占位符时原样输出
- Error path: data 缺少变量时，占位符保持原样或清空（不崩溃）

**Verification:** 运行 `pytest tests/documents/test_template_renderer.py`，确认填充和沙箱安全。

---

### U6. Agent 工具封装（DocumentTool）

**Goal:** 创建 Agent 工具，LLM 通过 function calling 触发文档创建。

**Requirements:** R5, R6, R7, R8

**Dependencies:** U1, U2, U3, U4, U5

**Files:**
- `src/agentkit/tools/document_tool.py`（新建）
- `src/agentkit/server/app.py`（修改：注册 DocumentTool）
- `tests/tools/test_document_tool.py`（新建）

**Approach:**
- `DocumentTool(service: DocumentService)` 继承 `Tool`
- `name = "document"`，`description = "创建格式化文档（Word/Excel/PDF）或填充 Word 模板"`
- `input_schema`：
  ```json
  {
    "type": "object",
    "properties": {
      "format": {"type": "string", "enum": ["word", "excel", "pdf"]},
      "content": {"type": "string", "description": "Markdown 格式的文档内容"},
      "template": {"type": "string", "description": "模板文件路径（可选，仅 word）"},
      "template_data": {"type": "object", "description": "模板填充数据（可选）"}
    },
    "required": ["format", "content"]
  }
  ```
- `execute()` 调用 `service.create_document()`，返回 `{"success": True, "filename": ..., "download_url": ..., "size": ...}`
- 在 `app.py` 中注册：`tool_registry.register(DocumentTool(service=document_service))`

**Patterns to follow:** `src/agentkit/tools/memory_tool.py`（Tool 基类模式、input_schema、execute 返回格式）。

**Test scenarios:**
- Happy path: format=word, content="# 标题\n段落" → 返回 success + download_url
- Happy path: format=pdf, content="..." → 返回 success + download_url
- Happy path: format=word + template + template_data → 模板填充成功
- Error path: format 无效 → 返回 success=False + error message
- Error path: content 为空 → 返回 success=False + error message
- Integration: 工具注册后 agent._tool_registry.get("document") 能获取到

**Verification:** 运行 `pytest tests/tools/test_document_tool.py`，确认工具注册和调用正常。

---

### U7. REST API 路由

**Goal:** 为前端提供文档处理的 REST API。

**Requirements:** R9, R10, R12, R27, R28

**Dependencies:** U1, U2, U3, U4, U5

**Files:**
- `src/agentkit/server/routes/documents.py`（新建）
- `src/agentkit/server/app.py`（修改：注册 documents router）
- `tests/routes/test_documents.py`（新建）

**Approach:**
- `router = APIRouter(prefix="/documents", tags=["documents"])`
- 端点：
  - `POST /api/v1/documents/create` — 创建文档（body: format, content, conversation_id, template?）
  - `POST /api/v1/documents/upload-template` — 上传模板文件（带认证）
  - `GET /api/v1/documents/conversation/{conversation_id}` — 获取对话的文档列表
  - `GET /api/v1/documents/download/{doc_id}` — 下载文档（带认证）
- 认证：复用 `Depends(_verify_api_key)` 模式
- 文件大小限制：50MB

**Patterns to follow:** `src/agentkit/server/routes/chat.py`（APIRouter 模式、文件上传/下载）、`src/agentkit/server/routes/kb_management.py`（认证模式）。

**Test scenarios:**
- Happy path: POST /create format=word → 200 + 文件元信息
- Happy path: GET /conversation/{id} → 200 + 文档列表
- Happy path: GET /download/{doc_id} → 200 + 文件流
- Security: 未认证请求 → 401
- Edge case: 不存在的 doc_id → 404
- Edge case: 文件超过 50MB → 413

**Verification:** 运行 `pytest tests/routes/test_documents.py`，用 curl 验证端点。

---

### U8. 前端文件卡片 + 右侧文档面板

**Goal:** 对话中渲染文件卡片，右侧面板展示当前对话的文档列表。

**Requirements:** R17, R18, R19, R20, R21, R22

**Dependencies:** U7

**Files:**
- `src/agentkit/server/frontend/src/components/chat/messages/DocumentCard.vue`（新建）
- `src/agentkit/server/frontend/src/components/chat/DocumentPanel.vue`（新建，右侧面板）
- `src/agentkit/server/frontend/src/stores/documents.ts`（新建，Pinia store）
- `src/agentkit/server/frontend/src/api/documents.ts`（新建，API client）
- `src/agentkit/server/frontend/src/views/ChatView.vue`（修改：集成右侧面板）
- `src/agentkit/server/frontend/src/stores/chat.ts`（修改：token 事件中检测文件元信息并更新 documents store）

**Approach:**
- `DocumentCard.vue`：复用 `FileAttachment.vue` 的设计，显示文件名、格式图标、大小、下载按钮。作为新的消息渲染类型。
- `DocumentPanel.vue`：右侧可折叠面板，展示当前对话的文档列表，每项显示文件名、格式图标、生成时间、下载链接。
- `stores/documents.ts`：`documentsByConversation: ref<Map<string, DocumentMeta[]>>`，`fetchDocuments(convId)`，`addDocument(convId, doc)`。
- `api/documents.ts`：`createDocument()`、`getConversationDocuments()`、`getDownloadUrl()`。
- ChatView 集成：在聊天区域右侧添加 DocumentPanel，根据当前 conversationId 加载文档列表。
- chat store 集成：当 Agent 工具返回文件元信息时，自动更新 documents store。

**Patterns to follow:** `src/agentkit/server/frontend/src/components/chat/messages/FileAttachment.vue`（组件模式）、`src/agentkit/server/frontend/src/stores/chat.ts`（Pinia store 模式）、`src/agentkit/server/frontend/src/api/client.ts`（API client 模式）。

**Test scenarios:**
- Happy path: Agent 生成文档后，对话中显示文件卡片
- Happy path: 右侧面板自动更新，显示新文档
- Happy path: 点击下载按钮，浏览器下载文件
- Happy path: 切换对话，面板显示对应对话的文档列表
- UI: 面板可折叠/展开
- Edge case: 对话无文档时，面板显示空状态

**Verification:** 启动前端开发服务器，手动测试文件卡片渲染和右侧面板交互。

---

### U9. 文档读取能力（复用 DocumentLoader）

**Goal:** Agent 能读取用户上传的 Word/Excel/PDF 文档内容。

**Requirements:** R4

**Dependencies:** U1

**Files:**
- `src/agentkit/tools/document_tool.py`（修改：添加 read 操作）
- `src/agentkit/memory/document_loader.py`（修改：确保 openpyxl 读取支持，或新增 Excel 读取）

**Approach:**
- DocumentTool 的 input_schema 新增 `action` 参数：`"create"` | `"read"`
- `action="read"` 时，调用 `DocumentLoader.load(path)` 读取文档内容
- DocumentLoader 已支持 PDF（PyMuPDF/pdfplumber）和 DOCX（python-docx），需新增 Excel 读取（openpyxl）
- 返回 `{"success": True, "content": "提取的文本内容"}`

**Patterns to follow:** `src/agentkit/memory/document_loader.py`（现有解析模式）。

**Test scenarios:**
- Happy path: 读取 .docx 文件，返回文本内容
- Happy path: 读取 .xlsx 文件，返回表格内容
- Happy path: 读取 .pdf 文件，返回文本内容
- Edge case: 空文件返回空字符串
- Error path: 不存在的文件返回 success=False

**Verification:** 运行 `pytest tests/tools/test_document_tool.py`，确认读取功能正常。

---

## Scope Boundaries

### Deferred to Follow-Up Work

- PPT 创建（.pptx）— v2
- 格式转换（Office→PDF）— v2，可能需要 LibreOffice
- PDF 合并和拆分 — v2
- Excel/PPT 模板填充 — v2
- 文档编辑 — v2
- MCP Document Tools 集成（可选增强）— v2
- 文档过期清理的定时任务实现 — v2（v1 手动清理或懒清理）

### Outside this product's identity

- OCR / 扫描文档识别
- 文档协作编辑
- 文档版本控制
- 云存储集成
- 文档水印 / 加密 / 数字签名

---

## Risks & Dependencies

- **Markdown→Office 格式映射的完整性** — Markdown 不能表达所有 Office 格式（如合并单元格、图片嵌入）。v1 只支持基本格式（标题、段落、列表、表格），复杂格式 defer。
- **中文字体在 PDF 中的渲染** — reportlab 默认不支持中文，需注册中文字体（如 SimSun 或 NotoSansCJK）。需确认服务器有中文字体文件。
- **python-docx-template 的 Jinja2 语法限制** — Office XML 结构中 Jinja2 语法可能受限（如表格内的循环）。需测试复杂模板。
- **前端右侧面板的布局影响** — 现有 ChatView 布局可能需要调整以容纳右侧面板，需确认不破坏现有聊天 UI。

---

## Sources & Research

- 需求文档：`docs/brainstorms/2026-06-23-document-processing-capability-requirements.md`
- Tool 基类：`src/agentkit/tools/base.py`、`src/agentkit/tools/memory_tool.py`
- ToolRegistry：`src/agentkit/tools/registry.py`、`src/agentkit/server/app.py`（第 239-269 行）
- 路由模式：`src/agentkit/server/routes/chat.py`、`src/agentkit/server/routes/kb_management.py`
- 数据库模式：`src/agentkit/server/auth/models.py`（aiosqlite 裸连接模式）
- 前端组件：`src/agentkit/server/frontend/src/components/chat/messages/FileAttachment.vue`
- 前端 store：`src/agentkit/server/frontend/src/stores/chat.ts`
- 文档解析：`src/agentkit/memory/document_loader.py`
- MCP Document Tools 验证报告：版本 0.1.0，未验证，不建议生产使用，不支持模板填充