feat(bitable): add bitable companion service with full P0-P2 fixes

Bitable is a multi-dimensional table companion service that runs alongside
the main AgentKit server. It provides structured data storage with formula
fields, views, and ingestion pipelines.

Major components:
- Domain models (Pydantic v2): Table, Field, Record, View, RecalcTask
- SQLAlchemy 2 async ORM with independent bitable PostgreSQL schema
- Formula engine: AST parser, DAG, Kahn topological sort, safe eval
- RecalcWorker: atomic task claiming (FOR UPDATE SKIP LOCKED), topo-order
  processing, stale-threshold reaper for crash recovery
- REST API (/api/v1/bitable): tables, fields, records, views, files
- BitableTool: agent-facing tool with batch chunking (500/batch)
- CLI: agentkit bitable subcommands (create, list, import-excel, etc.)
- Frontend: Vue 3 + vxe-table grid with field management, views, filters
- Ingestion: Excel (openpyxl), database reflection, API collector

Security fixes (ce-code-review P0 + ce-debug P1):
- SQL injection prevention (field_id validation, parameterized queries)
- IDOR protection (_check_table_ownership on all table-level endpoints)
- SSRF prevention (URL scheme + private IP validation in parse_excel_url)
- OOM prevention (streaming file upload, batch delete, batch insert)
- Atomic recalc task claiming (FOR UPDATE SKIP LOCKED)
- Formula engine cache invalidation on field changes
- Composite cursor pagination for non-id sort orders
- Batch upsert (eliminates N+1 queries)
- Sync I/O offloaded to thread pool in async contexts
- Internal token auth (X-Internal-Token, hmac.compare_digest)
- PK unique index enforcement

Test coverage: 88 unit tests (95 skipped without Docker)
This commit is contained in:
chiguyong 2026-06-25 01:09:59 +08:00
parent 567cbc9c9b
commit bbbf9cd40a
64 changed files with 13433 additions and 238 deletions

5
.gitignore vendored
View File

@ -47,3 +47,8 @@ src/agentkit/server/static/
# Runtime data (auth DB, conversation DB, etc.) # Runtime data (auth DB, conversation DB, etc.)
data/ data/
# Agent skill tooling (local-only, not project code)
.agents/
.trae/
.aider*

305
AGENTS.md
View File

@ -1,203 +1,206 @@
# Fischer AgentKit — Project Context # Fischer AgentKit — 项目上下文
## Rules ## 规则
- Python >= 3.11, type hints required, `pydantic>=2.0` for all data models - Python >= 3.11,必须使用类型注解,所有数据模型使用 `pydantic>=2.0`
- Ruff for lint + format: `ruff check src/ && ruff format src/` (target py311, line-length 100) - 使用 Ruff 进行 lint 和格式化:`ruff check src/ && ruff format src/`(目标 py311行宽 100
- Tests: `pytest` (asyncio_mode=auto), markers: `integration`, `redis`, `postgres` - 测试:`pytest`asyncio\_mode=auto标记`integration`、`redis`、`postgres`
- Never use `any` type — use proper Pydantic models or `Unknown` - 禁止使用 `any` 类型 — 使用合适的 Pydantic 模型或 `Unknown`
- API key comparison must use `hmac.compare_digest` (constant-time) - API Key 比较必须使用 `hmac.compare_digest`(恒定时间比较)
- Expert names validated with `_EXPERT_NAME_RE = re.compile(r"^[a-zA-Z0-9_-]{1,64}$")` - 专家名称使用 `_EXPERT_NAME_RE = re.compile(r"^[a-zA-Z0-9_-]{1,64}$")` 校验
- HandoffTransport queues bounded (`maxsize=1024`), close uses sentinel pattern - HandoffTransport 队列有界(`maxsize=1024`),关闭使用 sentinel 模式
- Frontend: Vue 3 + TypeScript + Ant Design Vue, Pinia stores, no `require()` calls - 前端Vue 3 + TypeScript + Ant Design VuePinia stores禁止 `require()` 调用
- **Async generator safety**: Never use early `return` before the first `yield` in `async def` — use `return; yield` pattern instead (see `.trae/rules/project_rules.md`) - **异步生成器安全**:在 `async def` 中禁止在第一个 `yield` 之前使用 `return` — 改用 `return; yield` 模式(见 `.trae/rules/project_rules.md`
- 所有回复必须是中文
## Tech Stack ## 技术栈
- **Backend**: Python 3.11+, FastAPI, Uvicorn, Pydantic v2, SQLAlchemy 2 (async) - **后端**Python 3.11+、FastAPI、Uvicorn、Pydantic v2、SQLAlchemy 2async
- **Frontend**: Vue 3, TypeScript, Vite 5, Ant Design Vue 4, Pinia, Vue Router 4 - **前端**Vue 3、TypeScript、Vite 5、Ant Design Vue 4、Pinia、Vue Router 4
- **Desktop**: Tauri 2.x (Rust shell + Python sidecar) - **桌面端**Tauri 2.xRust 外壳 + Python sidecar
- **Infra**: Redis (bus/cache/state), PostgreSQL + pgvector (episodic memory) - **基础设施**Redis总线/缓存/状态、PostgreSQL + pgvector情景记忆
- **CLI**: Typer + Rich - **CLI**Typer + Rich
- **Exact versions**: see `pyproject.toml` (Python), `package.json` (Node) - **精确版本**:见 `pyproject.toml`Python、`package.json`Node
## Commands ## 命令
```bash ```bash
# Backend # 后端
pip install -e ".[dev]" # Install with dev deps pip install -e ".[dev]" # 安装开发依赖
agentkit gui --port 8002 # Web GUI (frontend + API) agentkit gui --port 8002 # Web GUI(前端 + API
agentkit serve --port 8001 # API-only server agentkit serve --port 8001 # 仅 API 服务
agentkit chat # CLI interactive chat agentkit chat # CLI 交互式聊天
agentkit init # Generate agentkit.yaml agentkit init # 生成 agentkit.yaml
agentkit version / doctor / usage # Utility commands agentkit version / doctor / usage # 工具命令
agentkit task submit/status/list/cancel # Task management agentkit task submit/status/list/cancel # 任务管理
agentkit skill list/load/info # Skill management agentkit skill list/load/info # 技能管理
agentkit pair --name X # Generate API key for external system agentkit pair --name X # 为外部系统生成 API Key
pytest # Run all tests pytest # 运行所有测试
pytest -m "not integration" # Unit tests only pytest -m "not integration" # 仅单元测试
ruff check src/ && ruff format src/ # Lint + format ruff check src/ && ruff format src/ # Lint + 格式化
# Frontend # 前端
cd src/agentkit/server/frontend cd src/agentkit/server/frontend
npm install # Install deps npm install # 安装依赖
npm run dev # Vite dev server (proxy /api -> :8000) npm run dev # Vite 开发服务器(代理 /api -> :8000
npm run build:frontend # Production build -> ../static npm run build:frontend # 生产构建 -> ../static
npm run typecheck # TypeScript check npm run typecheck # TypeScript 检查
# Desktop # 桌面端
cd src/agentkit/server/frontend cd src/agentkit/server/frontend
npm run tauri dev # Tauri dev mode npm run tauri dev # Tauri 开发模式
npm run tauri build # Tauri production build npm run tauri build # Tauri 生产构建
# Docker # Docker
docker-compose up -d # AgentKit + Redis + PostgreSQL docker-compose up -d # AgentKit + Redis + PostgreSQL
``` ```
## Architecture ## 架构
### Request Flow ### 请求流程
``` ```
User Input 用户输入
├─ @board prefix -> BoardRouter (experts/board_router.py) -> BoardOrchestrator (multi-round discussion) ├─ @board 前缀 -> BoardRouter (experts/board_router.py) -> BoardOrchestrator多轮讨论
├─ @team prefix -> ExpertTeamRouter (experts/router.py) -> TeamOrchestrator (pipeline collaboration) ├─ @team 前缀 -> ExpertTeamRouter (experts/router.py) -> TeamOrchestrator流水线协作
└─ otherwise -> RequestPreprocessor (chat/request_preprocessor.py) └─ 其他 -> RequestPreprocessor (chat/request_preprocessor.py)
Layer 0: @skill:xxx prefix -> explicit skill selection (SKILL_REACT or skill's configured mode) Layer 0: @skill:xxx 前缀 -> 显式技能选择SKILL_REACT 或技能配置的模式)
Layer 1: Trivial-input regex (~0ms, 0 tokens) -> DIRECT_CHAT Layer 1: 琐碎输入正则(~0ms0 tokens-> DIRECT_CHAT
(greetings, identity, factual Q&A, math, translation; guarded by _TOOL_CONTEXT_RE) (问候、身份、事实问答、数学、翻译;由 _TOOL_CONTEXT_RE 守护)
Default: -> REACT (LLM decides tool usage autonomously in the agent loop) 默认: -> REACTLLM 在 agent 循环中自主决定工具使用)
-> ExecutionMode: DIRECT_CHAT / REACT / SKILL_REACT / REWOO / REFLEXION / PLAN_EXEC / TEAM_COLLAB -> ExecutionMode: DIRECT_CHAT / REACT / SKILL_REACT / REWOO / REFLEXION / PLAN_EXEC / TEAM_COLLAB
(chat handler currently supports DIRECT_CHAT, REACT, SKILL_REACT; others raise "not yet supported") chat handler 当前支持 DIRECT_CHAT、REACT、SKILL_REACT其余抛出 "not yet supported"
``` ```
**Note**: The old 3-layer `CostAwareRouter` (with `RegexRules` / `HeuristicClassifier` / `SemanticRouter` / `Vickrey Auction`) has been replaced by `RequestPreprocessor`. The `IntentRouter` (`router/intent.py`) exists but is not wired into the chat flow. `AuctionHouse` with Vickrey auction lives in `marketplace/auction.py` (marketplace subsystem, not routing). **注意**:旧的 3 层 `CostAwareRouter`(含 `RegexRules` / `HeuristicClassifier` / `SemanticRouter` / `Vickrey Auction`)已被 `RequestPreprocessor` 替换。`IntentRouter``router/intent.py`)存在但未接入 chat 流程。`AuctionHouse`Vickrey 拍卖)位于 `marketplace/auction.py`(属于 marketplace 子系统,非路由)。
### Agent Hierarchy ### Agent 层级
``` ```
BaseAgent (core/base.py) — abstract, execute() is final BaseAgent (core/base.py) — 抽象基类execute() 是 final 方法
+-- ConfigDrivenAgent (core/config_driven.py) — YAML-driven, 3 task modes +-- ConfigDrivenAgent (core/config_driven.py) — YAML 驱动3 种任务模式
+-- ReActEngine (core/react.py) — Think->Act->Observe +-- ReActEngine (core/react.py) — Think->Act->Observe
+-- ReflexionAgent (core/reflexion.py) — reflection-driven +-- ReflexionAgent (core/reflexion.py) — 反思驱动
+-- ReWOOAgent (core/rewoo.py) — plan-without-observation +-- ReWOOAgent (core/rewoo.py) — 无观察规划
+-- StandaloneAgent (core/standalone.py) — standalone runner +-- StandaloneAgent (core/standalone.py) — 独立运行器
``` ```
### Expert Team Mode (Pipeline) ### 专家团队模式(流水线)
``` ```
ExpertConfig (extends AgentConfig) -> Expert (wraps ConfigDrivenAgent via AgentPool) ExpertConfig(继承 AgentConfig-> Expert通过 AgentPool 包装 ConfigDrivenAgent
ExpertTeam: manages experts, shared workspace, team status (FORMING→PLANNING→EXECUTING→SYNTHESIZING→COMPLETED) ExpertTeam管理专家、共享工作区、团队状态FORMING→PLANNING→EXECUTING→SYNTHESIZING→COMPLETED
TeamOrchestrator: pipeline execution — Lead decomposes task into PlanPhase with depends_on, topological sort, parallel layers TeamOrchestrator:流水线执行 — Lead 将任务分解为带 depends_on 的 PlanPhase拓扑排序并行分层
PlanPhase: id, name, assigned_expert, task_description, depends_on, status (PENDING/RUNNING/COMPLETED/FAILED) PlanPhaseid、name、assigned_expert、task_description、depends_on、statusPENDING/RUNNING/COMPLETED/FAILED
TeamPlan: phases with dependencies, topological_sort() returns execution layers (Kahn's algorithm) TeamPlan带依赖的阶段topological_sort() 返回执行层Kahn 算法)
ExpertTeamRouter: @team prefix routing, @team:dev_team template expansion, name validation, MAX_EXPERTS=10 ExpertTeamRouter@team 前缀路由、@team:dev_team 模板展开、名称校验、MAX_EXPERTS=10
HandoffTransport: InProcess (asyncio.Queue) + Redis Pub/Sub — used for event broadcasting only HandoffTransportInProcessasyncio.Queue+ Redis Pub/Sub — 仅用于事件广播
``` ```
**Pipeline Flow**: **流水线流程**
1. `@team` prefix triggers team mode (or `@team:dev_team` for template, `@team:expert1,expert2` for explicit)
2. `ExpertTeam.create_team()` sets status to PLANNING
3. Lead Expert decomposes task into phases via LLM (fallback to single phase on failure)
4. `topological_sort()` arranges phases into layers (same-layer parallel, inter-layer serial)
5. Each phase creates an isolated `ConfigDrivenAgent` via `AgentPool.create_agent` (context isolation, KTD3)
6. Phase outputs passed via `SharedWorkspace` (`{plan_id}/phase/{phase_id}/output`)
7. Lead synthesizes results (BEST strategy)
8. On all-phases-fail: fallback to single agent mode
**Event Sequence**: `team_formed``plan_update``phase_started``expert_step``expert_result``phase_completed``team_synthesis``team_dissolved` 1. `@team` 前缀触发团队模式(或 `@team:dev_team` 用模板,`@team:expert1,expert2` 显式指定)
2. `ExpertTeam.create_team()` 将状态置为 PLANNING
3. Lead Expert 通过 LLM 将任务分解为阶段(失败时回退为单阶段)
4. `topological_sort()` 将阶段排成层(同层并行,层间串行)
5. 每个阶段通过 `AgentPool.create_agent` 创建隔离的 `ConfigDrivenAgent`上下文隔离KTD3
6. 阶段输出通过 `SharedWorkspace` 传递(`{plan_id}/phase/{phase_id}/output`
7. Lead 综合结果BEST 策略)
8. 所有阶段都失败时:回退到单 agent 模式
**Team Templates**: `configs/experts/dev_team.yaml` stores member list in `bound_skills` field (tech_lead, frontend_engineer, backend_engineer, qa_engineer, code_reviewer) **事件序列**`team_formed` → `plan_update``phase_started``expert_step``expert_result``phase_completed``team_synthesis``team_dissolved`
Lifecycle: FORMING -> PLANNING -> EXECUTING -> SYNTHESIZING -> COMPLETED -> DISSOLVED **团队模板**`configs/experts/dev_team.yaml` 在 `bound_skills` 字段存储成员列表tech\_lead、frontend\_engineer、backend\_engineer、qa\_engineer、code\_reviewer
On failure: fallback to single-agent mode (lead or first active expert).
### Module Map 生命周期FORMING -> PLANNING -> EXECUTING -> SYNTHESIZING -> COMPLETED -> DISSOLVED
失败时:回退到单 agent 模式lead 或第一个活跃专家)。
| Layer | Modules | Purpose | ### 模块映射
|-------|---------|---------|
| API | `server/`, `cli/` | FastAPI routes + Typer CLI |
| Auth | `server/auth/` | JWT + RBAC + terminal security (6-layer whitelist) |
| Service | `core/`, `chat/`, `skills/`, `experts/` | Agent engine, routing, skills, expert teams |
| Data | `memory/`, `session/`, `bus/` | Persistence, sessions, messaging |
| Utility | `llm/`, `tools/`, `evolution/`, `quality/`, `mcp/` | LLM gateway, tools, self-evolution, quality, MCP |
| Client | `client/` | ConfigSync, RemoteLLMProvider integration |
### Key Subsystems | 层级 | 模块 | 用途 |
| --- | ---------------------------------------------- | ------------------------------- |
| API | `server/`、`cli/` | FastAPI 路由 + Typer CLI |
| 认证 | `server/auth/` | JWT + RBAC + 终端安全6 层白名单) |
| 服务 | `core/`、`chat/`、`skills/`、`experts/` | Agent 引擎、路由、技能、专家团队 |
| 数据 | `memory/`、`session/`、`bus/` | 持久化、会话、消息 |
| 工具 | `llm/`、`tools/`、`evolution/`、`quality/`、`mcp/` | LLM 网关、工具、自进化、质量、MCP |
| 客户端 | `client/` | ConfigSync、RemoteLLMProvider 集成 |
- **LLM Gateway** (`llm/`): 6 providers (OpenAI/Anthropic/Gemini/Doubao/Wenxin/Yuanbao), fallback, semantic cache, usage tracking, RemoteLLMProvider (client→server proxy with 401 refresh retry) ### 关键子系统
- **Memory** (`memory/`): 4-layer (SOUL/USER/MEMORY/DAILY), WorkingMemory (Redis), EpisodicMemory (PG+pgvector), SemanticMemory (HTTP RAG)
- **Evolution** (`evolution/`): Reflector, PromptOptimizer (genetic), PitfallDetector, ABTester
- **Tools** (`tools/`): 21 built-in + MCP extension, composition (SequentialChain/ParallelFanOut/DynamicSelector)
- **Pipeline** (`orchestrator/`): PipelineEngine, SagaOrchestrator, DynamicPipeline, HandoffManager
- **Bus** (`bus/`): MemoryBus (in-process), RedisBus (distributed)
- **Auth** (`server/auth/`): JWT (access 15min + refresh 7d, HS256), API Key (constant-time compare), 3-level RBAC (member/operator/admin + permission bits), 6-layer terminal security (blocklist→shell-ops→builtin→global→user→session→danger), bcrypt password hashing (rounds=12)
### Server Routes (22 modules) - **LLM 网关**`llm/`6 个 providerOpenAI/Anthropic/Gemini/Doubao/Wenxin/Yuanbao、fallback、语义缓存、用量追踪、RemoteLLMProviderclient→server 代理,带 401 刷新重试)
- **记忆**`memory/`4 层SOUL/USER/MEMORY/DAILY、WorkingMemoryRedis、EpisodicMemoryPG+pgvector、SemanticMemoryHTTP RAG
- **进化**`evolution/`Reflector、PromptOptimizer遗传算法、PitfallDetector、ABTester
- **工具**`tools/`21 个内置 + MCP 扩展组合SequentialChain/ParallelFanOut/DynamicSelector
- **流水线**`orchestrator/`PipelineEngine、SagaOrchestrator、DynamicPipeline、HandoffManager
- **总线**`bus/`MemoryBus进程内、RedisBus分布式
- **认证**`server/auth/`JWTaccess 15min + refresh 7dHS256、API Key恒定时间比较、3 级 RBACmember/operator/admin + 权限位、6 层终端安全blocklist→shell-ops→builtin→global→user→session→danger、bcrypt 密码哈希rounds=12
| Prefix | Module | Purpose | ### 服务端路由22 个模块)
|--------|--------|---------|
| `/api/v1/agents` | agents.py | Agent CRUD |
| `/api/v1/tasks` | tasks.py | Task submit/query/cancel |
| `/api/v1/skills` | skills.py | Skill register/list |
| `/api/v1/chat` | chat.py | Chat REST + WebSocket |
| `/api/v1/ws` | ws.py | WebSocket channel |
| `/api/v1/llm` | llm.py | LLM usage |
| `/api/v1/llm/chat` | llm_gateway.py | LLM gateway proxy (JWT auth, SSE streaming) |
| `/api/v1/health` | health.py | Health check |
| `/api/v1/metrics` | metrics.py | Metrics |
| `/api/v1/evolution` | evolution.py + evolution_dashboard.py | Self-evolution API |
| `/api/v1/memory` | memory.py | Memory management |
| `/api/v1/portal` | portal.py | Portal |
| `/api/v1/kb` | kb_management.py | Knowledge base |
| `/api/v1/skill-mgmt` | skill_management.py | Skill management |
| `/api/v1/workflows` | workflows.py | Workflows |
| `/api/v1/terminal` | terminal.py | Local terminal (client sidecar PTY) |
| `/api/v1/terminal/server` | terminal_server.py | Server terminal (server PTY + admin approval) |
| `/api/v1/terminal` | terminal_whitelist.py | Whitelist/blocklist/audit-log management |
| `/api/v1/settings` | settings.py | Settings |
| `/api/v1/auth` | auth.py | Login/refresh/logout/me |
| `/api/v1/system` | system.py | System resources (SYSTEM_CONFIG permission) |
| `/api/v1/config` | config_sync.py | Config version + sync (polling) |
### WebSocket Chat Protocol | 前缀 | 模块 | 用途 |
| ------------------------- | -------------------------------------- | ------------------------- |
| `/api/v1/agents` | agents.py | Agent CRUD |
| `/api/v1/tasks` | tasks.py | 任务提交/查询/取消 |
| `/api/v1/skills` | skills.py | 技能注册/列表 |
| `/api/v1/chat` | chat.py | Chat REST + WebSocket |
| `/api/v1/ws` | ws.py | WebSocket 通道 |
| `/api/v1/llm` | llm.py | LLM 用量 |
| `/api/v1/llm/chat` | llm\_gateway.py | LLM 网关代理JWT 认证SSE 流式) |
| `/api/v1/health` | health.py | 健康检查 |
| `/api/v1/metrics` | metrics.py | 指标 |
| `/api/v1/evolution` | evolution.py + evolution\_dashboard.py | 自进化 API |
| `/api/v1/memory` | memory.py | 记忆管理 |
| `/api/v1/portal` | portal.py | Portal |
| `/api/v1/kb` | kb\_management.py | 知识库 |
| `/api/v1/skill-mgmt` | skill\_management.py | 技能管理 |
| `/api/v1/workflows` | workflows.py | 工作流 |
| `/api/v1/terminal` | terminal.py | 本地终端client sidecar PTY |
| `/api/v1/terminal/server` | terminal\_server.py | 服务端终端server PTY + 管理员审批) |
| `/api/v1/terminal` | terminal\_whitelist.py | 白名单/黑名单/审计日志管理 |
| `/api/v1/settings` | settings.py | 设置 |
| `/api/v1/auth` | auth.py | 登录/刷新/登出/me |
| `/api/v1/system` | system.py | 系统资源(需 SYSTEM\_CONFIG 权限) |
| `/api/v1/config` | config\_sync.py | 配置版本 + 同步(轮询) |
Client -> Server: `message`, `reply`, `confirmation_reply`, `cancel`, `ping` ### WebSocket Chat 协议
Server -> Client: `connected`, `token`, `thinking`, `step`, `final_answer`, `skill_match`, `confirmation_request`, `confirmation_result`, `ask_human`, `error`, `pong`
Expert Team events: `team_formed`, `expert_step`, `expert_result`, `plan_update`, `phase_started`, `phase_completed`, `phase_failed`, `team_synthesis`, `team_dissolved`
### Frontend Pages Client -> Server`message`、`reply`、`confirmation_reply`、`cancel`、`ping`
Server -> Client`connected`、`token`、`thinking`、`step`、`final_answer`、`skill_match`、`confirmation_request`、`confirmation_result`、`ask_human`、`error`、`pong`
专家团队事件:`team_formed`、`expert_step`、`expert_result`、`plan_update`、`phase_started`、`phase_completed`、`phase_failed`、`team_synthesis`、`team_dissolved`
- `/agent/chat` — Chat with Expert Team view ### 前端页面
- `/agent/code` — Code/workflow
- `/agent/monitor` — Evolution dashboard
- `/computer-use` — Desktop control
- `/login` — Login page (JWT auth)
- Terminal panel — Local + server terminal with whitelist manager
### Configuration Priority - `/agent/chat` — 专家团队聊天视图
- `/agent/code` — 代码/工作流
- `/agent/monitor` — 进化看板
- `/computer-use` — 桌面控制
- `/login` — 登录页JWT 认证)
- 终端面板 — 本地 + 服务端终端,含白名单管理器
CLI args > `agentkit.yaml` > env vars (`${VAR:-default}`) > `.env` > hardcoded defaults ### 配置优先级
Config search: `--config` path > `./agentkit.yaml` > `~/.agentkit/agentkit.yaml` CLI 参数 > `agentkit.yaml` > 环境变量(`${VAR:-default}`> `.env` > 硬编码默认值
## Conventions 配置查找:`--config` 路径 > `./agentkit.yaml` > `~/.agentkit/agentkit.yaml`
- Skill configs: `configs/skills/*.yaml` (16 presets, unified as `SkillConfig`) ## 约定
- Skill categories: `agent_template` (execution engines: react/direct/rewoo/reflexion/plan_exec/goal_driven) vs `business_skill` (domain skills). Classified via `_ENGINE_TEMPLATE_NAMES` in `server/routes/skill_management.py`. Frontend groups by `category` field — `SkillsView` two-column layout, `SkillCard`/`SkillsTab` show type tags (引擎/技能) and category-based icons
- LLM configs: `agentkit.yaml` llm section (unified with server config)
- Pipeline configs: `configs/pipelines/*.yaml`
- Expert templates: `configs/experts/*.yaml` (5 programming experts + dev_team team template), registered via `ExpertTemplateRegistry`
- Team templates: `bound_skills` field stores member list (e.g., `dev_team.yaml` lists tech_lead, frontend_engineer, backend_engineer, qa_engineer, code_reviewer)
- All Pydantic models use `model_config = ConfigDict(...)` not `class Config`
- Test files: `tests/unit/` and `tests/integration/`
- Frontend stores: Pinia, one per domain (chat, team, settings)
- Frontend components: `src/agentkit/server/frontend/src/components/`
## Boundaries - 技能配置:`configs/skills/*.yaml`16 个预设,统一为 `SkillConfig`
- 技能分类:`agent_template`执行引擎react/direct/rewoo/reflexion/plan\_exec/goal\_drivenvs `business_skill`(领域技能)。通过 `server/routes/skill_management.py` 中的 `_ENGINE_TEMPLATE_NAMES` 分类。前端按 `category` 字段分组 — `SkillsView` 双栏布局,`SkillCard`/`SkillsTab` 显示类型标签(引擎/技能)和基于分类的图标
- LLM 配置:`agentkit.yaml` llm 段(与服务端配置统一)
- 流水线配置:`configs/pipelines/*.yaml`
- 专家模板:`configs/experts/*.yaml`5 个编程专家 + dev\_team 团队模板),通过 `ExpertTemplateRegistry` 注册
- 团队模板:`bound_skills` 字段存储成员列表(如 `dev_team.yaml` 列出 tech\_lead、frontend\_engineer、backend\_engineer、qa\_engineer、code\_reviewer
- 所有 Pydantic 模型使用 `model_config = ConfigDict(...)` 而非 `class Config`
- 测试文件:`tests/unit/` 和 `tests/integration/`
- 前端 storesPinia每个领域一个chat、team、settings
- 前端组件:`src/agentkit/server/frontend/src/components/`
## 边界
- 未经明确请求不得修改 `pyproject.toml` 版本
- 禁止直接推送到 main — 使用 feature 分支
- 集成测试需要 DockerRedis + PostgreSQL
- 桌面端构建需要 Rust 工具链 + PyInstaller
- Never modify `pyproject.toml` version without explicit request
- Never push to main directly — use feature branches
- Integration tests require Docker (Redis + PostgreSQL)
- Desktop builds require Rust toolchain + PyInstaller

View File

@ -1 +0,0 @@
AGENTS.md

View File

@ -1,16 +1,20 @@
# Fischer AgentKit # Fischer AgentKit
统一 AI Agent 开发框架 -- 将 LLM、Tool、Prompt 组装为可执行的 Skill通过 ReAct 推理引擎自主完成任务支持记忆持久化、自进化、Pipeline 编排和桌面客户端。 企业级统一 AI Agent 门户平台 -- 面向企业用户与开发者,将 LLM、Tool、Prompt 组装为可执行的 Skill通过 ReAct 推理引擎自主完成任务支持记忆持久化、自进化、Pipeline 编排、专家团队协作和桌面客户端。
## 项目简介 ## 项目简介
AgentKit 解决的核心问题:**从写 150 行 Agent 代码降为 10-20 行 YAML 配置**。 AgentKit 是企业级统一 AI Agent 门户平台,目标用户覆盖**企业用户**与**开发者**
传统方式下,每新增一个 Agent 需要编写子类、处理 LLM 调用、管理工具绑定、校验输出质量。AgentKit 将这些能力标准化为可组合模块,开发者只需编写 YAML 配置即可定义一个完整的 SkillPrompt + Tool + 质量门禁),框架自动完成 ReAct 推理循环、模型路由降级、产出质量检查和标准化输出。 - **企业用户**:通过 Web GUI / 桌面客户端开箱即用,零代码配置 Skill、专家团队、知识库直接获得多专家协作、文档生成、桌面操控等能力
- **开发者**:通过 Python 库 / CLI / HTTP API 深度集成,将 150 行 Agent 代码降为 10-20 行 YAML 配置,框架自动完成 ReAct 推理循环、模型路由降级、产出质量检查和标准化输出
AgentKit 将 LLM、Tool、Prompt 标准化为可组合模块,开发者只需编写 YAML 配置即可定义一个完整的 SkillPrompt + Tool + 质量门禁);企业用户通过门户界面即可编排专家团队、监控自进化、管理知识库与终端安全。
核心定位: 核心定位:
- **配置驱动** -- YAML 定义 Skill无需写 Agent 子类 - **门户平台** -- 统一入口聚合 Skill、专家团队、知识库、终端、自进化等能力企业用户开箱即用
- **配置驱动** -- YAML 定义 Skill开发者无需写 Agent 子类
- **生产就绪** -- 内置质量门禁、模型降级、用量统计、级联检测、状态持久化 - **生产就绪** -- 内置质量门禁、模型降级、用量统计、级联检测、状态持久化
- **四种使用** -- Python 库引用、CLI 聊天、Web GUI、桌面客户端 - **四种使用** -- Python 库引用、CLI 聊天、Web GUI、桌面客户端
- **专家团队** -- Expert Team Mode多专家协作执行复杂任务前端以多角色对话流呈现 - **专家团队** -- Expert Team Mode多专家协作执行复杂任务前端以多角色对话流呈现

View File

@ -0,0 +1,214 @@
# 多维表格Bitable伴生服务需求文档
- **日期**2026-06-24
- **状态**:已对齐,待规划
- **范围分级**Deep — feature
- **后续**:交由 `/ce-plan` 做实现规划
---
## 1. 问题与机会
AgentKit 当前缺少一个**统一的持久化结构化数据落地载体**。当出现以下需求时,没有合适的地方承接数据:
- 多系统数据汇总(需把多个来源的结构化数据合并到一处)
- 本地 Excel 上传后持久化(当前 Excel 仅能单向导出或解析为文本进 RAG无法作为可编辑的结构化表留存
- 外部数据采集(爬虫/API 抓取的结果需要按字段落地为可查询、可视图、可分析的表)
现状Excel 导出是单向的(`src/agentkit/documents/renderers/excel_renderer.py`Excel 解析只转文本进知识库(`src/agentkit/memory/document_loader.py``MultiSourceRetriever` 只做读侧多源检索,`SharedWorkspace` 是带 TTL 的临时 KV。**没有任何模块能把异构数据源的结构化数据持久化为可编辑、可视图、可计算的多维表格。**
机会:引入多维表格伴生服务,作为 AgentKit 异构数据的统一落地载体。Agent 作为数据的主要作者(采集写入),用户在落地后的表上精修、配视图、做分析。这既补齐了结构化数据持久化的缺口,又让 Agent 获得"数据编排者"的战略能力。
## 2. 主要使用者与核心价值
| 维度 | 决策 |
|------|------|
| **形态** | 混合模式——Agent 采集 + 用户精修 |
| **Agent 角色** | 数据作者执行三类采集Excel/数据库/爬虫API按字段写入多维表格 |
| **用户角色** | 数据精修者与分析者:在落地后的表上编辑用户列、配置视图、做分析 |
| **核心价值** | 异构数据源的统一持久化落地载体,其上承载视图、分析、公式、引用 |
**三类采集场景**(均为 Agent 驱动):
1. 上传 Excel 或提供在线 Excel 地址 → 读取内容 → 按字段写入多维表格
2. 指定数据库 → 根据数据表 → 生成多张多维表格数据表
3. 根据指令执行数据采集(爬虫或具体 API→ 获取到的数据按字段写入多维表格
## 3. 服务架构
**多维表格是 AgentKit 的伴生服务**
- **逻辑独立**:自有 API/CLI、自有领域模型表/字段/记录/视图/公式)、自有存储边界
- **当前共部署**:物理上与 AgentKit 同进程或同部署单元UI 级集成于 AgentKit 前端
- **调用边界**AgentKit ↔ 多维表格走 API/CLI**不做进程内紧耦合**
- **未来演进**:可零成本抽离为独立服务,只是部署变更,不改代码
- **存储边界**:当前共享 AgentKit 的 PostgreSQL使用**独立 schema** 隔离;未来抽离时迁移
> 设计含义:所有跨服务交互按"远程调用"心智设计即使当前是本地调用。字段所有权模型、upsert 语义、公式引擎都**内建在多维表格服务自身**,而非套在外部工具上。
## 4. 关键产品决策
### 4.1 写入语义:按主键 upsert + 字段所有权模型
Agent 重复采集时,按表的主键字段 upsert
- **匹配到的记录**:更新"数据列"Agent 管理的列),保留"用户列"
- **未匹配的记录**:新增
- **用户列**永不被 Agent 覆盖
**字段所有权**(每列标记为"数据列"或"用户列"
- **自动推断**:公式列、引用列、手动标注列 → 用户列Agent 采集写入的列 → 数据列
- **Agent 声明**Agent 采集时可显式声明列的所有权,覆盖自动推断
- 公式列天然是用户列(派生的,永不被覆盖)
### 4.2 公式与引用:身份核心,深度分阶段
公式列和引用列是"多维表格"身份的核心,**必须从 v1 存在**,但支持深度分阶段扩展:
- **v1**:基础公式(算术、字符串、简单聚合如 SUM/AVG/COUNT+ 基础引用lookup 到另一张表的字段)
- **v2+**:高级公式(日期、条件、跨表 rollup+ 函数库扩展
**公式重算策略**:异步重算 + "计算中"状态标记。Agent 写入数据列后,依赖该列的公式列进入"计算中"状态,由后台异步重算管道更新。避免同步重算阻塞写入,代价是短暂的不一致窗口(用户可见"计算中"标记)。
### 4.3 规模与存储:可演进
- **起步规模**:单表 < 10 万行总表 < 1000 部门级
- **架构目标**支持未来向大规模10万+行)演进
- **存储选型**:规范化存储(字段定义、记录、单元格分离)+ 索引 + 分页,**不**用 JSONB 整表塞单行
- **大规模演进路径**v3列式存储、分区、物化视图、异步重算管道
## 5. 能力范围与分阶段
用户提出的 6 项能力,按复杂度与依赖关系分三阶段:
| 能力 | 复杂度 | v1 | v2 | v3 |
|------|--------|----|----|----|
| ① 模块搭建(服务骨架+领域模型+API | 基础 | ✅ | — | — |
| ② 数据采集落地Excel/DB/爬虫API 三类) | 中 | ✅ | — | — |
| ③ 多视图展示 | 中 | 网格视图 | 看板/甘特/画廊 | 表单 |
| ④ 分析计算 | 中高 | — | 分组/透视 | 高级聚合 |
| ⑤ 公式列+引用列 | 高 | 基础公式+lookup | 高级公式+rollup | 函数库扩展 |
| ⑥ 图片+附件 | 低中 | ✅ | — | — |
### v1核心闭环验证
验证"Agent 采集 → 持久化落地 → 用户查看/精修"的核心闭环:
- 服务骨架:领域模型(表/字段/记录/视图、API/CLI、独立 schema 存储
- 字段所有权模型 + 按主键 upsert 语义
- 三类采集落地Excel 上传/URL、数据库导入、爬虫/API 采集)
- 网格视图(表格视图,支持排序/筛选/分页/单元格编辑)
- 基础公式列算术、字符串、SUM/AVG/COUNT 等简单聚合)
- 基础引用列lookup 到另一张表的字段)
- 图片/附件字段类型(复用现有文件上传能力)
- 异步公式重算 + "计算中"状态
### v2多视图与分析
- 看板视图(按分组字段分列展示)
- 甘特视图(按日期字段排时间线)
- 画廊视图(以图片/附件为主视觉的卡片展示)
- 高级公式(日期函数、条件函数、跨表 rollup
- 分析能力(分组聚合、透视表)
### v3规模化与协作
- 表单视图(以表单形式收集数据写入表)
- 公式函数库扩展
- 大规模优化(列式存储、分区、物化视图、异步重算管道升级)
- 多人实时协作
## 6. 方案探索与推荐
### 方案 1自建多维表格引擎分阶段交付
在 AgentKit 内构建原生 bitable 子系统规范化存储字段所有权模型原生内建公式引擎自建分阶段。Agent 通过新增的 bitable API/CLI 写入。
- **优点**完全可控与现有栈PG/Redis/Vue/FastAPI匹配字段所有权模型原生upsert 语义无摩擦
- **缺点**:构建工作量最大;公式引擎是硬骨头;需长期维护
- **风险**:公式引擎范围蔓延;大规模重算性能
### 方案 2集成开源多维表格APITable/NocoDB作为子服务
部署开源 bitable 作为伴生服务AgentKit 通过其 API 让 Agent 写入,用户编辑在 bitable 原生 UI 完成,上层叠加 upsert-保留用户列逻辑。
- **优点**:视图/公式/附件白送,成熟,最快获完整功能
- **缺点**AGPL 协议有传染性风险若商业化upsert-保留用户列需硬套(外部 bitable 无字段所有权概念);集成走 API 较松
- **风险**:协议冲突;外部模型与需求偏离
### 方案 3挑战者Agent 结构化数据底座优先UI 作为叠加层
反转优先级:多维表格首先是 Agent 的持久化结构化工作记忆/输出底座,用户侧多视图 UI 是读写底座的叠加层。
- **优点**:最大化 Agent 协同;战略差异化;底座可驱动表格 UI 之外的能力
- **缺点**:用户主动建表流程次要;底座抽象需更多架构思考
### 推荐:方案 1 自建 + 方案 3 底座心智
**理由**
1. 三类采集场景全是 Agent 驱动——本质是"Agent 作为数据作者",方案 3 心智天然契合;但用户也要精修,需方案 1 的完整引擎
2. upsert-保留用户列的字段所有权模型是定制的——外部 bitable 没有此概念硬套很痛AGPL 对可能商业化的产品是真实风险
3. 现有基础设施齐全PG + pgvector + Redis + SQLAlchemy 2 + Vue3 + Ant Design Vue自建边际成本可控
4. 伴生服务架构约束天然要求 API/CLI 边界——方案 1 自建反而最契合,因为所有权模型内建在服务自身
5. 分阶段控制风险——v1 先验证核心闭环
**与方案 3 的融合**:架构上以"Agent 的持久化结构化数据底座"心智设计领域模型,使多维表格不仅是"一个表格功能",而是 Agent 的数据编排能力的载体。这让底座未来可驱动表格 UI 之外的能力仪表盘、报表、Agent 记忆)。
## 7. 范围边界
### 本次范围内v1
- 多维表格伴生服务骨架领域模型、API/CLI、独立 schema 存储)
- 字段所有权模型 + 按主键 upsert 语义
- 三类采集落地Excel/DB/爬虫API
- 网格视图
- 基础公式列 + 基础引用列lookup
- 图片/附件字段
- 异步公式重算
### 延后v2/v3
- 看板/甘特/画廊/表单视图
- 高级公式(日期/条件/跨表 rollup+ 函数库扩展
- 分析能力(分组/透视)
- 大规模优化(列式/分区/物化视图)
- 多人实时协作
### 本产品身份之外
- 不做通用电子表格(非单元格自由编辑,是字段化记录模型)
- 不做 ETL/数据管道平台(采集是 Agent 驱动的按需执行,非定时调度管道)
- 不做 BI 仪表盘产品(分析能力服务于表格内聚合,非独立 BI
- 不替代知识库 RAG多维表格是结构化数据载体非非结构化文档检索
## 8. 假设与依赖
- **假设**Agent 已具备执行采集任务的能力(爬虫/API 调用),多维表格只承接"写入"环节,不负责采集执行本身
- **假设**:共享 PostgreSQL 的性能足以支撑 v1/v2 规模v3 大规模时再评估独立数据库或列式存储
- **依赖**:现有文件上传能力(`src/agentkit/server/routes/chat.py` 的上传端点、`data/uploads/` 存储)可复用于附件字段
- **依赖**Agent 工具系统(`src/agentkit/tools/base.py` 的 `execute() -> dict` 契约)可扩展新增 bitable 写入工具
- **假设**:公式异步重算的"计算中"窗口(秒级)对用户可接受
## 9. 成功标准
**v1 验证成功的标志**
1. Agent 能把一份 Excel 上传的数据按字段写入多维表格,并在网格视图中查看
2. Agent 能指定一个数据库表,生成对应的多维表格
3. Agent 能执行一次 API 采集,把返回数据按字段写入多维表格
4. 用户能在网格视图中编辑单元格、新增公式列(如 `=SUM(数据列)`)、看到异步重算结果
5. Agent 对同一表重复采集时,按主键 upsert 更新数据列,用户的公式列和手动编辑保留不变
6. 多维表格服务通过 API/CLI 被 AgentKit 调用,无进程内紧耦合
## 10. 下一步
本需求文档交由 `/ce-plan` 做实现规划,重点规划:
- v1 的领域模型设计(表/字段/记录/视图/公式 的实体关系)
- 独立 schema 的存储设计(规范化表结构、索引、分页)
- API/CLI 接口设计CRUD + 采集写入 + upsert + 公式重算触发)
- 字段所有权模型的实现机制(自动推断 + Agent 声明)
- 异步重算管道设计
- Agent bitable 写入工具设计
- 前端网格视图组件选型与集成

View File

@ -0,0 +1,182 @@
---
date: 2026-06-24
topic: portal-platform-evolution
---
# AgentKit 门户平台整体演进路线
## Summary
按优先级串行推进 AgentKit 门户平台演进:先建独立 RAG 平台对标 MaxKB 功能对等,再扩展多端接入与 MCP Server最后生态替换降本MCP/Celery/LiteLLM。不设硬性时间按完成度推进。
## Problem Frame
AgentKit 定位为企业级统一 AI Agent 门户平台,面向企业用户与开发者。对标 MaxKB开源企业级智能体平台GitHub 19k+ stars后发现当前能力堆栈在多个方向存在差距
- **RAG 工业级管道**AgentKit 现有 `memory/` 模块是开发者级组件库(基础分块 + pgvector 语义检索 + time_decay 重排MaxKB 是工业级产品功能(双索引检索 + 智能分段 + 问题生成 + 术语表 + 命中处理模式。RAG 是门户平台服务企业知识库场景的底线能力。
- **平台触达**AgentKit 仅有飞书/Confluence/通用 HTTP 三种 RAG 适配器MaxKB 原生支持企微/钉钉/飞书/Slack 多端接入。门户平台需要触达企业现有协作工具。
- **MCP Server**AgentKit 已有 MCP Client`mcp/client.py`)和 MCP Server`mcp/server.py`)基础实现,但尚未将 Skill/专家团队发布为 MCP 工具。门户平台应完善 MCP Server 的 Skill/专家团队发布能力。
- **自研 vs 生态**AgentKit 大量自研Agent 引擎/LLM 网关/工作流画布/MCP 客户端/记忆系统/消息总线commodity 层维护成本高。
本次演进为**预防性演进 + 必备功能补齐**,非救火式驱动。目标是补齐门户平台应有的能力,使 AgentKit 在企业级 AI Agent 平台赛道具备完整竞争力。
## Key Decisions
**串行演进策略(方案 A。** 按优先级串行推进,每个方向充分交付后再进入下一个。理由:用户要求"对标 MaxKB 功能对等"MVP 驱动难以一次达标;不设硬性时间契合串行节奏;预防性演进无紧急压力,可保障交付质量。
**RAG 平台并行独立。** 新建独立 RAG 平台模块,现有 `memory/` 保留给 Agent 记忆WorkingMemory/EpisodicMemory/SemanticMemory。理由职责分离避免 RAG 与 Agent 记忆耦合RAG 平台作为门户平台基础设施服务于企业用户知识库场景Agent 记忆服务于 Agent 运行时。
**开放引入生态依赖。** commodity 层优先用生态降低维护成本。约束需注意开源协议合规且对已完成或进行中的特性保持向后兼容。差异化层Agent 引擎/专家团队/自进化/终端安全)保持自研。
**保留现有工作流。** FlowCanvas 不替换为 LogicFlow现有工作流画布保持自研。理由避免破坏现有节点类型SkillNode/ApprovalNode/ConditionNode/ParallelNode和工作流。
**对标 MaxKB 功能对等。** RAG 工业级管道的成功标准是功能对等:双索引检索/智能分段/问题生成/术语表/命中处理模式/rerank 全部具备。
## Requirements
### RAG 工业级管道(优先级 1
R1. 新建独立 RAG 平台模块,与现有 `memory/` 模块职责分离,现有 `memory/` 保留给 Agent 记忆使用。现有 `memory/local_rag.py``LocalRAGService`pgvector + 分块 + 嵌入 + 语义检索)需明确迁移策略:吸收/扩展至新平台、提取至新模块、或新建并废弃 LocalRAGService。
R2. 支持双索引检索pgvector 语义检索 + PostgreSQL 全文检索(`search_vector`),提供 `embedding`(语义)/ `keywords`(全文)/ `blend`混合三种检索模式。检索模式由企业用户按知识库配置默认值Agent 运行时可按查询特征覆盖。
R3. 在 RAG 平台模块中实现智能分段与高级分段能力(可参考现有 `memory/chunking.py` 的分块基础),提供分段预览能力,企业用户可在向量化前查看分段结果。
R4. 支持问题自动生成:为文档段落自动生成相关问题/问法,提升检索召回率。
R5. 支持术语表Termbase通过全文检索分词增强提升中文场景检索准确率。
R6. 支持命中处理模式模型优化模式LLM 基于检索结果生成回答与直接回答模式直接返回匹配段落按知识库配置默认模式企业用户在知识库设置中选择Agent 可按查询场景覆盖。
R7. 支持 rerank 重排:检索结果经 rerank 模型重排后返回,提升相关性排序。
R8. 扩展现有 `KnowledgeBaseView`/`DocumentUpload`/`SearchTest` 组件,提供可视化文档管理:文档上传/分段预览/检索测试,企业用户可通过前端界面管理知识库。知识库必须实施 per-KB 访问控制owner/authorized-usersAgent 检索必须限定于调用用户授权的知识库。文档上传必须验证文件类型白名单、强制大小限制、并在索引前净化解析内容markdown sanitize、PDF 解析安全)。
### 平台触达扩展(优先级 2
R9. 支持多端消息接入:企微/钉钉/飞书/Slack 消息适配器,企业用户可通过现有协作工具使用 AgentKit。各平台适配器必须验证平台提供的请求签名/token飞书 encrypt_key、钉钉 token、企微信 EncodingAESKey后处理消息拒绝未认证请求。所有第三方平台凭证必须存储于 secrets store非明文配置定义轮换策略与访问审计。
R10. 完善现有 MCP Server`mcp/server.py`):支持将 Skill/专家团队发布为 MCP 工具,供外部 AI 系统调用。MCP 工具调用必须要求认证与授权(复用现有 JWT+RBAC 或 API Key 机制),发布 Skill/专家团队为 MCP 工具需管理员级授权。
### 生态替换降本(优先级 3
**目标**:将 commodity 层MCP 客户端/异步任务/LLM Provider 适配)迁移至生态方案,降低自研维护成本,使团队聚焦差异化能力。成功标准:替换后现有功能行为不变,维护代码量减少。
R11. MCP 客户端替换为 `langchain-mcp-adapters`:跟进行业协议演进,降低自研 3 传输层Stdio/HTTP/SSE的维护成本。
R12. 引入 Celery 异步任务:与现有 asyncio 原生共存,承接文档向量化/批量任务(利用现有 Redis 作为 broker不引入新基础设施提供任务持久化/重试/调度能力。提供异步任务可视化:进度展示、失败通知与重试、任务历史。
R13. LLM Provider 底层替换为 LiteLLM上层网关逻辑fallback/缓存/用量追踪)保留自研,底层 provider 适配走 LiteLLM 统一接口。
## Actors
A1. **企业用户** — 通过前端界面管理知识库(上传文档/配置分段/测试检索)、配置多端接入、发布 MCP 工具。
A2. **开发者** — 通过 API/MCP Server 集成 AgentKit 能力到外部系统。
A3. **Agent** — 运行时调用 RAG 平台检索知识库内容,支撑问答与决策。
## Key Flows
F1. RAG 文档处理流程
- **Trigger:** 企业用户上传文档到知识库。
- **Actors:** A1, A3
- **Steps:** 文档解析 → 分段(智能/高级)→ 分段预览 → 向量化 → 全文索引建立 → 问题自动生成 → 可用。
- **Outcome:** 文档进入知识库,可被 Agent 检索。
F2. RAG 检索流程
- **Trigger:** Agent 需要检索知识库回答用户问题。
- **Actors:** A3
- **Steps:** 查询接收 → 检索模式选择embedding/keywords/blend→ 双索引检索 → rerank 重排 → 命中处理(模型优化/直接回答)→ 返回结果。
- **Outcome:** Agent 获得相关知识库内容。
F3. 多端消息接入流程
- **Trigger:** 企业用户通过企微/钉钉/飞书/Slack 发送消息。
- **Actors:** A1, A3
- **Steps:** 消息适配器接收 → 转换为 AgentKit 标准格式 → Agent 处理 → 响应转换为目标平台格式 → 返回。
- **Outcome:** 企业用户通过协作工具获得 Agent 响应。
F4. MCP Server 发布流程
- **Trigger:** 企业用户或开发者将 Skill/专家团队发布为 MCP 工具。
- **Actors:** A1, A2
- **Steps:** 选择 Skill/专家团队 → 配置 MCP endpoint → 发布 → 外部 AI 系统可通过 MCP 协议调用。
- **Outcome:** AgentKit 能力通过 MCP 协议对外输出。
## Scope Boundaries
### Deferred for later
- 本地模型支持Ollama——后续迭代服务企业私有化部署场景。
- 现有 `memory/` 模块重构——保留给 Agent 记忆用,不在本次演进范围。
### Outside this product's identity
- FlowCanvas→LogicFlow 替换——保留现有工作流,不替换。
- Agent 引擎ReActEngine/ReWOO/Reflexion/PlanExec——保持自研是核心差异化能力。
- 专家团队编排(流水线 + 私董会)——保持自研,生态无对应方案。
- 自进化系统16 组件)——保持自研,独有能力。
- 终端安全6 层白名单)——保持自研,安全必须自主可控。
## Dependencies / Assumptions
- **开源协议合规**生态替换涉及的依赖协议需宽松可商用。Celery (BSD-3)、LiteLLM (MIT)、langchain-mcp-adapters 需确认协议。LogicFlow 已因保留 FlowCanvas 决策排除非协议原因——Apache-2.0 本身可商用)
- **现有特性向后兼容**生态替换MCP 客户端/Celery/LiteLLM不能破坏现有功能需提供迁移路径。
- **pgvector 基础设施**RAG 平台与现有 EpisodicMemory 共用 pgvector 基础设施,但数据模型独立。
- **前端组件复用**:现有 KnowledgeBaseView/DocumentUpload/SearchTest 组件可能需要重构以支撑 RAG 平台可视化文档管理。
## Outstanding Questions
### Resolve Before Planning
- **[P0 安全] MCP Server 端点缺少认证/授权决策**R10 暴露 Skills/Expert Teams 为 MCP 工具F4 描述发布流程但未提及认证、授权、限流或访问控制。未认证的 MCP 端点允许任何可达客户端调用 Skills、读取 Expert Team 输出,若工具具备文件系统或 shell 访问权限则构成远程代码执行面。需在规划前明确认证方案API Key / JWT 复用 / OAuth / mTLS、授权模型按 skill / team / tenant、限流策略。ce-doc-review 延期security-lens置信度 100
- **[P0 安全] 多端消息适配器缺少输入验证**R9 多平台消息接入(飞书/钉钉/企业微信F3 未提及签名校验、来源认证或速率限制。外部平台消息为不可信输入边界,缺少验证允许伪造消息、注入恶意内容、触发未授权 Skill 执行。需在规划前明确:各平台签名机制(飞书 encrypt_key、钉钉 token、企微信 EncodingAESKey、消息格式校验、重放攻击防护。ce-doc-review 延期security-lens置信度 100
- **[P0 安全] 文档上传缺少内容净化**R1/R7 涉及用户上传文档PDF/Word/Markdown/TXT需求未提及内容净化、恶意文件检测或大小限制。上传文档可能包含恶意脚本XSS via markdown、超大文件导致 OOM、嵌入恶意宏。需在规划前明确文件类型白名单、大小限制、内容扫描、markdown sanitize、PDF 解析安全。ce-doc-review 延期security-lens置信度 100
### Deferred to Planning
- 飞书消息接入适配器与现有飞书 RAG 适配器(`memory/adapters/feishu.py`)的复用程度——后续确认。
- RAG 平台数据模型Knowledge/Document/Paragraph/Problem/Embedding的具体设计——ce-plan 决策。
- Celery 与现有 asyncio 原生的共存策略——ce-plan 决策。
- LiteLLM 与现有自研 Provider 的迁移路径——ce-plan 决策。
- **[P1 战略] 对标 MaxKB ≠ 竞争力**:功能对等是底线而非差异化。需明确 AgentKit 相对 MaxKB 的差异化定位(专家团队/自进化/终端安全等独有能力如何与 RAG 平台协同ce-doc-review 延期product-lens置信度 75
- **[P1 前提] 演进前提缺少用户痛点证据**:文档以"对标 MaxKB 发现差距"为前提,但未提供用户痛点证据(用户反馈/流失原因/竞品丢失原因)。需确认:是否有用户因缺少 RAG/多端/MCP Server 而流失或抱怨ce-doc-review 延期product-lens置信度 75ROOT
- **[P1 前提] 未评估"不做"基线**未评估不执行此演进计划的后果。需确认若不演进AgentKit 的实际损失是什么ce-doc-review 延期adversarial置信度 75DEPENDENT
- **[P1 战略] 同质化追赶 vs 差异化构建**RAG/多端/MCP Server 是 commodity 能力MaxKB 已有。需确认:投入大量资源追赶 commodity 是否优于强化差异化能力ce-doc-review 延期product-lens置信度 75
- **[P1 战略] 重建 MaxKB 缺乏正当性论证**RAG 平台并行独立意味着重建 MaxKB 已有的工业级管道。需确认:为何不直接集成 MaxKB 或 forkce-doc-review 延期adversarial置信度 75ROOT
- **[P1 基础设施] 并行 RAG 平台导致基础设施翻倍**:独立 RAG 平台与现有 `memory/` 共用 pgvector 但数据模型独立,可能导致维护两套 RAG 基础设施。需确认长期是否合并ce-doc-review 延期product-lens置信度 75DEPENDENT
- **[P1 设计] RAG 平台信息架构未定义**RAG 平台的前端信息架构(知识库列表/文档管理/检索测试/配置页)未定义。归 ce-plan 设计。ce-doc-review 延期design-lens置信度 75
- **[P1 设计] 多端配置流程缺失**R9 多端消息接入缺少配置流程(如何添加平台/配置凭证/测试连通性)。归 ce-plan 设计。ce-doc-review 延期design-lens置信度 75
- **[P1 安全] 知识库访问控制未指定**R8 可视化文档管理未指定访问控制(谁可查看/编辑/删除知识库)。需明确 RBAC 模型。ce-doc-review 延期security-lens置信度 75
- **[P1 安全] 适配器凭证管理未定义**R9 多端适配器需要管理各平台 API 凭证app_id/app_secret/token需求未提及凭证存储与轮换。归 ce-plan 设计。ce-doc-review 延期security-lens置信度 75
- **[P1 安全] MCP 发布授权未指定**R10 MCP Server 发布流程未指定谁有权发布 MCP 工具。需明确发布权限模型。ce-doc-review 延期security-lens置信度 75
- **[P1 技术] PG 全文检索对中文不适用**R2 依赖 PostgreSQL 全文检索,但 PG 原生全文检索对中文支持差(缺中文分词)。需确认:是否使用 pg_jieba/zhparser 扩展或外部搜索引擎ce-doc-review 延期feasibility置信度 75
- **[P1 优先级] R12 优先级阻断 P1 交付**R12Celery在 P3但 R1-R8RAG 管道的文档向量化需要异步任务能力。需确认P1 是否依赖 R12ce-doc-review 延期scope-guardian置信度 75
- **[P2 证据] Celery 替换缺少必要性证据**R12 引入 Celery 但未提供 asyncio 原生不足的证据(具体场景/性能瓶颈/故障案例。需补充必要性论证。ce-doc-review 延期feasibility置信度 75
- **[P2 量化] LiteLLM 节省未量化**R13 引入 LiteLLM 但未量化节省(维护代码量/开发效率/兼容 provider 数。需补充量化数据。ce-doc-review 延期feasibility置信度 75
- **[P2 设计] MCP 配置流程未指定**R10 MCP Server 发布流程缺少配置细节endpoint 路径/工具命名/参数定义)。归 ce-plan 设计。ce-doc-review 延期design-lens置信度 75
- **[P2 设计] 分块预览未定义**R3 分段预览的交互模式未定义(预览界面/编辑能力/重新分段)。归 ce-plan 设计。ce-doc-review 延期design-lens置信度 75
- **[P2 技术] 消息总线替换未处理**:现有 `bus/`MemoryBus/RedisBus在生态替换中未提及。需确认是否保留自研ce-doc-review 延期feasibility置信度 75
- **[P2 技术] Rerank 模型未处理**R7 rerank 未指定模型(本地/API/开源)。归 ce-plan 决策。ce-doc-review 延期feasibility置信度 75
- **[P1 战略] MaxKB 对等框架解决错误问题**MaxKB 是 RAG 知识库产品AgentKit 是 Agent 平台。对标不同产品类别的功能对等可能构建不服务于实际用户的能力。需重构为用户结果导向的成功标准。ce-doc-review 第 2 轮延期product-lens+adversarial置信度 100
- **[P1 战略] 串行策略饿死差异化投入**串行执行意味着差异化能力Agent 引擎/专家团队/自进化)在 P1/P2 完成前零投入。需考虑预留差异化并行轨道。ce-doc-review 第 2 轮延期product-lens置信度 75
- **[P1 战略] Build-vs-buy 未评估**R1-R8 从零构建工业级 RAG 管道,未评估集成 MaxKB 或采用 RAG 框架LlamaIndex/Haystack。需补充 build-vs-buy 评估。ce-doc-review 第 2 轮延期product-lens+adversarial置信度 100
- **[P1 前提] 核心替换向后兼容性是假设非验证**R11/R12/R13 替换三个核心组件但假设"现有功能行为不变"。现有 LLM 网关有 6 provider + fallback + 语义缓存 + RemoteLLMProvider 代理。需将向后兼容从假设转为验证前提。ce-doc-review 第 2 轮延期adversarial置信度 75
- **[P1 设计] 文档处理失败状态缺失**F1 未定义解析失败/不支持格式/向量化错误的用户可见状态。归 ce-plan 设计。ce-doc-review 第 2 轮延期design-lens置信度 75
- **[P1 设计] 分段预览交互模式未定义**R3 "查看分段结果"是只读还是可编辑(合并/拆分/重新分段)未定义。归 ce-plan 设计。ce-doc-review 第 2 轮延期design-lens置信度 75
- **[P1 设计] 多端配置与认证流程缺失**F3 缺少多端 onboarding 流程webhook 配置/OAuth/app 凭证/连通性测试)。归 ce-plan 设计。ce-doc-review 第 2 轮延期design-lens置信度 75
- **[P2 战略] 维护成本痛点延期至 P3**Problem Frame 声明"commodity 层维护成本高"但解决方案在最低优先级 P3。需考虑将高杠杆替换如 LiteLLM提前并行。ce-doc-review 第 2 轮延期product-lens置信度 75
- **[P2 战略] R11-R13 是技术债非产品需求**R11-R13 成功标准是"现有功能行为不变"(零用户可见影响),无 Actor 受益。需考虑移至独立工程债轨道。ce-doc-review 第 2 轮延期product-lens置信度 75
- **[P2 战略] 门户触达(P2)反转门户价值主张**:门户平台的核心价值是触达,但多端接入在 P2。需考虑在 P1 并行交付至少一个高价值渠道。ce-doc-review 第 2 轮延期product-lens置信度 75
- **[P2 技术] Celery 缺乏必要性论证**R12 引入 Celery 但未论证 asyncio 不足。文档向量化是 CPU-bound可用 ProcessPoolExecutor批量任务是 I/O-boundasyncio 强项。需补充具体场景。ce-doc-review 第 2 轮延期scope-guardian+adversarial置信度 75
- **[P2 技术] LiteLLM 替换覆盖缺口**R13 未评估 LiteLLM 对 6 个现有 provider尤其 Doubao/Wenxin/Yuanbao的覆盖以及语义缓存/用量追踪/RemoteLLMProvider 代理等网关特性。需补充 feature-gap 分析。ce-doc-review 第 2 轮延期scope-guardian+adversarial置信度 75
- **[P2 战略] 串行策略阻断 MCP Server**R10 完善现有 MCP 基础设施,无依赖 RAG 或多端。串行策略将其阻断在两个无关工作流之后。需考虑解耦 R10。ce-doc-review 第 2 轮延期adversarial置信度 75
- **[P2 设计] MCP 发布配置细节未定义**F4 "配置 MCP endpoint"未定义配置字段(工具名称/描述/输入 schema/鉴权方式/发布前测试)。归 ce-plan 设计。ce-doc-review 第 2 轮延期design-lens置信度 75
- **[P2 设计] 新 RAG 平台门户 IA 未定义**R1 新建独立 RAG 平台模块,但未定义其在门户导航中的位置(顶级 section 还是扩展现有知识库管理区)。归 ce-plan 设计。ce-doc-review 第 2 轮延期design-lens置信度 75
## Sources / Research
- MaxKB 系统架构https://maxkb.cn/docs/v1/system_arch/
- MaxKB 技术解析https://juejin.cn/post/7650428235188420651
- MaxKB GitHubhttps://github.com/1Panel-dev/MaxKB
- AgentKit 代码库:`memory/`RAG 基础组件)、`server/routes/`22 路由模块)、`src/agentkit/server/frontend/src/components/`(前端组件)
- AgentKit 项目规则:`AGENTS.md`、`CLAUDE.md`

View File

@ -0,0 +1,868 @@
---
title: "feat: 多维表格Bitable伴生服务 v1"
status: active
date: 2026-06-24
deepened: 2026-06-24
type: feat
origin: docs/brainstorms/2026-06-24-bitable-module-requirements.md
---
# 多维表格Bitable伴生服务 v1 实现规划
## Summary
为 AgentKit 引入多维表格伴生服务作为异构数据源Excel/数据库/爬虫API的统一持久化落地载体。Agent 是数据的主要作者采集写入用户在落地后的表上精修、配视图、做分析。v1 验证"采集→落地→网格视图→基础公式→附件"核心闭环。
本服务逻辑独立(自有 API/CLI/领域模型/存储当前共部署、UI 级集成,未来可零成本抽离。
## Problem Frame
AgentKit 缺少统一的持久化结构化数据落地载体。Excel 导出是单向的(`src/agentkit/documents/renderers/excel_renderer.py`Excel 解析只转文本进 RAG`src/agentkit/memory/document_loader.py``SharedWorkspace` 是带 TTL 的临时 KV。没有模块能把异构数据源的结构化数据持久化为可编辑、可视图、可计算的多维表格。
详见 origin: `docs/brainstorms/2026-06-24-bitable-module-requirements.md`
## Requirements
源自需求文档 v1 范围:
| ID | 需求 | 来源 |
|----|------|------|
| R1 | 服务骨架:领域模型(表/字段/记录/视图、API/CLI、独立 schema 存储 | 需求文档 §5 v1 |
| R2 | 字段所有权模型 + 按主键 upsert 语义(数据列归 Agent用户列保留 | 需求文档 §4.1 |
| R3 | 三类采集落地Excel 上传/URL、数据库导入、爬虫/API 采集) | 需求文档 §2 |
| R4 | 网格视图(排序/筛选/分页/单元格编辑) | 需求文档 §5 v1 |
| R5 | 基础公式列(算术/字符串/SUM/AVG/COUNT+ 基础引用列lookup | 需求文档 §4.2 |
| R6 | 图片/附件字段类型(复用现有文件上传能力) | 需求文档 §5 v1 |
| R7 | 异步公式重算 + "计算中"状态标记 | 需求文档 §4.2 |
| R8 | 伴生服务架构API/CLI 调用边界,不做进程内紧耦合 | 需求文档 §3 |
**成功标准**(源自需求文档 §9Agent 能把 Excel/DB/API 数据写入多维表格;用户能编辑单元格、新增公式列、看到异步重算结果;重复采集时按主键 upsert 保留用户列;服务通过 API/CLI 被调用。
---
## Key Technical Decisions
### KTD1: 存储选用 PostgreSQL非 SQLite跟随 evolution/memory 模式
现有伴生子系统calendar/documents/auth用 SQLite + 独立 `.db` 文件。bitable **偏离此模式**,改用 PostgreSQL + 独立 schema跟随 `src/agentkit/evolution/pg_store.py``src/agentkit/memory/models.py` 的 PostgreSQL 模式。
**理由**:需求文档要求可演进到单表 10万+行 + 并发写入Agent 采集 + 用户编辑同时。SQLite 的并发写锁和单文件规模是硬瓶颈。PostgreSQL 的 JSONB 查询能力、行级并发、索引支持是 bitable 的刚需。
**代价**bitable 要求部署环境配置 PostgreSQL不像 calendar/documents 开箱即用 SQLite。这是可接受的——需求文档已明确"共享 PG + 独立 schema"。
**模式参考**`src/agentkit/evolution/pg_store.py`PGBase 独立 + 延迟初始化 + 锁防并发)、`src/agentkit/memory/models.py`SQLAlchemy 2 declarative + JSONB + pgvector
### KTD2: 存储模型——字段定义表 + 记录表JSONB 存值)
不用 EAV一行一单元格100k×20=200万行太慢不用动态列加列要 DDL。采用
- **字段定义表** `bitable_fields`:每行一个字段定义(名称、类型、配置、所有权)
- **记录表** `bitable_records`:每行一条记录,`values` 列为 JSONB`{field_id: value}`
这是 Airtable/飞书多维表格的标准模式。JSONB 支持 GIN 索引和 `->>` 查询,兼顾灵活性与查询性能。加列/删列只改字段定义表,不动记录表结构。
### KTD3: 公式引擎——自研 Python 轻量引擎
不引入 HyperFormula商业付费、pycelGPL 传染风险、formulasEUPL 边界模糊)。自研,因为 v1 函数集小10-50 个)。
**架构**`ast`/`pyparsing` 解析公式为 AST → 构建 DAG字段依赖关系→ Kahn 算法拓扑排序 → DFS 检测循环引用 → 增量重算(仅重算受影响下游)。
**重算策略**:数据列写入 → 标记依赖该列的公式列为"计算中" → 异步队列按拓扑序重算 → 结果写回记录 JSONB → 状态置"完成"。
`ponytail:` 自研引擎的 O(V+E) 拓扑重算在万级公式单元格下足够;若未来公式量到十万级或需 Excel 100% 兼容,升级路径为迁移到 Univer 引擎Apache-2.0,免费商用)。
### KTD4: 网格视图组件——vxe-tableMIT
不选 Handsontable商业付费、ag-grid Enterprise付费功能、a-table 裸用10k+ 行无虚拟滚动)。选 vxe-tableVue 3 原生 + TS、MIT、横向+纵向虚拟滚动、可编辑 CRUD、自定义渲染器插槽实现附件/图片/公式列)。
公式列由后端计算后回填值,前端只渲染(不前端算公式)。
### KTD5: 服务边界——REST API 即使共部署也走 HTTP
需求文档要求"API/CLI 调用边界,不做进程内紧耦合"。即使 bitable 与 AgentKit 共进程部署Agent 调用 bitable 也走 localhost REST API`/api/v1/bitable/*`),而非直接 import service 类。
**理由**:满足伴生服务契约,未来抽离为零成本。代价是本地 HTTP 往返开销(可忽略)。
**例外**CLI 命令(`agentkit bitable ...`)可直接调用 service 层CLI 是运维工具,不是运行时调用路径)。
### KTD6: 字段所有权——field 元数据 `owner` 字段 + 自动推断
`bitable_fields` 表增加 `owner` 列(`agent` | `user`)。自动推断规则:公式列/引用列/手动标注列 → `user`Agent 采集写入的列 → `agent`。Agent 采集时可显式声明覆盖推断。
upsert 时只更新 `owner=agent` 的字段值,`owner=user` 的字段值原样保留。
### KTD7: 公式引擎安全约束——受限 AST walker + 白名单节点
`ast.parse` 后**禁止直接 `eval()`**。必须实现受限 AST walker仅允许白名单节点类型`Expression`、`BinOp`、`UnaryOp`、`BoolOp`、`Compare`、`Call`(仅已注册函数)、`Name`(仅字段引用)、`Constant`、`IfExp`。
**禁用节点**`Attribute`(防 `__import__`)、`Subscript`、`Lambda`、`Import`/`ImportFrom`、`Assign`/`AugAssign`、`For`/`While`、`FunctionDef`/`ClassDef`、`Subscript`、`Await`、`Yield`。遇到禁用节点立即抛出 `FormulaSecurityError`
**理由**:公式字符串来自用户输入和 Agent 输出,是信任边界。`ast.eval` 的 `eval` 模式仍允许 `__builtins__` 访问。受限 walker 是唯一安全方案。
**模式参考**Python `ast` 模块的 `NodeVisitor` + 白名单校验,类似 bandit 的 AST 检查模式。
### KTD8: Upsert 用 `jsonb_set` 逐字段合并,禁止整行替换
upsert 更新 agent 列时,**禁止** `UPDATE ... SET values = :new_values`(整行替换会覆盖 user 列)。必须用 `jsonb_set` 逐字段合并:
```sql
-- ponytail: 逐字段 jsonb_setO(字段数) per record万级批量 upsert 可接受
UPDATE bitable_records
SET values = jsonb_set(values, :field_path, :field_value, true)
WHERE id = :record_id
```
对每条记录的每个 agent 列执行一次 `jsonb_set`,或在单条 SQL 中嵌套多个 `jsonb_set`。user 列(`owner=user`)的值绝不出现在 UPDATE 语句中。
**理由**:整行替换是 upsert 语义破坏的最常见实现错误。`jsonb_set` 逐字段合并是唯一能保证"只更新 agent 列、保留 user 列"的正确实现。
### KTD9: 记录分页用 cursor-based非 offset-based
`GET /tables/{id}/records` 分页用 cursor`?cursor=...&limit=50`),非 `?offset=0&limit=50`
**理由**offset 分页在 100k 行时深翻页慢(`OFFSET 50000` 仍扫描前 5 万行。cursor 分页用 `WHERE id > :cursor ORDER BY id LIMIT :limit`恒定性能。代价是不支持随机跳页v1 不需要——网格视图是连续滚动)。
`ponytail:` cursor 分页不支持跳页;未来若需"跳到第 N 页",升级路径为 keyset + 估算偏移或预计算页索引。
### KTD10: vxe-table 与 Ant Design Vue CSS 隔离
vxe-table 引入全局 CSS`.vxe-*` 前缀),可能与 Ant Design Vue 的 `.ant-*` 样式冲突。隔离策略:
1. vxe-table 样式通过 `@import` 局部引入到 `BitableGrid.vue``<style scoped>` 不可行vxe-table 用全局类),改为在 `main.ts``import 'vxe-table/lib/style.css'` 且**只在 bitable 路由组件挂载时确保已加载**
2. bitable 网格容器用 `.bitable-grid-scope` 包裹vxe-table 的样式覆盖限定在该 scope 下(`.bitable-grid-scope .vxe-table { ... }`
3. 字体/颜色变量对齐 Ant Design Vue 的 token`var(--ant-primary-color)` 等),避免视觉割裂
**理由**vxe-table 和 Ant Design Vue 都是全局样式注入型组件库,不隔离会导致样式互相污染。
### KTD11: 内部 Agent→bitable HTTP 服务间认证
Agent 通过 BitableTool 调用 bitable REST API 时,不走用户 JWT 认证Agent 无用户会话)。改用**内部服务令牌**
- `agentkit.yaml` 配置 `bitable.internal_token`(启动时生成或手动配置)
- BitableTool 请求头携带 `X-Internal-Token: <token>`
- bitable 路由的 `require_authenticated` 依赖增加内部令牌分支:`Authorization: Bearer <jwt>` **或** `X-Internal-Token: <token>` 二选一
- 内部令牌仅授权 bitable 端点,不授权其他 API
**理由**:伴生服务架构要求 REST API 边界KTD5但 Agent 无用户会话。内部令牌是服务间认证的标准模式,比禁用认证安全,比共享 JWT 简单。
`ponytail:` 内部令牌是静态共享密钥,适合单实例部署;未来多实例/独立部署时升级为 mTLS 或 OAuth2 client credentials。
---
## High-Level Technical Design
### 组件架构
```mermaid
flowchart LR
subgraph AgentKit
Agent[Agent Loop] -->|HTTP API| BitableRoutes[Bitable Routes]
CLI[agentkit bitable CLI] -->|direct call| BitableService
BitableRoutes --> BitableService[BitableService]
BitableService --> BitableRepo[Repository]
BitableService --> FormulaEngine[Formula Engine]
BitableService --> RecalcQueue[Recalc Queue]
RecalcWorker[Recalc Worker] -->|consume| RecalcQueue
RecalcWorker --> FormulaEngine
RecalcWorker --> BitableRepo
BitableRepo --> PG[(PostgreSQL\nbitable schema)]
end
subgraph Ingestion
Agent -->|Excel/DB/Crawler| BitableTool[BitableTool]
BitableTool -->|HTTP| BitableRoutes
end
subgraph Frontend
GridView[Grid View\nvxe-table] -->|HTTP| BitableRoutes
GridStore[Pinia Store] --> GridView
end
```
### 数据模型 ERD
```mermaid
erDiagram
bitable_tables ||--o{ bitable_fields : has
bitable_tables ||--o{ bitable_records : has
bitable_tables ||--o{ bitable_views : has
bitable_fields ||--o{ bitable_recalc_queue : triggers
bitable_tables {
string id PK
string name
string description
string primary_key_field_id FK
string owner_user_id
timestamp created_at
timestamp updated_at
}
bitable_fields {
string id PK
string table_id FK
string name
string field_type "text/number/date/select/attachment/image/formula/lookup"
jsonb config "options, formula_expr, lookup_target"
string owner "agent|user"
timestamp created_at
}
bitable_records {
string id PK
string table_id FK
jsonb values "{field_id: value}"
timestamp created_at
timestamp updated_at
}
bitable_views {
string id PK
string table_id FK
string name
string view_type "grid|kanban|gantt|gallery|form"
jsonb config "filters, sorts, groupings, hidden_fields"
timestamp created_at
}
bitable_recalc_queue {
string id PK
string table_id FK
string record_id FK
string field_id FK
string status "pending|calculating|done|error"
string error_message
timestamp queued_at
timestamp completed_at
}
```
### 公式异步重算流程
```mermaid
sequenceDiagram
participant Agent
participant API
participant Service
participant Queue as Recalc Queue
participant Worker as Recalc Worker
participant Engine as Formula Engine
participant DB
Agent->>API: POST /records (upsert, data columns)
API->>Service: upsert_records(table_id, records, pk)
Service->>DB: upsert (update agent-owned columns only)
Service->>Service: detect affected formula fields (DAG lookup)
Service->>DB: mark formula cells "calculating"
Service->>Queue: enqueue recalc tasks (record_id, field_id)
Service-->>API: 202 Accepted (records saved, formulas calculating)
API-->>Agent: 202 + recalc pending count
loop async
Worker->>Queue: dequeue task
Worker->>DB: load source values + formula expr
Worker->>Engine: evaluate(formula_expr, source_values)
Engine-->>Worker: result | error
Worker->>DB: write result to record values JSONB
Worker->>DB: mark cell "done" | "error"
end
Note over Agent,API: Frontend polls or gets WS update for "done" status
```
---
## Scope Boundaries
### 本次范围内v1
- bitable 伴生服务骨架领域模型、PostgreSQL schema、REST API、CLI
- 字段所有权模型 + 按主键 upsert 语义
- 三类采集落地Excel/DB/爬虫API通过 BitableTool
- 网格视图vxe-table排序/筛选/分页/编辑)
- 基础公式列(算术/字符串/SUM/AVG/COUNT+ 基础引用列lookup
- 图片/附件字段类型
- 异步公式重算 + "计算中"状态
### 延后v2/v3见需求文档 §5
- 看板/甘特/画廊/表单视图
- 高级公式(日期/条件/跨表 rollup+ 函数库扩展
- 分析能力(分组/透视)
- 大规模优化(列式/分区/物化视图)
- 多人实时协作
### 本产品身份之外
- 不做通用电子表格(字段化记录模型,非单元格自由编辑)
- 不做 ETL/数据管道平台(采集是 Agent 按需执行,非定时调度)
- 不做 BI 仪表盘产品
- 不替代知识库 RAG
### Deferred to Follow-Up Work
- bitable 数据导出为 Excel/CSV现有 `excel_renderer.py` 可后续适配)
- bitable 记录的语义检索pgvector 索引,类似 episodic memory
- 多维表格与 Agent 记忆系统的联动bitable 作为 episodic memory 的结构化补充)
- WebSocket 实时推送公式重算完成事件v1 用轮询)
---
## Implementation Units
### U1. 领域模型 + PostgreSQL Schema + 服务骨架
**Goal:** 搭建 bitable 子系统的领域模型、数据库 schema 和服务骨架,为后续所有单元提供基础。
**Requirements:** R1, R8
**Dependencies:** 无(基础单元)
**Files:**
- `src/agentkit/bitable/__init__.py`(新建)
- `src/agentkit/bitable/models.py`新建Pydantic v2 数据模型)
- `src/agentkit/bitable/db.py`新建PostgreSQL schema + init 函数 + 迁移机制)
- `src/agentkit/bitable/repository.py`(新建,数据访问层)
- `src/agentkit/bitable/service.py`(新建,业务逻辑层骨架)
- `src/agentkit/server/app.py`修改lifespan 中初始化 bitable
- `tests/unit/bitable/test_models.py`(新建)
- `tests/unit/bitable/test_db.py`(新建)
**Approach:**
- Pydantic 模型:`Table`、`Field`(含 `FieldType` 枚举text/number/date/select/multiselect/attachment/image/formula/lookup、`Record`、`View`、`RecalcTask`。所有模型用 `model_config = ConfigDict(...)`
- PostgreSQL schema5 张表(`bitable_tables`、`bitable_fields`、`bitable_records`、`bitable_views`、`bitable_recalc_queue`+ 1 张元数据表(`bitable_meta`),置于独立 schema `bitable`。`bitable_records.values` 用 JSONB + GIN 索引。
- **主键唯一约束**`bitable_tables` 的 `primary_key_field_id` 指定的字段在 `bitable_records.values` 中对应值必须唯一。通过在 `bitable_records` 上建函数索引 `CREATE UNIQUE INDEX ... ON bitable_records (table_id, (values->>'{pk_field_id}')) WHERE values ? '{pk_field_id}'` 实现。upsert 按此索引匹配。
- **Recalc queue 索引**`bitable_recalc_queue` 在 `(status, queued_at)` 上建索引worker 按 status=pending + queued_at 排序消费);在 `(record_id, field_id)` 上建唯一索引防重复入队。
- **Schema 迁移机制**:采用 `src/agentkit/server/auth/models.py``_SCHEMA_VERSION` 模式。`bitable_meta` 表存 `schema_version``init_bitable_db()` 读取当前版本,按版本号顺序执行迁移脚本(`migrations/v1__init.sql`、`migrations/v2__add_index.sql` 等)。首次创建版本=1后续每次 init 检查版本并执行未应用的迁移。
- init 函数 `init_bitable_db()`:参考 `src/agentkit/evolution/pg_store.py` 的延迟初始化 + 锁防并发模式。`CREATE SCHEMA IF NOT EXISTS bitable` + 按版本执行迁移。
- service 骨架:`BitableService` 类,注入 repository暴露后续单元将实现的方法签名。
- app.py lifespan`try/except` 包裹初始化,失败时 `logger.exception` 不崩溃(参考 calendar 子系统初始化模式,`src/agentkit/server/app.py` 第 406-428 行)。
**Patterns to follow:**
- PostgreSQL 模式:`src/agentkit/evolution/pg_store.py`PGBase 独立、延迟初始化、锁防并发)
- SQLAlchemy 2 declarative`src/agentkit/server/auth/models.py``DeclarativeBase` + `Mapped` + `mapped_column` + `_SCHEMA_VERSION` 迁移机制)
- JSONB 元数据:`src/agentkit/memory/models.py``metadata_` 字段用 JSONB
- 表名安全校验:`src/agentkit/evolution/experience_store.py``_SAFE_TABLE_NAME_PATTERN` 防 SQL 注入)
**Test scenarios:**
- Happy path: `init_bitable_db()` 创建 schema 和 6 张表(含 `bitable_meta`),幂等重复调用不报错
- Happy path: Pydantic 模型序列化/反序列化 round-trip 正确Table/Field/Record/View
- Edge case: `Field``config` JSONB 在不同 field_type 下结构正确formula 类型有 `formula_expr`lookup 类型有 `lookup_target`select 类型有 `options`
- Edge case: `Record.values` JSONB 为空 `{}` 时合法(新记录无值)
- Covers 迁移: `bitable_meta.schema_version` 初始为 1模拟 v2 迁移脚本存在时,第二次 init 执行 v2 迁移并更新版本号
- Covers 主键约束: 设置主键字段后,插入两条相同主键值的记录触发唯一约束冲突
- Covers 队列去重: `bitable_recalc_queue``(record_id, field_id)` 唯一索引阻止重复入队
- Error path: PostgreSQL 不可用时 `init_bitable_db()` 抛出明确异常app.py lifespan 捕获后 bitable 降级
- Integration: app.py lifespan 初始化后 `app.state.bitable_service` 存在
**Verification:** `init_bitable_db()` 成功创建 schema + 6 张表 + 迁移版本记录主键唯一约束生效Pydantic 模型可正确序列化app 启动后 bitable service 可用(或降级记日志)。
---
### U2. CRUD API + 字段所有权 + Upsert 语义
**Goal:** 实现 bitable 的 REST API表/字段/记录/视图 CRUD含字段所有权模型和按主键 upsert 语义。
**Requirements:** R1, R2, R8
**Dependencies:** U1
**Files:**
- `src/agentkit/server/routes/bitable.py`新建FastAPI 路由)
- `src/agentkit/bitable/service.py`(修改,实现 CRUD + upsert 逻辑)
- `src/agentkit/bitable/repository.py`(修改,实现数据访问)
- `src/agentkit/server/app.py`(修改,注册路由 `app.include_router(bitable_routes.router, prefix="/api/v1")`
- `tests/unit/bitable/test_service.py`(新建)
- `tests/unit/bitable/test_routes.py`(新建)
**Approach:**
- 路由:`router = APIRouter(prefix="/bitable", tags=["bitable"])`,最终前缀 `/api/v1/bitable`。参考 `src/agentkit/server/routes/calendar.py`
- 端点:
- 表:`POST /tables`、`GET /tables`、`GET /tables/{id}`、`PATCH /tables/{id}`、`DELETE /tables/{id}`
- 字段:`POST /tables/{id}/fields`、`GET /tables/{id}/fields`、`PATCH /fields/{id}`、`DELETE /fields/{id}`
- 记录:`POST /tables/{id}/records`(批量插入)、`GET /tables/{id}/records`cursor 分页+筛选+排序,见 KTD9、`PATCH /records/{id}`、`DELETE /tables/{id}/records`(批量删除)
- 视图:`POST /tables/{id}/views`、`GET /tables/{id}/views`、`PATCH /views/{id}`
- 字段所有权:创建字段时 `owner` 默认 `user`Agent 通过 BitableTool 写入时声明 `owner=agent`。upsert 时只更新 `owner=agent` 的字段值。
- **Upsert 实现KTD8**upsert 更新阶段**禁止** `UPDATE ... SET values = :new_values`。必须用 `jsonb_set` 逐字段合并——对每条记录的每个 agent 列执行 `jsonb_set(values, '{field_id}', :value, true)`。user 列值绝不出现在 UPDATE 语句中。批量 upsert 用单事务包裹,失败回滚。
- **字段删除依赖检查**:删除字段前检查:(1) 是否有公式字段引用该字段DAG 反向查找);(2) 是否是表的主键字段;(3) 是否有视图的 filter/sort 配置引用该字段。有依赖时返回 409 Conflict + 依赖列表,不直接删除。强制删除时(`?force=true`)级联清理:公式字段标记为 error、视图配置移除该字段引用、记录 JSONB 中移除该字段 key`values - '{field_id}'`)。
- **视图过滤翻译**:视图的 `config.filters`(如 `[{field_id, op, value}]`)在查询记录时翻译为 JSONB 查询条件。`op` 支持 `eq`/`ne`/`contains`/`gt`/`lt`/`is_empty`。翻译为 `WHERE values->>'{field_id}' {op_sql} :value`。排序翻译为 `ORDER BY values->>'{field_id}'`。注意JSONB `->>` 返回 textnumber/date 比较需 cast`CAST(values->>'{field_id}' AS NUMERIC)`)。
- 认证:`require_authenticated` 依赖(参考 `src/agentkit/server/auth/dependencies.py`+ 内部令牌分支KTD11
- 服务访问:路由通过 `request.app.state.bitable_service` 获取 service参考 calendar.py 模式)。
**Patterns to follow:**
- 路由模块:`src/agentkit/server/routes/calendar.py``APIRouter` + `app.state` 服务访问 + 503 降级)
- 认证依赖:`src/agentkit/server/auth/dependencies.py``require_authenticated`、`require_permission`
- 表名/字段名校验:`_SAFE_TABLE_NAME_PATTERN` 模式防注入
- JSONB 查询:`src/agentkit/memory/models.py` 的 JSONB 操作模式
**Test scenarios:**
- Happy path: 创建表 → 添加字段 → 插入记录 → 查询记录cursor 分页)完整流程
- Happy path: upsert 模式——首次插入 3 条记录,第二次 upsert 同主键不同数据列值,记录数不变、数据列更新
- Covers R2 + KTD8: upsert 时用户列(`owner=user`)值不被覆盖——先手动设置 user 列值,再 upsertuser 列值不变。验证 SQL 层面用 `jsonb_set` 而非整行替换
- Covers 字段删除: 删除被公式引用的字段返回 409 + 依赖列表;`?force=true` 后级联清理公式字段状态
- Covers 字段删除: 删除主键字段返回 409
- Covers 视图过滤: 视图配置 `filter: [{field_id, op: "gt", value: 100}]` 查询时正确过滤 number 字段CAST 为 NUMERIC
- Edge case: 主键字段未设置时 upsert 报 400
- Edge case: 批量插入空数组返回成功且 count=0
- Edge case: cursor 分页——第一页返回 next_cursor第二页用该 cursor 获取后续记录,无更多数据时 next_cursor 为 null
- Covers 并发: 两个并发 upsert 同主键不同 agent 列——一个成功一个等待,最终两列都更新(行级锁)
- Error path: 表不存在时所有操作返回 404
- Error path: 字段类型不匹配(往 number 字段写非数字)的校验行为
- Integration: 字段所有权自动推断——Agent 声明 `owner=agent` 的字段upsert 后该字段更新;`owner=user` 的字段不更新
**Verification:** API 端点可 CRUD 表/字段/记录/视图upsert 用 `jsonb_set` 正确保留用户列;字段删除有依赖检查;视图过滤正确翻译为 JSONB 查询cursor 分页正确。
---
### U3. 公式引擎 + 异步重算管道 + 基础引用列
**Goal:** 实现自研 Python 公式引擎(解析、依赖图、重算)和异步重算管道,支持基础公式(算术/字符串/SUM/AVG/COUNT和基础引用列lookup
**Requirements:** R5, R7
**Dependencies:** U1, U2
**Files:**
- `src/agentkit/bitable/formula/__init__.py`(新建)
- `src/agentkit/bitable/formula/parser.py`(新建,公式解析为 AST
- `src/agentkit/bitable/formula/engine.py`新建DAG + 拓扑排序 + 求值)
- `src/agentkit/bitable/formula/functions.py`(新建,内置函数库)
- `src/agentkit/bitable/recalc_worker.py`(新建,异步重算 worker
- `src/agentkit/bitable/service.py`(修改,写入时触发重算入队)
- `src/agentkit/server/app.py`修改lifespan 启动 recalc worker
- `tests/unit/bitable/test_formula_parser.py`(新建)
- `tests/unit/bitable/test_formula_engine.py`(新建)
- `tests/unit/bitable/test_recalc.py`(新建)
**Approach:**
- **解析器**`parser.py`):用 `ast` 模块解析公式字符串(如 `=SUM({field_abc}) + {field_xyz} * 2`)为 AST。字段引用用 `{field_id}` 语法。支持算术运算符、字符串拼接、函数调用。
- **AST 安全约束KTD7**`ast.parse` 后实现受限 `NodeVisitor`,仅允许白名单节点(`Expression`/`BinOp`/`UnaryOp`/`BoolOp`/`Compare`/`Call`仅已注册函数/`Name`仅字段引用/`Constant`/`IfExp`)。禁用 `Attribute`/`Subscript`/`Lambda`/`Import`/`Assign`/`For`/`While`/`FunctionDef`/`ClassDef`/`Await`/`Yield`。遇到禁用节点抛 `FormulaSecurityError`。**禁止** `eval()`/`exec()`。
- **引擎**`engine.py`
- 构建 DAG遍历 AST 提取字段依赖,建立字段间依赖图
- 拓扑排序Kahn 算法,确定重算顺序
- 循环检测DFS 检测循环引用,抛出 `CircularReferenceError`
- 求值:按拓扑序遍历 AST解析字段引用从 record values 读取),调用函数库
- **函数库**`functions.py`v1 实现 `SUM`、`AVG`、`COUNT`、`MIN`、`MAX`、`CONCAT`、`ABS`、`ROUND`、`IF`、`LEN`。每个函数注册到 `FUNCTION_REGISTRY`
- **聚合函数语义边界**`SUM({field_id})` 中 `{field_id}` 引用整列(聚合上下文),返回标量;`{field_id} + 1` 中 `{field_id}` 引用当前记录的值(行上下文),返回标量。**区分规则**:聚合函数的参数为列引用时聚合整列,否则按行求值。`SUM({f1} + {f2})` = 对每行 `f1+f2` 求和(聚合);`{f1} + SUM({f2})` = 当前行 f1 + f2 列总和(混合)。解析器在 AST 层面标记聚合上下文,引擎按标记决定取列值还是取行值。
- **引用列lookup**lookup 字段的 `config.lookup_target = {table_id, field_id, filter_field_id, filter_value}`。求值时从目标表查询匹配记录的指定字段值。复用引擎的字段引用解析机制。lookup 是只读引用,不参与 DAG 环检测(不会形成环)。
- **异步重算管道**`recalc_worker.py`
- 数据列写入后service 检测受影响的公式字段DAG 反向查找下游)
- 标记 `bitable_recalc_queue``pending`,记录 JSONB 中公式字段值置为 `{__status: "calculating"}`。入队时利用 `(record_id, field_id)` 唯一索引去重(已有 pending 任务则跳过ON CONFLICT DO NOTHING
- worker 从队列消费任务,按拓扑序重算,结果写回 JSONB状态置 `done`/`error`
- **事务边界**:每个 recalc task 的"读取源值→求值→写回结果→标记完成"在单事务中完成,避免读到半更新状态。写回用 `jsonb_set` 只更新公式字段值,不影响其他字段。
- **Worker 崩溃恢复**worker 启动时扫描 `status='calculating'` 的任务(上次崩溃残留),重置为 `pending` 重新入队。app.py lifespan 中 worker 作为 asyncio task 启动,关闭时发 sentinel 优雅停止。
- **Reaper 机制**:定时任务(每 5 分钟)扫描 `status='pending'``queued_at` 超过 10 分钟的任务,重置为 `pending` 并重新入队(防 worker 卡死)。
- worker 在 app.py lifespan 中作为 asyncio task 启动,关闭时优雅停止
- **异步生成器安全**:若 worker 用 `async def` + `yield` 消费队列,遵守 `return; yield` 模式(见 `.trae/rules/project_rules.md`)。
**Execution note:** 公式引擎核心逻辑(解析 + AST 安全 + DAG + 求值)建议测试先行——先写解析器、安全约束和循环检测的测试,再实现。
**Patterns to follow:**
- 异步任务生命周期:`src/agentkit/server/app.py` lifespan 中 calendar/evolution 子系统的启动/关闭模式
- 异步生成器安全:`.trae/rules/project_rules.md``return; yield` 模式)
- 队列消费模式:`src/agentkit/experts/team.py` 的 `HandoffTransport`bounded queue + sentinel 关闭)
- AST 安全检查Python `ast.NodeVisitor` 白名单模式
**Test scenarios:**
- Happy path: `=1+2*3` 解析为 AST 并求值得 7
- Happy path: `=SUM({f1})` 其中 f1 列值为 [1,2,3] 求值得 6聚合上下文
- Happy path: `=CONCAT({f1}, "-", {f2})` 求值得 "a-b"(行上下文)
- Happy path: `={f1} + SUM({f2})` 混合上下文——当前行 f1 + f2 列总和
- Happy path: lookup 字段从目标表查询匹配记录的值
- Covers KTD7 安全: 公式 `=__import__('os')` 抛出 `FormulaSecurityError`Attribute 节点被禁)
- Covers KTD7 安全: 公式 `=(lambda: 1)()` 抛出 `FormulaSecurityError`Lambda 节点被禁)
- Covers KTD7 安全: 公式 `=eval('1+1')` 抛出 `FormulaSecurityError`Call 节点函数名不在注册表)
- Covers 聚合语义: `SUM({f1})` 聚合整列 vs `{f1} + 1` 引用当前行——两者语义不同,测试验证
- Edge case: 公式引用不存在的 field_id 报错
- Edge case: 空值参与运算的语义SUM 忽略空值,算术遇空值报错)
- Covers R7: 数据列写入后,公式字段进入 "calculating" 状态worker 重算后变为 "done" 且值正确
- Covers 崩溃恢复: 模拟 worker 在 `calculating` 状态崩溃 → 重启后任务重置为 `pending` 并重算成功
- Covers 去重: 同一 (record_id, field_id) 并发入队两次 → 队列中只有一个任务(唯一索引)
- Covers 事务: recalc task 执行期间另一请求更新源字段 → task 在自己的事务中读到一致快照
- Error path: 循环引用 `f1 = f2 + 1`、`f2 = f1 + 1` 抛出 `CircularReferenceError`
- Error path: 公式语法错误(括号不匹配)抛出 `FormulaParseError`
- Error path: 函数不存在(`=UNKNOWN()`)抛出 `UnknownFunctionError`
- Integration: 多条记录批量 upsert 后,所有受影响公式字段被入队重算,最终值正确
**Verification:** 公式可解析、求值AST 安全约束阻止注入聚合函数语义正确循环引用被检测异步重算管道正确更新公式值且支持崩溃恢复lookup 引用列正确跨表取值。
---
### U4. Agent BitableTool + 三类采集落地
**Goal:** 实现 `BitableTool`Agent 工具支持三类数据采集写入Excel 上传/URL、数据库导入、爬虫/API 采集。
**Requirements:** R3, R8
**Dependencies:** U2
**Files:**
- `src/agentkit/tools/bitable_tool.py`(新建,`BitableTool(Tool)`
- `src/agentkit/bitable/ingestion/__init__.py`(新建)
- `src/agentkit/bitable/ingestion/excel.py`新建Excel 解析+写入)
- `src/agentkit/bitable/ingestion/database.py`(新建,数据库导入)
- `src/agentkit/bitable/ingestion/api_collector.py`新建API/爬虫采集)
- `src/agentkit/server/app.py`(修改,注册 BitableTool 到 tool_registry
- `tests/unit/bitable/test_bitable_tool.py`(新建)
- `tests/unit/bitable/test_ingestion_excel.py`(新建)
**Approach:**
- **BitableTool**:继承 `Tool``src/agentkit/tools/base.py``input_schema` 用 `action` 枚举区分操作。通过 HTTP 调用 bitable REST APIKTD5即使共部署也走 HTTP
- actions: `create_table`、`import_excel`、`import_database`、`collect_api`、`upsert_records`、`query_records`
- 工具描述用英文(供 LLM function calling
- **内部认证KTD11**HTTP 请求头携带 `X-Internal-Token: <token>`token 从 `agentkit.yaml``bitable.internal_token` 读取。BitableTool 初始化时注入 token。
- **批量分块**`upsert_records` 和 `import_excel`/`import_database` 写入时,单次 HTTP 请求最多 500 条记录(`BATCH_SIZE=500`)。超过则分块发送,每块独立 HTTP 请求。失败时返回已成功块数 + 失败块详情,支持断点续传(调用方传入 `resume_from` 跳过已成功块)。
- **Excel 导入**`excel.py`):复用 `src/agentkit/memory/document_loader.py``_parse_xlsx` 逻辑openpyxl但改为返回结构化数据`{sheet_name: [{col: val}]}`)而非 Markdown 文本。支持文件上传和 URL 两种输入。按 sheet 创建表,首行为字段名,自动推断字段类型。
- **数据库导入**`database.py`):接收连接串 + 表名列表,用 SQLAlchemy 反射读取表结构,按表生成 bitable 表+字段批量导入数据。字段类型映射DB int→number、varchar→text、timestamp→date
- **API 采集**`api_collector.py`Agent 已有爬虫/API 调用能力(工具系统),此模块只负责"把 Agent 采集到的结构化数据按字段写入 bitable"。接收 `{records: [...], field_mapping: {...}}`,调用 upsert API 写入。
- **注册**app.py lifespan 中 `app.state.tool_registry.register(BitableTool(...))`,参考 calendar_tool 注册模式(`src/agentkit/server/app.py` 第 422 行)。
**Patterns to follow:**
- 工具基类:`src/agentkit/tools/base.py``Tool` 抽象类 + `safe_execute`
- 工具实现:`src/agentkit/tools/calendar_tool.py``action` 枚举 + service 注入 + input_schema
- Excel 解析:`src/agentkit/memory/document_loader.py` 的 `_parse_xlsx`openpyxl + `MAX_ROWS_PER_SHEET`
- 工具注册:`src/agentkit/server/app.py` lifespan`tool_registry.register`
**Test scenarios:**
- Happy path: `import_excel` 上传 .xlsx 文件 → 创建表 + 字段 + 记录,数据正确
- Happy path: `import_excel` 提供 URL → 下载 + 解析 + 写入
- Happy path: `import_database` 指定表名 → 生成 bitable 表,字段类型映射正确
- Happy path: `collect_api` 接收 records + field_mapping → upsert 写入
- Covers KTD11 认证: BitableTool 请求头携带 `X-Internal-Token`,无 token 时 bitable API 返回 401
- Covers 批量分块: 1200 条记录的 upsert → 分 3 次 HTTP 请求500+500+200全部成功
- Covers 批量分块: 1200 条记录,第 2 块失败 → 返回已成功 500 条 + 失败详情,`resume_from=500` 续传
- Covers R3: 三类采集场景各自端到端写入成功
- Edge case: Excel 空表(只有表头无数据行)→ 创建表+字段0 条记录
- Edge case: Excel 合并单元格——仅左上角有值(已知局限,参考 `document_loader.py`
- Edge case: 数据库表无主键——bitable 表自动生成 id 字段作为主键
- Error path: Excel 文件格式损坏 → 明确错误信息
- Error path: 数据库连接失败 → 错误返回,不创建 bitable 表
- Error path: bitable REST API 不可用503→ 工具返回明确错误
- Integration: Agent 通过 BitableTool 创建表并写入数据后,前端网格视图可查看到数据
**Verification:** Agent 能通过 BitableTool 执行三类采集写入;内部认证正确;批量分块写入可靠;数据正确落地到 bitable 表。
---
### U5a. 前端网格视图 + Store + API 客户端 + 公式轮询
**Goal:** 实现前端网格视图vxe-table支持排序/筛选/cursor 分页/单元格编辑,配套 Pinia store、API 客户端和公式重算状态轮询。
**Requirements:** R4, R7前端侧
**Dependencies:** U2
**Files:**
- `src/agentkit/server/frontend/src/api/bitable.ts`(新建,`BitableApiClient extends BaseApiClient` + 类型定义)
- `src/agentkit/server/frontend/src/stores/bitable.ts`(新建,`defineStore('bitable', ...)`
- `src/agentkit/server/frontend/src/views/BitableView.vue`(新建,主视图,全屏布局)
- `src/agentkit/server/frontend/src/components/bitable/BitableGrid.vue`新建vxe-table 网格组件)
- `src/agentkit/server/frontend/src/components/bitable/TableViewList.vue`(新建,表列表侧栏)
- `src/agentkit/server/frontend/src/router/index.ts`(修改,添加 `/agent/bitable` 路由,全屏布局)
- `src/agentkit/server/frontend/src/main.ts`(修改,`import 'vxe-table/lib/style.css'`
- `src/agentkit/server/frontend/package.json`(修改,添加 vxe-table 依赖)
**Approach:**
- **API 客户端**`bitable.ts``BitableApiClient extends BaseApiClient``src/agentkit/server/frontend/src/api/base.ts``API_BASE = '/api/v1/bitable'`。类型定义同文件(`IBitableTable`、`IBitableField`、`IBitableRecord`、`IBitableView`),参考 `src/agentkit/server/frontend/src/api/calendar.ts` 模式。
- **Pinia store**`bitable.ts`Composition API 风格statetables、currentTable、fields、records、views、loading、recalcPendingCountactionsloadTables、selectTable、loadRecords、updateCell、addField、pollRecalcStatus。参考 `src/agentkit/server/frontend/src/stores/calendar.ts`
- **网格组件**`BitableGrid.vue`vxe-table v4配置
- 虚拟滚动:`virtualXConfig` + `virtualYConfig`(支持 10k+ 行)
- 可编辑单元格:`edit-config`按字段类型渲染编辑器text/number/date/select
- 自定义列渲染器:附件/图片列用插槽渲染U6 实现),公式列显示值或"计算中"标记
- 排序/筛选vxe-table 内置 + 服务端 cursor 分页
- **CSS 隔离KTD10**:容器用 `.bitable-grid-scope` 包裹vxe-table 样式覆盖限定在该 scope 下
- **公式轮询策略**store 中 `pollRecalcStatus` action——当记录中存在 `__status: "calculating"` 的公式字段时,每 2s 轮询 `GET /tables/{id}/records?fields=calculating_only`。所有公式字段变为 `done`/`error` 后停止轮询。切换表或组件卸载时清理定时器。
- **主视图**`BitableView.vue`):左侧表列表 + 右侧网格。**全屏布局**(非 AgentLayout 的 55% 象限bitable 需要完整宽度展示网格。
- **路由**`/agent` children 中添加 `{ path: 'bitable', name: 'agent-bitable', meta: { title: '多维表格', panel: 'full' }, component: () => import('@/views/BitableView.vue') }`。`meta.panel = 'full'` 表示全屏(需在 AgentLayout 中支持 full panel 类型,或直接用独立 layout
- **依赖**`npm install vxe-table`MIT
**Patterns to follow:**
- API 客户端:`src/agentkit/server/frontend/src/api/calendar.ts``BaseApiClient` 继承 + 类型同文件 + `isXxx` 类型守卫)
- Pinia store`src/agentkit/server/frontend/src/stores/calendar.ts`Composition API + 错误中文化 + notification
- 路由注册:`src/agentkit/server/frontend/src/router/index.ts``/agent` children + `meta.panel`
- 组件结构:`src/agentkit/server/frontend/src/components/calendar/`(主视图 + 子组件分层)
**Test scenarios:**
- Happy path: 打开 `/agent/bitable` → 显示表列表 → 选择表 → 网格加载数据cursor 分页)
- Happy path: 双击单元格 → 编辑 → 保存 → 数据持久化到后端
- Happy path: 点击列头排序 → 服务端排序 → 数据刷新
- Happy path: 滚动到底部 → cursor 分页加载下一页
- Covers R7 轮询: 公式列显示"计算中"标记 → 2s 后轮询 → 重算完成更新为值 → 轮询停止
- Covers KTD10 隔离: vxe-table 样式不污染 Ant Design Vue 组件(视觉检查)
- Edge case: 10k 行数据虚拟滚动流畅(无卡顿)
- Edge case: 空表0 条记录)显示空状态
- Edge case: 切换表时轮询定时器被清理(无内存泄漏)
- Error path: 后端 503bitable 未初始化)→ 显示降级提示
- Integration: Agent 通过 BitableTool 写入数据后,前端刷新可见
**Verification:** 前端可查看/编辑 bitable 数据10k 行虚拟滚动流畅公式列正确显示计算状态并轮询更新CSS 隔离无污染。
---
### U5b. 表/字段管理 UI
**Goal:** 实现表创建、字段管理(新增/编辑/删除/类型配置)的前端 UI使用户能自主建表和配置字段不依赖 Agent 采集)。
**Requirements:** R1, R2
**Dependencies:** U5a
**Files:**
- `src/agentkit/server/frontend/src/components/bitable/TableCreateModal.vue`(新建,创建表对话框)
- `src/agentkit/server/frontend/src/components/bitable/FieldManagePanel.vue`(新建,字段管理面板)
- `src/agentkit/server/frontend/src/components/bitable/FieldConfigForm.vue`(新建,字段配置表单——按类型动态渲染)
- `src/agentkit/server/frontend/src/stores/bitable.ts`(修改,添加 createTable、addField、updateField、deleteField actions
- `src/agentkit/server/frontend/src/api/bitable.ts`(修改,补充表/字段 CRUD 方法)
**Approach:**
- **创建表对话框**`TableCreateModal.vue``a-modal` + `a-form`,输入表名、描述、主键字段名。提交后调用 `POST /tables`
- **字段管理面板**`FieldManagePanel.vue`):侧滑面板(`a-drawer`列出当前表所有字段名称、类型、owner 标签),支持新增/编辑/删除。删除时调用 API若返回 409有依赖则显示依赖列表确认框。
- **字段配置表单**`FieldConfigForm.vue`):按 `field_type` 动态渲染配置项:
- text/number无额外配置
- date日期格式
- select/multiselect选项列表可增删
- formula公式表达式输入框 + 实时语法校验(调用后端 `POST /bitable/fields/validate-formula`
- lookup目标表+字段+过滤条件选择器
- attachment/image无额外配置
- **owner 标签**:字段列表中 `owner=agent` 显示蓝色"Agent"标签,`owner=user` 显示绿色"用户"标签。
**Patterns to follow:**
- 对话框/抽屉Ant Design Vue 的 `a-modal`/`a-drawer` 模式
- 动态表单:`a-form` + `v-if` 按 type 渲染
- 表单校验:`src/agentkit/server/frontend/src/components/calendar/` 中的表单校验模式
**Test scenarios:**
- Happy path: 点击"新建表" → 填写表名+主键字段 → 提交 → 表列表刷新
- Happy path: 打开字段管理 → 新增 formula 字段 → 输入公式 → 语法校验通过 → 保存
- Happy path: 编辑 select 字段 → 增删选项 → 保存 → 网格中该列下拉选项更新
- Covers 字段删除: 删除被公式引用的字段 → 显示 409 依赖列表 → 确认强制删除 → 公式字段标记 error
- Edge case: 公式语法错误 → 实时校验显示错误提示,保存按钮禁用
- Edge case: lookup 字段配置——选择目标表后加载该表字段列表
- Edge case: owner 标签正确显示agent 蓝色 / user 绿色)
- Error path: 网络错误 → notification 提示
**Verification:** 用户可通过 UI 创建表、管理字段(含公式语法校验)、删除字段(含依赖检查);字段 owner 标签正确。
---
### U5c. 视图配置 UI
**Goal:** 实现视图管理(创建/切换/配置筛选排序)的前端 UI使用户能保存和切换不同的数据查看视角。
**Requirements:** R4
**Dependencies:** U5a
**Files:**
- `src/agentkit/server/frontend/src/components/bitable/ViewSwitcher.vue`(新建,视图切换器)
- `src/agentkit/server/frontend/src/components/bitable/ViewConfigPanel.vue`(新建,视图配置面板——筛选/排序/隐藏字段)
- `src/agentkit/server/frontend/src/components/bitable/FilterBuilder.vue`(新建,筛选条件构建器)
- `src/agentkit/server/frontend/src/stores/bitable.ts`(修改,添加 createView、updateView、switchView actions
- `src/agentkit/server/frontend/src/api/bitable.ts`(修改,补充视图 CRUD 方法)
**Approach:**
- **视图切换器**`ViewSwitcher.vue`):网格顶部 tab 栏,列出当前表的所有视图,点击切换。"+"按钮创建新视图(输入名称+类型v1 仅 grid
- **视图配置面板**`ViewConfigPanel.vue``a-drawer`,三个 tab筛选、排序、隐藏字段。
- 筛选:`FilterBuilder` 组件支持增删筛选条件field + op + valueop 选项按字段类型动态变化
- 排序:字段选择 + 升序/降序
- 隐藏字段:字段列表 checkbox勾选隐藏
- **FilterBuilder**`FilterBuilder.vue`):条件列表,每行 `a-select`(字段)+ `a-select`(操作符)+ `a-input`/`a-date-picker`。op 选项text→eq/ne/contains/is_emptynumber→eq/ne/gt/lt/gte/ltedate→eq/gt/lt/between。
- 配置变更实时保存到视图 `config``PATCH /views/{id}`),网格立即重新查询。
**Patterns to follow:**
- Tab 栏Ant Design Vue `a-tabs`
- 条件构建器:动态表单行 + `a-select` 联动
**Test scenarios:**
- Happy path: 创建视图"高价订单" → 配置筛选 `amount > 1000` → 网格只显示匹配记录
- Happy path: 切换视图 → 网格数据按视图配置刷新
- Happy path: 配置排序 `created_at desc` → 网格按创建时间倒序
- Happy path: 隐藏字段 → 网格中该列消失
- Edge case: 筛选 number 字段 → op 选项为 eq/ne/gt/lt非 contains
- Edge case: 筛选 date 字段 → 值输入为 `a-date-picker`
- Edge case: 多个筛选条件 AND 组合
- Edge case: 视图配置变更实时保存 + 网格刷新
- Error path: 筛选值类型不匹配 → 校验提示
**Verification:** 用户可创建/切换视图;筛选/排序/隐藏字段配置生效并实时保存;不同字段类型的筛选操作符正确。
---
### U6. 图片/附件字段类型
**Goal:** 实现图片和附件字段类型,复用现有文件上传能力,支持上传、存储、预览。
**Requirements:** R6
**Dependencies:** U2, U5a
**Files:**
- `src/agentkit/server/routes/bitable.py`(修改,添加附件上传端点)
- `src/agentkit/bitable/service.py`(修改,附件存储逻辑 + 记录删除时附件清理)
- `src/agentkit/server/frontend/src/components/bitable/AttachmentCell.vue`(新建,附件单元格渲染)
- `src/agentkit/server/frontend/src/components/bitable/ImageCell.vue`(新建,图片单元格渲染,懒加载)
- `tests/unit/bitable/test_attachment.py`(新建)
**Approach:**
- **后端**:复用现有上传端点模式(`src/agentkit/server/routes/chat.py` 第 1240-1276 行的 `POST /api/v1/chat/upload`),新建 `POST /api/v1/bitable/tables/{id}/upload`,存储到 `data/uploads/bitable/`。附件值在记录 JSONB 中存为 `[{filename, stored_name, url, size, mime_type}]` 数组。
- **记录删除时附件清理**`DELETE /tables/{id}/records` 和 `DELETE /records/{id}`service 层先读取待删记录的 attachment/image 字段值,删除对应物理文件,再删除记录。用 `try/except` 包裹文件删除(文件丢失不阻断记录删除,只记日志)。
- **前端**`AttachmentCell.vue` 渲染附件列表(文件名+下载链接),`ImageCell.vue` 渲染图片缩略图(点击预览)。作为 vxe-table 的自定义渲染器插槽。
- **图片懒加载**`ImageCell.vue` 用 `IntersectionObserver` 或 vxe-table 的虚拟滚动可见性回调——仅视口内的图片加载 `src`,视口外的用占位图。缩略图用 `?thumbnail=true` 参数请求后端生成缩略图(或前端 CSS 缩放)。
- **字段类型**`attachment`(通用文件)和 `image`(仅图片,前端限制 accept
**Patterns to follow:**
- 文件上传:`src/agentkit/server/routes/chat.py``MAX_UPLOAD_SIZE` + `data/uploads/` 存储 + 下载端点)
- 前端上传组件:`src/agentkit/server/frontend/src/components/kb/DocumentUpload.vue``a-upload-dragger`
- 文件展示:`src/agentkit/server/frontend/src/components/chat/messages/FileAttachment.vue`(文件类型识别)
**Test scenarios:**
- Happy path: 上传图片 → 记录 image 字段存为 `[{filename, url, ...}]` → 前端显示缩略图
- Happy path: 上传 PDF 附件 → attachment 字段存值 → 前端显示文件名+下载链接
- Covers 附件清理: 删除含附件的记录 → 物理文件被删除 → 记录被删除
- Covers 附件清理: 删除记录时文件已丢失 → 记录仍被删除(不阻断),日志记录
- Covers 懒加载: 10k 行含图片记录 → 仅视口内图片加载,滚动流畅
- Edge case: 上传超过大小限制的文件 → 413 错误
- Edge case: image 字段上传非图片文件 → 校验拒绝
- Edge case: 一条记录的 attachment 字段上传多个文件 → 数组存储多个
- Error path: 上传时磁盘满 → 明确错误
- Integration: Agent 通过 BitableTool 写入含附件的记录 → 前端正确渲染
**Verification:** 图片/附件可上传、存储、预览、下载;记录删除时附件文件被清理;图片懒加载保证大表滚动流畅;字段值正确存为 JSONB 数组。
---
### U7. CLI 子命令
**Goal:** 实现 `agentkit bitable` CLI 子命令组,供运维和脚本化操作。
**Requirements:** R1, R8
**Dependencies:** U2
**Files:**
- `src/agentkit/cli/bitable.py`(新建,`bitable_app = typer.Typer(...)`
- `src/agentkit/cli/main.py`(修改,`app.add_typer(bitable_app, name="bitable")`
- `tests/unit/bitable/test_cli.py`(新建)
**Approach:**
- 子命令:`list-tables`、`create-table`、`import-excel`、`export-excel`(复用 `excel_renderer.py`)、`query`。
- CLI 直接调用 `BitableService`KTD5 例外CLI 是运维工具,非运行时调用路径)。
- 参考 `src/agentkit/cli/task.py``task_app` 模式。
**Patterns to follow:**
- CLI 子命令组:`src/agentkit/cli/task.py``typer.Typer` + `@app.command`
- 注册:`src/agentkit/cli/main.py``app.add_typer`
- 输出格式Typer + Rich项目 CLI 已用 Rich
**Test scenarios:**
- Happy path: `agentkit bitable list-tables` 列出所有表
- Happy path: `agentkit bitable create-table --name "测试"` 创建表
- Happy path: `agentkit bitable import-excel --file data.xlsx --table "导入表"` 导入 Excel
- Edge case: 表不存在时 `query` 报错
- Error path: bitable 未初始化时 CLI 报明确错误
**Verification:** CLI 子命令可执行表管理和数据导入操作。
---
## Test Infrastructure
### PG Schema Fixture
所有涉及 PostgreSQL 的测试用 `@pytest.mark.postgres` 标记。`conftest.py` 提供 `bitable_db` fixture
- 每个测试函数前 `DROP SCHEMA bitable CASCADE` + 重新 `init_bitable_db()`,保证隔离
- 测试后自动清理teardown
- PG 不可用时 `pytest.skip("PostgreSQL not available")`,不报错
**文件**`tests/unit/bitable/conftest.py`(新建,`bitable_db` fixture + `bitable_service` fixture
### 测试标记
| 标记 | 用途 | 命令 |
|------|------|------|
| `@pytest.mark.postgres` | 需要 PG 的测试 | `pytest -m postgres` |
| `@pytest.mark.integration` | 端到端集成测试 | `pytest -m integration` |
| 无标记 | 纯单元测试公式解析、Pydantic 模型等) | `pytest -m "not postgres and not integration"` |
### HTTP Mock 策略
BitableTool 测试U4不真实调用 HTTP——用 `httpx.MockTransport``unittest.mock.patch` mock bitable REST API 响应:
- Happy pathmock 返回 202 + 正常数据
- Error pathmock 返回 503 / 401 / 500
- 批量分块mock 验证请求被正确分块
**不 mock 的场景**U2 路由测试用 FastAPI `TestClient` 真实调用(但 service 层可 mock repository
### 工厂 Fixture
`conftest.py` 提供工厂函数快速创建测试数据:
- `make_table(name="test_table", ...)` → 创建表 + 返回 table_id
- `make_field(table_id, name="f1", field_type="text", owner="agent", ...)` → 创建字段 + 返回 field_id
- `make_record(table_id, values={"f1": "value1", ...})` → 创建记录 + 返回 record_id
- `make_formula_field(table_id, name="calc", formula_expr="=SUM({f1})", ...)` → 创建公式字段
### 并发测试
U2/U3 的并发场景(并发 upsert、并发入队`asyncio.gather` 并发执行多个操作,验证:
- 并发 upsert 同主键不同列 → 行级锁,最终两列都更新
- 并发入队同 (record_id, field_id) → 唯一索引去重,队列中只有一个任务
### 异步生成器安全测试
若 recalc worker 用 `async def` + `yield`,测试验证 `return; yield` 模式(项目规则要求):
- worker 在队列为空时 `return` → 不触发 `'async for' requires __aiter__` 错误
- worker 正常消费 → yield 每个任务
---
## Risks & Dependencies
### 风险
| 风险 | 影响 | 缓解 |
|------|------|------|
| 公式引擎循环引用/跨表引用边界 case 多 | 重算错误或死循环 | DFS 循环检测 + 超时保护 + v1 限制跨表引用仅 lookup只读引用不形成环 |
| 公式 AST 安全约束遗漏 | 代码注入(`__import__`/`eval` | KTD7 白名单 AST walker + 安全测试覆盖(`=__import__`/`=eval`/`=lambda` 用例) |
| vxe-table 虚拟滚动 + 可编辑 + 自定义列三者集成复杂 | 前端开发周期拉长 | U5a 前先做最小原型验证(单表 + 1万行 + 编辑) |
| vxe-table 与 Ant Design Vue CSS 冲突 | 样式互相污染 | KTD10 CSS 隔离策略(`.bitable-grid-scope` 包裹 + token 对齐) |
| PostgreSQL 部署依赖 | 无 PG 环境无法用 bitable | lifespan 降级处理(参考 calendar 模式CLI doctor 检查 PG 可用性 |
| Agent 通过 HTTP 调用 bitable 的延迟 | 采集写入慢 | 本地 HTTP 延迟可忽略;批量写入用批量端点 + 分块U4 BATCH_SIZE=500 |
| JSONB 记录存储在 100k 行时的查询性能 | 分页/筛选慢 | GIN 索引 + cursor 分页KTD9+ v3 评估列式存储 |
| Recalc worker 崩溃后任务卡在 calculating | 公式永远不更新 | U3 崩溃恢复(启动时重置 calculating→pending+ reaper 定时清理 |
| 内部令牌泄露 | 未授权访问 bitable API | 令牌存 `agentkit.yaml`(不进 git未来升级 mTLSKTD11 ponytail 注明) |
| Schema 迁移失败 | bitable 不可用 | U1 采用 `_SCHEMA_VERSION` 模式,迁移在事务中执行,失败回滚 |
### 依赖
- PostgreSQL 已配置且可连接bitable 不像 calendar/documents 有 SQLite 回退)
- 现有文件上传基础设施(`data/uploads/` + 上传端点模式)可复用
- Agent 工具系统(`src/agentkit/tools/base.py`)可扩展
- vxe-table npm 包可安装MIT无许可证风险
---
## System-Wide Impact
- **部署**bitable 要求 PostgreSQL与 calendar/documents 的 SQLite 零依赖不同)。部署文档需注明 PG 是 bitable 的硬依赖。`agentkit.yaml` 需新增 `bitable.internal_token` 配置。
- **Agent 能力**Agent 获得"结构化数据落地"能力,可通过 BitableTool 把采集结果持久化为可编辑表格。这改变了 Agent 输出的形态(从纯文本/文件 → 持久化结构化数据)。
- **前端**:新增 `/agent/bitable` 路由(全屏布局,非象限),引入 vxe-table 依赖(与现有 Ant Design Vue 并存KTD10 CSS 隔离)。前端工作量从 1 个单元扩展到 3 个单元U5a/U5b/U5c
- **CLI**:新增 `agentkit bitable` 子命令组。
- **数据库**:新增 `bitable` schema6 张表含 `bitable_meta`),不影响现有 schema。采用 `_SCHEMA_VERSION` 迁移机制。
---
## Open Questions
- ~~**公式列的"计算中"状态前端感知方式**~~**已解决**——U5a 采用 2s 间隔轮询,公式字段全 done/error 后停止。WebSocket 推送延后到 v2。
- **数据库导入的字段类型映射完整性**v1 覆盖常见类型int/varchar/text/timestamp/bool/decimal复杂类型json/array/enum如何处理实现时按需扩展。
- **多用户并发编辑同一记录**v1 不做多人实时协作v3但 Agent 写入和用户编辑可能同时发生。v1 用乐观锁(`updated_at` 版本号)还是 last-write-wins倾向 last-write-winsupsert 语义已保证用户列不被覆盖,冲突面小)。实现时确认。
- **bitable 全屏路由的 layout 实现**`meta.panel = 'full'` 需要在 AgentLayout 中新增 full panel 类型,还是为 bitable 用独立 layout脱离 AgentLayout实现时确认——倾向独立 layoutbitable 是伴生服务UI 独立性更好)。
- **公式语法校验端点**U5b 提到 `POST /bitable/fields/validate-formula` 端点用于前端实时校验。这个端点在 U2 还是 U3 实现?倾向 U3公式引擎在 U3U2 只做 CRUD。
---
## Sources & Research
- **本地模式研究**`src/agentkit/server/app.py`(路由注册 + lifespan、`src/agentkit/server/auth/models.py`SQLAlchemy 2 模式)、`src/agentkit/evolution/pg_store.py`PostgreSQL 模式)、`src/agentkit/cli/task.py`CLI 模式)、`src/agentkit/server/frontend/src/api/calendar.ts`API 客户端模式)、`src/agentkit/server/frontend/src/stores/calendar.ts`Pinia store 模式)、`src/agentkit/tools/calendar_tool.py`(工具实现模式)
- **公式引擎选型**HyperformulaGPLv3/商业双授权商业需付费、formulasEUPL 1.1、pycelGPL-3.0不推荐、UniverApache-2.0,升级路径)。结论:自研避免许可证风险。
- **网格组件选型**vxe-tableMITVue 3 原生首选、ag-grid CommunityMIT但分组/透视需 Enterprise 付费、Handsontable商业付费、UniverApache-2.0完整套件备选。结论vxe-table。
- **公式重算算法**:业界标准为"标记脏 cell → 拓扑序增量重算"Luckysheet/Univer/HyperFormula 均采用此模式。

View File

@ -0,0 +1,521 @@
---
title: "feat: 长程任务可靠性优化 — 中间件管道、循环检测、并发限制、检查点、状态卸载"
status: active
date: 2026-06-24
type: feat
origin: "DeerFlow 2.0 SuperAgent Harness 架构对比分析2026-06-24 对话)"
---
## Summary
基于 DeerFlow 2.0(字节跳动开源 SuperAgent Harness74K+ Stars的架构对比分析为 AgentKit 补齐长程任务(持续数分钟到数小时)可靠性的 5 个关键缺口:
1. **循环检测LoopDetection** — ReAct 循环内滑动窗口 hash 检测重复工具调用,消除 ~30% 的 token 浪费。
2. **子代理并发限制SubagentLimit** — Expert Team 同层并行阶段加 Semaphore避免 LLM 限流洪峰。
3. **主动压缩触发Headroom 压缩)** — 基于 token 用量预测主动触发压缩,避免单次请求超限。
4. **SharedWorkspace Redis 化 + 状态卸载** — 阶段输出主动卸载到 Redis/磁盘,上下文只保留摘要,长程任务 token 降 50%+。
5. **中间件管道架构** — 统一中间件协议(洋葱模型),将散落的横切关注点(压缩/计量/安全/循环检测)集中化,并行接入验证后移除旧路径。
6. **Pipeline 检查点与断点续跑** — 阶段级 checkpoint崩溃后可从最后完成阶段恢复。
7. **技能加载审计** — 确认 disclosure_level 运行时是否真正按需加载,补齐渐进式加载缺口(若存在)。
明确**不**做:全盘迁移 LangGraph自研架构保持灵活、重写现有编排逻辑、Docker 沙箱默认引入仅文档化边界、IM 原生集成标记为可选插件、ACP 集成MCP 已覆盖)。
## Problem Frame
DeerFlow 2.0 的核心能力是让 Agent 自主执行持续数小时的复杂工作流而不"迷失"。其工程基础是 14 层有序中间件管道(洋葱模型)+ 文件系统优先的状态管理 + 沙箱隔离 + 循环检测 + 子代理并发限制。
对照 AgentKit 现状:
| DeerFlow 能力 | AgentKit 现状 | 差距 |
|---|---|---|
| 14 层有序中间件管道 | 横切关注点硬编码在 `ReActEngine._execute_loop``execute_stream` 内,无统一管道 | **缺失** |
| LoopDetectionMiddleware滑动窗口 hash | ReAct 循环无循环检测,只有 `max_steps` 兜底 | **缺失** |
| SubagentLimitMiddlewaremax 3/轮) | `asyncio.gather` 同层全部并行,无 Semaphore | **缺失** |
| SummarizationMiddleware主动触发 | 压缩在 ReAct 内被动调用,阈值 8000 tokens 硬编码 | **部分**(被动 vs 主动) |
| 文件系统优先状态管理 | SharedWorkspace 默认内存 dict不持久化 | **缺失** |
| Checkpoint节点级断点续跑 | PipelineStateManager 只写不读,无 resume 逻辑 | **缺失** |
| 技能渐进式加载(先元数据,触发时加载完整) | disclosure_level 字段存在但运行时不使用,启动时一次性全量加载 | **部分**(字段有,逻辑无) |
关键洞察AgentKit 的自研编排(拓扑排序 + Board 辩论 + 4 层记忆 + 自演化)比 DeerFlow 更丰富,但**长程任务护栏**(循环检测、并发限制、统一中间件、检查点)是明显短板。补齐这些护栏是让 AgentKit 也能支撑"数小时自主执行"的关键。
## Requirements
- **R1**ReAct 循环(非流式 `_execute_loop` + 流式 `execute_stream`)内加入循环检测,滑动窗口检测最近 N 步的 `(tool_name, arguments_hash)` 重复,触发时注入"你正在重复调用 X请改变策略"系统消息而非直接中断。
- **R2**`TeamOrchestrator` 同层并行阶段(`asyncio.gather`)加 `asyncio.Semaphore`,默认 max 3 并发,可配置;辩论阶段并行同样加限制。
- **R3**`ContextCompressor` 包装为基于 headroom 预测的主动压缩——当 token 用量 / 模型上限 > 80% 时主动触发压缩,而非固定阈值 8000。
- **R4**`SharedWorkspace` 在 ExpertTeam 创建时传入 Redis client复用 `app.state.working_redis_client`),阶段输出主动写入 Redis上下文中只保留摘要 + 引用路径。
- **R5**:新增 `core/middleware.py`,定义 `Middleware` 协议(`before`/`after`+ `MiddlewareChain`(洋葱模型,有序执行);将现有 ContextCompressor、终端安全、循环检测包装为中间件并行接入新旧共存验证后移除旧路径。
- **R6**:新增 `orchestrator/checkpoint.py`,实现 `PipelineCheckpoint`阶段级Redis 存储,键 `agentkit:pipeline:checkpoint:{plan_id}:{phase_id}`TTL 7 天);`TeamOrchestrator` 在阶段完成后 save checkpoint新增 `resume(plan_id)` 方法从 checkpoint 恢复。
- **R7**:审计 `SkillRegistry` 运行时加载路径,确认 `disclosure_level` 是否真正按需加载;若非,在 `build_skill_system_prompt` 中实现 Level 0 概要注入 + ReAct 循环内 `skill_search` 工具按需加载完整内容。
- **R8**:所有优化项配置化,新增 `agentkit.yaml``pipeline` 节(`checkpoint.enabled`、`max_concurrent_phases`、`loop_detection.window_size` 等),遵循 `ServerConfig.from_dict` 模式。
- **R9**每个优化项附最小自检测试ponytail 规则),参考 `test_pipeline_state.py``TestPipelineStateRedis` 类模式AsyncMock + 直接注入 mock
## Key Technical Decisions
### KTD1中间件管道并行接入不直接替换现有请求流
**决策**:中间件管道与现有 `ReActEngine._execute_loop` / `execute_stream` 内的横切逻辑并行运行,通过 feature flag 控制(`pipeline.middleware.enabled`),验证稳定后再移除旧路径。
**理由**:直接替换风险高——`_execute_loop`(行 281-831`execute_stream`(行 833-1504各有 500+ 行横切逻辑压缩、trace、memory、telemetry深度交织。并行接入允许灰度验证单点故障不影响现有功能。
**代价**:短期内代码重复(新旧路径共存),工作量翻倍。可接受——长程任务可靠性是核心能力,不值得为省工作量冒回归风险。
### KTD2循环检测不直接中断注入纠正消息
**决策**:循环检测触发时,不抛异常或 break 循环,而是向 conversation 注入系统消息"你正在重复调用 {tool_name},请改变策略",给 LLM 一次自我纠正机会。连续 2 次检测仍重复则强制中断。
**理由**DeerFlow 的 LoopDetectionMiddleware 也是先警告后中断。直接中断会丢失已有上下文,且 LLM 可能只是"卡顿"而非真正死循环(如等待外部资源)。纠正消息是最低成本的恢复机制。
**代价**:多消耗 1-2 轮 LLM 调用。可接受——比无限循环烧 token 好。
### KTD3检查点粒度为阶段级Phase-level不做节点级
**决策**`PipelineCheckpoint` 只在 `PlanPhase` 完成后保存(`_execute_execution_phase` 行 633 后),保存 `plan_id`、`phase_id`、`phase.result`、`plan.topological_sort()` 当前进度。不做 ReAct 循环内的节点级 checkpoint。
**理由**:节点级 checkpoint类似 LangGraph需要状态模式强约束与 AgentKit 的 `dict[str, Any]` 上下文模型冲突,实现复杂度高。阶段级已满足"崩溃后从最后完成阶段恢复"的核心需求,覆盖 90% 的长程任务恢复场景。
**代价**:崩溃在阶段中间时,该阶段需重跑。可接受——单阶段通常 1-5 分钟,重跑成本低于全量重跑。
### KTD4SharedWorkspace Redis 化,但不引入磁盘文件系统
**决策**`ExpertTeam` 创建时传入 Redis client复用 `app.state.working_redis_client`),阶段输出写入 Redis。不引入 DeerFlow 式的磁盘文件系统(`/mnt/skills/`、`/workspace/`)。
**理由**AgentKit 已有 Redis 基础设施(`PipelineStateRedis` 的 `_safe_redis_call` + Memory fallback 模式成熟),复用成本最低。磁盘文件系统需要容器化部署配合,违反 ponytail"不引入新依赖")。
**代价**Redis 存储有 TTL 限制7 天),不适合永久归档。可接受——阶段输出是中间产物,永久归档应由 EpisodicMemoryPG+pgvector负责。
### KTD5技能渐进式加载——先审计按需实现
**决策**U5 首先是审计单元——确认 `disclosure_level` 在运行时是否被使用。若已按需加载,该单元无代码产出(仅审计报告)。若未按需加载,在 `build_skill_system_prompt` 实现 Level 0 概要注入 + ReAct 循环内新增 `skill_search` 工具。
**理由**:研究显示 `disclosure_level` 字段存在但运行时默认为 1全量加载疑似未真正按需加载。但需审计确认后再决定是否实现避免过度工程。
**代价**U5 可能无代码产出。可接受——审计结论本身有价值(确认差距是否存在)。
---
## Implementation Units
### U1. 循环检测LoopDetection
**Goal**:在 ReAct 循环内加入滑动窗口 hash 检测,识别重复工具调用模式,触发纠正消息。
**Requirements**R1
**Dependencies**:无(独立优化项)
**Files**
- `src/agentkit/core/react.py`(修改:`_execute_loop` 行 388-392 + `execute_stream` 行 948-952 插入检测逻辑)
- `tests/unit/test_react_engine.py`(修改:新增 `TestLoopDetection` 类)
**Approach**
- 在 `ReActEngine.__init__` 增加 `_loop_window: deque = deque(maxlen=5)``_loop_threshold: int = 2`
- 检测逻辑:每步工具调用后,计算 `(tool_name, json.dumps(arguments, sort_keys=True))` 的 hash存入 `_loop_window`
- 若窗口内同一 hash 出现次数 ≥ `_loop_threshold`,向 conversation 注入系统消息
- 连续 2 次检测仍重复,抛 `LoopDetectedError`(新增异常,继承 `TaskCancelledError`
- 插入点:`react.py` 行 388`step += 1` 后)、行 948流式对应位置
- 配置化:`loop_detection.window_size`、`loop_detection.threshold`agentkit.yaml `pipeline` 节)
**Patterns to follow**`cancellation_token.check()`(行 391的协作式检查模式——检测但不阻塞给 LLM 纠正机会。
**Test scenarios**
- **Happy path**正常工具调用不触发检测5 步内不同工具调用,无系统消息注入
- **Edge case**:连续 3 次相同工具 + 相同参数 → 第 2 次注入纠正消息,第 3 次抛 `LoopDetectedError`
- **Edge case**:相同工具不同参数 → 不触发检测(参数 hash 不同)
- **Error path**`LoopDetectedError` 被 `_execute_loop` 的 try/except 捕获,返回 `ReActResult``error` 字段
- **Integration**:流式路径 `execute_stream` 同样触发检测,通过 SSE 事件 `step` 广播纠正消息
**Verification**:运行 `python3 -m pytest tests/unit/test_react_engine.py::TestLoopDetection -x -q` 全部通过;模拟重复调用场景验证检测生效。
---
### U2. 子代理并发限制SubagentLimit
**Goal**Expert Team 同层并行阶段加 Semaphore避免 LLM 限流洪峰。
**Requirements**R2
**Dependencies**:无(独立优化项)
**Files**
- `src/agentkit/experts/orchestrator.py`(修改:`__init__` 行 75-84 + `execute` 行 205-208 + `_execute_debate_phase` 行 949-955
- `tests/unit/experts/test_team_orchestrator.py`(修改:新增 `TestConcurrencyLimit` 类)
**Approach**
- `TeamOrchestrator.__init__` 增加 `max_concurrent_phases: int = 3`,创建 `self._phase_semaphore = asyncio.Semaphore(max_concurrent_phases)`
- 行 205-208 的 `asyncio.gather` 改为包裹 `_bounded_execute(phase)`
```
async def _bounded_execute(phase):
async with self._phase_semaphore:
return await self._execute_phase(phase, plan)
```
- 辩论阶段行 949-955 同样包裹 `_bounded_debate_arg(expert)`
- 配置化:`pipeline.max_concurrent_phases`agentkit.yaml默认 3
- `MAX_PHASES = 10`(行 67保持不变作为阶段总数上限
**Patterns to follow**`chat.py` 行 761 `_MAX_CONCURRENT_TASKS = 4` 的 per-session 并发限制模式。
**Test scenarios**
- **Happy path**3 个阶段同层全部并行执行semaphore 不阻塞)
- **Edge case**5 个阶段同层,验证最多 3 个同时执行(用 `asyncio.Event` 同步验证并发数)
- **Edge case**`max_concurrent_phases=1` 时退化为串行执行
- **Error path**:某阶段失败不影响 semaphore 释放(`async with` 保证释放)
- **Integration**:辩论阶段 4 个专家并行,验证 semaphore 限制同样生效
**Verification**:运行 `python3 -m pytest tests/unit/experts/test_team_orchestrator.py::TestConcurrencyLimit -x -q` 全部通过。
---
### U3. 主动压缩触发Headroom 压缩)
**Goal**:基于 token 用量预测主动触发压缩,避免单次请求超限。
**Requirements**R3
**Dependencies**:无(独立优化项)
**Files**
- `src/agentkit/core/compressor.py`(修改:新增 `HeadroomCompressionMiddleware` 类)
- `src/agentkit/core/react.py`(修改:`_should_compress` 行 1652-1663 改为基于 headroom 预测)
- `tests/unit/test_react_engine.py`(修改:新增 `TestHeadroomCompression` 类)
**Approach**
- `ContextCompressor` 新增 `model_context_limit: int` 参数(默认 128000从 LLMConfig 读取)
- `_should_compress` 改为:`estimate_tokens(conversation) / model_context_limit > 0.8`
- 保留固定阈值 8000 作为下限(小模型场景)
- 压缩触发后,将压缩前的 conversation checkpoint 到 SharedWorkspace与 U4 协同)
- 配置化:`pipeline.compression.headroom_threshold`(默认 0.8)、`pipeline.compression.min_tokens`(默认 8000
**Patterns to follow**:现有 `_should_compress`(行 1652-1663的阈值判断模式只是阈值来源从硬编码改为 headroom 比例。
**Test scenarios**
- **Happy path**conversation 100K tokensmodel_limit 128K → 触发压缩100K/128K = 0.78 > 0.8? 否,不触发;调整测试数据为 110K → 0.86 > 0.8 触发)
- **Edge case**conversation 5K tokensmodel_limit 128K → 不触发(低于 min_tokens 8000
- **Edge case**model_limit 8K小模型conversation 7K → 触发7K/8K = 0.875 > 0.8
- **Error path**:压缩器不可用(`is_available()=False`)→ 跳过压缩,记录 warning
**Verification**:运行 `python3 -m pytest tests/unit/test_react_engine.py::TestHeadroomCompression -x -q` 全部通过。
---
### U4. SharedWorkspace Redis 化 + 状态卸载
**Goal**ExpertTeam 创建时传入 Redis client阶段输出主动写入 Redis上下文只保留摘要。
**Requirements**R4
**Dependencies**:无(独立优化项,但与 U6 中间件管道协同)
**Files**
- `src/agentkit/experts/team.py`(修改:`__init__` 行 67 接收 `redis_client` 参数)
- `src/agentkit/server/routes/chat.py`(修改:`_execute_team_collab` 行 407-410 传入 `app.state.working_redis_client`
- `src/agentkit/experts/orchestrator.py`(修改:`_execute_execution_phase` 行 633 后增加上下文摘要替换)
- `tests/unit/test_orchestrator.py`(修改:新增 `TestSharedWorkspaceRedis` 类)
**Approach**
- `ExpertTeam.__init__` 增加 `redis_client: Any = None` 参数,传给 `SharedWorkspace(redis_client=redis_client)`
- `chat.py` 行 407 `ExpertTeam(...)` 传入 `redis_client=app_state.working_redis_client`
- `orchestrator.py` 行 633 `workspace.write` 后,将 `phase.result` 的完整内容写入 Redis上下文中只保留摘要 + 引用路径:
```
summary = result.get("content", "")[:500] + "..."
ref_key = f"{plan.id}/phase/{phase.id}/output"
# 后续阶段读取依赖输出时,从 Redis 读取完整内容
```
- `orchestrator.py` 行 496-501 读取依赖输出的逻辑改为:先从内存 `plan.phases` 读,若不存在从 Redis `workspace.read`
**Patterns to follow**`PipelineStateRedis._safe_redis_call`(行 183-223的 Redis + Memory fallback 模式——Redis 失败时降级到内存 dict不阻断执行。
**Test scenarios**
- **Happy path**阶段完成后Redis 中存在 `{plan_id}/phase/{phase_id}/output` 键,值为完整 result
- **Edge case**Redis 不可用 → 降级到内存 dict执行不中断`_safe_redis_call` 模式)
- **Edge case**:后续阶段读取依赖输出,内存无(模拟崩溃恢复)→ 从 Redis 读取成功
- **Integration**5 阶段流水线,验证上下文中只保留摘要,完整输出在 Redis
**Verification**:运行 `python3 -m pytest tests/unit/test_orchestrator.py::TestSharedWorkspaceRedis -x -q` 全部通过;手动验证 Redis 中存在阶段输出键。
---
### U5. 技能加载审计与渐进式加载
**Goal**:审计 `disclosure_level` 运行时使用情况;若未按需加载,实现 Level 0 概要注入 + `skill_search` 工具。
**Requirements**R7
**Dependencies**:无(审计单元,可能无代码产出)
**Files**
- `src/agentkit/skills/registry.py`(审计:`get` 方法 + 运行时调用路径)
- `src/agentkit/chat/skill_routing.py`(审计 + 可能修改:`build_skill_system_prompt` 行 101-126
- `src/agentkit/core/react.py`(可能修改:新增 `skill_search` 工具,参考 `_maybe_add_tool_search` 行 1600-1623
- `tests/unit/test_skill_routing.py`(可能新增:`TestProgressiveSkillLoading` 类)
**Approach**
- **审计阶段**
1. 确认 `SkillLoader.load_from_skill_md``loader.py` 行 89`disclosure_level` 默认值
2. 确认 `build_skill_system_prompt``skill_routing.py` 行 101-126是否根据 `disclosure_level` 动态选择 sections
3. 确认 ReAct 循环内是否有 `skill_search` 工具(类似 `tool_search`
4. 输出审计报告:`disclosure_level` 是否真正按需加载
- **实现阶段(若审计确认未按需加载)**
1. `build_skill_system_prompt` 根据 `disclosure_level=0` 只注入 name + description
2. ReAct 循环内新增 `skill_search(query: str)` 工具,从 SkillRegistry 检索匹配 skill 的完整内容
3. LLM 调用 `skill_search` 后,完整 skill 内容注入到 conversation
- **若审计确认已按需加载**:该单元仅输出审计报告,无代码改动
**Patterns to follow**`_maybe_add_tool_search``react.py` 行 1600-1623`tool_search` 工具实现模式——动态注入工具 schemaLLM 按需调用。
**Test scenarios**(若实现阶段触发):
- **Happy path**`disclosure_level=0` 时system_prompt 只含 skill name + description
- **Happy path**LLM 调用 `skill_search("research")` → 返回 research skill 的完整 instructions
- **Edge case**`skill_search` 查询无匹配 → 返回"无匹配 skill"提示
- **Edge case**`disclosure_level=1`(默认)→ 行为与现状一致(全量加载),向后兼容
**Verification**:审计报告完成;若实现,运行 `python3 -m pytest tests/unit/test_skill_routing.py::TestProgressiveSkillLoading -x -q` 全部通过。
---
### U6. 中间件管道架构
**Goal**:定义统一中间件协议(洋葱模型),将横切关注点集中化,并行接入验证后移除旧路径。
**Requirements**R5
**Dependencies**U1循环检测迁移为中间件、U3压缩迁移为中间件
**Files**
- `src/agentkit/core/middleware.py`(新建:`Middleware` 协议 + `MiddlewareChain` + `RequestContext`
- `src/agentkit/core/react.py`(修改:`execute` / `execute_stream` 入口接入中间件链feature flag 控制)
- `src/agentkit/server/routes/chat.py`修改ReActEngine 创建时注入中间件链)
- `tests/unit/test_middleware.py`(新建:`TestMiddlewareChain` 类)
**Approach**
- **新建 `core/middleware.py`**
```
class RequestContext: # 请求上下文,贯穿中间件链
conversation, tools, system_prompt, trajectory, step, token_usage, ...
class Middleware(Protocol):
async def before(self, ctx: RequestContext) -> RequestContext: ...
async def after(self, ctx: RequestContext, result: Any) -> Any: ...
class MiddlewareChain:
def __init__(self, middlewares: list[Middleware]): ...
async def execute(self, ctx: RequestContext, handler: Callable) -> Any:
# 洋葱模型before 由外到内after 由内到外
```
- **初始中间件集**精简版6 层,非 DeerFlow 的 14 层):
1. `ThreadDataMiddleware` — 初始化 RequestContext
2. `SummarizationMiddleware` — 包装现有 ContextCompressorU3 的 headroom 压缩)
3. `TokenUsageMiddleware` — token 计量
4. `LoopDetectionMiddleware` — 包装 U1 的循环检测
5. `SubagentLimitMiddleware` — 包装 U2 的并发限制(作用于 Expert Team
6. `TerminalSecurityMiddleware` — 包装现有 6 层终端安全
- **并行接入**
- `ReActEngine.__init__` 增加 `middleware_chain: MiddlewareChain | None = None`
- `execute` / `execute_stream` 入口:若 `middleware_chain` 存在且 `pipeline.middleware.enabled=True`,走中间件路径;否则走现有路径
- `chat.py` 行 1066-1068 ReActEngine 创建时注入中间件链
- **迁移顺序**(验证后逐步移除旧路径):
1. 先接入 SummarizationMiddleware + TokenUsageMiddleware低风险
2. 再接入 LoopDetectionMiddleware + SubagentLimitMiddleware中风险
3. 最后接入 TerminalSecurityMiddleware高风险需充分回归
- 配置化:`pipeline.middleware.enabled`(默认 False灰度开启
**Patterns to follow**DeerFlow 的洋葱模型——`before` 由外到内、`after` 由内到外,顺序依赖通过 `@Next`/`@Prev` 装饰器声明(首版可用固定顺序列表简化)。
**Test scenarios**
- **Happy path**3 个中间件按序执行,`before` 顺序 A→B→C`after` 顺序 C→B→A
- **Edge case**:某中间件 `before` 抛异常 → 后续中间件不执行,`after` 链不触发
- **Edge case**`middleware_chain=None` → 走现有路径,行为不变(向后兼容)
- **Integration**`SummarizationMiddleware` 触发压缩后,`LoopDetectionMiddleware` 仍能检测(中间件间状态通过 RequestContext 传递)
- **Integration**feature flag 开关切换,新旧路径行为一致
**Verification**:运行 `python3 -m pytest tests/unit/test_middleware.py -x -q` 全部通过;灰度开启中间件路径,运行现有 ReAct 测试套件验证无回归。
---
### U7. Pipeline 检查点与断点续跑
**Goal**:阶段级 checkpoint崩溃后可从最后完成阶段恢复。
**Requirements**R6
**Dependencies**U4SharedWorkspace Redis 化提供存储基础)
**Files**
- `src/agentkit/orchestrator/checkpoint.py`(新建:`PipelineCheckpoint` 类)
- `src/agentkit/experts/orchestrator.py`(修改:`__init__` 注入 checkpoint + `_execute_execution_phase` 行 633 后 save + 新增 `resume` 方法)
- `src/agentkit/server/routes/tasks.py`(修改:新增 `POST /api/v1/tasks/{id}/resume` 端点)
- `tests/unit/test_pipeline_checkpoint.py`(新建:`TestPipelineCheckpoint` 类)
**Approach**
- **新建 `orchestrator/checkpoint.py`**
```
class PipelineCheckpoint:
def __init__(self, redis_client, prefix="agentkit:pipeline:checkpoint"): ...
async def save(self, plan_id: str, phase: PlanPhase, plan_status: str) -> None:
# 键:{prefix}:{plan_id}:{phase_id}
# 值JSON {phase_id, phase_name, phase_result, phase_status, plan_status, saved_at}
# TTL: 7 天(与 PipelineStateRedis._TTL_SECONDS 一致)
async def load(self, plan_id: str) -> CheckpointData | None:
# 返回最后一个 COMPLETED 阶段的 checkpoint
async def list_checkpoints(self, plan_id: str) -> list[CheckpointData]: ...
async def clear(self, plan_id: str) -> None: ...
```
- **复用 `PipelineStateRedis._safe_redis_call` 模式**Redis 失败降级到内存 dict不阻断执行
- **TeamOrchestrator 接入**
- `__init__` 增加 `checkpoint: PipelineCheckpoint | None = None`
- `_execute_execution_phase` 行 633phase COMPLETED 后)调用 `checkpoint.save(plan.id, phase, plan.status)`
- 新增 `async def resume(self, plan_id: str) -> OrchestrationResult`
1. `checkpoint.load(plan_id)` 获取最后完成阶段
2. 重建 `TeamPlan`,标记已完成阶段为 COMPLETED
3. 从下一未完成阶段继续执行
- **API 端点**`POST /api/v1/tasks/{id}/resume` → 调用 `TeamOrchestrator.resume(plan_id)`
- 配置化:`pipeline.checkpoint.enabled`(默认 False、`pipeline.checkpoint.ttl_seconds`(默认 604800
**Patterns to follow**`PipelineStateRedis``pipeline_state.py` 行 178-314的 Redis + Memory fallback + 键命名 + TTL + JSON 序列化模式。`test_pipeline_state.py` 的 `TestPipelineStateRedis` 类的测试模式AsyncMock + 直接注入 mock
**Test scenarios**
- **Happy path**`save(plan_id, phase, "executing")` 后 `load(plan_id)` 返回该 phase 的 checkpoint 数据
- **Happy path**3 个阶段完成,`load` 返回第 3 个阶段的 checkpoint最后一个 COMPLETED
- **Edge case**`load` 不存在的 plan_id → 返回 None
- **Edge case**Redis 不可用 → 降级到内存 dictsave/load 仍工作
- **Error path**`save` 时 Redis 异常 → 记录 warning不阻断阶段执行`_safe_redis_call` 模式)
- **Integration**`resume(plan_id)` 从 checkpoint 恢复,跳过已完成阶段,执行未完成阶段
- **Integration**checkpoint TTL 过期后 `load` 返回 None
**Verification**:运行 `python3 -m pytest tests/unit/test_pipeline_checkpoint.py -x -q` 全部通过;手动模拟崩溃恢复场景验证 resume 生效。
---
## Scope Boundaries
### In Scope
- ReAct 循环内循环检测U1
- Expert Team 并发限制U2
- 主动压缩触发U3
- SharedWorkspace Redis 化 + 状态卸载U4
- 技能加载审计与渐进式加载U5
- 中间件管道架构U6并行接入
- Pipeline 检查点与断点续跑U7阶段级
- 所有优化项的配置化agentkit.yaml `pipeline` 节)
- 每个优化项的最小自检测试
### Out of Scope
- 全盘迁移到 LangGraph明确不建议自研架构保持灵活
- 重写现有编排逻辑拓扑排序、Board 辩论、4 层记忆等保持不变)
- Docker 沙箱默认引入仅文档化命令级安全的边界Docker 沙箱作为可选插件未来考虑)
- IM 原生集成Telegram/Slack/Feishu标记为可选插件非本期
- ACP 集成MCP 已覆盖更通用的协议)
- 节点级 checkpointKTD3 决策,阶段级已满足核心需求)
- DeerFlow 式磁盘文件系统KTD4 决策,复用 Redis
### Deferred to Follow-Up Work
- 中间件管道的 `@Next`/`@Prev` 装饰器声明顺序依赖(首版用固定顺序列表)
- 全局 LLM 并发限制(`LLMGateway` 内 semaphore本期只做 Expert Team 层)
- ReAct conversation 的 checkpoint本期只做 Pipeline 阶段级)
- 中间件路径移除旧路径(需灰度验证 1-2 周后单独执行)
- `skill_search` 工具的语义检索(本期若实现,用关键词匹配,语义检索延后)
---
## High-Level Technical Design
### 中间件管道洋葱模型
```
请求进入 ReActEngine.execute()
MiddlewareChain.execute(ctx, handler)
┌─ ThreadDataMiddleware.before ─────────────────────────┐
│ ┌─ SummarizationMiddleware.before ────────────────┐ │
│ │ ┌─ TokenUsageMiddleware.before ─────────────┐ │ │
│ │ │ ┌─ LoopDetectionMiddleware.before ────┐ │ │ │
│ │ │ │ ┌─ TerminalSecurityMiddleware.before ┐│ │ │ │
│ │ │ │ │ handler(ctx) ← ReAct 循环 ││ │ │ │
│ │ │ │ └─ TerminalSecurityMiddleware.after ─┘│ │ │ │
│ │ │ └─ LoopDetectionMiddleware.after ────────┘ │ │ │
│ │ └─ TokenUsageMiddleware.after ─────────────────┘ │ │
│ └─ SummarizationMiddleware.after ────────────────────┘ │
└─ ThreadDataMiddleware.after ─────────────────────────────┘
返回结果
```
### 检查点恢复流程
```
TeamOrchestrator.execute(plan)
每阶段完成后: checkpoint.save(plan_id, phase, plan.status)
[崩溃发生]
用户调用 POST /api/v1/tasks/{id}/resume
TeamOrchestrator.resume(plan_id)
├─ checkpoint.load(plan_id) → 获取最后 COMPLETED 阶段
├─ 重建 TeamPlan标记已完成阶段
├─ topological_sort() → 找到下一未完成层
└─ 从下一层继续执行(与 execute 相同的并行+串行逻辑)
```
---
## Risks & Dependencies
### Risks
| 风险 | 概率 | 影响 | 缓解 |
|---|---|---|---|
| 中间件管道重构引入回归 | 中 | 高ReAct 循环是核心路径) | 并行接入 + feature flag 灰度KTD1 |
| 循环检测误判(合法重复调用被中断) | 低 | 中(用户体验) | 先警告后中断KTD2阈值可配置 |
| SharedWorkspace Redis 化后性能下降 | 低 | 中Redis 延迟 vs 内存 dict | `_safe_redis_call` fallback 到内存,异步写入 |
| 检查点序列化大对象导致 Redis 膨胀 | 中 | 低TTL 7 天自动清理) | phase.result 只存摘要 + 引用,完整内容已在 SharedWorkspace |
| U5 审计确认无需改动,单元"无产出" | 中 | 低(审计结论有价值) | KTD5 已明确接受此可能性 |
### Dependencies
- **Redis 基础设施**U4、U7 依赖 Redis 可用(已有 `app.state.working_redis_client`U4 接入)
- **现有测试套件**U6 中间件管道接入后需运行全量 ReAct/Team 测试验证无回归
- **agentkit.yaml 配置**U1-U7 均需新增 `pipeline` 配置节U8 统一处理)
---
## System-Wide Impact
### 影响方
- **终端用户**:长程任务(@team 模式)更稳定,不再因循环/限流崩溃;崩溃后可恢复
- **开发者**:新增中间件扩展点,横切逻辑可插拔;配置项增加 `pipeline`
- **运维**Redis 使用量增加checkpoint + SharedWorkspace需监控内存
- **API 消费者**:新增 `POST /api/v1/tasks/{id}/resume` 端点
### 兼容性
- 所有优化项默认关闭feature flag不影响现有行为
- `agentkit.yaml` 新增 `pipeline` 节,旧配置无此节时取默认值
- `SharedWorkspace` Redis 化后内存模式仍可用fallback
- 中间件管道 `middleware_chain=None` 时走现有路径
---
## Acceptance Examples
- **AE1**:用户发起 @team 任务5 阶段),阶段 3 因 LLM 限流失败 → 系统自动重试U2 并发限制避免洪峰),任务完成
- **AE2**:用户发起 ReAct 任务LLM 连续 3 次调用相同工具相同参数 → 第 2 次注入纠正消息LLM 改变策略任务完成U1 循环检测)
- **AE3**:用户发起 @team 任务10 阶段),阶段 7 时服务崩溃 → 用户调用 `/resume`,从阶段 7 继续任务完成U7 检查点)
- **AE4**用户发起长对话50 轮token 用量接近模型上限 → 系统主动压缩历史对话继续不中断U3 headroom 压缩)
- **AE5**:开发者新增"审计日志"横切逻辑 → 实现 `Middleware` 协议,注入 `MiddlewareChain`,无需修改 ReActEngineU6 中间件管道)
---
## Sources & Research
- **DeerFlow 2.0 架构分析**2026-06-24 对话14 层中间件管道、Lead Agent + Sub-Agent 编排、沙箱隔离、长期记忆、技能渐进式加载
- **DeerFlow 2.0 源码拆解**CSDN, 2026-05-1014 层 Middleware 严格有序、Sub-Agent 并发编排、结构化记忆
- **DeerFlow 2.0 Deep Dive**guancyxx.cn, 2026-05-07Lead Agent + Subagent 模式、SubagentLimitMiddleware、LoopDetectionMiddleware
- **AgentKit 现有架构**`react.py` ReAct 循环、`orchestrator.py` Team 编排、`pipeline_state.py` 状态持久化、`shared_workspace.py` 共享工作区
- **外部研究 load-bearing**是——DeerFlow 的中间件管道设计直接影响了 KTD1并行接入策略、KTD2循环检测不中断、U6中间件协议设计

View File

@ -0,0 +1,508 @@
---
title: "AgentKit 门户平台整体演进路线"
type: feat
date: 2026-06-24
origin: docs/brainstorms/2026-06-24-portal-platform-evolution-requirements.md
---
# AgentKit 门户平台整体演进路线
## Summary
按优先级串行推进 AgentKit 门户平台演进P1 用 LlamaIndex 构建工业级 RAG 管道 + TaskIQ 异步任务基础P2 扩展多端消息接入与 MCP Server 认证发布P3 用 LiteLLM/langchain-mcp-adapters 替换 commodity 层降本。差异化能力Agent 引擎/专家团队/自进化/终端安全)保持自研,不在本计划范围内。
## Problem Frame
AgentKit 定位为企业级统一 AI Agent 门户平台。对标 MaxKB 后发现四方面差距RAG 管道是开发者级组件非工业级产品上传端点从未调用向量化KB 存储仅内存);平台触达仅有 RAG 数据源适配器无消息适配器MCP Server 零认证且未支持 Skill/专家团队发布commodity 层大量自研维护成本高。
本次演进为预防性演进 + 必备功能补齐。目标是补齐门户平台应有的能力,使 AgentKit 在企业级 AI Agent 平台赛道具备完整竞争力。
## Requirements
### RAG 工业级管道P1
R1. 企业用户可上传文档到知识库,文档经 LlamaIndex 管道处理(解析→分段→预览→向量化→索引)后可被检索。现有 `memory/local_rag.py``LocalRAGService` 对 KB 场景废弃Agent 记忆WorkingMemory/EpisodicMemory/SemanticMemory保留不动。
R2. 知识库支持双索引检索pgvector 语义检索 + PostgreSQL 全文检索jieba 中文分词),提供 embedding/keywords/blend 三种模式。检索模式由企业用户按知识库配置默认值Agent 运行时可按查询特征覆盖。
R3. 文档分段支持智能分段与高级分段LlamaIndex IngestionPipeline企业用户可在向量化前预览分段结果只读预览编辑能力延后
R4. 系统为文档段落自动生成相关问题LLM-based参考现有 `memory/contextual_retrieval.py` 的 ContextualChunker 模式),提升检索召回率。
R5. 系统支持术语表Termbase通过 jieba 自定义词典增强中文分词,提升领域术语检索准确率。
R6. 知识库支持命中处理模式模型优化模式LLM 基于检索结果生成回答)与直接回答模式(直接返回匹配段落),按 KB 配置默认模式Agent 可按查询场景覆盖。
R7. 检索结果经 rerank 模型重排后返回API-based reranker可配置 Cohere Rerank 或 BGE-Reranker
R8. 知识库实施 per-KB 访问控制owner/authorized-users文档上传验证文件类型白名单、强制大小限制、索引前净化解析内容markdown sanitize、PDF 解析安全。Agent 检索限定于调用用户授权的知识库。
R9. 知识库元数据持久化到 PostgreSQLKB 源、文档记录、ACL重启不丢失。现有内存 `KnowledgeSourceStore` 替换为 PG 后端。
R10. 文档向量化通过 TaskIQ 异步执行(复用现有 Redis 作为 broker提供进度展示、失败通知与重试、任务历史。文档状态模型pending → parsing → segmenting → vectorizing → indexed | failed含 error_message
### 平台触达扩展P2
R11. 系统支持企微/钉钉/飞书/Slack 消息接入。各适配器验证平台签名/token飞书 encrypt_key、钉钉 token、企微 EncodingAESKey后处理消息拒绝未认证请求。
R12. 平台凭证存储于加密 DB 列master key 来自环境变量),定义轮换策略与访问审计。现有明文 `ProviderConfig.api_key` 同步迁移。
R13. MCP Server 合并至主 FastAPI app`/api/v1/mcp/` 路由),所有端点要求认证与授权(复用 `require_permission` + API Key。现有独立 `mcp/server.py` 重构为路由工厂。
R14. 企业用户/开发者可将 Skill/专家团队发布为 MCP 工具(配置:工具名称、描述、输入 schema、鉴权方式、速率限制发布需管理员级授权。外部 AI 系统通过 MCP 协议认证调用。
### 生态替换降本P3
R15. LLM Provider 底层替换为 LiteLLM6 个直接 API providerOpenAI/Anthropic/Gemini/Doubao/Wenxin/Yuanbao走 LiteLLM 统一接口。上层网关逻辑fallback/用量追踪/部门级配额)保留自研。`RemoteLLMProvider`(客户端→服务端代理)保留不动。
R16. MCP 客户端替换为 `langchain-mcp-adapters`:跟进行业协议演进,降低自研 3 传输层Stdio/HTTP/SSE维护成本。
R17. 自研语义缓存替换为 LiteLLM 内置 Redis Semantic Cache。阈值调优默认 0.87,约 13% 误命中风险)。现有 `llm/cache.py` 废弃。
## Key Technical Decisions
**KTD1: LlamaIndex 作为 RAG 管道框架。** 外部研究确认 LlamaIndex 2026 原生覆盖所有所需能力(双索引/智能分段/rerank/问题生成pgvector 一等支持。相比从零构建避免重复造轮子;相比集成 MaxKB 独立服务避免引入 Django + 独立 PG 的运维复杂度。风险LlamaIndex 频繁 breaking changes → 通过版本锁定 + 集成测试缓解。
**KTD2: TaskIQ 替代 Celery 作为异步任务队列。** 外部研究确认 Celery 对 asyncio 原生栈过度设计(引入 broker + worker + beat = 3 个新运维组件。TaskIQ 提供一等 FastAPI 集成、Redis broker 支持、asyncio 原生。R12 提前至 P1 解决文档向量化阻塞事件循环的优先级依赖冲突。ARQ 已废弃不采用。
**KTD3: KB 元数据持久化到 PostgreSQL。** 遵循 `memory/episodic.py` 的 EpisodicMemory 模式SQLAlchemy async session + PG。KB 源、文档记录、ACL 存关系表embedding 存 pgvector。替换现有内存 `KnowledgeSourceStore`
**KTD4: MCP Server 合并至主 FastAPI app。** 现有 `mcp/server.py` 独立 app 零认证是 RCE 面(终端工具存在)。合并为 `/api/v1/mcp/` 子路由,复用 `require_permission` + `APIKeyHeader` 认证。独立 `MCPServer` 类重构为路由工厂。
**KTD5: Per-KB ACL 通过新 `kb_acl` 表实现。** 遵循现有 `filter_kb_sources_by_department` 模式,新增 `kb_acl`kb_id, user_id, role: owner/viewer。Agent 检索时通过 `filter_kb_by_user_acl()` 过滤,与部门级过滤并行。
**KTD6: 应用层 jieba 分词实现中文全文检索。** PostgreSQL 内置 `tsvector` 不支持中文分词。在 Python 层用 jieba 分词后写入 `tsvector`,避免安装 PG 扩展pg_jieba/zhparser的运维复杂度也避免引入 Elasticsearch 外部搜索引擎。术语表通过 jieba 自定义词典实现。
**KTD7: LiteLLM 替换直接 providerRemoteLLMProvider 保留。** LiteLLM 原生支持 Volcengine/DoubaoWenxin/Yuanbao 通过 OpenAI 兼容端点。`RemoteLLMProvider` 是客户端→服务端代理(架构上不同于直接 API providerLiteLLM 无法替代。语义缓存/fallback/用量追踪保留自研上层逻辑,底层 provider 适配走 LiteLLM。
**KTD8: 加密 DB 列存储平台凭证。** 单部署企业场景下,加密 DB 列 + 环境变量 master key 足够。避免引入 HashiCorp Vault 外部依赖。凭证轮换通过 API 触发 re-encrypt。访问审计记录到现有审计日志。
**KTD9: 新建 `src/agentkit/rag_platform/` 顶层模块。** 与 `memory/` 职责分离:`rag_platform/` 服务企业知识库场景,`memory/` 服务 Agent 运行时记忆。`LocalRAGService` 对 KB 场景废弃Agent 记忆EpisodicMemory保留使用。
**KTD10: 成功标准从"MaxKB 功能对等"重构为"用户结果导向"。** MaxKB 是 RAG 知识库产品AgentKit 是 Agent 平台。对标不同产品类别的功能对等可能构建不服务于实际用户的能力。成功标准改为:企业用户可上传文档、配置检索、测试召回、通过多端使用 Agent 检索知识库。
## High-Level Technical Design
```mermaid
flowchart TB
subgraph P1["P1: RAG 工业级管道 + 异步任务"]
U1[U1 RAG 平台骨架] --> U2[U2 KB 持久化 + ACL]
U2 --> U3[U3 文档处理管道]
U3 --> U4[U4 双索引检索]
U4 --> U5[U5 Rerank/问题生成/术语表]
U5 --> U6[U6 命中处理 + KB 设置]
U2 --> U7[U7 上传安全 + 净化]
U3 --> U8[U8 TaskIQ 异步任务]
U6 --> U9[U9 前端 KB 管理]
U8 --> U9
end
subgraph P2["P2: 平台触达扩展"]
U10[U10 适配器骨架 + secrets] --> U11[U11 飞书 IM]
U11 --> U12[U12 钉钉/企微/Slack]
U13[U13 MCP 认证 + 合并]
U13 --> U14[U14 Skill/团队 MCP 发布]
end
subgraph P3["P3: 生态替换降本"]
U15[U15 LiteLLM Provider]
U16[U16 langchain-mcp-adapters]
U17[U17 LiteLLM 语义缓存]
end
P1 --> P2 --> P3
```
### 文档处理管道状态机
```mermaid
stateDiagram-v2
[*] --> pending: 上传
pending --> parsing: TaskIQ 接收
parsing --> segmenting: 解析成功
parsing --> failed: 解析失败
segmenting --> vectorizing: 预览确认/自动
segmenting --> failed: 分段失败
vectorizing --> indexed: 向量化+索引成功
vectorizing --> failed: 向量化失败
failed --> pending: 用户重试
indexed --> [*]
```
### 检索流程
```mermaid
flowchart LR
Q[Agent 查询] --> ACL{Per-KB ACL 过滤}
ACL -->|授权| Mode{检索模式}
Mode -->|embedding| Sem[pgvector 语义]
Mode -->|keywords| FT[PG 全文 jieba]
Mode -->|blend| Both[双索引合并]
Sem --> Rerank[rerank 模型]
FT --> Rerank
Both --> Rerank
Rerank --> Hit{命中处理}
Hit -->|model_opt| LLM[LLM 生成回答]
Hit -->|direct| Return[返回匹配段落]
LLM --> Result[检索结果]
Return --> Result
```
## Implementation Units
### P1: RAG 工业级管道 + 异步任务基础
---
### U1. RAG 平台模块骨架 + LlamaIndex 集成
- **Goal:** 创建 `src/agentkit/rag_platform/` 模块,集成 LlamaIndex 作为管道框架,连接现有 pgvector。
- **Files:**
- `src/agentkit/rag_platform/__init__.py` — 模块入口
- `src/agentkit/rag_platform/models.py` — Pydantic 数据模型KB、Document、Chunk、QueryResult
- `src/agentkit/rag_platform/pipeline.py` — LlamaIndex IngestionPipeline 封装
- `src/agentkit/rag_platform/indexing.py` — pgvector 索引管理LlamaIndex PGVectorStore
- **Patterns:** 遵循现有模块结构cf. `memory/__init__.py`, `mcp/__init__.py`LlamaIndex `PGVectorStore` 连接现有 PostgreSQL。
- **Test scenarios:**
- LlamaIndex PGVectorStore 连接现有 pgvector 扩展
- 基础 ingest文档 → chunk → embedding → pgvector INSERT端到端工作
- 基础 queryquery → embedding → pgvector cosine 检索)返回结果
- **Verification:** `pytest tests/unit/rag_platform/test_pipeline.py`
---
### U2. KB 持久化存储 + Per-KB 访问控制
- **Goal:** 替换内存 `KnowledgeSourceStore` 为 PostgreSQL 持久化,实现 per-KB ACL。
- **Files:**
- `src/agentkit/rag_platform/store.py` — KB/Document 持久化SQLAlchemy async
- `src/agentkit/rag_platform/acl.py` — per-KB ACL 逻辑
- `src/agentkit/server/auth/models.py` — 新增 `kb_acl` 表模型
- `src/agentkit/server/routes/kb_management.py` — 替换 `KnowledgeSourceStore` 调用
- **Patterns:** 遵循 `memory/episodic.py` 的 EpisodicMemory PG 模式async session遵循 `filter_kb_sources_by_department` 模式实现 `filter_kb_by_user_acl()`
- **Test scenarios:**
- KB 元数据写入 PG重启后仍存在
- owner 用户可查询自己的 KB
- 非 authorized 用户查询 KB 被拒绝
- Agent 检索时 ACL 过滤生效(仅返回授权 KB 的结果)
- **Verification:** `pytest tests/unit/rag_platform/test_store.py tests/unit/rag_platform/test_acl.py`
---
### U3. 文档处理管道LlamaIndex 分段 + 预览 + 向量化)
- **Goal:** 连接上传→解析→分段→预览→向量化→索引管道,使用 LlamaIndex IngestionPipeline。
- **Files:**
- `src/agentkit/rag_platform/document_processor.py` — 文档处理管道
- `src/agentkit/rag_platform/preview.py` — 分段预览 API
- `src/agentkit/server/routes/kb_management.py` — 重写 `upload_document()` 端点
- **Patterns:** LlamaIndex `IngestionPipeline`SentenceSplitter + MetadataExtractor现有 `memory/document_loader.py``DocumentLoader` 用于解析文档状态模型pending→parsing→segmenting→vectorizing→indexed|failed
- **Test scenarios:**
- 上传 PDF → 解析 → 分段 → 返回预览(只读)
- 确认预览 → 向量化 → 索引 → 可检索
- 解析失败 → 状态 failed + error_message
- 向量化失败 → 状态 failed + error_message
- 重复上传同一文档 → 拒绝或更新(非创建重复)
- **Verification:** `pytest tests/unit/rag_platform/test_document_processor.py`
---
### U4. 双索引检索pgvector 语义 + PG 全文检索 with jieba
- **Goal:** 实现双索引检索,支持 embedding/keywords/blend 三种模式。
- **Files:**
- `src/agentkit/rag_platform/retrieval.py` — 检索逻辑(三模式)
- `src/agentkit/rag_platform/fulltext.py` — jieba 分词 + tsvector 写入/查询
- **Patterns:** LlamaIndex hybrid retrieverVectorStoreRetriever + NLSQLRetriever 或自定义jieba 在 Python 层分词后写入 `search_vector` 列。
- **Test scenarios:**
- embedding 模式:语义检索返回相关结果
- keywords 模式:中文全文检索返回包含关键词的结果
- blend 模式:合并语义+全文结果,去重排序
- 查询无结果时返回空列表(非报错)
- **Verification:** `pytest tests/unit/rag_platform/test_retrieval.py tests/unit/rag_platform/test_fulltext.py`
---
### U5. Rerank + 问题生成 + 术语表
- **Goal:** 添加 rerank 模型重排、问题自动生成、术语表支持。
- **Files:**
- `src/agentkit/rag_platform/rerank.py` — rerank 模型集成Cohere/BGE-Reranker 可配置)
- `src/agentkit/rag_platform/question_gen.py` — LLM-based 问题生成
- `src/agentkit/rag_platform/termbase.py` — 术语表管理 + jieba 自定义词典
- **Patterns:** LlamaIndex rerankers`CohereRerank` 或 `SentenceTransformerRerank`);参考 `memory/contextual_retrieval.py` 的 ContextualChunker 模式生成问题jieba `load_userdict()` 加载术语表。
- **Test scenarios:**
- rerank 后结果相关性顺序改善
- 问题生成产生与段落内容相关的问题
- 术语表中的领域术语被正确分词
- 术语表增强后检索召回率提升
- **Verification:** `pytest tests/unit/rag_platform/test_rerank.py tests/unit/rag_platform/test_question_gen.py tests/unit/rag_platform/test_termbase.py`
---
### U6. 命中处理模式 + KB 设置
- **Goal:** 实现命中处理模式(模型优化/直接回答)+ KB 级别设置。
- **Files:**
- `src/agentkit/rag_platform/hit_processing.py` — 命中处理逻辑
- `src/agentkit/rag_platform/settings.py` — KB 设置模型(检索模式默认/命中处理默认/授权用户)
- `src/agentkit/server/routes/kb_management.py` — KB 设置端点
- **Patterns:** 模型优化模式调用现有 LLM Gateway直接回答模式返回匹配段落KB 设置存 PG。
- **Test scenarios:**
- model_opt 模式LLM 基于检索结果生成回答
- direct 模式:直接返回匹配段落
- KB 设置默认模式生效
- Agent 运行时覆盖默认模式
- **Verification:** `pytest tests/unit/rag_platform/test_hit_processing.py tests/unit/rag_platform/test_settings.py`
---
### U7. 文件上传安全 + 内容净化
- **Goal:** 后端文件类型白名单 + 内容净化。
- **Files:**
- `src/agentkit/rag_platform/sanitize.py` — 内容净化markdown sanitize + PDF 安全)
- `src/agentkit/server/routes/kb_management.py` — 上传端点增加白名单验证
- **Patterns:** `DocumentLoader._detect_format()` 映射作为白名单源;`bleach` 或 `markdown` 库净化 HTML/markdownPDF 解析限制页面数/大小。
- **Test scenarios:**
- 白名单外文件类型被拒绝(.exe, .sh 等)
- 超大小限制文件被拒绝
- markdown 中 `<script>` 标签被净化
- PDF 解析不触发已知 CVE限制解析器行为
- **Verification:** `pytest tests/unit/rag_platform/test_sanitize.py`
---
### U8. TaskIQ 异步任务集成
- **Goal:** 集成 TaskIQ 实现异步文档向量化和批量任务。
- **Files:**
- `src/agentkit/rag_platform/tasks.py` — TaskIQ 任务定义(向量化、批量索引)
- `src/agentkit/server/app.py` — TaskIQ startup/shutdown
- `src/agentkit/server/task_store.py` — 扩展状态跟踪(复用现有 TaskStore 模式)
- **Patterns:** TaskIQ FastAPI 集成(`TaskiqMiddleware`);现有 Redis 作为 broker`TaskStore` 状态模型PENDING/RUNNING/COMPLETED/FAILED
- **Test scenarios:**
- 大文档50MB PDF向量化在后台执行不阻塞事件循环
- 任务状态可查询pending→running→completed
- 任务失败后自动重试(可配置重试次数)
- 任务历史可查询
- **Verification:** `pytest tests/unit/rag_platform/test_tasks.py`
---
### U9. 前端 KB 管理扩展
- **Goal:** 扩展 KnowledgeBaseView/DocumentUpload/SearchTest 组件增加分段预览、状态展示、KB 设置。
- **Files:**
- `src/agentkit/server/frontend/src/components/knowledge/KnowledgeBaseView.vue` — 扩展
- `src/agentkit/server/frontend/src/components/knowledge/DocumentUpload.vue` — 增加状态展示
- `src/agentkit/server/frontend/src/components/knowledge/SearchTest.vue` — 扩展检索测试
- `src/agentkit/server/frontend/src/components/knowledge/SegmentPreview.vue` — 新建分段预览组件
- `src/agentkit/server/frontend/src/components/knowledge/KBSettings.vue` — 新建 KB 设置组件
- **Patterns:** 现有 Ant Design Vue 组件模式WebSocket 推送文档处理状态(复用现有 ws 协议)。
- **Test scenarios:**
- 上传后显示处理进度pending→parsing→...→indexed
- 分段预览显示分段结果(只读)
- 检索测试支持三模式切换
- KB 设置可配置检索模式默认/命中处理默认/授权用户
- **Verification:** `npm run typecheck` + 手动验证
### P2: 平台触达扩展
---
### U10. 多端消息适配器骨架 + secrets store
- **Goal:** 创建 MessageAdapter 协议 + secrets store 基础设施。
- **Files:**
- `src/agentkit/channels/__init__.py` — 模块入口
- `src/agentkit/channels/base.py` — MessageAdapter 协议receive_message/send_message/verify_signature
- `src/agentkit/channels/secrets.py` — 加密 DB 列 secrets store
- `src/agentkit/server/routes/channels.py` — 渠道管理端点
- **Patterns:**`MessageAdapter` 协议cf. `memory/adapters/base.py` 的 KBAdapter加密 DB 列AES-256master key 来自环境变量)。
- **Test scenarios:**
- secrets 写入后加密存储(非明文)
- secrets 读取时解密
- MessageAdapter 协议被所有适配器实现
- 渠道管理端点 CRUD 工作
- **Verification:** `pytest tests/unit/channels/test_secrets.py tests/unit/channels/test_base.py`
---
### U11. 飞书 IM 适配器(端到端)
- **Goal:** 实现飞书 IM 消息适配器端到端。
- **Files:**
- `src/agentkit/channels/feishu.py` — 飞书 IM 适配器
- `src/agentkit/server/routes/channels.py` — 飞书 webhook 端点
- **Patterns:** 飞书 webhook 签名验证encrypt_key + AES 解密);消息格式转换(飞书事件 → AgentKit 标准消息);现有 chat handler 集成(`RequestPreprocessor` → `ExecutionMode`)。
- **Test scenarios:**
- 飞书消息 → webhook → 签名验证 → Agent 处理 → 响应返回飞书
- 无效签名请求被拒绝
- 文本消息正确转换
- Agent 响应正确格式化返回飞书
- **Verification:** `pytest tests/unit/channels/test_feishu.py`
---
### U12. 钉钉/企微/Slack 适配器
- **Goal:** 按飞书模式实现其余平台适配器。
- **Files:**
- `src/agentkit/channels/dingtalk.py` — 钉钉适配器
- `src/agentkit/channels/wecom.py` — 企微适配器
- `src/agentkit/channels/slack.py` — Slack 适配器
- **Patterns:** 遵循 U11 飞书模式;平台特定签名验证(钉钉 token、企微 EncodingAESKey、Slack signing secret
- **Test scenarios:**
- 每个平台端到端消息流
- 每个平台签名验证拒绝无效请求
- **Verification:** `pytest tests/unit/channels/test_dingtalk.py tests/unit/channels/test_wecom.py tests/unit/channels/test_slack.py`
---
### U13. MCP Server 认证 + 合并至主 app
- **Goal:** 将 MCP Server 合并至主 FastAPI app添加认证。
- **Files:**
- `src/agentkit/mcp/server.py` — 重构为路由工厂(`create_mcp_router()`
- `src/agentkit/server/app.py` — 挂载 MCP 路由到 `/api/v1/mcp/`
- **Patterns:** `require_permission` 依赖注入;`APIKeyHeader` 认证;现有 `ToolRegistry.list_tools()` 暴露为 MCP 工具。
- **Test scenarios:**
- 无认证调用 `/api/v1/mcp/tools/list` 被拒绝401
- 有效 API Key 调用返回工具列表
- 有效 JWT + 权限调用返回工具列表
- 现有工具仍可通过 MCP 协议调用
- **Verification:** `pytest tests/unit/mcp/test_server_auth.py`
---
### U14. Skill/专家团队 MCP 发布
- **Goal:** 支持 Skill/专家团队发布为 MCP 工具。
- **Files:**
- `src/agentkit/mcp/publisher.py` — Skill/Team → MCP Tool 适配器
- `src/agentkit/server/routes/mcp_publish.py` — 发布管理端点
- `src/agentkit/mcp/server.py` — 扩展工具列表包含已发布 Skill/Team
- **Patterns:** Tool 适配器包装Skill/Team → `Tool` 接口);管理员级授权(`Permission.ADMIN`);配置字段(工具名称/描述/输入 schema/鉴权方式/速率限制)。
- **Test scenarios:**
- 管理员发布 Skill 为 MCP 工具
- 非管理员发布被拒绝
- 外部系统通过 MCP 认证调用已发布 Skill
- 专家团队发布为 MCP 工具
- 已发布工具在 `/api/v1/mcp/tools/list` 中可见
- **Verification:** `pytest tests/unit/mcp/test_publisher.py`
### P3: 生态替换降本
---
### U15. LiteLLM Provider 替换
- **Goal:** 用 LiteLLM 替换 6 个直接 API provider 适配器。
- **Files:**
- `src/agentkit/llm/providers.py` — 重写为 LiteLLM 统一接口
- `src/agentkit/llm/gateway.py` — 适配上层网关逻辑
- `src/agentkit/llm/config.py` — provider 配置更新
- **Patterns:** LiteLLM `completion()` / `acompletion()` 统一接口;保留自研 fallback 链/用量追踪/部门级配额;`RemoteLLMProvider` 保留不动。
- **Test scenarios:**
- 6 个 provider 通过 LiteLLM 调用成功OpenAI/Anthropic/Gemini/Doubao/Wenxin/Yuanbao
- fallback 链在 provider 失败时切换
- 用量追踪记录正确
- 部门级配额生效
- `RemoteLLMProvider` 仍正常工作(不受影响)
- 流式响应正常工作
- **Verification:** `pytest tests/unit/llm/test_providers.py tests/unit/llm/test_gateway.py`
---
### U16. langchain-mcp-adapters 替换 MCP 客户端
- **Goal:** 用 langchain-mcp-adapters 替换自研 MCP 客户端传输层。
- **Files:**
- `src/agentkit/mcp/client.py` — 重写为 langchain-mcp-adapters 封装
- `src/agentkit/mcp/transport.py` — 废弃Stdio/HTTP/SSE 传输由 langchain-mcp-adapters 提供)
- **Patterns:** `langchain-mcp-adapters``ClientSession` + 传输层;现有 `MCPTool` 包装保留(将远程 MCP 工具暴露为本地 `Tool`)。
- **Test scenarios:**
- 现有 MCP 工具调用通过新客户端工作
- Stdio 传输连接成功
- HTTP 传输连接成功
- SSE 传输连接成功
- `MCPTool` 包装仍正常工作
- **Verification:** `pytest tests/unit/mcp/test_client.py`
---
### U17. LiteLLM 语义缓存集成
- **Goal:** 用 LiteLLM 内置 Redis Semantic Cache 替换自研语义缓存。
- **Files:**
- `src/agentkit/llm/cache.py` — 重写为 LiteLLM 缓存配置(或废弃,由 gateway 直接配置)
- `src/agentkit/llm/gateway.py` — 集成 LiteLLM caching 配置
- **Patterns:** LiteLLM `RedisSemanticCache`;阈值调优(默认 0.87);缓存 key 包含 system prompt + temperature。
- **Test scenarios:**
- 语义相似查询命中缓存
- 不同 system prompt 不命中缓存
- 缓存命中率可统计
- 阈值调优生效
- **Verification:** `pytest tests/unit/llm/test_cache.py`
## Scope Boundaries
### In scope
- P1: RAG 工业级管道LlamaIndex 集成 + 双索引 + rerank + 问题生成 + 术语表 + 命中处理 + per-KB ACL + 上传安全 + TaskIQ 异步 + 前端扩展)
- P2: 多端消息接入(飞书/钉钉/企微/Slack+ secrets store + MCP Server 认证 + Skill/团队 MCP 发布
- P3: LiteLLM provider 替换 + langchain-mcp-adapters 客户端替换 + LiteLLM 语义缓存替换
### Deferred for later
- 分段预览编辑能力(合并/拆分/重新分段)— P1 只做只读预览
- 外部 secrets managerHashiCorp Vault / 云 KMS— P2 用加密 DB 列,单部署足够
- 分块预览的高级交互模式 — 归 ce-work 设计
- RAG 平台门户 IA 精细化(顶级 section vs 扩展现有)— P1 前端扩展先复用现有 KnowledgeBaseView
### Outside this product's identity
- Agent 引擎/专家团队/自进化/终端安全 — 差异化能力,保持自研,不在本计划范围
- FlowCanvas 工作流画布 — 不替换为 LogicFlow保持自研
- 消息总线(`bus/`)— 紧耦合 Agent 事件系统,非 commodity保持自研
- MaxKB 深度集成(共享 DB/embedding model— MaxKB 作为独立服务集成仅作备选方案,不在本计划实施
## System-Wide Impact
- **数据生命周期:** 新增 KB/Document/Chunk/kb_acl 表到 PostgreSQLpgvector 索引扩展(现有 Agent 记忆表不动);`search_vector` 列新增到 KB chunk 表。
- **认证边界:** MCP Server 从零认证变为要求 JWT/API KeyKB 检索新增 per-KB ACL 层;平台凭证从明文变为加密存储。
- **性能姿态:** 文档向量化从同步(阻塞事件循环)变为 TaskIQ 异步;检索从单索引变为双索引 + rerank增加延迟但提升相关性
- **共享基础设施:** Redis 新增 TaskIQ broker 角色(与现有 bus/cache 共存PostgreSQL 新增 RAG 平台 schema。
- **Agent/工具对等:** Agent 运行时通过 RAG 平台检索 KB 内容;外部 AI 系统通过 MCP 调用已发布 Skill/团队。
- **向后兼容:** `LocalRAGService` 对 KB 场景废弃但保留Agent 记忆仍用);`RemoteLLMProvider` 保留不动;现有 `ToolRegistry` 工具仍可通过 MCP 访问。
## Risks & Dependencies
- **LlamaIndex breaking changes:** LlamaIndex 频繁发布 breaking changes。缓解版本锁定`pyproject.toml` pin major version+ 集成测试覆盖核心管道。
- **jieba 中文分词质量:** jieba 默认词典可能不覆盖领域术语。缓解:术语表通过 `jieba.load_userdict()` 扩展;检索测试覆盖中文场景。
- **TaskIQ 成熟度:** TaskIQ 社区较小(~2k★。缓解API 简单,必要时可替换为 SAQ 或回退到 ProcessPoolExecutor。
- **LiteLLM 中文 provider 覆盖:** Wenxin/Yuanbao 通过 OpenAI 兼容端点,可能缺少 provider 特定功能。缓解feature-gap 分析在 U15 实施时执行;保留自定义 handler 作为 fallback。
- **MCP Server 合并破坏现有集成:** 合并至主 app 可能影响现有 MCP 客户端调用。缓解:保持 `/api/v1/mcp/` 路径与现有 MCP 协议兼容;迁移测试。
- **向后兼容性验证:** R15-R17 替换核心组件,"现有功能行为不变"是假设非验证。缓解:每个替换单元必须有 feature-parity 测试(现有行为 → 新实现行为对比)。
- **GPL v3 合规边界:** MaxKB 作为备选集成方案的 GPL v3 许可证。缓解:本计划不实施 MaxKB 集成(仅作备选);若未来集成,通过 REST API 独立服务调用(不修改/分发 MaxKB 代码)。
## Open Questions
1. **门户触达(P2)反转门户价值主张:** 门户平台核心价值是触达,但多端接入在 P2。是否在 P1 并行交付至少一个高价值渠道(如飞书 IM默认假设不并行P1 聚焦 RAG 管道P2 再做多端。
2. **R11-R13 是技术债非产品需求:** R15-R17原 R11-R13成功标准是"现有功能行为不变"(零用户可见影响),无 Actor 受益。是否移至独立工程债轨道?默认假设:保留在 P3作为 commodity 层降本。
3. **rerank 模型选择:** U5 rerank 模型未指定Cohere Rerank vs BGE-Reranker vs 其他。默认假设API-basedCohere Rerank 或 BGE-Reranker via Xinference可配置。
4. **多端 onboarding 流程细节:** U10-U12 的管理员配置流程webhook URL 生成、app 凭证配置、连通性测试需在实施时细化。默认假设admin 导航到渠道配置 → 选择平台 → 输入凭证 → 系统生成 webhook URL → 管理员在平台配置 → 连通性测试。
## Sources / Research
- **LlamaIndex 2026:** 14 index types, sparse+dense hybrid retrieval, auto-rerank, pgvector first-class support, LlamaParse for complex PDFs. Pitfall: frequent breaking changes. — https://blog.csdn.net/yanxilou/article/details/162178538
- **LlamaIndex vs Haystack 2026:** Architecture and decision matrix. — https://myengineeringpath.dev/tools/llamaindex-vs-haystack/
- **LiteLLM v1.89.x:** 157 providers, 2784 models. Volcengine/Doubao native; Wenxin/Yuanbao via OpenAI-compatible endpoint. Redis Semantic Cache, fallback chains, virtual keys, spend tracking built-in. P95 proxy overhead 8ms @ 1k RPS. — https://docs.litellm.ai/docs/providers
- **LiteLLM semantic caching:** Threshold tuning critical (0.87 ≈ 13% false-positive collision risk). Cache key excludes system prompt/temperature by default. — https://theneuralbase.com/litellm/learn/intermediate/semantic-caching-similar-prompts/
- **TaskIQ:** Modern async task queue with first-class FastAPI integration. ARQ deprecated — TaskIQ is spiritual successor. — https://markaicode.com/alternatives/rq-alternatives/
- **Celery vs asyncio:** Celery overkill for asyncio-native stack. Adds broker + worker + beat = 3 new operational components. — https://theneuralbase.com/celery-for-ml/learn/advanced/vs-dramatiq/
- **MaxKB v2.7.0:** 21.4k★, GPLv3, Django + Vue, ships Celery internally. OpenAI-compatible chat API. Cannot share PG/pgvector — brings its own. GPL v3 not a blocker for SaaS-style REST integration (confirmed by FSF FAQ, Chinese judicial precedent 不乱买案). — https://maxkb.cn/docs/v2/user_manual/chat_to_API/
- **GPL v3 commercial use:** Internal-use loophole, distribution trigger. SaaS/network-service via REST does not trigger copyleft. — https://legalclarity.org/can-you-use-gplv3-in-a-commercial-application/
- **AgentKit 代码库:** `memory/local_rag.py`LocalRAGService — pgvector + 分块 + 嵌入 + 语义检索)、`memory/chunking.py`TextChunker/StructuralChunker、`memory/contextual_retrieval.py`ContextualChunker — LLM 生成上下文前缀)、`mcp/server.py`(零认证 MCP Server、`mcp/client.py`3 传输层 MCP 客户端)、`llm/gateway.py`6 provider + fallback + 语义缓存 + 用量追踪)、`llm/cache.py`(自研语义缓存)、`server/routes/kb_management.py`KB 管理端点 — 上传未调用向量化)、`server/auth/`JWT + RBAC + API Key + 部门级过滤)、`server/task_store.py`(任务状态存储 — 仅状态非执行)、`bus/`MemoryBus/RedisBus
- **需求文档:** `docs/brainstorms/2026-06-24-portal-platform-evolution-requirements.md`(经两轮 ce-doc-review36 项延期至 planning

View File

@ -1,12 +1,174 @@
{ {
"version": 1, "version": 1,
"skills": { "skills": {
"ce-brainstorm": {
"source": "EveryInc/compound-engineering-plugin",
"sourceType": "github",
"skillPath": "skills/ce-brainstorm/SKILL.md",
"computedHash": "ee435337c207aac2eb75ff3b1d2f689d3e0f9a0b834e70a18b292c4419f2fa82"
},
"ce-code-review": {
"source": "EveryInc/compound-engineering-plugin",
"sourceType": "github",
"skillPath": "skills/ce-code-review/SKILL.md",
"computedHash": "ed4ea6717d72837b6364b7ba8ad0bbd26c1ef1d62f592fd0832ce80d0b74fb1e"
},
"ce-commit": {
"source": "EveryInc/compound-engineering-plugin",
"sourceType": "github",
"skillPath": "skills/ce-commit/SKILL.md",
"computedHash": "8217b388a1c6e206423313faabbb85b779dfd8b9cc28893ac7022e73b12d686e"
},
"ce-commit-push-pr": {
"source": "EveryInc/compound-engineering-plugin",
"sourceType": "github",
"skillPath": "skills/ce-commit-push-pr/SKILL.md",
"computedHash": "fc0477ae0527b917676f291ae4b72f74909848746845d7e1e291cdc250636c18"
},
"ce-compound": {
"source": "EveryInc/compound-engineering-plugin",
"sourceType": "github",
"skillPath": "skills/ce-compound/SKILL.md",
"computedHash": "8d5c2024f3ff700d16a8144efcba296b2f540e218e700a8e2cafb76fc6a57be3"
},
"ce-compound-refresh": {
"source": "EveryInc/compound-engineering-plugin",
"sourceType": "github",
"skillPath": "skills/ce-compound-refresh/SKILL.md",
"computedHash": "210ab636264392d81ed9eebb1ad109225163a954d25374a58d9e7c4824df6af1"
},
"ce-debug": {
"source": "EveryInc/compound-engineering-plugin",
"sourceType": "github",
"skillPath": "skills/ce-debug/SKILL.md",
"computedHash": "f69714bb7069d733de0601c801b8b1f9f696b9e2813420f002164d07f2aec4e4"
},
"ce-doc-review": {
"source": "EveryInc/compound-engineering-plugin",
"sourceType": "github",
"skillPath": "skills/ce-doc-review/SKILL.md",
"computedHash": "59b054eb7e8a25ba44d1a0364b872a469e30272b1b4fb6b9b0d7062330769870"
},
"ce-dogfood-beta": {
"source": "EveryInc/compound-engineering-plugin",
"sourceType": "github",
"skillPath": "skills/ce-dogfood-beta/SKILL.md",
"computedHash": "ef164695bb3a5779dfa66dd5bcb8032255eb8912b875360d38716631f1c73870"
},
"ce-ideate": {
"source": "EveryInc/compound-engineering-plugin",
"sourceType": "github",
"skillPath": "skills/ce-ideate/SKILL.md",
"computedHash": "3d91386c84afcb25e5cddd8d82b96374a416f34a8c9d739e2d85674869d496b8"
},
"ce-optimize": {
"source": "EveryInc/compound-engineering-plugin",
"sourceType": "github",
"skillPath": "skills/ce-optimize/SKILL.md",
"computedHash": "7d1021e840bfa6c37147f5bea3bc9add8dd7f24cb6350d67fb5bd2df7b25e4cc"
},
"ce-plan": {
"source": "EveryInc/compound-engineering-plugin",
"sourceType": "github",
"skillPath": "skills/ce-plan/SKILL.md",
"computedHash": "4a35c906bf49f09134dbc0422dcfe9cdef7f9036c067a7bf266fe51ce40c354e"
},
"ce-polish": {
"source": "EveryInc/compound-engineering-plugin",
"sourceType": "github",
"skillPath": "skills/ce-polish/SKILL.md",
"computedHash": "520e3d5347c3523c3c8810a92ab6a3f4dca552514c2e7c18ac81ea03def5f180"
},
"ce-product-pulse": {
"source": "EveryInc/compound-engineering-plugin",
"sourceType": "github",
"skillPath": "skills/ce-product-pulse/SKILL.md",
"computedHash": "bf15852134296d1a8bf331d41a9b8c0e5411372767e5b01ba11a1b04e26b753b"
},
"ce-promote": {
"source": "EveryInc/compound-engineering-plugin",
"sourceType": "github",
"skillPath": "skills/ce-promote/SKILL.md",
"computedHash": "280f6003ad0cdde596f5c77edfcc63823f9d5066e800d4ee9e05828609aa352f"
},
"ce-proof": {
"source": "EveryInc/compound-engineering-plugin",
"sourceType": "github",
"skillPath": "skills/ce-proof/SKILL.md",
"computedHash": "fc9c51d14ac15fce8db15e5c20469aa1a83632c68a58a4e2ba8ac5e17d2f0d8b"
},
"ce-resolve-pr-feedback": {
"source": "EveryInc/compound-engineering-plugin",
"sourceType": "github",
"skillPath": "skills/ce-resolve-pr-feedback/SKILL.md",
"computedHash": "3c100452baa3e70d9be374ebacb6dc347627c59ad9d692ba31dcdb581aa2380c"
},
"ce-riffrec-feedback-analysis": {
"source": "EveryInc/compound-engineering-plugin",
"sourceType": "github",
"skillPath": "skills/ce-riffrec-feedback-analysis/SKILL.md",
"computedHash": "83f98244f23ef3c13e34497da979281e16957c66cb459fba53a86ee12eebc138"
},
"ce-setup": {
"source": "EveryInc/compound-engineering-plugin",
"sourceType": "github",
"skillPath": "skills/ce-setup/SKILL.md",
"computedHash": "e5e4c6452730ac27e42ee0a13a724898693cced7c50edccb0becba3d99a0e4b6"
},
"ce-simplify-code": {
"source": "EveryInc/compound-engineering-plugin",
"sourceType": "github",
"skillPath": "skills/ce-simplify-code/SKILL.md",
"computedHash": "82e0e4af40d1022ba6867a81bb5e6bfedaf8f238dfb62a9543dc57dfc6412383"
},
"ce-strategy": {
"source": "EveryInc/compound-engineering-plugin",
"sourceType": "github",
"skillPath": "skills/ce-strategy/SKILL.md",
"computedHash": "511506bd9bb029b37d8e0ef7dd0e52a75d8d535d4f181219b561080d8cd8c0f7"
},
"ce-test-browser": {
"source": "EveryInc/compound-engineering-plugin",
"sourceType": "github",
"skillPath": "skills/ce-test-browser/SKILL.md",
"computedHash": "7d95df13ca6a901720e9f41b38b3e0fdf81a32b199e3e522dbe5f08cc903cbda"
},
"ce-test-xcode": {
"source": "EveryInc/compound-engineering-plugin",
"sourceType": "github",
"skillPath": "skills/ce-test-xcode/SKILL.md",
"computedHash": "d11af303f77f3fd2493fcee09afded96a1035587971a0214b6d8814a72329b64"
},
"ce-work": {
"source": "EveryInc/compound-engineering-plugin",
"sourceType": "github",
"skillPath": "skills/ce-work/SKILL.md",
"computedHash": "a77908268a8638f0971fc869128cee50ece7cc152952e10157b84f3ac0730053"
},
"ce-work-beta": {
"source": "EveryInc/compound-engineering-plugin",
"sourceType": "github",
"skillPath": "skills/ce-work-beta/SKILL.md",
"computedHash": "9bf08db9c23dcf4a50ab61cf3d116aae829fa8a66f4ac2937f08c18e265fba1d"
},
"ce-worktree": {
"source": "EveryInc/compound-engineering-plugin",
"sourceType": "github",
"skillPath": "skills/ce-worktree/SKILL.md",
"computedHash": "ed83a34f43ab07252dcb8812b8b0c15ebc57c3760694e997cd8c021c90052875"
},
"find-skills": { "find-skills": {
"source": "vercel-labs/skills", "source": "vercel-labs/skills",
"sourceType": "github", "sourceType": "github",
"skillPath": "skills/find-skills/SKILL.md", "skillPath": "skills/find-skills/SKILL.md",
"computedHash": "9e1c8b3103f92fa8092568a44fe64858de7c5c9dc65ce4bea8f168080e889cfd" "computedHash": "9e1c8b3103f92fa8092568a44fe64858de7c5c9dc65ce4bea8f168080e889cfd"
}, },
"lfg": {
"source": "EveryInc/compound-engineering-plugin",
"sourceType": "github",
"skillPath": "skills/lfg/SKILL.md",
"computedHash": "802083dbce8f8cbf3dc1d7cafabf20bdc6211d022da8eeb9a1e4dc13ec4b92c0"
},
"open-code-review": { "open-code-review": {
"source": "alibaba/open-code-review", "source": "alibaba/open-code-review",
"sourceType": "github", "sourceType": "github",

View File

@ -0,0 +1,30 @@
"""Bitable companion service — multi-dimensional table for AgentKit.
Provides structured data persistence with field ownership, upsert semantics,
formula columns, and grid views. Logically independent (own API/CLI/models/
storage), currently co-deployed, UI-level integrated.
"""
from agentkit.bitable.models import (
Field,
FieldOwner,
FieldType,
Record,
RecalcStatus,
RecalcTask,
Table,
View,
ViewType,
)
__all__ = [
"Field",
"FieldOwner",
"FieldType",
"Record",
"RecalcStatus",
"RecalcTask",
"Table",
"View",
"ViewType",
]

321
src/agentkit/bitable/db.py Normal file
View File

@ -0,0 +1,321 @@
"""PostgreSQL schema and initialization for the bitable subsystem.
Uses an independent ``bitable`` schema within the shared PostgreSQL instance.
Follows the lazy-init + lock pattern from ``evolution/pg_store.py`` and the
``_SCHEMA_VERSION`` migration pattern from ``server/auth/models.py``.
Schema versioning
-----------------
:data:`_SCHEMA_VERSION` tracks the current bitable DB schema. The
``bitable_meta`` table stores the version so subsequent restarts skip
already-applied migrations.
"""
from __future__ import annotations
import asyncio
import logging
import os
import uuid as _uuid
from datetime import datetime, timezone
from typing import Any
from sqlalchemy import (
Column,
DateTime,
Index,
String,
Text,
UniqueConstraint,
text,
)
from sqlalchemy.dialects.postgresql import JSONB
from sqlalchemy.orm import DeclarativeBase
logger = logging.getLogger(__name__)
# Current schema version — bump when adding migrations.
_SCHEMA_VERSION = 1
_META_SCHEMA_VERSION_KEY = "schema_version"
def _utcnow() -> datetime:
return datetime.now(timezone.utc)
def _uuid_str() -> str:
return str(_uuid.uuid4())
class BitableBase(DeclarativeBase):
"""Declarative base for bitable ORM models (independent schema)."""
class TableModel(BitableBase):
"""ORM model for ``bitable.bitable_tables``."""
__tablename__ = "bitable_tables"
__table_args__ = {"schema": "bitable"}
id = Column(String, primary_key=True, default=_uuid_str)
name = Column(String, nullable=False)
description = Column(Text, default="")
primary_key_field_id = Column(String, nullable=True)
owner_user_id = Column(String, nullable=True)
created_at = Column(DateTime(timezone=True), default=_utcnow)
updated_at = Column(DateTime(timezone=True), default=_utcnow, onupdate=_utcnow)
class FieldModel(BitableBase):
"""ORM model for ``bitable.bitable_fields``."""
__tablename__ = "bitable_fields"
__table_args__ = (
Index("ix_bitable_fields_table_id", "table_id"),
{"schema": "bitable"},
)
id = Column(String, primary_key=True, default=_uuid_str)
table_id = Column(String, nullable=False)
name = Column(String, nullable=False)
field_type = Column(String, nullable=False)
config = Column(JSONB, default=dict)
owner = Column(String, default="user")
created_at = Column(DateTime(timezone=True), default=_utcnow)
class RecordModel(BitableBase):
"""ORM model for ``bitable.bitable_records``.
``values`` is JSONB mapping ``{field_id: value}``. A GIN index supports
efficient JSONB key existence checks. A unique expression index on
``(table_id, values->>pk_field_id)`` enforces primary-key uniqueness
(created dynamically in ``_apply_v1_schema`` because the pk field id is
per-table, not a fixed column).
"""
__tablename__ = "bitable_records"
__table_args__ = (
Index("ix_bitable_records_table_id", "table_id"),
Index("ix_bitable_records_values_gin", "values", postgresql_using="gin"),
{"schema": "bitable"},
)
id = Column(String, primary_key=True, default=_uuid_str)
table_id = Column(String, nullable=False)
values = Column(JSONB, default=dict)
created_at = Column(DateTime(timezone=True), default=_utcnow)
updated_at = Column(DateTime(timezone=True), default=_utcnow, onupdate=_utcnow)
class ViewModel(BitableBase):
"""ORM model for ``bitable.bitable_views``."""
__tablename__ = "bitable_views"
__table_args__ = (
Index("ix_bitable_views_table_id", "table_id"),
{"schema": "bitable"},
)
id = Column(String, primary_key=True, default=_uuid_str)
table_id = Column(String, nullable=False)
name = Column(String, nullable=False)
view_type = Column(String, default="grid")
config = Column(JSONB, default=dict)
created_at = Column(DateTime(timezone=True), default=_utcnow)
class RecalcQueueModel(BitableBase):
"""ORM model for ``bitable.bitable_recalc_queue``.
The ``(record_id, field_id)`` unique index prevents duplicate enqueues.
The ``(status, queued_at)`` index supports efficient worker consumption.
"""
__tablename__ = "bitable_recalc_queue"
__table_args__ = (
UniqueConstraint("record_id", "field_id", name="uq_recalc_record_field"),
Index("ix_recalc_status_queued", "status", "queued_at"),
{"schema": "bitable"},
)
id = Column(String, primary_key=True, default=_uuid_str)
table_id = Column(String, nullable=False)
record_id = Column(String, nullable=False)
field_id = Column(String, nullable=False)
status = Column(String, default="pending")
error_message = Column(Text, nullable=True)
queued_at = Column(DateTime(timezone=True), default=_utcnow)
completed_at = Column(DateTime(timezone=True), nullable=True)
class MetaModel(BitableBase):
"""ORM model for ``bitable.bitable_meta`` — schema version tracking."""
__tablename__ = "bitable_meta"
__table_args__ = {"schema": "bitable"}
key = Column(String, primary_key=True)
value = Column(String, nullable=False)
updated_at = Column(DateTime(timezone=True), default=_utcnow, onupdate=_utcnow)
# ---------------------------------------------------------------------------
# Schema is created via BitableBase.metadata.create_all (see BitableDB.init).
# The ORM models above are the single source of truth for table/index DDL.
# ---------------------------------------------------------------------------
def _resolve_database_url(database_url: str | None = None) -> str | None:
"""Resolve PostgreSQL connection URL.
Priority: explicit arg > env ``DATABASE_URL`` > ``AGENTKIT_DATABASE_URL``.
"""
if database_url:
return database_url
return os.environ.get("DATABASE_URL") or os.environ.get("AGENTKIT_DATABASE_URL")
class BitableDB:
"""PostgreSQL connection manager for bitable (lazy init + lock).
Usage::
db = BitableDB(database_url="postgresql+asyncpg://...")
await db.init()
# ... use db.engine / db.session_factory ...
await db.close()
"""
def __init__(self, database_url: str | None = None) -> None:
self._database_url = database_url or _resolve_database_url()
self._engine: Any = None
self._session_factory: Any = None
self._initialized = False
self._init_lock = asyncio.Lock()
@property
def database_url(self) -> str | None:
return self._database_url
@property
def engine(self) -> Any:
return self._engine
@property
def session_factory(self) -> Any:
return self._session_factory
async def _ensure_initialized(self) -> None:
"""Lazy-init async engine and session factory (with lock)."""
if self._initialized:
return
async with self._init_lock:
if self._initialized:
return
if not self._database_url:
raise RuntimeError(
"No database URL configured for bitable. "
"Set DATABASE_URL or AGENTKIT_DATABASE_URL env var."
)
from sqlalchemy.ext.asyncio import AsyncSession, create_async_engine
from sqlalchemy.orm import sessionmaker
self._engine = create_async_engine(self._database_url, echo=False)
self._session_factory = sessionmaker(
self._engine, class_=AsyncSession, expire_on_commit=False
)
self._initialized = True
async def init(self) -> None:
"""Initialize engine + create schema and tables (idempotent).
Runs schema migrations based on ``_SCHEMA_VERSION`` stored in
``bitable_meta``. Safe to call on every startup.
"""
await self._ensure_initialized()
async with self._engine.begin() as conn:
# 1. Create the bitable schema (idempotent) — must precede
# metadata.create_all since tables live in this schema.
await conn.execute(text("CREATE SCHEMA IF NOT EXISTS bitable"))
# 2. Create all tables/indexes/constraints from ORM metadata.
# Uses run_sync because asyncpg doesn't support multi-statement
# text() execution; metadata.create_all emits one DDL statement
# per table and handles schema-qualified names + GIN indexes.
await conn.run_sync(BitableBase.metadata.create_all)
# 3. Check current schema version
result = await conn.execute(
text("SELECT value FROM bitable.bitable_meta WHERE key = :key"),
{"key": _META_SCHEMA_VERSION_KEY},
)
row = result.fetchone()
current_version = int(row[0]) if row else 0
# 4. Apply migrations if needed (future versions add elif blocks here)
if current_version < _SCHEMA_VERSION:
# V1: initial schema (already created above)
# Future: if current_version < 2: await _apply_v2_migration(conn)
await conn.execute(
text(
"INSERT INTO bitable.bitable_meta (key, value, updated_at) "
"VALUES (:key, :value, NOW()) "
"ON CONFLICT (key) DO UPDATE SET value = :value, updated_at = NOW()"
),
{"key": _META_SCHEMA_VERSION_KEY, "value": str(_SCHEMA_VERSION)},
)
logger.info("bitable schema migrated: v%d → v%d", current_version, _SCHEMA_VERSION)
async def close(self) -> None:
"""Close engine, release all connections."""
if self._engine is not None:
await self._engine.dispose()
self._engine = None
self._session_factory = None
self._initialized = False
async def __aenter__(self) -> "BitableDB":
await self.init()
return self
async def __aexit__(self, exc_type: Any, exc_val: Any, exc_tb: Any) -> None:
await self.close()
# Module-level singleton (initialized by app lifespan)
_db: BitableDB | None = None
async def init_bitable_db(database_url: str | None = None) -> BitableDB:
"""Initialize the bitable database (module-level singleton).
Called from ``app.py`` lifespan. On failure, raises the caller should
catch and degrade gracefully (bitable API returns 503).
"""
global _db
if _db is not None and _db.is_initialized:
return _db
_db = BitableDB(database_url=database_url)
await _db.init()
logger.info("bitable DB initialized (schema version %d)", _SCHEMA_VERSION)
return _db
def get_bitable_db() -> BitableDB | None:
"""Return the module-level bitable DB singleton (or None if not init'd)."""
return _db
async def close_bitable_db() -> None:
"""Close the module-level bitable DB singleton."""
global _db
if _db is not None:
await _db.close()
_db = None

View File

@ -0,0 +1,33 @@
"""Formula engine for the bitable subsystem.
Self-contained Python formula engine (KTD3) no external dependencies.
Submodules:
- :mod:`agentkit.bitable.formula.parser` parse formula strings to AST
- :mod:`agentkit.bitable.formula.engine` DAG, topological sort, evaluation
- :mod:`agentkit.bitable.formula.functions` built-in function registry
"""
from __future__ import annotations
from agentkit.bitable.formula.engine import (
CircularReferenceError,
FormulaEngine,
)
from agentkit.bitable.formula.functions import FUNCTION_REGISTRY
from agentkit.bitable.formula.parser import (
FormulaParseError,
FormulaSecurityError,
UnknownFunctionError,
parse_formula,
)
__all__ = [
"CircularReferenceError",
"FormulaEngine",
"FormulaParseError",
"FormulaSecurityError",
"FUNCTION_REGISTRY",
"UnknownFunctionError",
"parse_formula",
]

View File

@ -0,0 +1,293 @@
"""Formula engine — DAG, topological sort, and evaluation.
Builds a dependency graph of formula fields, detects circular references,
and evaluates formulas in topological order.
Aggregate context (KTD3):
- ``SUM({f1})`` f1 is an aggregate reference (entire column)
- ``{f1} + 1`` f1 is a row reference (current record's value)
- ``{f1} + SUM({f2})`` mixed: row f1 + column f2 sum
The engine distinguishes these by checking if a field reference appears
as a direct argument to an aggregate function.
"""
from __future__ import annotations
import ast
from collections import deque
from typing import Any
from agentkit.bitable.formula.functions import AGGREGATE_FUNCTIONS, FUNCTION_REGISTRY
from agentkit.bitable.formula.parser import (
FormulaParseError,
evaluate_ast,
parse_formula,
)
class CircularReferenceError(Exception):
"""Raised when formula fields form a circular dependency."""
class FormulaEngine:
"""Formula engine: parse, build DAG, detect cycles, evaluate.
Usage::
engine = FormulaEngine()
# Register formula fields
engine.add_formula(field_id="calc", formula="=SUM({src})", field_refs={"src"})
# Evaluate for a specific record
result = engine.evaluate(field_id="calc", field_values={"src": [1, 2, 3]})
"""
def __init__(self) -> None:
# field_id → (ast_tree, field_mapping, aggregate_refs, row_refs)
self._formulas: dict[str, _FormulaEntry] = {}
# DAG: field_id → set of field_ids it depends on
self._dag: dict[str, set[str]] = {}
def add_formula(self, field_id: str, formula: str) -> None:
"""Register a formula for a field.
Raises:
FormulaParseError: Syntax error.
FormulaSecurityError: Disallowed AST node.
UnknownFunctionError: Unregistered function.
CircularReferenceError: Adding this formula creates a cycle.
"""
tree, field_mapping = parse_formula(formula, set(FUNCTION_REGISTRY.keys()))
# Classify field refs into aggregate vs row context
aggregate_refs, row_refs = _classify_refs(tree, field_mapping)
entry = _FormulaEntry(
tree=tree,
field_mapping=field_mapping,
aggregate_refs=aggregate_refs,
row_refs=row_refs,
formula=formula,
)
self._formulas[field_id] = entry
# Update DAG: this field depends on all referenced fields
self._dag[field_id] = aggregate_refs | row_refs
# Check for cycles
cycle = _detect_cycle(self._dag)
if cycle:
# Rollback
del self._formulas[field_id]
del self._dag[field_id]
raise CircularReferenceError(f"Circular reference detected: {''.join(cycle)}")
def remove_formula(self, field_id: str) -> None:
"""Remove a formula from the engine."""
self._formulas.pop(field_id, None)
self._dag.pop(field_id, None)
# Remove edges pointing to this field
for deps in self._dag.values():
deps.discard(field_id)
def get_dependencies(self, field_id: str) -> set[str]:
"""Get the set of field IDs that ``field_id`` depends on."""
return self._dag.get(field_id, set()).copy()
def get_dependents(self, field_id: str) -> set[str]:
"""Get the set of formula field IDs that depend on ``field_id``."""
return {fid for fid, deps in self._dag.items() if field_id in deps}
def topological_order(self) -> list[str]:
"""Return all formula field IDs in topological order (Kahn's algorithm)."""
return _topological_sort(self._dag)
def evaluate(
self,
field_id: str,
row_values: dict[str, Any],
column_values: dict[str, list[Any]] | None = None,
) -> Any:
"""Evaluate a formula field for a specific record.
Args:
field_id: The formula field to evaluate.
row_values: Field ID value for the current record (row context).
column_values: Field ID list of all values in that column
(aggregate context). Required for aggregate references.
Returns:
The computed value.
Raises:
KeyError: Field ID not registered.
FormulaParseError: Field reference not found in values.
"""
if field_id not in self._formulas:
raise KeyError(f"Formula not registered: {field_id}")
entry = self._formulas[field_id]
column_values = column_values or {}
# Build the field_values dict for the evaluator
# Aggregate refs get column values (lists), row refs get row values (scalars)
eval_values: dict[str, Any] = {}
# Map real field IDs to safe names
for safe_name, real_id in entry.field_mapping.items():
if real_id in entry.aggregate_refs:
eval_values[safe_name] = column_values.get(real_id, [])
else:
eval_values[safe_name] = row_values.get(real_id)
return evaluate_ast(entry.tree, eval_values, FUNCTION_REGISTRY)
def evaluate_all_for_record(
self,
row_values: dict[str, Any],
column_values: dict[str, list[Any]] | None = None,
) -> dict[str, Any]:
"""Evaluate all registered formulas for a record.
Returns a dict of field_id computed value.
Formulas are evaluated in topological order so that formula-to-formula
dependencies are resolved correctly.
"""
results: dict[str, Any] = {}
column_values = column_values or {}
for field_id in self.topological_order():
# Include already-computed formula results in row_values
merged_row = {**row_values, **results}
try:
results[field_id] = self.evaluate(field_id, merged_row, column_values)
except (FormulaParseError, ZeroDivisionError, TypeError) as e:
results[field_id] = {"__error": str(e)}
return results
# ── Internal data structures ──────────────────────────────
class _FormulaEntry:
"""Parsed formula metadata."""
__slots__ = ("tree", "field_mapping", "aggregate_refs", "row_refs", "formula")
def __init__(
self,
tree: ast.Expression,
field_mapping: dict[str, str],
aggregate_refs: set[str],
row_refs: set[str],
formula: str,
) -> None:
self.tree = tree
self.field_mapping = field_mapping
self.aggregate_refs = aggregate_refs
self.row_refs = row_refs
self.formula = formula
# ── DAG utilities ─────────────────────────────────────────
def _detect_cycle(dag: dict[str, set[str]]) -> list[str] | None:
"""Detect a cycle in the DAG using DFS. Returns the cycle path or None."""
WHITE, GRAY, BLACK = 0, 1, 2
color: dict[str, int] = {node: WHITE for node in dag}
parent: dict[str, str | None] = {node: None for node in dag}
def _dfs(node: str) -> list[str] | None:
color[node] = GRAY
for neighbor in dag.get(node, set()):
if neighbor not in color:
color[neighbor] = WHITE
parent[neighbor] = None
if color[neighbor] == GRAY:
# Found cycle — reconstruct path
cycle = [neighbor]
current = node
while current is not None and current != neighbor:
cycle.append(current)
current = parent.get(current)
cycle.append(neighbor)
cycle.reverse()
return cycle
if color[neighbor] == WHITE:
parent[neighbor] = node
result = _dfs(neighbor)
if result is not None:
return result
color[node] = BLACK
return None
for node in dag:
if color.get(node, WHITE) == WHITE:
result = _dfs(node)
if result is not None:
return result
return None
def _topological_sort(dag: dict[str, set[str]]) -> list[str]:
"""Kahn's algorithm for topological sort."""
# Build in-degree map
in_degree: dict[str, int] = {node: 0 for node in dag}
for node, deps in dag.items():
for dep in deps:
if dep in in_degree:
in_degree[node] += 1
# Start with nodes that have no dependencies
queue = deque(node for node, degree in in_degree.items() if degree == 0)
result: list[str] = []
while queue:
node = queue.popleft()
result.append(node)
# Find all nodes that depend on this node
for other_node, deps in dag.items():
if node in deps:
in_degree[other_node] -= 1
if in_degree[other_node] == 0 and other_node not in result:
queue.append(other_node)
return result
def _classify_refs(
tree: ast.Expression, field_mapping: dict[str, str]
) -> tuple[set[str], set[str]]:
"""Classify field references into aggregate (column) and row context.
A field reference is aggregate if it appears as a direct argument to
an aggregate function (SUM/AVG/COUNT/MIN/MAX). Otherwise it's row context.
"""
aggregate_refs: set[str] = set()
row_refs: set[str] = set()
# Get all safe names → real field IDs
safe_to_real = field_mapping
class _Classifier(ast.NodeVisitor):
def visit_Call(self, node: ast.Call) -> None:
if isinstance(node.func, ast.Name) and node.func.id in AGGREGATE_FUNCTIONS:
# Direct arguments to aggregate functions are column refs
for arg in node.args:
if isinstance(arg, ast.Name) and arg.id in safe_to_real:
aggregate_refs.add(safe_to_real[arg.id])
else:
# Non-field args (e.g., literals) — visit normally
self.visit(arg)
else:
self.generic_visit(node)
def visit_Name(self, node: ast.Name) -> None:
if node.id in safe_to_real:
real_id = safe_to_real[node.id]
if real_id not in aggregate_refs:
row_refs.add(real_id)
_Classifier().visit(tree)
return aggregate_refs, row_refs

View File

@ -0,0 +1,101 @@
"""Built-in formula functions for the bitable formula engine.
v1 implements: SUM, AVG, COUNT, MIN, MAX, CONCAT, ABS, ROUND, IF, LEN.
Aggregate functions (SUM/AVG/COUNT/MIN/MAX) accept a list of values
(the entire column) and return a scalar. Non-aggregate functions
(ABS/ROUND/IF/LEN/CONCAT) operate on scalars.
The engine determines whether to pass a column (list) or scalar (row value)
based on the calling context see :mod:`agentkit.bitable.formula.engine`.
"""
from __future__ import annotations
from typing import Any, Callable
# ── Aggregate functions (operate on lists) ────────────────
def _sum(values: list[Any]) -> float | int:
"""Sum of numeric values, ignoring None/empty."""
total = 0
for v in values:
if v is None or v == "":
continue
total += v
return total
def _avg(values: list[Any]) -> float:
"""Average of numeric values, ignoring None/empty."""
nums = [v for v in values if v is not None and v != ""]
if not nums:
return 0.0
return sum(nums) / len(nums)
def _count(values: list[Any]) -> int:
"""Count of non-empty values."""
return sum(1 for v in values if v is not None and v != "")
def _min(values: list[Any]) -> Any:
"""Minimum of numeric values, ignoring None/empty."""
nums = [v for v in values if v is not None and v != ""]
if not nums:
return 0
return min(nums)
def _max(values: list[Any]) -> Any:
"""Maximum of numeric values, ignoring None/empty."""
nums = [v for v in values if v is not None and v != ""]
if not nums:
return 0
return max(nums)
# ── Scalar functions ──────────────────────────────────────
def _abs(value: Any) -> Any:
return abs(value)
def _round(value: Any, digits: int = 0) -> float:
return round(value, digits)
def _if(condition: Any, true_val: Any, false_val: Any = None) -> Any:
return true_val if condition else false_val
def _len(value: Any) -> int:
if value is None:
return 0
return len(str(value))
def _concat(*args: Any) -> str:
"""Concatenate all arguments as strings."""
return "".join(str(a) for a in args if a is not None)
# ── Registry ──────────────────────────────────────────────
# Functions that aggregate a column (receive a list of all column values)
AGGREGATE_FUNCTIONS: frozenset[str] = frozenset({"SUM", "AVG", "COUNT", "MIN", "MAX"})
FUNCTION_REGISTRY: dict[str, Callable[..., Any]] = {
"SUM": _sum,
"AVG": _avg,
"COUNT": _count,
"MIN": _min,
"MAX": _max,
"ABS": _abs,
"ROUND": _round,
"IF": _if,
"LEN": _len,
"CONCAT": _concat,
}

View File

@ -0,0 +1,311 @@
"""Formula parser — converts formula strings to safe Python AST.
KTD7 security: uses ``ast.parse`` then a restricted ``NodeVisitor`` that
only allows whitelist nodes. **Never** uses ``eval()`` / ``exec()``.
Formula syntax:
- Starts with ``=`` (stripped before parsing).
- Field references: ``{field_id}`` converted to ``Name(field_id)`` nodes.
- Arithmetic: ``+``, ``-``, ``*``, ``/``, ``%``, ``**``.
- Comparison: ``==``, ``!=``, ``<``, ``>``, ``<=``, ``>=``.
- Boolean: ``and``, ``or``, ``not``.
- String concat: ``+`` on strings, or ``CONCAT(...)``.
- Function calls: ``SUM(...)``, ``AVG(...)``, etc. (registered functions only).
- Conditional: ``IF(cond, a, b)`` or Python ``a if cond else b``.
- Literals: numbers, strings (single/double quotes).
Examples::
=1+2*3 7
=SUM({f1}) aggregate sum of column f1
={f1} + {f2} row-level sum of fields f1 and f2
=CONCAT({f1}, "-") string concatenation
"""
from __future__ import annotations
import ast
import re
from typing import Any
from agentkit.bitable.formula.functions import FUNCTION_REGISTRY
# ── Exceptions ────────────────────────────────────────────
class FormulaParseError(Exception):
"""Raised when a formula string cannot be parsed."""
class FormulaSecurityError(Exception):
"""Raised when a formula AST contains a disallowed node (KTD7)."""
class UnknownFunctionError(Exception):
"""Raised when a formula calls a function not in the registry."""
# ── Field reference substitution ──────────────────────────
# Match {field_id} — field IDs are UUIDs or alphanumeric.
_FIELD_REF_RE = re.compile(r"\{([a-zA-Z0-9_-]+)\}")
def _substitute_field_refs(formula: str) -> tuple[str, dict[str, str]]:
"""Replace ``{field_id}`` with ``_f_<safe_name>`` (a Python Name node).
Field IDs are UUIDs that may start with a digit, which is invalid in Python
identifiers. We prefix with ``_f_`` and replace hyphens with underscores.
A reverse mapping is returned so the engine can map back to real field IDs.
"""
mapping: dict[str, str] = {}
def _replace(match: re.Match[str]) -> str:
field_id = match.group(1)
# Convert UUID-style field_id to a valid Python identifier
safe_name = "_f_" + field_id.replace("-", "_")
mapping[safe_name] = field_id
return safe_name
result = _FIELD_REF_RE.sub(_replace, formula)
return result, mapping
# ── AST whitelist (KTD7) ──────────────────────────────────
_ALLOWED_NODES: frozenset[type[ast.AST]] = frozenset(
{
ast.Expression,
ast.BinOp,
ast.UnaryOp,
ast.BoolOp,
ast.Compare,
ast.Call,
ast.Name,
ast.Constant,
ast.IfExp,
ast.Load,
ast.Add,
ast.Sub,
ast.Mult,
ast.Div,
ast.Mod,
ast.Pow,
ast.USub,
ast.UAdd,
ast.Not,
ast.And,
ast.Or,
ast.Eq,
ast.NotEq,
ast.Lt,
ast.Gt,
ast.LtE,
ast.GtE,
}
)
class _SecurityVisitor(ast.NodeVisitor):
"""Visit AST nodes, reject any not in the whitelist (KTD7)."""
def __init__(self, allowed_functions: set[str]) -> None:
self._allowed_functions = allowed_functions
def generic_visit(self, node: ast.AST) -> None: # noqa: D401
node_type = type(node)
if node_type not in _ALLOWED_NODES:
raise FormulaSecurityError(
f"Disallowed AST node: {node_type.__name__}. Formula contains unsafe constructs."
)
super().generic_visit(node)
def visit_Call(self, node: ast.Call) -> None:
# Check that the function being called is a registered Name
if not isinstance(node.func, ast.Name):
raise FormulaSecurityError(
"Only direct function calls by name are allowed. "
"Method calls and attribute access are forbidden."
)
if node.func.id not in self._allowed_functions:
raise UnknownFunctionError(
f"Unknown function: '{node.func.id}'. Allowed: {sorted(self._allowed_functions)}"
)
self.generic_visit(node)
# ── Public API ────────────────────────────────────────────
def parse_formula(
formula: str, allowed_functions: set[str] | None = None
) -> tuple[ast.Expression, dict[str, str]]:
"""Parse a formula string into a safe AST.
Args:
formula: Formula string, optionally starting with ``=``.
allowed_functions: Set of registered function names. If None,
all functions are allowed (used for syntax-only validation).
Returns:
Tuple of (AST expression, field_ref_mapping) where
field_ref_mapping maps safe Python identifiers to original field IDs.
Raises:
FormulaParseError: Syntax error in formula.
FormulaSecurityError: Formula contains disallowed AST nodes.
UnknownFunctionError: Formula calls an unregistered function.
"""
expr = formula.strip()
if expr.startswith("="):
expr = expr[1:]
if not expr:
raise FormulaParseError("Empty formula")
# Substitute field references {field_id} → safe_name
substituted, field_mapping = _substitute_field_refs(expr)
try:
tree = ast.parse(substituted, mode="eval")
except SyntaxError as e:
raise FormulaParseError(f"Syntax error in formula: {e}") from e
# Security check
# When allowed_functions is None, use all registered functions (syntax-only validation)
if allowed_functions is None:
allowed = set(FUNCTION_REGISTRY.keys())
else:
allowed = allowed_functions
visitor = _SecurityVisitor(allowed)
visitor.visit(tree)
return tree, field_mapping # type: ignore[return-value]
def evaluate_ast(
tree: ast.Expression,
field_values: dict[str, Any],
functions: dict[str, Any],
) -> Any:
"""Evaluate a parsed formula AST against field values and functions.
This is NOT ``eval()`` it's a manual AST walker that only processes
whitelist nodes. Field references (Name nodes) are resolved from
``field_values``; function calls from ``functions``.
Args:
tree: Parsed AST from :func:`parse_formula`.
field_values: Mapping of field safe-name value (scalar or list for aggregates).
functions: Mapping of function name callable.
Returns:
The computed value.
"""
return _eval_node(tree.body, field_values, functions)
def _eval_node(node: ast.AST, fields: dict[str, Any], functions: dict[str, Any]) -> Any:
"""Recursively evaluate an AST node."""
if isinstance(node, ast.Constant):
return node.value
if isinstance(node, ast.Name):
if node.id not in fields:
raise FormulaParseError(f"Unknown field reference: {node.id}")
return fields[node.id]
if isinstance(node, ast.BinOp):
left = _eval_node(node.left, fields, functions)
right = _eval_node(node.right, fields, functions)
return _apply_binop(node.op, left, right)
if isinstance(node, ast.UnaryOp):
operand = _eval_node(node.operand, fields, functions)
if isinstance(node.op, ast.USub):
return -operand
if isinstance(node.op, ast.UAdd):
return +operand
if isinstance(node.op, ast.Not):
return not operand
raise FormulaSecurityError(f"Disallowed unary op: {type(node.op).__name__}")
if isinstance(node, ast.BoolOp):
values = [_eval_node(v, fields, functions) for v in node.values]
if isinstance(node.op, ast.And):
result = True
for v in values:
result = result and v
if not result:
return result
return result
if isinstance(node.op, ast.Or):
result = False
for v in values:
result = result or v
if not result:
return result
return result
raise FormulaSecurityError(f"Disallowed bool op: {type(node.op).__name__}")
if isinstance(node, ast.Compare):
left = _eval_node(node.left, fields, functions)
for op, comparator in zip(node.ops, node.comparators):
right = _eval_node(comparator, fields, functions)
if not _apply_compare(op, left, right):
return False
left = right
return True
if isinstance(node, ast.IfExp):
test = _eval_node(node.test, fields, functions)
if test:
return _eval_node(node.body, fields, functions)
return _eval_node(node.orelse, fields, functions)
if isinstance(node, ast.Call):
if not isinstance(node.func, ast.Name):
raise FormulaSecurityError("Only named function calls allowed")
func_name = node.func.id
if func_name not in functions:
raise UnknownFunctionError(f"Unknown function: {func_name}")
args = [_eval_node(a, fields, functions) for a in node.args]
return functions[func_name](*args)
raise FormulaSecurityError(f"Disallowed node during evaluation: {type(node).__name__}")
def _apply_binop(op: ast.AST, left: Any, right: Any) -> Any:
"""Apply a binary operator."""
if isinstance(op, ast.Add):
# String concat or numeric addition
if isinstance(left, str) or isinstance(right, str):
return f"{left}{right}"
return left + right
if isinstance(op, ast.Sub):
return left - right
if isinstance(op, ast.Mult):
return left * right
if isinstance(op, ast.Div):
return left / right
if isinstance(op, ast.Mod):
return left % right
if isinstance(op, ast.Pow):
return left**right
raise FormulaSecurityError(f"Disallowed binary op: {type(op).__name__}")
def _apply_compare(op: ast.AST, left: Any, right: Any) -> bool:
"""Apply a comparison operator."""
if isinstance(op, ast.Eq):
return left == right
if isinstance(op, ast.NotEq):
return left != right
if isinstance(op, ast.Lt):
return left < right
if isinstance(op, ast.Gt):
return left > right
if isinstance(op, ast.LtE):
return left <= right
if isinstance(op, ast.GtE):
return left >= right
raise FormulaSecurityError(f"Disallowed compare op: {type(op).__name__}")

View File

@ -0,0 +1,32 @@
"""Data ingestion modules for the bitable subsystem.
Pure data transformation no HTTP calls. The BitableTool orchestrates
HTTP writes to the bitable REST API; these modules only parse/transform
external data into bitable-ready field definitions and record lists.
"""
from __future__ import annotations
from agentkit.bitable.ingestion.api_collector import transform_records
from agentkit.bitable.ingestion.database import (
DB_TYPE_MAP,
import_table,
infer_field_type,
)
from agentkit.bitable.ingestion.excel import (
ParsedSheet,
parse_excel,
parse_excel_bytes,
parse_excel_url,
)
__all__ = [
"DB_TYPE_MAP",
"ParsedSheet",
"import_table",
"infer_field_type",
"parse_excel",
"parse_excel_bytes",
"parse_excel_url",
"transform_records",
]

View File

@ -0,0 +1,51 @@
"""API collector — transform Agent-collected structured data for bitable upsert.
The Agent already has crawl/API tools (web_crawl, web_search, etc.). This
module only handles the "shape" transformation: map arbitrary JSON records
to bitable field IDs via a ``field_mapping`` dict, then return records
ready for the upsert API.
Usage::
transformed = transform_records(
records=[{"name": "Alice", "age": 30}],
field_mapping={"name": "fld_abc", "age": "fld_def"},
)
# → [{"fld_abc": "Alice", "fld_def": 30}]
"""
from __future__ import annotations
from typing import Any
def transform_records(
records: list[dict[str, Any]],
field_mapping: dict[str, str],
) -> list[dict[str, Any]]:
"""Map source record keys to bitable field IDs via field_mapping.
Keys not in ``field_mapping`` are dropped. Values are passed through
as-is (the bitable upsert API handles type coercion).
Args:
records: List of source records (arbitrary keys).
field_mapping: ``{source_key: bitable_field_id}``.
Returns:
List of records with bitable field IDs as keys.
"""
if not records:
return []
if not field_mapping:
return []
transformed: list[dict[str, Any]] = []
for rec in records:
out: dict[str, Any] = {}
for src_key, field_id in field_mapping.items():
if src_key in rec:
out[field_id] = rec[src_key]
if out:
transformed.append(out)
return transformed

View File

@ -0,0 +1,171 @@
"""Database ingestion — reflect external DB tables into bitable-ready data.
Uses SQLAlchemy reflection to read table structure and rows. The caller
(BitableTool) then creates a bitable table + fields and upserts the rows
via the bitable REST API.
Type mapping (KTD: DB bitable):
INTEGER / BIGINT / SMALLINT / NUMERIC / FLOAT / DECIMAL number
VARCHAR / TEXT / CHAR / UUID text
TIMESTAMP / DATETIME / DATE date
BOOLEAN text (v1: no bool type)
JSON / JSONB text
fallback text
"""
from __future__ import annotations
import logging
from typing import Any
from sqlalchemy import (
BigInteger,
Boolean,
Date,
DateTime,
Float,
Integer,
Numeric,
SmallInteger,
String,
Text,
create_engine,
inspect,
select,
)
from sqlalchemy.engine import Engine
logger = logging.getLogger(__name__)
# ponytail: Static mapping covers all common SQL types. Unknown types fall
# back to text — safe but lossy. Upgrade path: add entries as needed.
DB_TYPE_MAP: dict[type, str] = {
Integer: "number",
BigInteger: "number",
SmallInteger: "number",
Numeric: "number",
Float: "number",
String: "text",
Text: "text",
DateTime: "date",
Date: "date",
Boolean: "text",
}
# Batch size for reading rows from the source DB
READ_BATCH = 1000
def infer_field_type(sqla_type: Any) -> str:
"""Map a SQLAlchemy column type instance or class to a bitable field type.
Handles both type instances (``Integer()``) and type classes (``Integer``).
Falls back to ``"text"`` for unknown types.
"""
for sqla_cls, bitable_type in DB_TYPE_MAP.items():
if isinstance(sqla_type, sqla_cls):
return bitable_type
# If sqla_type is a class (not instance), check subclass relationship
if isinstance(sqla_type, type):
for sqla_cls, bitable_type in DB_TYPE_MAP.items():
if issubclass(sqla_type, sqla_cls):
return bitable_type
return "text"
def import_table(
connection_string: str,
table_name: str,
*,
max_rows: int = 50_000,
) -> dict[str, Any]:
"""Reflect a single table from an external DB.
Returns ``{"table_name": str, "fields": [...], "records": [...],
"primary_key": str | None, "row_count": int}``.
Raises ``ConnectionError`` if the DB is unreachable.
"""
try:
engine = create_engine(connection_string)
except Exception as e:
raise ConnectionError(f"Failed to create engine for connection string: {e}") from e
try:
return _reflect_and_read(engine, table_name, max_rows)
finally:
engine.dispose()
def _reflect_and_read(engine: Engine, table_name: str, max_rows: int) -> dict[str, Any]:
"""Reflect one table and read its rows."""
insp = inspect(engine)
# Validate table exists
if table_name not in insp.get_table_names():
raise ValueError(f"Table {table_name!r} not found in database")
from sqlalchemy import Table, MetaData
metadata = MetaData()
table = Table(table_name, metadata, autoload_with=engine)
# Build field definitions
fields: list[dict[str, Any]] = []
pk_columns = list(table.primary_key.columns)
pk_name = pk_columns[0].name if pk_columns else None
for col in table.columns:
field_type = infer_field_type(col.type)
fields.append(
{
"name": col.name,
"field_type": field_type,
"is_primary_key": col.name == pk_name,
}
)
# If no PK, auto-generate one
if pk_name is None:
fields.insert(0, {"name": "id", "field_type": "text", "is_primary_key": True})
pk_name = "id"
# Read rows
records: list[dict[str, Any]] = []
with engine.connect() as conn:
result = conn.execute(select(table))
for i, row in enumerate(result):
if i >= max_rows:
logger.warning("Table %r truncated at %d rows during import", table_name, max_rows)
break
rec: dict[str, Any] = {}
for col in table.columns:
val = getattr(row, col.name, None)
if val is not None:
val = _serialize(val)
rec[col.name] = val
records.append(rec)
return {
"table_name": table_name,
"fields": fields,
"records": records,
"primary_key": pk_name,
"row_count": len(records),
}
def _serialize(val: Any) -> Any:
"""Serialize a DB value to JSON-safe form."""
from datetime import date, datetime
from decimal import Decimal
if isinstance(val, datetime):
return val.isoformat()
if isinstance(val, date):
return val.isoformat()
if isinstance(val, Decimal):
return float(val)
if isinstance(val, bytes):
return val.decode("utf-8", errors="replace")
return val

View File

@ -0,0 +1,249 @@
"""Excel ingestion — parse .xlsx into structured sheets for bitable import.
Reuses openpyxl (already a dependency via ``agentkit.memory.document_loader``).
Returns structured data ``{sheet_name: [{col: val}, ...]}`` rather than
Markdown text. First row is treated as field names; types are auto-inferred
from column values.
Known limitations (same as ``document_loader._parse_xlsx``):
- ``data_only=True`` returns ``None`` for formulas never opened in Excel.
- Merged cells: only the top-left cell has a value; others are ``None``.
"""
from __future__ import annotations
import io
import ipaddress
import logging
import socket
from dataclasses import dataclass, field
from pathlib import Path
from typing import Any
from urllib.parse import urlparse
import httpx
logger = logging.getLogger(__name__)
MAX_ROWS_PER_SHEET = 10_000
MAX_CELL_CHARS = 10_000
@dataclass
class ParsedSheet:
"""One parsed Excel sheet ready for bitable import."""
name: str
columns: list[str] = field(default_factory=list)
field_types: list[str] = field(default_factory=list) # "text" | "number" | "date"
records: list[dict[str, Any]] = field(default_factory=list)
def parse_excel(file_path: str | Path) -> list[ParsedSheet]:
"""Parse an .xlsx file from disk into structured sheets."""
path = Path(file_path)
if not path.exists():
raise FileNotFoundError(f"Excel file not found: {path}")
content = path.read_bytes()
return parse_excel_bytes(content)
def parse_excel_url(url: str, *, timeout: float = 30.0) -> list[ParsedSheet]:
"""Download an .xlsx from a URL and parse it.
Validates the URL to prevent SSRF: only http/https schemes are allowed,
and the resolved host must not be a private, loopback, link-local, or
reserved IP address. Hostnames are resolved before the request so the
check covers DNS rebinding to internal IPs.
"""
parsed = urlparse(url)
if parsed.scheme not in ("http", "https"):
raise ValueError(f"Disallowed URL scheme: {parsed.scheme!r} (only http/https)")
if not parsed.hostname:
raise ValueError("URL has no hostname")
_assert_safe_host(parsed.hostname)
resp = httpx.get(url, timeout=timeout, follow_redirects=False)
# Follow redirects manually, re-validating each Location (KTD: SSRF guard).
seen_redirects = 0
while resp.is_redirect and seen_redirects < 5:
seen_redirects += 1
next_url = httpx.URL(url).join(resp.headers["location"])
if next_url.scheme not in ("http", "https") or not next_url.host:
raise ValueError(f"Unsafe redirect target: {next_url}")
_assert_safe_host(next_url.host)
resp = httpx.get(next_url, timeout=timeout, follow_redirects=False)
url = str(next_url)
resp.raise_for_status()
return parse_excel_bytes(resp.content)
def _assert_safe_host(host: str) -> None:
"""Raise ``ValueError`` if ``host`` resolves to a private/loopback/reserved IP.
Accepts IPv4/IPv6 literals and DNS names. DNS names are resolved and every
returned address is checked any private/loopback/link-local/reserved
address blocks the request.
"""
# ponytail: blocks RFC1918, loopback, link-local, and reserved ranges.
# Ceiling: does not defend against DNS rebinding after the check (TOCTOU);
# upgrade path is to pin resolved IP in the httpx transport.
try:
addr = ipaddress.ip_address(host)
except ValueError:
# Hostname — resolve and check all A/AAAA records.
try:
infos = socket.getaddrinfo(host, None)
except socket.gaierror as e:
raise ValueError(f"Cannot resolve host {host!r}: {e}") from e
for info in infos:
sockaddr = info[4]
ip_str = sockaddr[0]
try:
addr = ipaddress.ip_address(ip_str)
except ValueError:
continue
if _is_unsafe_ip(addr):
raise ValueError(f"Host {host!r} resolves to private/reserved IP {addr}")
return
if _is_unsafe_ip(addr):
raise ValueError(f"Host {host!r} is a private/loopback/reserved IP: {addr}")
def _is_unsafe_ip(addr: ipaddress._BaseAddress) -> bool:
"""True if the address is private, loopback, link-local, reserved, or multicast."""
return (
addr.is_private
or addr.is_loopback
or addr.is_link_local
or addr.is_reserved
or addr.is_multicast
or addr.is_unspecified
)
def parse_excel_bytes(content: bytes) -> list[ParsedSheet]:
"""Parse Excel content from bytes. Raises ``ValueError`` on corrupt files."""
try:
from openpyxl import load_workbook
except ImportError as e:
raise ImportError("openpyxl is required for Excel ingestion") from e
try:
wb = load_workbook(io.BytesIO(content), data_only=True, read_only=True)
except Exception as e:
raise ValueError(f"Failed to parse Excel file: {e}") from e
sheets: list[ParsedSheet] = []
try:
for ws in wb.worksheets:
sheet = _parse_worksheet(ws)
if sheet is not None:
sheets.append(sheet)
finally:
wb.close()
return sheets
def _parse_worksheet(ws) -> ParsedSheet | None:
"""Parse a single worksheet. Returns ``None`` for completely empty sheets."""
rows_iter = ws.iter_rows(values_only=True)
# First row = headers
try:
header_row = next(rows_iter)
except StopIteration:
return None # empty sheet
headers = [str(v).strip() if v is not None else f"col_{i}" for i, v in enumerate(header_row)]
# Deduplicate headers
seen: dict[str, int] = {}
clean_headers: list[str] = []
for h in headers:
if h in seen:
seen[h] += 1
clean_headers.append(f"{h}_{seen[h]}")
else:
seen[h] = 0
clean_headers.append(h)
# Collect data rows
data_rows: list[tuple] = []
for row in rows_iter:
if len(data_rows) >= MAX_ROWS_PER_SHEET:
logger.warning("Sheet %r truncated at %d rows", ws.title, MAX_ROWS_PER_SHEET)
break
data_rows.append(row)
# Infer field types and build records
col_count = len(clean_headers)
field_types = _infer_column_types(data_rows, col_count)
records: list[dict[str, Any]] = []
for row in data_rows:
rec: dict[str, Any] = {}
for i, col_name in enumerate(clean_headers):
val = row[i] if i < len(row) else None
if val is not None:
val = _coerce_value(val, field_types[i])
rec[col_name] = val
records.append(rec)
return ParsedSheet(
name=ws.title,
columns=clean_headers,
field_types=field_types,
records=records,
)
def _infer_column_types(rows: list[tuple], col_count: int) -> list[str]:
"""Infer bitable field type per column: 'number', 'date', or 'text'."""
from datetime import date, datetime
types: list[str] = []
for col_idx in range(col_count):
is_number = True
is_date = True
has_value = False
for row in rows:
if col_idx >= len(row):
continue
val = row[col_idx]
if val is None or val == "":
continue
has_value = True
if isinstance(val, bool):
is_number = False
is_date = False
break
if not isinstance(val, (int, float)):
is_number = False
if not isinstance(val, (datetime, date)):
is_date = False
if not is_number and not is_date:
break
if not has_value:
types.append("text")
elif is_number:
types.append("number")
elif is_date:
types.append("date")
else:
types.append("text")
return types
def _coerce_value(val: Any, field_type: str) -> Any:
"""Coerce a cell value to the inferred field type. Truncate long strings."""
if field_type == "date":
from datetime import datetime
if isinstance(val, datetime):
return val.isoformat()
if isinstance(val, str) and len(val) > MAX_CELL_CHARS:
return val[:MAX_CELL_CHARS]
return val

View File

@ -0,0 +1,132 @@
"""Pydantic v2 data models for the bitable subsystem.
All models use ``model_config = ConfigDict(...)`` per project convention.
Field configs are stored as JSONB in PostgreSQL; the Pydantic models here
define the application-layer shape.
"""
from __future__ import annotations
from datetime import datetime, timezone
from enum import Enum
from typing import Any
from pydantic import BaseModel, ConfigDict, Field as PydanticField
def _utcnow() -> datetime:
return datetime.now(timezone.utc)
class FieldType(str, Enum):
"""Supported field types in bitable tables."""
text = "text"
number = "number"
date = "date"
select = "select"
multiselect = "multiselect"
attachment = "attachment"
image = "image"
formula = "formula"
lookup = "lookup"
class FieldOwner(str, Enum):
"""Who owns a field's data — determines upsert merge behavior."""
agent = "agent"
user = "user"
class ViewType(str, Enum):
"""Supported view types (v1: grid only)."""
grid = "grid"
kanban = "kanban"
gantt = "gantt"
gallery = "gallery"
form = "form"
class RecalcStatus(str, Enum):
"""Status of an asynchronous formula recalculation task."""
pending = "pending"
calculating = "calculating"
done = "done"
error = "error"
class Table(BaseModel):
"""A bitable table — collection of fields and records."""
model_config = ConfigDict(from_attributes=True)
id: str
name: str
description: str = ""
primary_key_field_id: str | None = None
owner_user_id: str | None = None
created_at: datetime = PydanticField(default_factory=_utcnow)
updated_at: datetime = PydanticField(default_factory=_utcnow)
class Field(BaseModel):
"""A column definition in a bitable table.
``config`` varies by ``field_type``:
- select/multiselect: ``{"options": [{"label": "...", "value": "..."}]}``
- formula: ``{"formula_expr": "=SUM({field_abc})"}``
- lookup: ``{"lookup_target": {"table_id": "...", "field_id": "...", "filter_field_id": "...", "filter_value": "..."}}``
"""
model_config = ConfigDict(from_attributes=True)
id: str
table_id: str
name: str
field_type: FieldType
config: dict[str, Any] = PydanticField(default_factory=dict)
owner: FieldOwner = FieldOwner.user
created_at: datetime = PydanticField(default_factory=_utcnow)
class Record(BaseModel):
"""A row in a bitable table. ``values`` maps field_id → value."""
model_config = ConfigDict(from_attributes=True)
id: str
table_id: str
values: dict[str, Any] = PydanticField(default_factory=dict)
created_at: datetime = PydanticField(default_factory=_utcnow)
updated_at: datetime = PydanticField(default_factory=_utcnow)
class View(BaseModel):
"""A saved view of a table (filters, sorts, hidden fields)."""
model_config = ConfigDict(from_attributes=True)
id: str
table_id: str
name: str
view_type: ViewType = ViewType.grid
config: dict[str, Any] = PydanticField(default_factory=dict)
created_at: datetime = PydanticField(default_factory=_utcnow)
class RecalcTask(BaseModel):
"""An asynchronous formula recalculation task."""
model_config = ConfigDict(from_attributes=True)
id: str
table_id: str
record_id: str
field_id: str
status: RecalcStatus = RecalcStatus.pending
error_message: str | None = None
queued_at: datetime = PydanticField(default_factory=_utcnow)
completed_at: datetime | None = None

View File

@ -0,0 +1,266 @@
"""Async recalc worker for formula fields.
Consumes recalc tasks from the queue, evaluates formulas, and writes results
back to records. Supports crash recovery (resets stale ``calculating`` tasks
on startup) and graceful shutdown.
Lifecycle (managed by app.py lifespan):
worker = RecalcWorker(db, service)
await worker.start() # starts background task + crash recovery
...
await worker.stop() # waits for shutdown
The worker runs as an asyncio task, polling the queue every ``poll_interval``
seconds. Each task is processed in its own transaction.
"""
from __future__ import annotations
import asyncio
import logging
from typing import Any
from agentkit.bitable.db import BitableDB
from agentkit.bitable.formula.engine import FormulaEngine
from agentkit.bitable.models import FieldType, RecalcStatus
from agentkit.bitable.repository import BitableRepository
from agentkit.bitable.service import BitableService
logger = logging.getLogger(__name__)
_DEFAULT_POLL_INTERVAL = 0.5 # seconds between queue polls
_DEFAULT_REAPER_INTERVAL = 300 # 5 minutes
_DEFAULT_STALE_THRESHOLD = 600 # 10 minutes
class RecalcWorker:
"""Background worker that processes formula recalc tasks.
Usage::
worker = RecalcWorker(db, service)
await worker.start()
# ... worker runs in background ...
await worker.stop()
"""
def __init__(
self,
db: BitableDB,
service: BitableService,
poll_interval: float = _DEFAULT_POLL_INTERVAL,
reaper_interval: float = _DEFAULT_REAPER_INTERVAL,
stale_threshold: float = _DEFAULT_STALE_THRESHOLD,
) -> None:
self._db = db
self._service = service
self._repo = BitableRepository(db)
self._poll_interval = poll_interval
self._reaper_interval = reaper_interval
self._stale_threshold = stale_threshold
self._task: asyncio.Task[None] | None = None
self._reaper_task: asyncio.Task[None] | None = None
self._stop_event = asyncio.Event()
# Per-table formula engines (cached, rebuilt when fields change)
self._engines: dict[str, FormulaEngine] = {}
async def start(self) -> None:
"""Start the worker. Performs crash recovery first."""
# Crash recovery: reset stale 'calculating' tasks to 'pending'.
# On startup, all calculating tasks are stale (worker was down).
# Use threshold=0 to reset all calculating tasks immediately.
reset_count = await self._repo.reset_stale_recalc_tasks(stale_threshold=0.0)
if reset_count > 0:
logger.info("RecalcWorker: reset %d stale tasks to pending", reset_count)
self._stop_event.clear()
self._task = asyncio.create_task(self._run(), name="recalc-worker")
self._reaper_task = asyncio.create_task(self._run_reaper(), name="recalc-reaper")
logger.info("RecalcWorker started")
async def stop(self) -> None:
"""Gracefully stop the worker."""
self._stop_event.set()
if self._task is not None:
self._task.cancel()
try:
await self._task
except asyncio.CancelledError:
pass
self._task = None
if self._reaper_task is not None:
self._reaper_task.cancel()
try:
await self._reaper_task
except asyncio.CancelledError:
pass
self._reaper_task = None
logger.info("RecalcWorker stopped")
async def _run(self) -> None:
"""Main worker loop — poll queue, process tasks."""
while not self._stop_event.is_set():
try:
# Atomic claim (P1 #6): tasks are marked 'calculating' in the
# same transaction, so concurrent workers never grab the same task.
tasks = await self._repo.claim_recalc_tasks(limit=10)
if not tasks:
await asyncio.sleep(self._poll_interval)
continue
# P1 #7: sort tasks by topological order so formula-to-formula
# dependencies resolve correctly within a batch. Tasks for the
# same record are ordered so that if B depends on A, A is
# processed first (A's result is written back before B reads it).
tasks = await self._sort_by_topological_order(tasks)
for task in tasks:
if self._stop_event.is_set():
break
await self.process_task(task)
except asyncio.CancelledError:
break
except Exception:
logger.exception("RecalcWorker error in main loop")
await asyncio.sleep(self._poll_interval)
async def _sort_by_topological_order(self, tasks: list[Any]) -> list[Any]:
"""Sort claimed tasks so dependencies are processed first (P1 #7).
Groups tasks by table_id, builds (or reuses) the engine to get the
topological order, and assigns each task a sort key of
``(table_id, record_id, topo_index)``. Tasks for fields not in the
DAG get topo_index = infinity (processed last).
"""
if len(tasks) <= 1:
return tasks
# Build topo index per table: {table_id: {field_id: position}}
topo_index: dict[str, dict[str, int]] = {}
table_ids = {t.table_id for t in tasks}
for tid in table_ids:
engine = await self._get_or_build_engine(tid)
if engine is None:
topo_index[tid] = {}
continue
order = engine.topological_order()
topo_index[tid] = {fid: i for i, fid in enumerate(order)}
def _key(t: Any) -> tuple[str, str, int]:
idx = topo_index.get(t.table_id, {}).get(t.field_id, 1 << 30)
return (t.table_id, t.record_id, idx)
return sorted(tasks, key=_key)
async def _run_reaper(self) -> None:
"""Reaper loop — reset stale calculating tasks periodically.
Only resets tasks older than ``stale_threshold`` (P1 #10), so active
tasks being processed by a live worker are not interrupted.
"""
while not self._stop_event.is_set():
try:
await asyncio.sleep(self._reaper_interval)
count = await self._repo.reset_stale_recalc_tasks(
stale_threshold=self._stale_threshold
)
if count > 0:
logger.info(
"RecalcWorker reaper: reset %d stale tasks (threshold=%ss)",
count,
self._stale_threshold,
)
except asyncio.CancelledError:
break
except Exception:
logger.exception("RecalcWorker reaper error")
async def process_task(self, task: Any) -> None:
"""Process a single recalc task: evaluate formula → write result.
The task is expected to already be in ``calculating`` status when
called from the worker loop (atomic claim sets it). When called
synchronously via ``service.process_recalc_task``, this method
marks it calculating first (idempotent re-marking is harmless).
"""
# Idempotent: mark calculating (no-op if already calculating via claim).
await self._repo.update_recalc_status(task.id, RecalcStatus.calculating)
try:
field = await self._repo.get_field(task.field_id)
if field is None or field.field_type != FieldType.formula:
await self._repo.update_recalc_status(
task.id, RecalcStatus.error, "Field not found or not a formula"
)
return
formula_expr = field.config.get("formula_expr", "")
if not formula_expr:
await self._repo.update_recalc_status(
task.id, RecalcStatus.error, "No formula_expr in field config"
)
return
engine = await self._get_or_build_engine(task.table_id)
if engine is None:
await self._repo.update_recalc_status(
task.id, RecalcStatus.error, "No formula fields in table"
)
return
record = await self._repo.get_record(task.record_id)
if record is None:
await self._repo.update_recalc_status(
task.id, RecalcStatus.error, "Record not found"
)
return
deps = engine.get_dependencies(task.field_id)
column_values: dict[str, list[Any]] = {}
for dep_field_id in deps:
column_values[dep_field_id] = await self._repo.get_column_values(
task.table_id, dep_field_id
)
result = engine.evaluate(
task.field_id,
row_values=record.values,
column_values=column_values,
)
await self._repo.set_formula_value(task.record_id, task.field_id, result)
await self._repo.update_recalc_status(task.id, RecalcStatus.done)
except Exception as e:
logger.exception("RecalcWorker: error processing task %s", task.id)
await self._repo.update_recalc_status(task.id, RecalcStatus.error, str(e)[:500])
async def _get_or_build_engine(self, table_id: str) -> FormulaEngine | None:
"""Get or build a FormulaEngine for a table.
Returns None if the table has no formula fields.
"""
if table_id in self._engines:
return self._engines[table_id]
fields = await self._repo.list_fields(table_id)
formula_fields = [f for f in fields if f.field_type == FieldType.formula]
if not formula_fields:
return None
engine = FormulaEngine()
for f in formula_fields:
formula_expr = f.config.get("formula_expr", "")
if formula_expr:
try:
engine.add_formula(f.id, formula_expr)
except Exception:
logger.exception("RecalcWorker: failed to register formula for field %s", f.id)
self._engines[table_id] = engine
return engine
def invalidate_engine(self, table_id: str) -> None:
"""Invalidate the cached formula engine for a table (call when fields change)."""
self._engines.pop(table_id, None)

View File

@ -0,0 +1,803 @@
"""Data access layer for the bitable subsystem.
All PostgreSQL operations go through this repository. The service layer
(:mod:`agentkit.bitable.service`) calls these methods routes never
access the repository directly.
"""
from __future__ import annotations
import logging
import re
from datetime import datetime, timedelta, timezone
from typing import Any
from sqlalchemy import delete, func, insert, select, text, update
from sqlalchemy.dialects.postgresql import insert as pg_insert
from agentkit.bitable.db import (
BitableDB,
FieldModel,
RecordModel,
RecalcQueueModel,
TableModel,
ViewModel,
_uuid_str,
)
from agentkit.bitable.models import (
Field,
FieldOwner,
FieldType,
Record,
RecalcStatus,
RecalcTask,
Table,
View,
ViewType,
)
logger = logging.getLogger(__name__)
class BitableRepository:
"""Async repository for bitable CRUD operations.
Usage::
repo = BitableRepository(db)
table = await repo.create_table(name="My Table")
"""
def __init__(self, db: BitableDB) -> None:
self._db = db
@property
def _session_factory(self):
return self._db.session_factory
# ── Tables ──────────────────────────────────────────────
async def create_table(
self,
name: str,
description: str = "",
primary_key_field_id: str | None = None,
owner_user_id: str | None = None,
) -> Table:
"""Create a new bitable table."""
async with self._session_factory() as session:
stmt = (
insert(TableModel)
.values(
id=_uuid_str(),
name=name,
description=description,
primary_key_field_id=primary_key_field_id,
owner_user_id=owner_user_id,
)
.returning(TableModel)
)
result = await session.execute(stmt)
entity = result.scalar_one()
await session.commit()
return Table.model_validate(entity)
async def get_table(self, table_id: str) -> Table | None:
"""Get a table by ID."""
async with self._session_factory() as session:
stmt = select(TableModel).where(TableModel.id == table_id)
result = await session.execute(stmt)
entity = result.scalars().first()
return Table.model_validate(entity) if entity else None
async def list_tables(self, owner_user_id: str | None = None) -> list[Table]:
"""List all tables (optionally filtered by owner)."""
async with self._session_factory() as session:
stmt = select(TableModel)
if owner_user_id:
stmt = stmt.where(TableModel.owner_user_id == owner_user_id)
stmt = stmt.order_by(TableModel.created_at.desc())
result = await session.execute(stmt)
return [Table.model_validate(e) for e in result.scalars().all()]
async def update_table(self, table_id: str, **kwargs: Any) -> Table | None:
"""Update a table's attributes."""
async with self._session_factory() as session:
stmt = (
update(TableModel)
.where(TableModel.id == table_id)
.values(**kwargs)
.returning(TableModel)
)
result = await session.execute(stmt)
entity = result.scalars().first()
await session.commit()
return Table.model_validate(entity) if entity else None
async def create_pk_unique_index(self, table_id: str, pk_field_id: str) -> None:
"""Create a unique expression index on the PK field for this table.
Enforces that ``values->>pk_field_id`` is unique per table. Required
for correct upsert semantics (KTD8) without it, duplicate PK values
can be silently inserted and ``find_record_by_pk`` does full scans.
"""
# ponytail: pk_field_id is a system UUID from TableModel, validated here
# before interpolation into a SQL identifier.
if not re.match(r"^[a-f0-9-]{36}$", pk_field_id):
raise ValueError(f"Invalid pk_field_id format: {pk_field_id}")
index_name = f"ix_bitable_records_pk_{table_id.replace('-', '_')}"
sql = text(
f"CREATE UNIQUE INDEX IF NOT EXISTS {index_name} "
f"ON bitable.bitable_records (table_id, (values->>'{pk_field_id}')) "
f"WHERE values ? :pk_path"
)
async with self._session_factory() as session:
await session.execute(sql, {"pk_path": pk_field_id})
await session.commit()
async def delete_table(self, table_id: str) -> bool:
"""Delete a table and all its fields, records, views, and recalc tasks."""
async with self._session_factory() as session:
await session.execute(delete(FieldModel).where(FieldModel.table_id == table_id))
await session.execute(delete(RecordModel).where(RecordModel.table_id == table_id))
await session.execute(delete(ViewModel).where(ViewModel.table_id == table_id))
await session.execute(
delete(RecalcQueueModel).where(RecalcQueueModel.table_id == table_id)
)
result = await session.execute(delete(TableModel).where(TableModel.id == table_id))
await session.commit()
return result.rowcount > 0
# ── Fields ──────────────────────────────────────────────
async def create_field(
self,
table_id: str,
name: str,
field_type: FieldType,
config: dict[str, Any] | None = None,
owner: FieldOwner = FieldOwner.user,
) -> Field:
"""Create a new field in a table."""
async with self._session_factory() as session:
stmt = (
insert(FieldModel)
.values(
id=_uuid_str(),
table_id=table_id,
name=name,
field_type=field_type.value,
config=config or {},
owner=owner.value,
)
.returning(FieldModel)
)
result = await session.execute(stmt)
entity = result.scalar_one()
await session.commit()
return Field.model_validate(entity)
async def get_field(self, field_id: str) -> Field | None:
"""Get a field by ID."""
async with self._session_factory() as session:
stmt = select(FieldModel).where(FieldModel.id == field_id)
result = await session.execute(stmt)
entity = result.scalars().first()
return Field.model_validate(entity) if entity else None
async def list_fields(self, table_id: str) -> list[Field]:
"""List all fields in a table."""
async with self._session_factory() as session:
stmt = (
select(FieldModel)
.where(FieldModel.table_id == table_id)
.order_by(FieldModel.created_at)
)
result = await session.execute(stmt)
return [Field.model_validate(e) for e in result.scalars().all()]
async def update_field(self, field_id: str, **kwargs: Any) -> Field | None:
"""Update a field's attributes."""
async with self._session_factory() as session:
stmt = (
update(FieldModel)
.where(FieldModel.id == field_id)
.values(**kwargs)
.returning(FieldModel)
)
result = await session.execute(stmt)
entity = result.scalars().first()
await session.commit()
return Field.model_validate(entity) if entity else None
async def delete_field(self, field_id: str) -> bool:
"""Delete a field."""
async with self._session_factory() as session:
result = await session.execute(delete(FieldModel).where(FieldModel.id == field_id))
await session.commit()
return result.rowcount > 0
# ── Records ─────────────────────────────────────────────
async def create_record(self, table_id: str, values: dict[str, Any] | None = None) -> Record:
"""Create a new record."""
async with self._session_factory() as session:
stmt = (
insert(RecordModel)
.values(
id=_uuid_str(),
table_id=table_id,
values=values or {},
)
.returning(RecordModel)
)
result = await session.execute(stmt)
entity = result.scalar_one()
await session.commit()
return Record.model_validate(entity)
async def create_records_batch(
self, table_id: str, records_values: list[dict[str, Any]]
) -> list[Record]:
"""Batch-insert multiple records (P2 #19: eliminates per-record INSERT).
Returns the created records in insertion order.
"""
if not records_values:
return []
async with self._session_factory() as session:
rows = [
{"id": _uuid_str(), "table_id": table_id, "values": vals} for vals in records_values
]
stmt = insert(RecordModel).values(rows).returning(RecordModel)
result = await session.execute(stmt)
entities = result.scalars().all()
await session.commit()
return [Record.model_validate(e) for e in entities]
async def get_record(self, record_id: str) -> Record | None:
"""Get a record by ID."""
async with self._session_factory() as session:
stmt = select(RecordModel).where(RecordModel.id == record_id)
result = await session.execute(stmt)
entity = result.scalars().first()
return Record.model_validate(entity) if entity else None
async def list_records(
self,
table_id: str,
cursor: str | None = None,
limit: int = 50,
) -> tuple[list[Record], str | None]:
"""List records in a table with cursor-based pagination.
Returns ``(records, next_cursor)``. ``next_cursor`` is ``None`` when
there are no more records.
"""
async with self._session_factory() as session:
stmt = (
select(RecordModel)
.where(RecordModel.table_id == table_id)
.order_by(RecordModel.id)
.limit(limit + 1) # fetch one extra to check if there's a next page
)
if cursor:
stmt = stmt.where(RecordModel.id > cursor)
result = await session.execute(stmt)
entities = result.scalars().all()
if len(entities) > limit:
next_cursor = entities[limit - 1].id
entities = entities[:limit]
else:
next_cursor = None
return [Record.model_validate(e) for e in entities], next_cursor
async def update_record_values(self, record_id: str, values: dict[str, Any]) -> Record | None:
"""Update a record's values (full replace)."""
async with self._session_factory() as session:
stmt = (
update(RecordModel)
.where(RecordModel.id == record_id)
.values(values=values)
.returning(RecordModel)
)
result = await session.execute(stmt)
entity = result.scalars().first()
await session.commit()
return Record.model_validate(entity) if entity else None
async def delete_record(self, record_id: str) -> bool:
"""Delete a record."""
async with self._session_factory() as session:
result = await session.execute(delete(RecordModel).where(RecordModel.id == record_id))
await session.commit()
return result.rowcount > 0
async def delete_records_by_table(self, table_id: str) -> int:
"""Delete all records in a table. Returns count deleted."""
async with self._session_factory() as session:
result = await session.execute(
delete(RecordModel).where(RecordModel.table_id == table_id)
)
await session.commit()
return result.rowcount
# ── Views ───────────────────────────────────────────────
async def create_view(
self,
table_id: str,
name: str,
view_type: ViewType = ViewType.grid,
config: dict[str, Any] | None = None,
) -> View:
"""Create a new view."""
async with self._session_factory() as session:
stmt = (
insert(ViewModel)
.values(
id=_uuid_str(),
table_id=table_id,
name=name,
view_type=view_type.value,
config=config or {},
)
.returning(ViewModel)
)
result = await session.execute(stmt)
entity = result.scalar_one()
await session.commit()
return View.model_validate(entity)
async def list_views(self, table_id: str) -> list[View]:
"""List all views in a table."""
async with self._session_factory() as session:
stmt = (
select(ViewModel)
.where(ViewModel.table_id == table_id)
.order_by(ViewModel.created_at)
)
result = await session.execute(stmt)
return [View.model_validate(e) for e in result.scalars().all()]
async def update_view(self, view_id: str, **kwargs: Any) -> View | None:
"""Update a view's attributes."""
async with self._session_factory() as session:
stmt = (
update(ViewModel)
.where(ViewModel.id == view_id)
.values(**kwargs)
.returning(ViewModel)
)
result = await session.execute(stmt)
entity = result.scalars().first()
await session.commit()
return View.model_validate(entity) if entity else None
# ── Recalc Queue ────────────────────────────────────────
async def enqueue_recalc(
self, table_id: str, record_id: str, field_id: str
) -> RecalcTask | None:
"""Enqueue a recalc task. Returns None if duplicate (ON CONFLICT DO NOTHING)."""
async with self._session_factory() as session:
stmt = (
pg_insert(RecalcQueueModel)
.values(
id=_uuid_str(),
table_id=table_id,
record_id=record_id,
field_id=field_id,
status=RecalcStatus.pending.value,
)
.on_conflict_do_nothing(constraint="uq_recalc_record_field")
.returning(RecalcQueueModel)
)
result = await session.execute(stmt)
entity = result.scalars().first()
await session.commit()
return RecalcTask.model_validate(entity) if entity else None
async def get_pending_recalc_tasks(self, limit: int = 50) -> list[RecalcTask]:
"""Get pending recalc tasks ordered by queue time.
Deprecated for concurrent workers use :meth:`claim_recalc_tasks`
for atomic claim. Kept for introspection/tests.
"""
async with self._session_factory() as session:
stmt = (
select(RecalcQueueModel)
.where(RecalcQueueModel.status == RecalcStatus.pending.value)
.order_by(RecalcQueueModel.queued_at)
.limit(limit)
)
result = await session.execute(stmt)
return [RecalcTask.model_validate(e) for e in result.scalars().all()]
async def claim_recalc_tasks(self, limit: int = 10) -> list[RecalcTask]:
"""Atomically claim up to ``limit`` pending tasks (P1 #6).
Uses ``FOR UPDATE SKIP LOCKED`` so multiple workers never grab the
same task. Each claimed task is set to ``calculating`` in the same
transaction and returned. Callers process the returned tasks and
then call :meth:`update_recalc_status` with ``done``/``error``.
"""
# ponytail: PostgreSQL-specific FOR UPDATE SKIP LOCKED. Ceiling: this
# binds the repository to PG; upgrade path is a dialect check.
async with self._session_factory() as session:
# Subselect pending tasks with row-level locks, skipping locked rows.
subq = (
select(RecalcQueueModel.id)
.where(RecalcQueueModel.status == RecalcStatus.pending.value)
.order_by(RecalcQueueModel.queued_at)
.limit(limit)
.with_for_update(skip_locked=True)
).subquery()
stmt = (
update(RecalcQueueModel)
.where(RecalcQueueModel.id.in_(select(subq.c.id)))
.values(status=RecalcStatus.calculating.value)
.returning(RecalcQueueModel)
)
result = await session.execute(stmt)
await session.commit()
return [RecalcTask.model_validate(e) for e in result.scalars().all()]
async def update_recalc_status(
self,
task_id: str,
status: RecalcStatus,
error_message: str | None = None,
) -> None:
"""Update a recalc task's status."""
async with self._session_factory() as session:
kwargs: dict[str, Any] = {"status": status.value}
if error_message is not None:
kwargs["error_message"] = error_message
if status in (RecalcStatus.done, RecalcStatus.error):
kwargs["completed_at"] = func.now()
stmt = update(RecalcQueueModel).where(RecalcQueueModel.id == task_id).values(**kwargs)
await session.execute(stmt)
await session.commit()
async def reset_stale_recalc_tasks(self, stale_threshold: float = 600.0) -> int:
"""Reset 'calculating' tasks older than ``stale_threshold`` seconds back to 'pending'.
Only tasks whose ``queued_at`` is older than ``now - stale_threshold``
are reset this avoids resetting tasks that a live worker is currently
processing (crash recovery, not active-task interruption). Returns count.
"""
# ponytail: stale_threshold is the max age a calculating task should
# reach. 600s default = 10min; a healthy worker finishes in milliseconds.
cutoff = datetime.now(timezone.utc) - timedelta(seconds=stale_threshold)
async with self._session_factory() as session:
stmt = (
update(RecalcQueueModel)
.where(RecalcQueueModel.status == RecalcStatus.calculating.value)
.where(RecalcQueueModel.queued_at < cutoff)
.values(status=RecalcStatus.pending.value)
)
result = await session.execute(stmt)
await session.commit()
return result.rowcount
# ── Upsert (KTD8: jsonb_set per agent field) ───────────
async def find_record_by_pk(
self, table_id: str, pk_field_id: str, pk_value: str
) -> RecordModel | None:
"""Find a record by primary key value (JSONB key lookup)."""
async with self._session_factory() as session:
# ponytail: field_id is a system-generated UUID, safe to interpolate.
# pk_value is parameterized.
sql = text(
"SELECT * FROM bitable.bitable_records "
"WHERE table_id = :table_id AND values->>:pk_path = :pk_value LIMIT 1"
)
result = await session.execute(
sql, {"table_id": table_id, "pk_path": pk_field_id, "pk_value": pk_value}
)
row = result.fetchone()
return RecordModel(**row._mapping) if row else None
async def find_records_by_pk_batch(
self, table_id: str, pk_field_id: str, pk_values: list[str]
) -> dict[str, RecordModel]:
"""Batch-find records by PK values (P1 #14: eliminates N+1 queries).
Returns a dict mapping pk_value (str) RecordModel.
"""
if not pk_values:
return {}
async with self._session_factory() as session:
# ponytail: field_id is a system UUID, safe to interpolate.
# pk_values are parameterized via ANY(:values).
sql = text(
"SELECT * FROM bitable.bitable_records "
"WHERE table_id = :table_id AND values->>:pk_path = ANY(:pk_values)"
)
result = await session.execute(
sql,
{
"table_id": table_id,
"pk_path": pk_field_id,
"pk_values": pk_values,
},
)
rows = result.fetchall()
result_map: dict[str, RecordModel] = {}
for r in rows:
values = r._mapping["values"]
if isinstance(values, str):
import json as _json
values = _json.loads(values)
pk_val = values.get(pk_field_id) if isinstance(values, dict) else None
if pk_val is not None:
result_map[str(pk_val)] = RecordModel(**r._mapping)
return result_map
async def upsert_record_agent_fields(
self, record_id: str, agent_field_values: dict[str, Any]
) -> None:
"""Update agent-owned fields using jsonb_set (KTD8).
Chains jsonb_set calls so user-owned fields are never touched.
"""
if not agent_field_values:
return
import json
async with self._session_factory() as session:
# Build nested jsonb_set: jsonb_set(jsonb_set(values, '{f1}', CAST(:v0 AS jsonb), true), ...)
# ponytail: field_ids are system UUIDs, safe to interpolate into path literals.
# Use CAST(:param AS jsonb) instead of :param::jsonb — asyncpg dialect
# misparses the `::` as part of the param name.
inner = "values"
params: dict[str, Any] = {"record_id": record_id}
for i, (field_id, value) in enumerate(agent_field_values.items()):
param_key = f"v{i}"
inner = f"jsonb_set({inner}, '{{{field_id}}}', CAST(:{param_key} AS jsonb), true)"
params[param_key] = json.dumps(value)
sql = text(f"UPDATE bitable.bitable_records SET values = {inner} WHERE id = :record_id")
await session.execute(sql, params)
await session.commit()
# ── View-filtered record listing (KTD9) ────────────────
async def list_records_filtered(
self,
table_id: str,
filters: list[dict[str, Any]] | None = None,
sorts: list[dict[str, Any]] | None = None,
cursor: str | None = None,
limit: int = 50,
) -> tuple[list[Record], str | None]:
"""List records with view filters/sorts + cursor pagination.
filters: [{"field_id": "...", "op": "eq|ne|gt|lt|contains|is_empty", "value": ...}]
sorts: [{"field_id": "...", "direction": "asc|desc"}]
Cursor is a base64-encoded JSON of ``{"id": ..., "sv": [sort_val, ...]}``.
When no sorts are given, the cursor degrades to id-only pagination.
With sorts, a row-value comparison ``(sort_cols..., id) > (cursor_vals...)``
ensures stable pagination across non-id orderings (P1 #11).
"""
import base64
import json as _json
async with self._session_factory() as session:
# Build raw SQL with JSONB filter/sort translation.
# ponytail: field_ids in filters/sorts are system UUIDs (validated by service layer).
where_clauses = ["table_id = :table_id"]
params: dict[str, Any] = {"table_id": table_id}
if filters:
for i, f in enumerate(filters):
fid = f["field_id"]
op = f["op"]
val = f.get("value")
param_name = f"fv{i}"
if op == "is_empty":
where_clauses.append(f"(values->>'{fid}') IS NULL OR values->>'{fid}' = ''")
elif op == "eq":
where_clauses.append(f"values->>'{fid}' = :{param_name}")
params[param_name] = str(val)
elif op == "ne":
where_clauses.append(f"values->>'{fid}' != :{param_name}")
params[param_name] = str(val)
elif op == "contains":
where_clauses.append(f"values->>'{fid}' LIKE :{param_name}")
params[param_name] = f"%{val}%"
elif op in ("gt", "lt", "gte", "lte"):
op_map = {"gt": ">", "lt": "<", "gte": ">=", "lte": "<="}
where_clauses.append(
f"CAST(values->>'{fid}' AS NUMERIC) {op_map[op]} :{param_name}"
)
params[param_name] = val
# Build sort expressions and cursor (P1 #11: composite cursor).
sort_exprs: list[str] = [] # SQL expressions for ORDER BY
sort_dirs: list[str] = [] # "ASC" or "DESC" per sort column
if sorts:
for s in sorts:
fid = s["field_id"]
direction = "ASC" if s.get("direction", "asc") == "asc" else "DESC"
sort_exprs.append(f"values->>'{fid}'")
sort_dirs.append(direction)
# Always append id as the final tiebreaker for stable ordering.
sort_exprs.append("id")
sort_dirs.append(
"ASC" if not sorts or sorts[0].get("direction", "asc") == "asc" else "DESC"
)
order_by = ", ".join(f"{expr} {dir_}" for expr, dir_ in zip(sort_exprs, sort_dirs))
# Cursor: composite row-value comparison.
if cursor:
try:
decoded = _json.loads(base64.b64decode(cursor).decode("utf-8"))
cursor_id = decoded["id"]
cursor_sort_vals = decoded.get("sv", [])
except (ValueError, KeyError) as e:
raise ValueError(f"Invalid cursor: {e}") from e
if not cursor_sort_vals:
# No sorts — simple id comparison.
where_clauses.append("id > :cursor_id")
params["cursor_id"] = cursor_id
else:
# Row-value comparison: (col1, col2, ..., id) > (v1, v2, ..., cursor_id)
# For DESC columns, invert the comparison direction per-column
# by negating the value or using < instead of >.
# PostgreSQL row values: (a,b) > (x,y) means a>x OR (a=x AND b>y).
# For DESC, we want a<x OR (a=x AND b<x), so we use < and the
# ORDER BY already has DESC.
# Build the LHS and RHS of the row comparison.
lhs_parts: list[str] = []
rhs_params: list[str] = []
for i, expr in enumerate(sort_exprs):
lhs_parts.append(expr)
param_name = f"csv{i}"
rhs_params.append(f":{param_name}")
params[param_name] = cursor_sort_vals[i]
# Last element is id.
lhs_parts.append("id")
params["cursor_id"] = cursor_id
rhs_params.append(":cursor_id")
# Determine comparison direction: if first sort is DESC, use <.
# Mixed-direction sorts are rare; for correctness with mixed
# directions, we'd need per-column operators. For now, use the
# first sort's direction (covers the common single-sort case).
# ponytail: ceiling — mixed ASC/DESC sorts use first direction only.
comp_op = "<" if sort_dirs[0] == "DESC" else ">"
lhs = f"({', '.join(lhs_parts)})"
rhs = f"({', '.join(rhs_params)})"
where_clauses.append(f"{lhs} {comp_op} {rhs}")
where_sql = " AND ".join(where_clauses)
sql = text(
f"SELECT * FROM bitable.bitable_records WHERE {where_sql} "
f"ORDER BY {order_by} LIMIT :limit"
)
params["limit"] = limit + 1
result = await session.execute(sql, params)
rows = result.fetchall()
if len(rows) > limit:
last_row = rows[limit - 1]
last_mapping = last_row._mapping
# Build composite cursor from sort values + id.
# Sort values are extracted as text to match `values->>'fid'` expressions.
sv: list[Any] = []
last_values = last_mapping.get("values")
if isinstance(last_values, str):
# asyncpg may return JSONB as str in raw text() queries.
try:
last_values = _json.loads(last_values)
except (ValueError, TypeError):
last_values = {}
if sorts and isinstance(last_values, dict):
for s in sorts:
fid = s["field_id"]
val = last_values.get(fid)
sv.append(str(val) if val is not None else None)
cursor_data = {"id": last_mapping["id"], "sv": sv}
next_cursor = base64.b64encode(_json.dumps(cursor_data).encode("utf-8")).decode(
"utf-8"
)
rows = rows[:limit]
else:
next_cursor = None
return [Record.model_validate(RecordModel(**r._mapping)) for r in rows], next_cursor
# ── Field deletion dependency queries ──────────────────
async def find_formula_fields_referencing(
self, table_id: str, target_field_id: str
) -> list[Field]:
"""Find formula fields that reference target_field_id in their formula_expr."""
async with self._session_factory() as session:
# JSONB config->>'formula_expr' LIKE '%{target_field_id}%'
sql = text(
"SELECT * FROM bitable.bitable_fields "
"WHERE table_id = :table_id AND field_type = 'formula' "
"AND config->>'formula_expr' LIKE :pattern"
)
result = await session.execute(
sql,
{"table_id": table_id, "pattern": f"%{target_field_id}%"},
)
return [Field.model_validate(FieldModel(**r._mapping)) for r in result.fetchall()]
async def find_views_referencing_field(self, table_id: str, field_id: str) -> list[View]:
"""Find views whose config (filters/sorts/hidden_fields) references field_id."""
async with self._session_factory() as session:
# Check if field_id appears anywhere in config JSONB
sql = text(
"SELECT * FROM bitable.bitable_views "
"WHERE table_id = :table_id AND config::text LIKE :pattern"
)
result = await session.execute(
sql,
{"table_id": table_id, "pattern": f"%{field_id}%"},
)
return [View.model_validate(ViewModel(**r._mapping)) for r in result.fetchall()]
async def remove_field_from_records(self, table_id: str, field_id: str) -> int:
"""Remove a field key from all records' JSONB values (force delete cleanup)."""
async with self._session_factory() as session:
sql = text(
"UPDATE bitable.bitable_records SET values = values - :field_path "
"WHERE table_id = :table_id"
)
result = await session.execute(sql, {"field_path": field_id, "table_id": table_id})
await session.commit()
return result.rowcount
# ── Recalc support (U3) ────────────────────────────────
async def get_column_values(self, table_id: str, field_id: str) -> list[Any]:
"""Get all values for a field across all records in a table (for aggregates).
Returns a list of values (preserving order by record id). Missing values
are included as None so aggregate functions can skip them.
"""
async with self._session_factory() as session:
sql = text(
"SELECT values->:field_id AS val FROM bitable.bitable_records "
"WHERE table_id = :table_id ORDER BY id"
)
result = await session.execute(sql, {"field_id": field_id, "table_id": table_id})
return [row[0] for row in result.fetchall()]
async def set_formula_value(self, record_id: str, field_id: str, value: Any) -> None:
"""Set a single formula field value in a record's JSONB (jsonb_set)."""
import json
async with self._session_factory() as session:
# ponytail: field_id is a system UUID, safe to interpolate into path literal.
# Use CAST AS jsonb instead of ::jsonb (asyncpg dialect misparses ::).
sql = text(
f"UPDATE bitable.bitable_records "
f"SET values = jsonb_set(values, '{{{field_id}}}', CAST(:val AS jsonb), true) "
f"WHERE id = :record_id"
)
await session.execute(sql, {"val": json.dumps(value), "record_id": record_id})
await session.commit()
async def get_all_records(self, table_id: str) -> list[Record]:
"""Get all records in a table (for column value extraction in recalc)."""
async with self._session_factory() as session:
stmt = (
select(RecordModel).where(RecordModel.table_id == table_id).order_by(RecordModel.id)
)
result = await session.execute(stmt)
return [Record.model_validate(e) for e in result.scalars().all()]

View File

@ -0,0 +1,477 @@
"""Business logic layer for the bitable subsystem.
The service wraps :class:`BitableRepository` and adds business rules:
field ownership, upsert semantics, formula recalc triggering, etc.
Routes and CLI call this layer never the repository directly.
"""
from __future__ import annotations
import logging
import os
from pathlib import Path
from typing import Any
from agentkit.bitable.db import BitableDB
from agentkit.bitable.models import (
Field,
FieldOwner,
FieldType,
Record,
RecalcTask,
Table,
View,
ViewType,
)
from agentkit.bitable.repository import BitableRepository
logger = logging.getLogger(__name__)
class FieldDependencyError(Exception):
"""Raised when deleting a field that has dependencies (formula refs, PK, views)."""
def __init__(self, message: str, dependencies: dict[str, Any]) -> None:
super().__init__(message)
self.dependencies = dependencies
class BitableService:
"""Bitable business logic service.
Usage::
service = BitableService(db)
table = await service.create_table(name="Orders")
"""
def __init__(self, db: BitableDB) -> None:
self._db = db
self._repo = BitableRepository(db)
self._recalc_worker: Any = None # RecalcWorker, set via set_recalc_worker
@property
def repo(self) -> BitableRepository:
return self._repo
def set_recalc_worker(self, worker: Any) -> None:
"""Register the long-lived RecalcWorker so field changes can invalidate its engine cache.
Called after both service and worker are constructed (breaks the
service<->worker construction cycle). When unset, field mutations
simply skip cache invalidation the next worker start rebuilds engines.
"""
self._recalc_worker = worker
def _invalidate_engine_cache(self, table_id: str) -> None:
"""Invalidate the worker's cached formula engine for a table (P1 #5)."""
if self._recalc_worker is not None:
self._recalc_worker.invalidate_engine(table_id)
# ── Tables ──────────────────────────────────────────────
async def create_table(
self,
name: str,
description: str = "",
primary_key_field_id: str | None = None,
owner_user_id: str | None = None,
) -> Table:
"""Create a new bitable table. Creates PK unique index if PK is set."""
table = await self._repo.create_table(
name=name,
description=description,
primary_key_field_id=primary_key_field_id,
owner_user_id=owner_user_id,
)
if primary_key_field_id:
await self._repo.create_pk_unique_index(table.id, primary_key_field_id)
return table
async def get_table(self, table_id: str) -> Table | None:
return await self._repo.get_table(table_id)
async def list_tables(self, owner_user_id: str | None = None) -> list[Table]:
return await self._repo.list_tables(owner_user_id=owner_user_id)
async def update_table(self, table_id: str, **kwargs: Any) -> Table | None:
"""Update table attrs. Creates PK unique index if primary_key_field_id is set."""
table = await self._repo.update_table(table_id, **kwargs)
if table and kwargs.get("primary_key_field_id"):
await self._repo.create_pk_unique_index(table_id, kwargs["primary_key_field_id"])
return table
async def delete_table(self, table_id: str) -> bool:
return await self._repo.delete_table(table_id)
# ── Fields ──────────────────────────────────────────────
async def create_field(
self,
table_id: str,
name: str,
field_type: FieldType,
config: dict[str, Any] | None = None,
owner: FieldOwner = FieldOwner.user,
) -> Field:
"""Create a new field. U2 will add formula validation and DAG updates."""
field = await self._repo.create_field(
table_id=table_id,
name=name,
field_type=field_type,
config=config or {},
owner=owner,
)
# New formula field changes the table's DAG — invalidate cached engine (P1 #5).
if field_type == FieldType.formula:
self._invalidate_engine_cache(table_id)
return field
async def get_field(self, field_id: str) -> Field | None:
return await self._repo.get_field(field_id)
async def list_fields(self, table_id: str) -> list[Field]:
return await self._repo.list_fields(table_id)
async def update_field(self, field_id: str, **kwargs: Any) -> Field | None:
"""Update a field. U2 will add dependency checking."""
field = await self._repo.update_field(field_id, **kwargs)
if field is not None:
# Any field update (name, config, formula_expr) can affect the DAG (P1 #5).
self._invalidate_engine_cache(field.table_id)
return field
async def delete_field(self, field_id: str, force: bool = False) -> bool:
"""Delete a field with dependency checking.
Returns True if deleted. Raises ValueError with dependency info if
blocked (use force=True to override with cascade cleanup).
"""
field = await self._repo.get_field(field_id)
if not field:
return False
# Check dependencies
deps: dict[str, Any] = {}
# 1. Is it a primary key field?
table = await self._repo.get_table(field.table_id)
if table and table.primary_key_field_id == field_id:
deps["is_primary_key"] = True
# 2. Formula fields referencing this field?
formula_deps = await self._repo.find_formula_fields_referencing(field.table_id, field_id)
if formula_deps:
deps["formula_fields"] = [{"id": f.id, "name": f.name} for f in formula_deps]
# 3. Views referencing this field?
view_deps = await self._repo.find_views_referencing_field(field.table_id, field_id)
if view_deps:
deps["views"] = [{"id": v.id, "name": v.name} for v in view_deps]
if deps and not force:
raise FieldDependencyError(
f"Cannot delete field '{field.name}': has dependencies", deps
)
# Force delete: cascade cleanup
if deps and force:
# Mark formula fields as error
for f in formula_deps:
await self._repo.update_field(
f.id,
config={**f.config, "error": "referenced field deleted"},
)
# Remove field from all records' JSONB
await self._repo.remove_field_from_records(field.table_id, field_id)
deleted = await self._repo.delete_field(field_id)
if deleted:
# Removing a field changes the table's DAG — invalidate cached engine (P1 #5).
self._invalidate_engine_cache(field.table_id)
return deleted
# ── Records ─────────────────────────────────────────────
async def create_record(self, table_id: str, values: dict[str, Any] | None = None) -> Record:
"""Create a new record. Triggers recalc for affected formula fields."""
record = await self._repo.create_record(table_id, values)
await self._trigger_recalc_for_affected_fields(table_id, record.id)
return record
async def create_records_batch(
self, table_id: str, records_values: list[dict[str, Any]]
) -> list[Record]:
"""Batch-create records (P2 #19). Triggers recalc for each record.
Processes in chunks of 500 to keep memory bounded.
"""
batch_size = 500
all_records: list[Record] = []
for i in range(0, len(records_values), batch_size):
chunk = records_values[i : i + batch_size]
records = await self._repo.create_records_batch(table_id, chunk)
for rec in records:
await self._trigger_recalc_for_affected_fields(table_id, rec.id)
all_records.extend(records)
return all_records
async def get_record(self, record_id: str) -> Record | None:
return await self._repo.get_record(record_id)
async def list_records(
self, table_id: str, cursor: str | None = None, limit: int = 50
) -> tuple[list[Record], str | None]:
"""List records with cursor-based pagination (KTD9)."""
return await self._repo.list_records(table_id, cursor=cursor, limit=limit)
async def list_records_filtered(
self,
table_id: str,
filters: list[dict[str, Any]] | None = None,
sorts: list[dict[str, Any]] | None = None,
cursor: str | None = None,
limit: int = 50,
) -> tuple[list[Record], str | None]:
"""List records with view filters/sorts + cursor pagination.
Validates every ``field_id`` in filters/sorts against the table's
actual fields before passing to the repository field IDs are
interpolated into SQL path literals and must be known system UUIDs.
"""
if filters or sorts:
fields = await self._repo.list_fields(table_id)
valid_field_ids = {f.id for f in fields}
for f in filters or []:
fid = f.get("field_id", "")
if fid not in valid_field_ids:
raise ValueError(f"Unknown field_id in filter: {fid}")
for s in sorts or []:
fid = s.get("field_id", "")
if fid not in valid_field_ids:
raise ValueError(f"Unknown field_id in sort: {fid}")
return await self._repo.list_records_filtered(
table_id, filters=filters, sorts=sorts, cursor=cursor, limit=limit
)
async def update_record_values(self, record_id: str, values: dict[str, Any]) -> Record | None:
"""Update a record's values (full replace). Triggers recalc for affected formulas."""
record = await self._repo.update_record_values(record_id, values)
if record is not None:
await self._trigger_recalc_for_affected_fields(record.table_id, record.id)
return record
async def delete_record(self, record_id: str) -> bool:
"""Delete a record and clean up any attachment/image files."""
record = await self._repo.get_record(record_id)
if record is None:
return False
await self._cleanup_attachment_files(record.table_id, [record])
return await self._repo.delete_record(record_id)
async def delete_records_by_table(self, table_id: str) -> int:
"""Delete all records in a table and clean up attachment/image files.
P2 #18: only fetches records when the table has attachment/image fields,
and processes them in batches to avoid OOM on large tables.
"""
# Check if cleanup is needed at all before loading any records.
fields = await self._repo.list_fields(table_id)
has_attachments = any(
f.field_type in (FieldType.attachment, FieldType.image) for f in fields
)
if has_attachments:
# Batch-fetch records to clean up physical files (P2 #18: avoid OOM).
batch_size = 500
cursor: str | None = None
while True:
batch, next_cursor = await self._repo.list_records(
table_id, cursor=cursor, limit=batch_size
)
if batch:
await self._cleanup_attachment_files(table_id, batch)
if next_cursor is None:
break
cursor = next_cursor
return await self._repo.delete_records_by_table(table_id)
async def _cleanup_attachment_files(
self,
table_id: str,
records: list[Record],
) -> None:
"""Delete physical files for attachment/image fields on the given records.
Best-effort: file deletion failures are logged but do not block
record deletion (the file may already be gone).
"""
fields = await self._repo.list_fields(table_id)
attachment_field_ids = {
f.id for f in fields if f.field_type in (FieldType.attachment, FieldType.image)
}
if not attachment_field_ids:
return
upload_dir = Path(os.environ.get("AGENTKIT_BITABLE_UPLOAD_DIR", "data/uploads/bitable"))
for record in records:
for field_id in attachment_field_ids:
value = record.values.get(field_id)
if not value or not isinstance(value, list):
continue
for file_meta in value:
if not isinstance(file_meta, dict):
continue
stored_name = file_meta.get("stored_name")
if not stored_name:
continue
file_path = upload_dir / str(stored_name)
try:
if file_path.exists():
file_path.unlink()
except OSError as exc:
# File lost or permission issue — don't block record deletion
logger.warning("Failed to delete attachment file %s: %s", file_path, exc)
# ── Upsert (KTD8: jsonb_set per agent field) ───────────
async def upsert_records(
self,
table_id: str,
records: list[dict[str, Any]],
primary_key_field_id: str,
) -> dict[str, Any]:
"""Upsert records by primary key using jsonb_set (KTD8).
For each record:
- If a record with the same PK value exists: update agent-owned fields
via jsonb_set (user-owned fields are never touched).
- If not: insert a new record.
Returns {"inserted": N, "updated": N, "skipped": N}.
P1 #14: batch-queries existing records by PK in a single SELECT,
eliminating the N+1 query pattern.
"""
if not primary_key_field_id:
raise ValueError("primary_key_field_id is required for upsert")
# Get agent-owned field IDs for this table
fields = await self._repo.list_fields(table_id)
agent_field_ids = {f.id for f in fields if f.owner == FieldOwner.agent}
# Partition records into insert vs update lists, collecting PK values.
to_insert: list[dict[str, Any]] = []
to_update: list[tuple[dict[str, Any], str]] = [] # (values, existing_record_id)
skipped = 0
# Collect all non-None PK values for batch lookup.
pk_values_by_str: dict[str, dict[str, Any]] = {}
for rec_values in records:
pk_value = rec_values.get(primary_key_field_id)
if pk_value is None:
skipped += 1
continue
pk_values_by_str[str(pk_value)] = rec_values
# Batch SELECT all existing records by PK (P1 #14: 1 query, not N).
existing_map = await self._repo.find_records_by_pk_batch(
table_id, primary_key_field_id, list(pk_values_by_str.keys())
)
for pk_str, rec_values in pk_values_by_str.items():
existing = existing_map.get(pk_str)
if existing is None:
to_insert.append(rec_values)
else:
to_update.append((rec_values, existing.id))
# Batch insert new records.
inserted = 0
for rec_values in to_insert:
record = await self._repo.create_record(table_id, values=rec_values)
await self._trigger_recalc_for_affected_fields(table_id, record.id)
inserted += 1
# Update existing records (agent-owned fields only via jsonb_set).
updated = 0
for rec_values, existing_id in to_update:
agent_values = {fid: val for fid, val in rec_values.items() if fid in agent_field_ids}
if agent_values:
await self._repo.upsert_record_agent_fields(existing_id, agent_values)
await self._trigger_recalc_for_affected_fields(table_id, existing_id)
updated += 1
return {"inserted": inserted, "updated": updated, "skipped": skipped}
# ── Views ───────────────────────────────────────────────
async def create_view(
self,
table_id: str,
name: str,
view_type: ViewType = ViewType.grid,
config: dict[str, Any] | None = None,
) -> View:
return await self._repo.create_view(
table_id=table_id,
name=name,
view_type=view_type,
config=config,
)
async def list_views(self, table_id: str) -> list[View]:
return await self._repo.list_views(table_id)
async def update_view(self, view_id: str, **kwargs: Any) -> View | None:
return await self._repo.update_view(view_id, **kwargs)
# ── Recalc (U3: formula recalc pipeline) ────────────────
async def _trigger_recalc_for_affected_fields(self, table_id: str, record_id: str) -> None:
"""Detect formula fields affected by a record write and enqueue recalc.
Finds all formula fields in the table, checks which ones depend on
the fields that were just written, and enqueues recalc tasks.
For simplicity (v1), we enqueue recalc for ALL formula fields in the
table for this record the worker will evaluate them in topological
order. The ON CONFLICT DO NOTHING constraint deduplicates.
"""
fields = await self._repo.list_fields(table_id)
formula_fields = [f for f in fields if f.field_type == FieldType.formula]
for f in formula_fields:
await self._repo.enqueue_recalc(table_id, record_id, f.id)
async def trigger_recalc(
self, table_id: str, record_id: str, field_id: str
) -> RecalcTask | None:
"""Enqueue a formula recalc task for a specific field."""
return await self._repo.enqueue_recalc(table_id, record_id, field_id)
async def get_pending_recalc_tasks(self, limit: int = 50) -> list[RecalcTask]:
return await self._repo.get_pending_recalc_tasks(limit=limit)
async def reset_stale_recalc_tasks(self, stale_threshold: float = 600.0) -> int:
"""Reset 'calculating' tasks older than threshold back to 'pending' (crash recovery)."""
return await self._repo.reset_stale_recalc_tasks(stale_threshold=stale_threshold)
async def process_recalc_task(self, task: RecalcTask) -> None:
"""Process a single recalc task (used by the worker or synchronously).
Marks the task as calculating, evaluates the formula, writes the result,
and marks as done/error.
"""
from agentkit.bitable.recalc_worker import RecalcWorker
# ponytail: We reuse RecalcWorker._process_task via a temporary worker
# instance. This is a known ceiling — the worker caches engines per table,
# and creating a new worker each call loses the cache. For synchronous
# processing in tests this is fine. For production, use the long-lived
# worker started in app.py lifespan.
worker = RecalcWorker(self._db, self)
await worker.process_task(task)
worker.invalidate_engine(task.table_id)

251
src/agentkit/cli/bitable.py Normal file
View File

@ -0,0 +1,251 @@
"""Bitable CLI subcommands.
Usage::
agentkit bitable list-tables
agentkit bitable create-table --name "Orders"
agentkit bitable import-excel --file data.xlsx --table "Import"
agentkit bitable query --table <id> --limit 20
CLI calls BitableService directly (KTD5 exception: CLI is an ops tool,
not a runtime call path).
"""
from __future__ import annotations
import asyncio
import os
from typing import Optional
import typer
from rich import print as rprint
from rich.table import Table
bitable_app = typer.Typer(
name="bitable",
help="Bitable (multi-dimensional table) management commands",
no_args_is_help=True,
)
def _check_db_url() -> str:
"""Return DATABASE_URL or exit with a clear error."""
db_url = os.environ.get("DATABASE_URL") or os.environ.get("AGENTKIT_DATABASE_URL")
if not db_url:
rprint("[red]Error: DATABASE_URL environment variable is not set.[/red]")
rprint("[dim]Set it to your PostgreSQL connection string, e.g.:[/dim]")
rprint("[dim] export DATABASE_URL=postgresql+asyncpg://user@localhost/db[/dim]")
raise typer.Exit(code=1)
return db_url
async def _run_with_service(coro_factory):
"""Initialize DB, run coro_factory(service), then close — all in one event loop."""
from agentkit.bitable.db import BitableDB
from agentkit.bitable.service import BitableService
db = BitableDB()
try:
await db.init()
service = BitableService(db)
return await coro_factory(service)
finally:
await db.close()
# ---------------------------------------------------------------------------
# list-tables
# ---------------------------------------------------------------------------
@bitable_app.command("list-tables")
def list_tables(
owner: Optional[str] = typer.Option(None, "--owner", help="Filter by owner user ID"),
):
"""List all bitable tables."""
_check_db_url()
async def _run(service):
tables = await service.list_tables(owner_user_id=owner)
if not tables:
rprint("[dim]No tables found.[/dim]")
return
tbl = Table(title="Bitable Tables")
tbl.add_column("ID", style="cyan", no_wrap=True)
tbl.add_column("Name", style="white")
tbl.add_column("Description", style="dim")
tbl.add_column("Created", style="green")
for t in tables:
tbl.add_row(
t.id,
t.name,
t.description or "",
t.created_at.strftime("%Y-%m-%d %H:%M") if t.created_at else "",
)
rprint(tbl)
asyncio.run(_run_with_service(_run))
# ---------------------------------------------------------------------------
# create-table
# ---------------------------------------------------------------------------
@bitable_app.command("create-table")
def create_table(
name: str = typer.Option(..., "--name", "-n", help="Table name"),
description: str = typer.Option("", "--description", "-d", help="Table description"),
):
"""Create a new bitable table."""
_check_db_url()
async def _run(service):
table = await service.create_table(name=name, description=description)
rprint(f"[green]Created table:[/green] {table.name}")
rprint(f" ID: [cyan]{table.id}[/cyan]")
if description:
rprint(f" Description: {description}")
asyncio.run(_run_with_service(_run))
# ---------------------------------------------------------------------------
# import-excel
# ---------------------------------------------------------------------------
@bitable_app.command("import-excel")
def import_excel(
file: str = typer.Option(..., "--file", "-f", help="Path to .xlsx file"),
table_name: Optional[str] = typer.Option(
None, "--table", "-t", help="Target table name (creates if not exists)"
),
sheet: Optional[str] = typer.Option(None, "--sheet", "-s", help="Sheet name (default: first)"),
):
"""Import an Excel file into a bitable table.
Creates a new table with inferred field types and inserts all rows.
"""
_check_db_url()
from agentkit.bitable.ingestion import parse_excel
from agentkit.bitable.models import FieldType
try:
sheets = parse_excel(file)
except FileNotFoundError:
rprint(f"[red]Error: File not found: {file}[/red]")
raise typer.Exit(code=1)
except Exception as exc:
rprint(f"[red]Error reading Excel file: {exc}[/red]")
raise typer.Exit(code=1)
if not sheets:
rprint("[red]Error: No sheets found in Excel file.[/red]")
raise typer.Exit(code=1)
# Select sheet
target = sheets[0]
if sheet:
target = next((s for s in sheets if s.name == sheet), None)
if target is None:
rprint(
f"[red]Error: Sheet '{sheet}' not found. "
f"Available: {[s.name for s in sheets]}[/red]"
)
raise typer.Exit(code=1)
tbl_name = table_name or target.name
async def _run(service):
table = await service.create_table(name=tbl_name, description=f"Imported from {file}")
rprint(f"[green]Created table:[/green] {table.name} (ID: {table.id})")
# Create fields
field_ids: list[str] = []
for col_name, col_type in zip(target.columns, target.field_types):
try:
ft = FieldType(col_type)
except ValueError:
ft = FieldType.text
field = await service.create_field(
table_id=table.id,
name=col_name,
field_type=ft,
config={},
)
field_ids.append(field.id)
rprint(f" Field: [cyan]{col_name}[/cyan] ({ft.value})")
# Insert records (P2 #19: batch insert instead of per-record)
if target.records:
values_list: list[dict] = []
for rec_values in target.records:
values_dict = {}
for idx, fid in enumerate(field_ids):
col_name = target.columns[idx]
if col_name in rec_values:
values_dict[fid] = rec_values[col_name]
values_list.append(values_dict)
created = await service.create_records_batch(table.id, values_list)
rprint(f"[green]Imported {len(created)} records.[/green]")
else:
rprint("[yellow]No records found in sheet.[/yellow]")
asyncio.run(_run_with_service(_run))
# ---------------------------------------------------------------------------
# query
# ---------------------------------------------------------------------------
@bitable_app.command("query")
def query(
table_id: str = typer.Option(..., "--table", "-t", help="Table ID"),
limit: int = typer.Option(20, "--limit", "-l", help="Max records to show"),
cursor: Optional[str] = typer.Option(None, "--cursor", help="Pagination cursor"),
):
"""Query records from a bitable table."""
_check_db_url()
async def _run(service):
table = await service.get_table(table_id)
if table is None:
rprint(f"[red]Error: Table '{table_id}' not found.[/red]")
raise typer.Exit(code=1)
fields = await service.list_fields(table_id)
records, next_cursor = await service.list_records_filtered(
table_id, cursor=cursor, limit=limit
)
if not records:
rprint("[dim]No records found.[/dim]")
return
tbl = Table(title=f"{table.name} ({len(records)} records)")
tbl.add_column("ID", style="cyan", no_wrap=True, width=12)
for f in fields:
tbl.add_column(f.name, style="white")
for rec in records:
row = [rec.id[:8] + "..."]
for f in fields:
val = rec.values.get(f.id)
if val is None:
row.append("[dim]—[/dim]")
elif isinstance(val, list):
row.append(f"[{len(val)} items]")
elif isinstance(val, dict):
row.append("{...}")
else:
row.append(str(val)[:50])
tbl.add_row(*row)
rprint(tbl)
if next_cursor:
rprint(f"[dim]Next cursor: {next_cursor}[/dim]")
asyncio.run(_run_with_service(_run))

View File

@ -23,6 +23,10 @@ from agentkit.cli.admin import admin_app # noqa: E402
app.add_typer(admin_app, name="admin") app.add_typer(admin_app, name="admin")
from agentkit.cli.bitable import bitable_app # noqa: E402
app.add_typer(bitable_app, name="bitable")
from agentkit.cli.init import init # noqa: E402 from agentkit.cli.init import init # noqa: E402
app.command(name="init")(init) app.command(name="init")(init)

View File

@ -21,6 +21,7 @@ Usage::
from __future__ import annotations from __future__ import annotations
import json
import logging import logging
from dataclasses import dataclass, field from dataclasses import dataclass, field
from typing import Any, Awaitable, Callable, Protocol, runtime_checkable from typing import Any, Awaitable, Callable, Protocol, runtime_checkable
@ -157,8 +158,9 @@ class TokenUsageMiddleware:
return ctx return ctx
async def after(self, ctx: RequestContext, result: Any) -> Any: async def after(self, ctx: RequestContext, result: Any) -> Any:
# 从 ReActResult 或类似结构提取 token usage # 从 ReActResult 或类似结构提取 token 用量
usage = getattr(result, "token_usage", None) # ReActResult 有 total_tokens 属性(非 token_usage
usage = getattr(result, "total_tokens", None)
if usage is not None: if usage is not None:
ctx.metadata["token_usage_total"] = usage ctx.metadata["token_usage_total"] = usage
return result return result
@ -188,12 +190,20 @@ class LoopDetectionMiddleware:
return result return result
# 检查最终 trajectory 中的重复工具调用模式(只取尾部窗口) # 检查最终 trajectory 中的重复工具调用模式(只取尾部窗口)
# trajectory 存储 ReActStep dataclass 对象,需同时兼容 dict
recent = trajectory[-self._window_size :] if trajectory else [] recent = trajectory[-self._window_size :] if trajectory else []
tool_calls = [ tool_calls: list[tuple[str, str]] = []
(step.get("tool_name", ""), step.get("arguments_hash", "")) for step in recent:
for step in recent # 兼容 dataclassReActStep和 dict 两种格式
if isinstance(step, dict) and "tool_name" in step if isinstance(step, dict):
] name = step.get("tool_name", "")
args = step.get("arguments", {})
else:
name = getattr(step, "tool_name", "") or ""
args = getattr(step, "arguments", {}) or {}
if name:
args_str = json.dumps(args, sort_keys=True, default=str) if args else ""
tool_calls.append((name, args_str))
if not tool_calls: if not tool_calls:
return result return result

View File

@ -259,6 +259,8 @@ class ReActEngine:
cancellation_token: 协作式取消令牌每次循环迭代检查是否已取消 cancellation_token: 协作式取消令牌每次循环迭代检查是否已取消
timeout_seconds: 超时秒数0 表示无超时None 使用 default_timeout timeout_seconds: 超时秒数0 表示无超时None 使用 default_timeout
""" """
# P2 #9: Reset loop detection state so reuse across conversations is clean
self.reset()
effective_compressor = compressor if compressor is not None else self._compressor effective_compressor = compressor if compressor is not None else self._compressor
effective_timeout = ( effective_timeout = (
timeout_seconds if timeout_seconds is not None else self._default_timeout timeout_seconds if timeout_seconds is not None else self._default_timeout
@ -965,6 +967,8 @@ class ReActEngine:
Args: Args:
compressor: 压缩策略None 时使用实例默认压缩器 compressor: 压缩策略None 时使用实例默认压缩器
""" """
# P2 #9: Reset loop detection state so reuse across conversations is clean
self.reset()
effective_compressor = compressor if compressor is not None else self._compressor effective_compressor = compressor if compressor is not None else self._compressor
tools = tools or [] tools = tools or []
if tools: if tools:

View File

@ -306,6 +306,13 @@ class TeamOrchestrator:
{"team_id": self._team.team_id}, {"team_id": self._team.team_id},
) )
# P2 #13: Clean up checkpoints after successful completion
if self._checkpoint is not None:
try:
await self._checkpoint.clear(plan.id)
except Exception as e:
logger.warning(f"Checkpoint clear failed: {e}")
return { return {
"status": "completed", "status": "completed",
"result": final_result, "result": final_result,
@ -363,6 +370,7 @@ class TeamOrchestrator:
checkpoints = await self._checkpoint.list_checkpoints(plan_id) checkpoints = await self._checkpoint.list_checkpoints(plan_id)
phase_results: dict[str, dict[str, Any]] = {} phase_results: dict[str, dict[str, Any]] = {}
completed_phase_ids: set[str] = set() completed_phase_ids: set[str] = set()
failed_phase_ids: set[str] = set()
for cp in checkpoints: for cp in checkpoints:
if cp.phase_status == "completed": if cp.phase_status == "completed":
@ -370,6 +378,9 @@ class TeamOrchestrator:
# Restore phase result from checkpoint # Restore phase result from checkpoint
if cp.phase_result: if cp.phase_result:
phase_results[cp.phase_id] = cp.phase_result phase_results[cp.phase_id] = cp.phase_result
elif cp.phase_status == "failed":
# P2 #11: Restore FAILED status so they aren't re-executed
failed_phase_ids.add(cp.phase_id)
# Apply checkpoint state to plan phases # Apply checkpoint state to plan phases
for ph in plan.phases: for ph in plan.phases:
@ -377,11 +388,19 @@ class TeamOrchestrator:
ph.status = PhaseStatus.COMPLETED ph.status = PhaseStatus.COMPLETED
if ph.id in phase_results and phase_results[ph.id]: if ph.id in phase_results and phase_results[ph.id]:
ph.result = phase_results[ph.id] ph.result = phase_results[ph.id]
elif ph.id in failed_phase_ids:
ph.status = PhaseStatus.FAILED
# PENDING phases remain PENDING — will be executed by _run_pipeline # PENDING phases remain PENDING — will be executed by _run_pipeline
# P2 #8: Restore debate count so MAX_DEBATES limit holds after resume
self._debate_count = sum(
1 for ph in plan.phases if ph.phase_type == PhaseType.DEBATE
)
logger.info( logger.info(
f"Resuming plan {plan_id}: {len(completed_phase_ids)} completed, " f"Resuming plan {plan_id}: {len(completed_phase_ids)} completed, "
f"{len(plan.phases) - len(completed_phase_ids)} pending" f"{len(failed_phase_ids)} failed, "
f"{len(plan.phases) - len(completed_phase_ids) - len(failed_phase_ids)} pending"
) )
# 4. Get lead expert # 4. Get lead expert
@ -558,6 +577,9 @@ class TeamOrchestrator:
def _offload_result(self, content: str, ref_key: str) -> dict[str, Any]: def _offload_result(self, content: str, ref_key: str) -> dict[str, Any]:
"""Create an offloaded result: summary in memory, full content in workspace.""" """Create an offloaded result: summary in memory, full content in workspace."""
# P2 #14: Guard against non-string content (dict, None, etc.)
if not isinstance(content, str):
content = str(content) if content is not None else ""
summary = ( summary = (
content[: self._OFFLOAD_SUMMARY_LIMIT] + "..." content[: self._OFFLOAD_SUMMARY_LIMIT] + "..."
if len(content) > self._OFFLOAD_SUMMARY_LIMIT if len(content) > self._OFFLOAD_SUMMARY_LIMIT
@ -1714,6 +1736,12 @@ class TeamOrchestrator:
"debate_inserted": debate.id, "debate_inserted": debate.id,
}, },
) )
# P1 #7: Persist dynamically inserted DEBATE phase so resume sees it
if self._checkpoint is not None:
try:
await self._checkpoint.save_plan(plan)
except Exception as e:
logger.warning(f"Checkpoint save_plan (debate insert) failed: {e}")
# ── U3 end ───────────────────────────────────────────────────────── # ── U3 end ─────────────────────────────────────────────────────────
@ -1816,13 +1844,18 @@ class TeamOrchestrator:
} }
# Build result summaries for LLM evaluation # Build result summaries for LLM evaluation
# P1 #5: 解析 offloaded 内容 — 从 SharedWorkspace 读取完整内容,而非使用截断摘要
summaries = [] summaries = []
for i, ph in enumerate(completed_phases): for i, ph in enumerate(completed_phases):
r = ph.result or {} r = ph.result or {}
content = r.get("content", str(r)) if isinstance(r, dict) else str(r) # U4: 如果结果被 offloaded从 workspace 读取完整内容
if isinstance(r, dict) and r.get("_offloaded"):
content = await self._read_dependency_output(ph)
else:
content = r.get("content", str(r)) if isinstance(r, dict) else str(r)
summaries.append( summaries.append(
f"Phase {i + 1}: {ph.name} (by {ph.assigned_expert}, task: {ph.task_description[:100]}):\n" f"Phase {i + 1}: {ph.name} (by {ph.assigned_expert}, task: {ph.task_description[:100]}):\n"
f"{content[:500]}" f"{content}"
) )
prompt = ( prompt = (

View File

@ -74,8 +74,9 @@ class PipelineCheckpoint:
self._redis = redis_client self._redis = redis_client
self._prefix = prefix self._prefix = prefix
self._ttl = ttl_seconds self._ttl = ttl_seconds
# 内存降级存储plan_id → list of CheckpointData # 内存降级存储plan_id → {phase_id → CheckpointData}
self._memory: dict[str, list[CheckpointData]] = {} # P1 #6: 改用 dict keyed by phase_id避免重复 append
self._memory: dict[str, dict[str, CheckpointData]] = {}
# 内存降级存储plan_id → (plan_dict, saved_timestamp) # 内存降级存储plan_id → (plan_dict, saved_timestamp)
self._memory_plans: dict[str, tuple[dict[str, Any], float]] = {} self._memory_plans: dict[str, tuple[dict[str, Any], float]] = {}
@ -169,8 +170,8 @@ class PipelineCheckpoint:
plan_status=plan_status, plan_status=plan_status,
) )
# 总是写入内存降级(保证一致性) # P1 #6: 内存降级用 dict keyed by phase_id覆盖重复 checkpoint
self._memory.setdefault(plan_id, []).append(data) self._memory.setdefault(plan_id, {})[phase_id] = data
# 尝试写入 Redis # 尝试写入 Redis
if self._redis is not None: if self._redis is not None:
@ -211,7 +212,8 @@ class PipelineCheckpoint:
if not phase_ids: if not phase_ids:
# Redis 无数据,检查内存(过滤过期) # Redis 无数据,检查内存(过滤过期)
return [ return [
c for c in self._memory.get(plan_id, []) if not self._is_expired(c.saved_at) c for c in self._memory.get(plan_id, {}).values()
if not self._is_expired(c.saved_at)
] ]
# 批量 GETpipeline 避免 N+1 往返) # 批量 GETpipeline 避免 N+1 往返)
@ -233,7 +235,10 @@ class PipelineCheckpoint:
) )
# 内存降级(过滤过期 checkpoint # 内存降级(过滤过期 checkpoint
return [c for c in self._memory.get(plan_id, []) if not self._is_expired(c.saved_at)] return [
c for c in self._memory.get(plan_id, {}).values()
if not self._is_expired(c.saved_at)
]
async def clear(self, plan_id: str) -> None: async def clear(self, plan_id: str) -> None:
"""清除某 plan 的所有 checkpoint。""" """清除某 plan 的所有 checkpoint。"""

View File

@ -51,6 +51,7 @@ from agentkit.server.routes import (
documents, documents,
admin as admin_routes_module, admin as admin_routes_module,
calendar as calendar_routes, calendar as calendar_routes,
bitable as bitable_routes,
) )
from agentkit.server.auth.jwt_utils import get_jwt_secret from agentkit.server.auth.jwt_utils import get_jwt_secret
from agentkit.server.auth.middleware import AuthMiddleware from agentkit.server.auth.middleware import AuthMiddleware
@ -427,9 +428,28 @@ async def lifespan(app: FastAPI):
except Exception: except Exception:
logger.exception("Failed to initialize calendar subsystem — calendar API unavailable") logger.exception("Failed to initialize calendar subsystem — calendar API unavailable")
# Bitable subsystem: init DB, service, internal token (KTD11).
try:
from agentkit.bitable.db import init_bitable_db
from agentkit.bitable.service import BitableService
bitable_db = await init_bitable_db()
app.state.bitable_service = BitableService(db=bitable_db)
app.state.bitable_internal_token = os.environ.get("AGENTKIT_BITABLE_INTERNAL_TOKEN")
logger.info("Bitable subsystem initialized")
except Exception:
logger.exception("Failed to initialize bitable subsystem")
yield yield
# Shutdown # Shutdown
# Close bitable DB
try:
from agentkit.bitable.db import close_bitable_db
await close_bitable_db()
except Exception:
pass
# Stop MCP servers # Stop MCP servers
if mcp_manager is not None: if mcp_manager is not None:
await mcp_manager.stop_all() await mcp_manager.stop_all()
@ -983,6 +1003,7 @@ def create_app(
app.include_router(admin_routes_module.admin_router, prefix="/api/v1") app.include_router(admin_routes_module.admin_router, prefix="/api/v1")
app.include_router(documents.router, prefix="/api/v1") app.include_router(documents.router, prefix="/api/v1")
app.include_router(calendar_routes.router, prefix="/api/v1") app.include_router(calendar_routes.router, prefix="/api/v1")
app.include_router(bitable_routes.router, prefix="/api/v1")
# Serve GUI when in GUI mode # Serve GUI when in GUI mode
gui_mode = os.environ.get("AGENTKIT_GUI_MODE") gui_mode = os.environ.get("AGENTKIT_GUI_MODE")

View File

@ -0,0 +1,327 @@
/** Bitable API client — thin wrapper over /api/v1/bitable endpoints. */
import { BaseApiClient } from './base'
// ── Domain types (co-located with API client) ──────────────────────────
export type FieldType =
| 'text'
| 'number'
| 'date'
| 'select'
| 'multiselect'
| 'attachment'
| 'image'
| 'formula'
| 'lookup'
export type FieldOwner = 'agent' | 'user'
export type ViewType = 'grid' | 'kanban' | 'gantt' | 'gallery' | 'form'
export type RecalcStatus = 'pending' | 'calculating' | 'done' | 'error'
export interface IBitableTable {
id: string
name: string
description: string
primary_key_field_id: string | null
owner_user_id: string | null
created_at: string
updated_at: string
}
export interface IBitableField {
id: string
table_id: string
name: string
field_type: FieldType
config: Record<string, unknown>
owner: FieldOwner
created_at: string
}
export interface IBitableRecord {
id: string
table_id: string
values: Record<string, unknown>
created_at: string
updated_at: string
}
export interface IBitableView {
id: string
table_id: string
name: string
view_type: ViewType
config: Record<string, unknown>
created_at: string
}
/** File metadata stored in attachment/image field values (JSONB array). */
export interface IAttachmentMeta {
filename: string
stored_name: string
mime_type: string
size: number
url: string
}
// ── Request types ──────────────────────────────────────────────────────
export interface ICreateTableRequest {
name: string
description?: string
primary_key_field_id?: string | null
}
export interface ICreateFieldRequest {
name: string
field_type: FieldType
config?: Record<string, unknown>
owner?: FieldOwner
}
export interface ICreateRecordRequest {
values: Record<string, unknown>
}
export interface IUpdateRecordRequest {
values: Record<string, unknown>
}
export interface IUpsertRequest {
records: Record<string, unknown>[]
primary_key_field_id: string
}
export interface ICreateViewRequest {
name: string
view_type?: ViewType
config?: Record<string, unknown>
}
// ── Response types ─────────────────────────────────────────────────────
export interface IListRecordsResponse {
success: boolean
records: IBitableRecord[]
next_cursor: string | null
}
// ── Runtime type guard ─────────────────────────────────────────────────
export function isBitableTable(value: unknown): value is IBitableTable {
if (typeof value !== 'object' || value === null) return false
const v = value as Record<string, unknown>
return typeof v.id === 'string' && typeof v.name === 'string'
}
// ── API client ─────────────────────────────────────────────────────────
const API_BASE = '/api/v1/bitable'
class BitableApiClient extends BaseApiClient {
constructor(baseUrl: string = API_BASE) {
super(baseUrl)
}
// ── Tables ───────────────────────────────────────────
async listTables(): Promise<{ success: boolean; tables: IBitableTable[] }> {
return this.request('/tables', { method: 'GET' })
}
async createTable(
data: ICreateTableRequest,
): Promise<{ success: boolean; table: IBitableTable }> {
return this.request('/tables', {
method: 'POST',
body: JSON.stringify(data),
})
}
async getTable(
tableId: string,
): Promise<{ success: boolean; table: IBitableTable }> {
return this.request(`/tables/${tableId}`, { method: 'GET' })
}
async updateTable(
tableId: string,
data: Partial<ICreateTableRequest>,
): Promise<{ success: boolean; table: IBitableTable }> {
return this.request(`/tables/${tableId}`, {
method: 'PATCH',
body: JSON.stringify(data),
})
}
async deleteTable(tableId: string): Promise<{ success: boolean }> {
return this.request(`/tables/${tableId}`, { method: 'DELETE' })
}
// ── Fields ───────────────────────────────────────────
async listFields(
tableId: string,
): Promise<{ success: boolean; fields: IBitableField[] }> {
return this.request(`/tables/${tableId}/fields`, { method: 'GET' })
}
async createField(
tableId: string,
data: ICreateFieldRequest,
): Promise<{ success: boolean; field: IBitableField }> {
return this.request(`/tables/${tableId}/fields`, {
method: 'POST',
body: JSON.stringify(data),
})
}
async updateField(
fieldId: string,
data: Partial<ICreateFieldRequest>,
): Promise<{ success: boolean; field: IBitableField }> {
return this.request(`/fields/${fieldId}`, {
method: 'PATCH',
body: JSON.stringify(data),
})
}
async deleteField(
fieldId: string,
force = false,
): Promise<{ success: boolean }> {
const qs = force ? '?force=true' : ''
return this.request(`/fields/${fieldId}${qs}`, { method: 'DELETE' })
}
// ── Formula validation (U5b) ─────────────────────────
async validateFormula(
formula: string,
): Promise<{ valid: boolean; error?: string }> {
return this.request('/fields/validate-formula', {
method: 'POST',
body: JSON.stringify({ formula }),
})
}
// ── Records ──────────────────────────────────────────
async listRecords(
tableId: string,
params?: {
cursor?: string
limit?: number
filters?: string
sorts?: string
},
): Promise<IListRecordsResponse> {
const sp = new URLSearchParams()
if (params?.cursor) sp.set('cursor', params.cursor)
if (params?.limit) sp.set('limit', String(params.limit))
if (params?.filters) sp.set('filters', params.filters)
if (params?.sorts) sp.set('sorts', params.sorts)
const qs = sp.toString()
const path = qs ? `/tables/${tableId}/records?${qs}` : `/tables/${tableId}/records`
return this.request(path, { method: 'GET' })
}
async createRecords(
tableId: string,
records: Record<string, unknown>[],
): Promise<{ success: boolean; count: number; records: IBitableRecord[] }> {
// P2 #20: client-side batch limit matches backend max_length=500.
const BATCH_LIMIT = 500
if (records.length <= BATCH_LIMIT) {
return this.request(`/tables/${tableId}/records`, {
method: 'POST',
body: JSON.stringify({ records }),
})
}
// Chunk large batches to avoid 422 from backend validation.
const allRecords: IBitableRecord[] = []
for (let i = 0; i < records.length; i += BATCH_LIMIT) {
const chunk = records.slice(i, i + BATCH_LIMIT)
const resp = await this.request<{ success: boolean; count: number; records: IBitableRecord[] }>(
`/tables/${tableId}/records`,
{ method: 'POST', body: JSON.stringify({ records: chunk }) },
)
if (resp.records) allRecords.push(...resp.records)
}
return { success: true, count: allRecords.length, records: allRecords }
}
async updateRecord(
recordId: string,
values: Record<string, unknown>,
): Promise<{ success: boolean; record: IBitableRecord }> {
return this.request(`/records/${recordId}`, {
method: 'PATCH',
body: JSON.stringify({ values }),
})
}
async deleteRecord(recordId: string): Promise<{ success: boolean }> {
return this.request(`/records/${recordId}`, { method: 'DELETE' })
}
// ── Upsert (KTD8) ────────────────────────────────────
async upsertRecords(
tableId: string,
data: IUpsertRequest,
): Promise<{ success: boolean; upserted: number; inserted: number; updated: number }> {
return this.request(`/tables/${tableId}/upsert`, {
method: 'POST',
body: JSON.stringify(data),
})
}
// ── Views ────────────────────────────────────────────
async listViews(
tableId: string,
): Promise<{ success: boolean; views: IBitableView[] }> {
return this.request(`/tables/${tableId}/views`, { method: 'GET' })
}
async createView(
tableId: string,
data: ICreateViewRequest,
): Promise<{ success: boolean; view: IBitableView }> {
return this.request(`/tables/${tableId}/views`, {
method: 'POST',
body: JSON.stringify(data),
})
}
async updateView(
viewId: string,
data: Partial<ICreateViewRequest>,
): Promise<{ success: boolean; view: IBitableView }> {
return this.request(`/views/${viewId}`, {
method: 'PATCH',
body: JSON.stringify(data),
})
}
// ── File upload (U6: attachment & image) ──────────────
async uploadFile(
tableId: string,
fieldId: string,
file: File,
): Promise<IAttachmentMeta> {
const formData = new FormData()
formData.append('file', file)
return this.request(
`/tables/${tableId}/upload?field_id=${encodeURIComponent(fieldId)}`,
{ method: 'POST', body: formData },
)
}
}
export const bitableApi = new BitableApiClient()

View File

@ -0,0 +1,73 @@
<template>
<div class="attachment-cell">
<div
v-for="(file, idx) in files"
:key="idx"
class="attachment-cell__item"
>
<a :href="file.url" target="_blank" class="attachment-cell__link">
<PaperClipOutlined />
<span class="attachment-cell__name">{{ file.filename }}</span>
<span class="attachment-cell__size">{{ formatSize(file.size) }}</span>
</a>
</div>
<span v-if="!files || files.length === 0" class="attachment-cell__empty"></span>
</div>
</template>
<script setup lang="ts">
import { PaperClipOutlined } from '@ant-design/icons-vue'
import type { IAttachmentMeta } from '@/api/bitable'
defineProps<{
files: IAttachmentMeta[] | null | undefined
}>()
function formatSize(bytes: number): string {
if (bytes < 1024) return `${bytes} B`
if (bytes < 1024 * 1024) return `${(bytes / 1024).toFixed(1)} KB`
return `${(bytes / (1024 * 1024)).toFixed(1)} MB`
}
</script>
<style scoped>
.attachment-cell {
display: flex;
flex-direction: column;
gap: 2px;
padding: 2px 0;
}
.attachment-cell__item {
line-height: 1.4;
}
.attachment-cell__link {
display: inline-flex;
align-items: center;
gap: 4px;
color: var(--color-primary, #1677ff);
text-decoration: none;
font-size: 12px;
}
.attachment-cell__link:hover {
text-decoration: underline;
}
.attachment-cell__name {
max-width: 120px;
overflow: hidden;
text-overflow: ellipsis;
white-space: nowrap;
}
.attachment-cell__size {
color: var(--text-secondary, #8c8c8c);
font-size: 11px;
}
.attachment-cell__empty {
color: var(--text-placeholder, #bfbfbf);
}
</style>

View File

@ -0,0 +1,227 @@
<template>
<div class="bitable-grid-scope">
<vxe-grid
ref="gridRef"
:data="rows"
:columns="gridColumns"
:height="height"
:loading="loading"
:row-config="{ keyField: '_recordId' }"
:column-config="{ resizable: true }"
:virtual-y-config="{ enabled: true, gt: 60 }"
:virtual-x-config="{ enabled: true, gt: 20 }"
:edit-config="{
trigger: 'click',
mode: 'cell',
showStatus: true,
autoClear: false,
}"
@edit-closed="onEditClosed"
>
<template #empty>
<a-empty :description="emptyText" />
</template>
<!-- Custom cell renderers for attachment/image fields (U6) -->
<template
v-for="f in attachmentFields"
:key="f.id"
#[`cell_${f.id}`]="{ row }"
>
<AttachmentCell
v-if="f.field_type === 'attachment'"
:files="(row[f.id] as IAttachmentMeta[] | null | undefined)"
/>
<ImageCell
v-else-if="f.field_type === 'image'"
:images="(row[f.id] as IAttachmentMeta[] | null | undefined)"
/>
</template>
</vxe-grid>
</div>
</template>
<script setup lang="ts">
import { computed, ref } from 'vue'
import { VxeGrid } from 'vxe-table'
import { Empty as AEmpty } from 'ant-design-vue'
import type { VxeGridProps, VxeGridEvents } from 'vxe-table'
import type {
IBitableField,
IBitableRecord,
IAttachmentMeta,
FieldType,
} from '@/api/bitable'
import AttachmentCell from './AttachmentCell.vue'
import ImageCell from './ImageCell.vue'
type GridRow = Record<string, unknown> & { _rowId: string; _recordId: string }
type GridColumn = NonNullable<VxeGridProps['columns']>[number]
const props = withDefaults(
defineProps<{
fields: IBitableField[]
records: IBitableRecord[]
loading?: boolean
height?: string | number
emptyText?: string
}>(),
{
loading: false,
height: 'auto',
emptyText: '暂无数据',
},
)
const emit = defineEmits<{
(e: 'edit-cell', payload: { recordId: string; fieldId: string; value: unknown }): void
}>()
const gridRef = ref<InstanceType<typeof VxeGrid> | null>(null)
// Fields that use custom slot renderers (attachment/image)
const attachmentFields = computed(() =>
props.fields.filter(
(f) => f.field_type === 'attachment' || f.field_type === 'image',
),
)
// Map records to grid rows with a stable _rowId (record.id)
const rows = computed<GridRow[]>(() =>
props.records.map((r) => ({
_rowId: r.id,
_recordId: r.id,
...r.values,
})),
)
// Build vxe-grid columns from field definitions
const gridColumns = computed<GridColumn[]>(() => {
const cols: GridColumn[] = [
{
type: 'seq',
width: 56,
fixed: 'left',
title: '#',
},
]
for (const f of props.fields) {
cols.push(buildColumn(f))
}
return cols
})
function buildColumn(f: IBitableField): GridColumn {
const isFormula = f.field_type === 'formula'
const isAttachment = f.field_type === 'attachment' || f.field_type === 'image'
const base: GridColumn = {
field: f.id,
title: f.name,
minWidth: 120,
width: 160,
resizable: true,
showOverflow: 'tooltip',
}
// Formula fields are read-only
if (isFormula) {
return { ...base, editRender: { enabled: false } }
}
// Attachment/image fields use custom slot renderers, read-only
if (isAttachment) {
return {
...base,
editRender: { enabled: false },
slots: { default: `cell_${f.id}` },
}
}
// Editable cells by field type
switch (f.field_type as FieldType) {
case 'number':
return {
...base,
editRender: {
enabled: true,
name: 'VxeNumberInput',
},
}
case 'date':
return {
...base,
editRender: {
enabled: true,
name: 'VxeDatePicker',
},
}
case 'select':
case 'multiselect':
// ponytail: select editor uses text input for v1; options wiring is U5c
return {
...base,
editRender: { enabled: true, name: 'VxeInput' },
}
default:
return {
...base,
editRender: { enabled: true, name: 'VxeInput' },
}
}
}
// Handle edit-closed event emit cell update
const onEditClosed: VxeGridEvents.EditClosed = (params) => {
const { row, column } = params
const field = column.field
if (!field) return
const recordId = (row as GridRow)._recordId
if (!recordId) return
// Only emit if value actually changed
const newValue = (row as Record<string, unknown>)[field]
const original = props.records.find((r) => r.id === recordId)
if (!original) return
const oldValue = original.values[field]
if (newValue === oldValue) return
emit('edit-cell', {
recordId,
fieldId: field,
value: newValue,
})
}
// Expose grid ref for parent (e.g. to refresh)
defineExpose({
refresh: () => gridRef.value?.refreshColumn(),
})
</script>
<style scoped>
.bitable-grid-scope {
width: 100%;
height: 100%;
overflow: hidden;
}
/* KTD10: CSS isolation all vxe-table style overrides scoped to
.bitable-grid-scope. Use :deep() to reach vxe-table internals. */
.bitable-grid-scope :deep(.vxe-table) {
font-size: 13px;
}
.bitable-grid-scope :deep(.vxe-header--column) {
background: var(--bg-secondary, #fafafa);
font-weight: 600;
}
.bitable-grid-scope :deep(.vxe-body--column.is--dirty) {
background: var(--bg-tertiary, #fffbe6);
}
.bitable-grid-scope :deep(.vxe-cell--dirty) {
color: var(--color-primary, #1677ff);
}
</style>

View File

@ -0,0 +1,186 @@
<template>
<a-form layout="vertical">
<a-form-item label="字段名称">
<a-input v-model:value="localName" :maxlength="100" />
</a-form-item>
<a-form-item label="字段类型">
<a-select v-model:value="localType" :disabled="!!field" @change="onTypeChange">
<a-select-option value="text">文本</a-select-option>
<a-select-option value="number">数字</a-select-option>
<a-select-option value="date">日期</a-select-option>
<a-select-option value="select">单选</a-select-option>
<a-select-option value="multiselect">多选</a-select-option>
<a-select-option value="formula">公式</a-select-option>
<a-select-option value="attachment">附件</a-select-option>
<a-select-option value="image">图片</a-select-option>
</a-select>
</a-form-item>
<!-- Select / Multiselect: options editor -->
<template v-if="localType === 'select' || localType === 'multiselect'">
<a-form-item label="选项列表">
<div
v-for="(_, idx) in selectOptions"
:key="idx"
class="field-config-form__option-row"
>
<a-input
v-model:value="selectOptions[idx]"
placeholder="选项值"
:maxlength="200"
/>
<a-button type="text" danger :icon="h(DeleteOutlined)" @click="removeOption(idx)" />
</div>
<a-button type="dashed" block :icon="h(PlusOutlined)" @click="addOption">
添加选项
</a-button>
</a-form-item>
</template>
<!-- Formula: expression editor with live validation -->
<template v-if="localType === 'formula'">
<a-form-item label="公式表达式">
<a-textarea
v-model:value="formulaExpr"
placeholder="例如: {field_id_1} + {field_id_2} 或 SUM({field_id})"
:rows="3"
:maxlength="2000"
/>
<div class="field-config-form__formula-hint">
{字段ID} 引用其他字段支持 SUM/AVG/COUNT/MIN/MAX/ABS/ROUND/IF/LEN/CONCAT
</div>
</a-form-item>
<a-form-item v-if="formulaExpr" label="语法校验">
<a-alert
v-if="formulaValid === true"
type="success"
message="公式语法正确"
show-icon
/>
<a-alert
v-else-if="formulaValid === false"
type="error"
:message="formulaError || '公式语法错误'"
show-icon
/>
<a-alert v-else type="info" message="校验中..." show-icon />
</a-form-item>
</template>
<!-- Date: format -->
<template v-if="localType === 'date'">
<a-form-item label="日期格式">
<a-select v-model:value="dateFormat">
<a-select-option value="YYYY-MM-DD">YYYY-MM-DD</a-select-option>
<a-select-option value="YYYY-MM-DD HH:mm">YYYY-MM-DD HH:mm</a-select-option>
<a-select-option value="YYYY/MM/DD">YYYY/MM/DD</a-select-option>
</a-select>
</a-form-item>
</template>
</a-form>
</template>
<script setup lang="ts">
import { ref, watch, h } from 'vue'
import { Button as AButton, Input as AInput, Select as ASelect, Alert as AAlert } from 'ant-design-vue'
import { DeleteOutlined, PlusOutlined } from '@ant-design/icons-vue'
import type { IBitableField, FieldType } from '@/api/bitable'
const props = defineProps<{
field?: IBitableField | null
}>()
const localName = ref(props.field?.name ?? '')
const localType = ref<FieldType>(props.field?.field_type ?? 'text')
const selectOptions = ref<string[]>(
(props.field?.config?.options as string[]) ?? [],
)
const formulaExpr = ref((props.field?.config?.formula_expr as string) ?? '')
const dateFormat = ref((props.field?.config?.format as string) ?? 'YYYY-MM-DD')
// Formula validation state
const formulaValid = ref<boolean | null>(null)
const formulaError = ref<string | null>(null)
let validateTimer: ReturnType<typeof setTimeout> | null = null
// Debounced formula validation
watch(formulaExpr, (val) => {
if (!val.trim()) {
formulaValid.value = null
return
}
formulaValid.value = null
if (validateTimer) clearTimeout(validateTimer)
validateTimer = setTimeout(async () => {
try {
const { bitableApi } = await import('@/api/bitable')
const result = await bitableApi.validateFormula(val)
formulaValid.value = result.valid
formulaError.value = result.error ?? null
} catch (err) {
formulaValid.value = false
formulaError.value = err instanceof Error ? err.message : String(err)
}
}, 500)
})
function onTypeChange(): void {
// Reset type-specific config when type changes
selectOptions.value = []
formulaExpr.value = ''
formulaValid.value = null
}
function addOption(): void {
selectOptions.value.push('')
}
function removeOption(idx: number): void {
selectOptions.value.splice(idx, 1)
}
// Build config object from current state
function buildConfig(): Record<string, unknown> {
const config: Record<string, unknown> = {}
if (localType.value === 'select' || localType.value === 'multiselect') {
config.options = selectOptions.value.filter((o) => o.trim())
}
if (localType.value === 'formula') {
config.formula_expr = formulaExpr.value
}
if (localType.value === 'date') {
config.format = dateFormat.value
}
return config
}
// Expose method for parent to get current form data
defineExpose({
getData: () => ({
name: localName.value.trim(),
fieldType: localType.value,
config: buildConfig(),
}),
isFormulaValid: () => {
if (localType.value !== 'formula') return true
if (!formulaExpr.value.trim()) return false
return formulaValid.value === true
},
})
</script>
<style scoped>
.field-config-form__option-row {
display: flex;
gap: 8px;
margin-bottom: 8px;
}
.field-config-form__formula-hint {
margin-top: 4px;
font-size: 12px;
color: var(--text-secondary, #8c8c8c);
}
</style>

View File

@ -0,0 +1,249 @@
<template>
<a-drawer
:open="open"
title="字段管理"
placement="right"
:width="480"
@close="handleClose"
>
<!-- Field list -->
<div class="field-manage-panel__list">
<div
v-for="f in fields"
:key="f.id"
class="field-manage-panel__item"
>
<div class="field-manage-panel__item-info">
<span class="field-manage-panel__item-name">{{ f.name }}</span>
<div class="field-manage-panel__item-meta">
<a-tag :color="typeColor(f.field_type)">{{ typeLabel(f.field_type) }}</a-tag>
<a-tag :color="f.owner === 'agent' ? 'blue' : 'green'">
{{ f.owner === 'agent' ? 'Agent' : '用户' }}
</a-tag>
</div>
</div>
<div class="field-manage-panel__item-actions">
<a-button type="text" size="small" @click="handleEdit(f)">编辑</a-button>
<a-button type="text" size="small" danger @click="handleDelete(f)">删除</a-button>
</div>
</div>
<a-empty v-if="fields.length === 0" description="暂无字段" />
</div>
<!-- Add field button -->
<div class="field-manage-panel__add">
<a-button type="dashed" block :icon="h(PlusOutlined)" @click="handleAdd">
添加字段
</a-button>
</div>
<!-- Edit/Add modal -->
<a-modal
:open="editModalOpen"
:title="editingField ? '编辑字段' : '新增字段'"
:confirm-loading="saving"
@ok="handleSave"
@cancel="editModalOpen = false"
>
<FieldConfigForm
v-if="editModalOpen"
ref="formRef"
:field="editingField"
/>
</a-modal>
<!-- Delete dependency confirmation -->
<a-modal
:open="deleteConfirmOpen"
title="确认删除"
@ok="handleForceDelete"
@cancel="deleteConfirmOpen = false"
>
<p>该字段被以下内容引用强制删除将级联清理</p>
<pre class="field-manage-panel__deps">{{ JSON.stringify(deleteDependencies, null, 2) }}</pre>
</a-modal>
</a-drawer>
</template>
<script setup lang="ts">
import { ref, h } from 'vue'
import { Modal as AModal, Drawer as ADrawer, Button as AButton, Tag as ATag, Empty as AEmpty } from 'ant-design-vue'
import { PlusOutlined } from '@ant-design/icons-vue'
import type { IBitableField, FieldType } from '@/api/bitable'
import { useBitableStore } from '@/stores/bitable'
import FieldConfigForm from './FieldConfigForm.vue'
defineProps<{
open: boolean
fields: IBitableField[]
}>()
const emit = defineEmits<{
(e: 'close'): void
}>()
const store = useBitableStore()
const editModalOpen = ref(false)
const editingField = ref<IBitableField | null>(null)
const saving = ref(false)
const formRef = ref<InstanceType<typeof FieldConfigForm> | null>(null)
// Delete confirmation state
const deleteConfirmOpen = ref(false)
const deleteTargetId = ref<string | null>(null)
const deleteDependencies = ref<Record<string, unknown>>({})
function handleClose(): void {
emit('close')
}
function handleAdd(): void {
editingField.value = null
editModalOpen.value = true
}
function handleEdit(field: IBitableField): void {
editingField.value = field
editModalOpen.value = true
}
async function handleSave(): Promise<void> {
const form = formRef.value
if (!form) return
if (!form.isFormulaValid()) {
AModal.warning({
title: '公式语法错误',
content: '请修正公式表达式后再保存',
})
return
}
const data = form.getData()
if (!data.name) {
AModal.warning({ title: '请输入字段名称' })
return
}
saving.value = true
try {
if (editingField.value) {
// Update existing field
await store.updateField(editingField.value.id, {
name: data.name,
config: data.config,
})
} else {
// Create new field
await store.addField(data.name, data.fieldType, data.config)
}
editModalOpen.value = false
} finally {
saving.value = false
}
}
async function handleDelete(field: IBitableField): Promise<void> {
const result = await store.deleteField(field.id)
if (!result.success && result.dependencies) {
// Has dependencies show confirmation
deleteTargetId.value = field.id
deleteDependencies.value = result.dependencies
deleteConfirmOpen.value = true
}
}
async function handleForceDelete(): Promise<void> {
if (!deleteTargetId.value) return
await store.deleteField(deleteTargetId.value, true)
deleteConfirmOpen.value = false
deleteTargetId.value = null
deleteDependencies.value = {}
}
// Helpers
function typeLabel(t: FieldType): string {
const labels: Record<FieldType, string> = {
text: '文本',
number: '数字',
date: '日期',
select: '单选',
multiselect: '多选',
attachment: '附件',
image: '图片',
formula: '公式',
lookup: '引用',
}
return labels[t] ?? t
}
function typeColor(t: FieldType): string {
const colors: Partial<Record<FieldType, string>> = {
text: 'default',
number: 'blue',
date: 'purple',
select: 'cyan',
multiselect: 'cyan',
formula: 'orange',
lookup: 'magenta',
attachment: 'geekblue',
image: 'geekblue',
}
return colors[t] ?? 'default'
}
</script>
<style scoped>
.field-manage-panel__list {
display: flex;
flex-direction: column;
gap: 8px;
margin-bottom: 16px;
}
.field-manage-panel__item {
display: flex;
align-items: center;
justify-content: space-between;
padding: 12px;
border: 1px solid var(--border-color, #f0f0f0);
border-radius: 6px;
}
.field-manage-panel__item-info {
display: flex;
flex-direction: column;
gap: 4px;
}
.field-manage-panel__item-name {
font-weight: 500;
font-size: 14px;
}
.field-manage-panel__item-meta {
display: flex;
gap: 4px;
}
.field-manage-panel__item-actions {
display: flex;
gap: 4px;
}
.field-manage-panel__add {
margin-top: 16px;
}
.field-manage-panel__deps {
background: var(--bg-secondary, #fafafa);
padding: 12px;
border-radius: 4px;
font-size: 12px;
max-height: 300px;
overflow: auto;
}
</style>

View File

@ -0,0 +1,179 @@
<template>
<div class="filter-builder">
<div
v-for="(condition, idx) in conditions"
:key="idx"
class="filter-builder__row"
>
<a-select
v-model:value="condition.field_id"
placeholder="选择字段"
style="width: 140px"
@change="onFieldChange(idx)"
>
<a-select-option
v-for="f in fields"
:key="f.id"
:value="f.id"
>
{{ f.name }}
</a-select-option>
</a-select>
<a-select
v-model:value="condition.op"
placeholder="操作符"
style="width: 120px"
>
<a-select-option
v-for="op in availableOps(condition.field_id)"
:key="op.value"
:value="op.value"
>
{{ op.label }}
</a-select-option>
</a-select>
<a-input
v-if="getValueInputType(condition.field_id) === 'text'"
v-model:value="condition.value"
placeholder="值"
style="width: 160px"
/>
<a-input-number
v-else-if="getValueInputType(condition.field_id) === 'number'"
v-model:value="condition.value"
placeholder="数值"
style="width: 160px"
/>
<a-date-picker
v-else-if="getValueInputType(condition.field_id) === 'date'"
:value="condition.value as string | undefined"
@change="(_, dateString) => condition.value = dateString"
style="width: 160px"
/>
<a-button type="text" danger :icon="h(DeleteOutlined)" @click="removeCondition(idx)" />
</div>
<a-button type="dashed" block :icon="h(PlusOutlined)" @click="addCondition">
添加筛选条件
</a-button>
</div>
</template>
<script setup lang="ts">
import { ref, watch, h } from 'vue'
import {
Button as AButton,
Select as ASelect,
Input as AInput,
InputNumber as AInputNumber,
DatePicker as ADatePicker,
} from 'ant-design-vue'
import { DeleteOutlined, PlusOutlined } from '@ant-design/icons-vue'
import type { IBitableField, FieldType } from '@/api/bitable'
interface FilterCondition {
field_id: string
op: string
// Value type varies by field type: string for text, number for number, string(ISO) for date
value: string | number | undefined
}
const props = defineProps<{
fields: IBitableField[]
modelValue?: FilterCondition[]
}>()
const emit = defineEmits<{
(e: 'update:modelValue', conditions: FilterCondition[]): void
}>()
const conditions = ref<FilterCondition[]>(props.modelValue ? [...props.modelValue] : [])
watch(conditions, (val) => {
emit('update:modelValue', val)
}, { deep: true })
function addCondition(): void {
conditions.value.push({ field_id: '', op: 'eq', value: undefined })
}
function removeCondition(idx: number): void {
conditions.value.splice(idx, 1)
}
function onFieldChange(idx: number): void {
// Reset op to first available for the new field type
const condition = conditions.value[idx]
const ops = availableOps(condition.field_id)
if (ops.length > 0 && !ops.some((o) => o.value === condition.op)) {
condition.op = ops[0].value
}
condition.value = undefined
}
function getField(field_id: string): IBitableField | undefined {
return props.fields.find((f) => f.id === field_id)
}
function availableOps(field_id: string): { value: string; label: string }[] {
const field = getField(field_id)
if (!field) return [{ value: 'eq', label: '等于' }]
const type = field.field_type as FieldType
switch (type) {
case 'number':
return [
{ value: 'eq', label: '等于' },
{ value: 'ne', label: '不等于' },
{ value: 'gt', label: '大于' },
{ value: 'lt', label: '小于' },
{ value: 'gte', label: '大于等于' },
{ value: 'lte', label: '小于等于' },
]
case 'date':
return [
{ value: 'eq', label: '等于' },
{ value: 'gt', label: '之后' },
{ value: 'lt', label: '之前' },
]
case 'text':
default:
return [
{ value: 'eq', label: '等于' },
{ value: 'ne', label: '不等于' },
{ value: 'contains', label: '包含' },
{ value: 'is_empty', label: '为空' },
]
}
}
function getValueInputType(field_id: string): 'text' | 'number' | 'date' | null {
const field = getField(field_id)
if (!field) return 'text'
const type = field.field_type as FieldType
switch (type) {
case 'number':
return 'number'
case 'date':
return 'date'
default:
return 'text'
}
}
defineExpose({
getConditions: () => conditions.value,
})
</script>
<style scoped>
.filter-builder__row {
display: flex;
gap: 8px;
margin-bottom: 8px;
align-items: center;
}
</style>

View File

@ -0,0 +1,117 @@
<template>
<div class="image-cell">
<div
v-for="(img, idx) in images"
:key="idx"
ref="containerRefs"
:data-idx="idx"
class="image-cell__thumb"
@click="preview(img)"
>
<img
v-if="loaded[idx]"
:src="img.url"
:alt="img.filename"
class="image-cell__img"
/>
<div v-else class="image-cell__placeholder">
<FileImageOutlined />
</div>
</div>
<span v-if="!images || images.length === 0" class="image-cell__empty"></span>
</div>
</template>
<script setup lang="ts">
import { ref, watch, onMounted, onUnmounted, nextTick } from 'vue'
import { FileImageOutlined } from '@ant-design/icons-vue'
import type { IAttachmentMeta } from '@/api/bitable'
const props = defineProps<{
images: IAttachmentMeta[] | null | undefined
}>()
const containerRefs = ref<HTMLElement[]>([])
const loaded = ref<Record<number, boolean>>({})
let observer: IntersectionObserver | null = null
function setupObserver(): void {
if (observer) {
observer.disconnect()
}
observer = new IntersectionObserver(
(entries) => {
for (const entry of entries) {
if (entry.isIntersecting) {
const idx = Number((entry.target as HTMLElement).dataset.idx)
loaded.value[idx] = true
observer?.unobserve(entry.target)
}
}
},
{ rootMargin: '100px' },
)
nextTick(() => {
containerRefs.value.forEach((el) => {
if (el && observer) observer.observe(el)
})
})
}
function preview(img: IAttachmentMeta): void {
window.open(img.url, '_blank')
}
watch(
() => props.images,
() => {
loaded.value = {}
setupObserver()
},
)
onMounted(() => {
setupObserver()
})
onUnmounted(() => {
observer?.disconnect()
})
</script>
<style scoped>
.image-cell {
display: flex;
flex-wrap: wrap;
gap: 4px;
padding: 2px 0;
}
.image-cell__thumb {
width: 40px;
height: 40px;
border-radius: 4px;
overflow: hidden;
cursor: pointer;
background: var(--bg-secondary, #f5f5f5);
display: flex;
align-items: center;
justify-content: center;
border: 1px solid var(--border-color, #f0f0f0);
}
.image-cell__img {
width: 100%;
height: 100%;
object-fit: cover;
}
.image-cell__placeholder {
color: var(--text-placeholder, #bfbfbf);
font-size: 16px;
}
.image-cell__empty {
color: var(--text-placeholder, #bfbfbf);
}
</style>

View File

@ -0,0 +1,103 @@
<template>
<a-modal
:open="open"
title="新建数据表"
:confirm-loading="loading"
@ok="handleOk"
@cancel="handleCancel"
>
<a-form
ref="formRef"
:model="formState"
:rules="rules"
layout="vertical"
>
<a-form-item label="表名" name="name">
<a-input
v-model:value="formState.name"
placeholder="请输入表名"
:maxlength="100"
/>
</a-form-item>
<a-form-item label="描述" name="description">
<a-textarea
v-model:value="formState.description"
placeholder="可选,表的用途描述"
:rows="2"
:maxlength="500"
/>
</a-form-item>
</a-form>
</a-modal>
</template>
<script setup lang="ts">
import { ref, reactive, watch } from 'vue'
import type { FormInstance } from 'ant-design-vue'
const props = defineProps<{
open: boolean
}>()
const emit = defineEmits<{
(e: 'success', tableId: string): void
(e: 'cancel'): void
}>()
const formRef = ref<FormInstance | null>(null)
const loading = ref(false)
const formState = reactive({
name: '',
description: '',
})
const rules = {
name: [
{ required: true, message: '请输入表名', trigger: 'blur' },
{ min: 1, max: 100, message: '表名长度 1-100', trigger: 'blur' },
],
}
// Reset form when modal opens
watch(
() => props.open,
(val) => {
if (val) {
formState.name = ''
formState.description = ''
}
},
)
async function handleOk(): Promise<void> {
try {
await formRef.value?.validate()
} catch {
return
}
loading.value = true
try {
// Lazy import to avoid circular dependency
const { bitableApi } = await import('@/api/bitable')
const resp = await bitableApi.createTable({
name: formState.name.trim(),
description: formState.description.trim() || undefined,
})
emit('success', resp.table.id)
} catch (err) {
const { notification } = await import('ant-design-vue')
notification.error({
message: '创建失败',
description: err instanceof Error ? err.message : String(err),
})
} finally {
loading.value = false
}
}
function handleCancel(): void {
emit('cancel')
}
</script>

View File

@ -0,0 +1,121 @@
<template>
<div class="table-view-list">
<div class="table-view-list__header">
<span class="table-view-list__title">数据表</span>
<a-button
size="small"
type="text"
:icon="h(PlusOutlined)"
@click="emit('create')"
/>
</div>
<a-spin :spinning="loading">
<div class="table-view-list__items">
<div
v-for="t in tables"
:key="t.id"
class="table-view-list__item"
:class="{ 'is-active': t.id === activeId }"
@click="emit('select', t.id)"
>
<TableOutlined class="table-view-list__icon" />
<span class="table-view-list__name">{{ t.name }}</span>
</div>
<a-empty
v-if="!loading && tables.length === 0"
:image="Empty.PRESENTED_IMAGE_SIMPLE"
description="暂无数据表"
class="table-view-list__empty"
/>
</div>
</a-spin>
</div>
</template>
<script setup lang="ts">
import { h } from 'vue'
import { Button as AButton, Spin as ASpin, Empty } from 'ant-design-vue'
import { PlusOutlined, TableOutlined } from '@ant-design/icons-vue'
import type { IBitableTable } from '@/api/bitable'
defineProps<{
tables: IBitableTable[]
activeId: string | null
loading?: boolean
}>()
const emit = defineEmits<{
(e: 'select', tableId: string): void
(e: 'create'): void
}>()
</script>
<style scoped>
.table-view-list {
width: 100%;
height: 100%;
display: flex;
flex-direction: column;
background: var(--bg-primary, #fff);
border-right: 1px solid var(--border-color, #f0f0f0);
}
.table-view-list__header {
display: flex;
align-items: center;
justify-content: space-between;
padding: 12px 16px;
border-bottom: 1px solid var(--border-color, #f0f0f0);
}
.table-view-list__title {
font-weight: 600;
font-size: 14px;
color: var(--text-primary, #1f1f1f);
}
.table-view-list__items {
flex: 1;
overflow-y: auto;
padding: 4px 0;
}
.table-view-list__item {
display: flex;
align-items: center;
gap: 8px;
padding: 8px 16px;
cursor: pointer;
transition: background 0.15s;
font-size: 13px;
color: var(--text-secondary, #595959);
}
.table-view-list__item:hover {
background: var(--bg-secondary, #fafafa);
}
.table-view-list__item.is-active {
background: var(--color-primary-bg, #e6f4ff);
color: var(--color-primary, #1677ff);
font-weight: 500;
}
.table-view-list__icon {
font-size: 14px;
flex-shrink: 0;
}
.table-view-list__name {
flex: 1;
overflow: hidden;
text-overflow: ellipsis;
white-space: nowrap;
}
.table-view-list__empty {
padding: 24px 0;
}
</style>

View File

@ -0,0 +1,175 @@
<template>
<a-drawer
:open="open"
title="视图配置"
placement="right"
:width="520"
@close="handleClose"
>
<a-tabs v-model:activeKey="activeTab">
<!-- Filter tab -->
<a-tab-pane key="filter" tab="筛选">
<FilterBuilder
ref="filterRef"
:fields="fields"
:model-value="currentFilters"
/>
<div class="view-config-panel__actions">
<a-button type="primary" @click="saveFilters">保存筛选</a-button>
</div>
</a-tab-pane>
<!-- Sort tab -->
<a-tab-pane key="sort" tab="排序">
<div class="view-config-panel__sort-row">
<a-select
v-model:value="sortFieldId"
placeholder="选择排序字段"
style="width: 200px"
>
<a-select-option
v-for="f in fields"
:key="f.id"
:value="f.id"
>
{{ f.name }}
</a-select-option>
</a-select>
<a-select v-model:value="sortOrder" style="width: 100px">
<a-select-option value="asc">升序</a-select-option>
<a-select-option value="desc">降序</a-select-option>
</a-select>
</div>
<div class="view-config-panel__actions">
<a-button type="primary" @click="saveSort">保存排序</a-button>
</div>
</a-tab-pane>
<!-- Hidden fields tab -->
<a-tab-pane key="hidden" tab="隐藏字段">
<p class="view-config-panel__hint">勾选要隐藏的字段</p>
<a-checkbox-group v-model:value="hiddenFieldIds">
<div
v-for="f in fields"
:key="f.id"
class="view-config-panel__field-check"
>
<a-checkbox :value="f.id">{{ f.name }}</a-checkbox>
</div>
</a-checkbox-group>
<div class="view-config-panel__actions">
<a-button type="primary" @click="saveHidden">保存隐藏配置</a-button>
</div>
</a-tab-pane>
</a-tabs>
</a-drawer>
</template>
<script setup lang="ts">
import { ref, watch, computed } from 'vue'
import {
Drawer as ADrawer,
Tabs as ATabs,
Button as AButton,
Select as ASelect,
Checkbox as ACheckbox,
} from 'ant-design-vue'
import type { IBitableField, IBitableView } from '@/api/bitable'
import { useBitableStore } from '@/stores/bitable'
import FilterBuilder from './FilterBuilder.vue'
const props = defineProps<{
open: boolean
fields: IBitableField[]
view: IBitableView | null
}>()
const emit = defineEmits<{
(e: 'close'): void
}>()
const store = useBitableStore()
const activeTab = ref('filter')
const filterRef = ref<InstanceType<typeof FilterBuilder> | null>(null)
// Current config from view
const currentFilters = computed(() => {
const filters = props.view?.config?.filters as unknown
if (!Array.isArray(filters)) return []
return filters as { field_id: string; op: string; value: string | number | undefined }[]
})
const sortFieldId = ref<string>(
(props.view?.config?.sort as { field?: string })?.field ?? '',
)
const sortOrder = ref<string>(
(props.view?.config?.sort as { order?: string })?.order ?? 'asc',
)
const hiddenFieldIds = ref<string[]>(
(props.view?.config?.hidden_fields as string[]) ?? [],
)
// Reset when view changes
watch(
() => props.view?.id,
() => {
sortFieldId.value = (props.view?.config?.sort as { field?: string })?.field ?? ''
sortOrder.value = (props.view?.config?.sort as { order?: string })?.order ?? 'asc'
hiddenFieldIds.value = (props.view?.config?.hidden_fields as string[]) ?? []
},
)
function handleClose(): void {
emit('close')
}
async function saveFilters(): Promise<void> {
if (!props.view) return
const conditions = filterRef.value?.getConditions() ?? []
const config = {
...props.view.config,
filters: conditions,
}
await store.updateView(props.view.id, { config })
}
async function saveSort(): Promise<void> {
if (!props.view) return
const config = {
...props.view.config,
sort: { field: sortFieldId.value, order: sortOrder.value },
}
await store.updateView(props.view.id, { config })
}
async function saveHidden(): Promise<void> {
if (!props.view) return
const config = {
...props.view.config,
hidden_fields: hiddenFieldIds.value,
}
await store.updateView(props.view.id, { config })
}
</script>
<style scoped>
.view-config-panel__sort-row {
display: flex;
gap: 8px;
margin-bottom: 16px;
}
.view-config-panel__actions {
margin-top: 16px;
}
.view-config-panel__hint {
color: var(--text-secondary, #8c8c8c);
margin-bottom: 12px;
}
.view-config-panel__field-check {
margin-bottom: 8px;
}
</style>

View File

@ -0,0 +1,83 @@
<template>
<div class="view-switcher">
<a-tabs
v-model:activeKey="activeKey"
type="editable-card"
size="small"
:add-icon="h(PlusOutlined)"
@change="onSwitch"
@edit="onEdit"
>
<a-tab-pane
v-for="v in views"
:key="v.id"
:tab="v.name"
:closable="false"
/>
</a-tabs>
<a-button
v-if="activeKey"
type="text"
size="small"
:icon="h(FilterOutlined)"
@click="emit('config')"
>
配置
</a-button>
</div>
</template>
<script setup lang="ts">
import { ref, watch, h } from 'vue'
import { Tabs as ATabs, Button as AButton } from 'ant-design-vue'
import { PlusOutlined, FilterOutlined } from '@ant-design/icons-vue'
import type { IBitableView } from '@/api/bitable'
const props = defineProps<{
views: IBitableView[]
activeViewId: string | null
}>()
const emit = defineEmits<{
(e: 'switch', viewId: string): void
(e: 'create'): void
(e: 'config'): void
}>()
// antd Tabs activeKey is string | number | undefined; bridge to/from null
const activeKey = ref<string | undefined>(props.activeViewId ?? undefined)
watch(
() => props.activeViewId,
(val) => {
activeKey.value = val ?? undefined
},
)
function onSwitch(key: string | number): void {
emit('switch', String(key))
}
function onEdit(_targetKey: unknown, action: 'add' | 'remove'): void {
if (action === 'add') {
emit('create')
}
// remove is disabled (closable=false) no-op
}
</script>
<style scoped>
.view-switcher {
display: flex;
align-items: center;
gap: 8px;
padding: 0 16px;
border-bottom: 1px solid var(--border-color, #f0f0f0);
}
.view-switcher :deep(.ant-tabs) {
flex: 1;
min-width: 0;
}
</style>

View File

@ -0,0 +1,381 @@
/**
* Pinia store for bitable feature tables, fields, records, views,
* and formula recalc status polling.
*
* ponytail: Formula recalc polling uses a simple setInterval. The polling
* stops when no records have pending/calculating formula fields.
* Ceiling: one polling timer per store instance; if multiple BitableGrid
* components mount simultaneously they share the same store-level timer.
*/
import { defineStore } from 'pinia'
import { ref, computed } from 'vue'
import { notification } from 'ant-design-vue'
import { bitableApi } from '@/api/bitable'
import type {
IBitableTable,
IBitableField,
IBitableRecord,
IBitableView,
FieldType,
} from '@/api/bitable'
export const useBitableStore = defineStore('bitable', () => {
// --- State ---
const tables = ref<IBitableTable[]>([])
const currentTable = ref<IBitableTable | null>(null)
const fields = ref<IBitableField[]>([])
const records = ref<IBitableRecord[]>([])
const views = ref<IBitableView[]>([])
const currentView = ref<IBitableView | null>(null)
const isLoading = ref(false)
const error = ref<string | null>(null)
const nextCursor = ref<string | null>(null)
const recalcPendingCount = ref(0)
// Polling timer for formula recalc status
let _pollTimer: ReturnType<typeof setInterval> | null = null
const POLL_INTERVAL = 2000 // 2s per plan
// --- Getters ---
const formulaFields = computed(() =>
fields.value.filter((f) => f.field_type === 'formula'),
)
const hasFormulaFields = computed(() => formulaFields.value.length > 0)
// --- Actions ---
/** Load all bitable tables */
async function loadTables(): Promise<void> {
isLoading.value = true
error.value = null
try {
const resp = await bitableApi.listTables()
tables.value = resp.tables || []
} catch (err) {
error.value = err instanceof Error ? err.message : '加载表格列表失败'
notification.error({ message: '加载失败', description: error.value })
} finally {
isLoading.value = false
}
}
/** Select a table and load its fields + records */
async function selectTable(tableId: string): Promise<void> {
stopPolling()
records.value = []
nextCursor.value = null
recalcPendingCount.value = 0
views.value = []
currentView.value = null
const table = tables.value.find((t) => t.id === tableId)
currentTable.value = table || null
if (!table) return
try {
const [fieldsResp, recordsResp, viewsResp] = await Promise.all([
bitableApi.listFields(tableId),
bitableApi.listRecords(tableId, { limit: 100 }),
bitableApi.listViews(tableId),
])
fields.value = fieldsResp.fields || []
records.value = recordsResp.records || []
nextCursor.value = recordsResp.next_cursor
views.value = viewsResp.views || []
// Start polling if there are formula fields
if (hasFormulaFields.value) {
startPolling(tableId)
}
} catch (err) {
error.value = err instanceof Error ? err.message : '加载表格数据失败'
notification.error({ message: '加载失败', description: error.value })
}
}
/** Load more records (cursor pagination) */
async function loadMoreRecords(): Promise<void> {
if (!currentTable.value || !nextCursor.value) return
try {
const resp = await bitableApi.listRecords(currentTable.value.id, {
cursor: nextCursor.value,
limit: 100,
})
records.value.push(...(resp.records || []))
nextCursor.value = resp.next_cursor
} catch (err) {
notification.error({
message: '加载更多失败',
description: err instanceof Error ? err.message : String(err),
})
}
}
/** Update a single cell value */
async function updateCell(
recordId: string,
fieldId: string,
value: unknown,
): Promise<void> {
try {
const resp = await bitableApi.updateRecord(recordId, { [fieldId]: value })
// Update local state
const idx = records.value.findIndex((r) => r.id === recordId)
if (idx >= 0) {
records.value[idx] = resp.record
}
} catch (err) {
notification.error({
message: '更新失败',
description: err instanceof Error ? err.message : String(err),
})
}
}
/** Add a new field */
async function addField(
name: string,
fieldType: FieldType,
config?: Record<string, unknown>,
): Promise<IBitableField | null> {
if (!currentTable.value) return null
try {
const resp = await bitableApi.createField(currentTable.value.id, {
name,
field_type: fieldType,
config,
})
fields.value.push(resp.field)
return resp.field
} catch (err) {
notification.error({
message: '创建字段失败',
description: err instanceof Error ? err.message : String(err),
})
return null
}
}
/** Create a new table */
async function createTable(
name: string,
description?: string,
): Promise<IBitableTable | null> {
try {
const resp = await bitableApi.createTable({ name, description })
tables.value.push(resp.table)
return resp.table
} catch (err) {
notification.error({
message: '创建表格失败',
description: err instanceof Error ? err.message : String(err),
})
return null
}
}
/** Update an existing field */
async function updateField(
fieldId: string,
data: { name?: string; config?: Record<string, unknown> },
): Promise<IBitableField | null> {
try {
const resp = await bitableApi.updateField(fieldId, data)
const idx = fields.value.findIndex((f) => f.id === fieldId)
if (idx >= 0) {
fields.value[idx] = resp.field
}
return resp.field
} catch (err) {
notification.error({
message: '更新字段失败',
description: err instanceof Error ? err.message : String(err),
})
return null
}
}
/** Delete a field; returns dependencies on 409 */
async function deleteField(
fieldId: string,
force = false,
): Promise<{ success: boolean; dependencies?: Record<string, unknown> }> {
try {
await bitableApi.deleteField(fieldId, force)
fields.value = fields.value.filter((f) => f.id !== fieldId)
return { success: true }
} catch (err) {
const apiErr = err as { status?: number; detail?: unknown }
// 409 = has dependencies, return them for UI confirmation
if (apiErr.status === 409 && apiErr.detail) {
const detail = apiErr.detail as Record<string, unknown>
return { success: false, dependencies: detail.dependencies as Record<string, unknown> }
}
notification.error({
message: '删除字段失败',
description: err instanceof Error ? err.message : String(err),
})
return { success: false }
}
}
/** Refresh records (e.g. after Agent writes data via BitableTool) */
async function refreshRecords(): Promise<void> {
if (!currentTable.value) return
try {
const filters = currentView.value?.config?.filters as unknown[] | undefined
const resp = await bitableApi.listRecords(currentTable.value.id, {
limit: 100,
filters: filters ? JSON.stringify(filters) : undefined,
})
records.value = resp.records || []
nextCursor.value = resp.next_cursor
} catch (err) {
// Silent fail on refresh — user didn't explicitly request it
console.warn('Failed to refresh records:', err)
}
}
// --- View management (U5c) ---
/** Create a new view for the current table */
async function createView(
name: string,
viewType: IBitableView['view_type'] = 'grid',
config?: Record<string, unknown>,
): Promise<IBitableView | null> {
if (!currentTable.value) return null
try {
const resp = await bitableApi.createView(currentTable.value.id, {
name,
view_type: viewType,
config,
})
views.value.push(resp.view)
currentView.value = resp.view
return resp.view
} catch (err) {
notification.error({
message: '创建视图失败',
description: err instanceof Error ? err.message : String(err),
})
return null
}
}
/** Update a view's config (filters/sorts/hidden fields) */
async function updateView(
viewId: string,
data: { name?: string; config?: Record<string, unknown> },
): Promise<void> {
try {
const resp = await bitableApi.updateView(viewId, data)
const idx = views.value.findIndex((v) => v.id === viewId)
if (idx >= 0) {
views.value[idx] = resp.view
}
if (currentView.value?.id === viewId) {
currentView.value = resp.view
// Re-query records with updated view config
await refreshRecords()
}
} catch (err) {
notification.error({
message: '更新视图失败',
description: err instanceof Error ? err.message : String(err),
})
}
}
/** Switch to a view — applies its config to the records query */
async function switchView(viewId: string): Promise<void> {
const view = views.value.find((v) => v.id === viewId)
if (!view) return
currentView.value = view
await refreshRecords()
}
// --- Formula recalc polling (R7) ---
/** Start polling for formula recalc status */
function startPolling(tableId: string): void {
stopPolling()
_pollTimer = setInterval(async () => {
await pollRecalcStatus(tableId)
}, POLL_INTERVAL)
}
/** Stop the polling timer */
function stopPolling(): void {
if (_pollTimer !== null) {
clearInterval(_pollTimer)
_pollTimer = null
}
}
/** Poll recalc status: reload records if any formula fields are still calculating */
async function pollRecalcStatus(tableId: string): Promise<void> {
try {
const resp = await bitableApi.listRecords(tableId, { limit: 100 })
const newRecords = resp.records || []
// Single traversal: collect pending records (formula field values still null)
const pending = newRecords.filter((rec) =>
formulaFields.value.some((f) => rec.values[f.id] == null),
)
const stillCalculating = pending.length > 0
// Only update state if records actually changed (avoid unnecessary re-renders)
const oldIds = records.value.map((r) => r.id).join(',')
const newIds = newRecords.map((r) => r.id).join(',')
if (oldIds !== newIds || stillCalculating) {
records.value = newRecords
nextCursor.value = resp.next_cursor
}
if (stillCalculating) {
recalcPendingCount.value = pending.length
} else {
recalcPendingCount.value = 0
stopPolling()
}
} catch (err) {
// Silent fail on poll — don't spam notifications
console.warn('Recalc poll failed:', err)
}
}
return {
// State
tables,
currentTable,
fields,
records,
views,
currentView,
isLoading,
error,
nextCursor,
recalcPendingCount,
// Getters
formulaFields,
hasFormulaFields,
// Actions
loadTables,
selectTable,
loadMoreRecords,
updateCell,
addField,
createTable,
updateField,
deleteField,
refreshRecords,
createView,
updateView,
switchView,
stopPolling,
}
})

View File

@ -0,0 +1,294 @@
<template>
<div class="bitable-view">
<!-- Top bar -->
<div class="bitable-view__topbar">
<div class="bitable-view__topbar-left">
<a-button type="text" :icon="h(ArrowLeftOutlined)" @click="goBack" />
<span class="bitable-view__title">多维表格</span>
</div>
<div class="bitable-view__topbar-right">
<a-tag v-if="store.recalcPendingCount > 0" color="processing">
<LoadingOutlined /> 计算中 ({{ store.recalcPendingCount }})
</a-tag>
<a-button
v-if="store.currentTable"
size="small"
:icon="h(SettingOutlined)"
@click="fieldPanelOpen = true"
>
字段管理
</a-button>
<a-button size="small" :icon="h(ReloadOutlined)" @click="handleRefresh">
刷新
</a-button>
</div>
</div>
<!-- Body: left sidebar + right grid -->
<div class="bitable-view__body">
<aside class="bitable-view__sidebar">
<TableViewList
:tables="store.tables"
:active-id="store.currentTable?.id ?? null"
:loading="store.isLoading"
@select="handleSelectTable"
@create="createModalOpen = true"
/>
</aside>
<main class="bitable-view__main">
<div v-if="!store.currentTable" class="bitable-view__placeholder">
<TableOutlined style="font-size: 48px; color: var(--text-placeholder)" />
<p>请选择左侧的数据表</p>
</div>
<template v-else>
<div class="bitable-view__grid-header">
<h3 class="bitable-view__table-name">{{ store.currentTable.name }}</h3>
<span class="bitable-view__field-count">
{{ store.fields.length }} 个字段 · {{ store.records.length }} 条记录
</span>
</div>
<!-- View switcher (U5c) -->
<ViewSwitcher
:views="store.views"
:active-view-id="store.currentView?.id ?? null"
@switch="handleSwitchView"
@create="handleCreateView"
@config="viewConfigOpen = true"
/>
<div class="bitable-view__grid-container">
<BitableGrid
:fields="visibleFields"
:records="store.records"
:loading="store.isLoading"
height="100%"
@edit-cell="handleEditCell"
/>
</div>
<!-- Load more (cursor pagination) -->
<div v-if="store.nextCursor" class="bitable-view__load-more">
<a-button @click="store.loadMoreRecords()">加载更多</a-button>
</div>
</template>
</main>
</div>
<!-- Table create modal (U5b) -->
<TableCreateModal
:open="createModalOpen"
@success="handleTableCreated"
@cancel="createModalOpen = false"
/>
<!-- Field management panel (U5b) -->
<FieldManagePanel
:open="fieldPanelOpen"
:fields="store.fields"
@close="fieldPanelOpen = false"
/>
<!-- View config panel (U5c) -->
<ViewConfigPanel
:open="viewConfigOpen"
:fields="store.fields"
:view="store.currentView"
@close="viewConfigOpen = false"
/>
</div>
</template>
<script setup lang="ts">
import { ref, computed, h, onMounted, onUnmounted } from 'vue'
import { useRouter } from 'vue-router'
import { Button as AButton, Tag as ATag, Modal as AModal } from 'ant-design-vue'
import {
ArrowLeftOutlined,
ReloadOutlined,
LoadingOutlined,
TableOutlined,
SettingOutlined,
} from '@ant-design/icons-vue'
import { useBitableStore } from '@/stores/bitable'
import TableViewList from '@/components/bitable/TableViewList.vue'
import BitableGrid from '@/components/bitable/BitableGrid.vue'
import TableCreateModal from '@/components/bitable/TableCreateModal.vue'
import FieldManagePanel from '@/components/bitable/FieldManagePanel.vue'
import ViewSwitcher from '@/components/bitable/ViewSwitcher.vue'
import ViewConfigPanel from '@/components/bitable/ViewConfigPanel.vue'
const router = useRouter()
const store = useBitableStore()
const createModalOpen = ref(false)
const fieldPanelOpen = ref(false)
const viewConfigOpen = ref(false)
// Filter out hidden fields based on current view config
const visibleFields = computed(() => {
const hiddenIds = (store.currentView?.config?.hidden_fields as string[]) ?? []
if (hiddenIds.length === 0) return store.fields
return store.fields.filter((f) => !hiddenIds.includes(f.id))
})
onMounted(() => {
store.loadTables()
})
onUnmounted(() => {
store.stopPolling()
})
function goBack(): void {
router.push('/agent/chat')
}
function handleSelectTable(tableId: string): void {
store.selectTable(tableId)
}
function handleRefresh(): void {
store.refreshRecords()
}
async function handleTableCreated(tableId: string): Promise<void> {
createModalOpen.value = false
await store.loadTables()
await store.selectTable(tableId)
}
async function handleEditCell(payload: {
recordId: string
fieldId: string
value: unknown
}): Promise<void> {
await store.updateCell(payload.recordId, payload.fieldId, payload.value)
}
function handleSwitchView(viewId: string): void {
store.switchView(viewId)
}
async function handleCreateView(): Promise<void> {
// ponytail: simple prompt for view name; full create modal is overkill for v1
let name = ''
AModal.confirm({
title: '新建视图',
content: h('input', {
class: 'ant-input',
placeholder: '请输入视图名称',
onInput: (e: Event) => {
name = (e.target as HTMLInputElement).value
},
}),
onOk: async () => {
if (!name.trim()) return
await store.createView(name.trim(), 'grid')
},
})
}
</script>
<style scoped>
.bitable-view {
display: flex;
flex-direction: column;
height: 100vh;
width: 100vw;
overflow: hidden;
background: var(--bg-primary, #fff);
}
.bitable-view__topbar {
display: flex;
align-items: center;
justify-content: space-between;
padding: 8px 16px;
border-bottom: 1px solid var(--border-color, #f0f0f0);
flex-shrink: 0;
}
.bitable-view__topbar-left {
display: flex;
align-items: center;
gap: 8px;
}
.bitable-view__topbar-right {
display: flex;
align-items: center;
gap: 8px;
}
.bitable-view__title {
font-weight: 600;
font-size: 16px;
}
.bitable-view__body {
flex: 1;
display: flex;
overflow: hidden;
}
.bitable-view__sidebar {
width: 240px;
flex-shrink: 0;
overflow: hidden;
}
.bitable-view__main {
flex: 1;
display: flex;
flex-direction: column;
overflow: hidden;
min-width: 0;
}
.bitable-view__placeholder {
display: flex;
flex-direction: column;
align-items: center;
justify-content: center;
height: 100%;
gap: 16px;
color: var(--text-placeholder, #bfbfbf);
}
.bitable-view__grid-header {
display: flex;
align-items: baseline;
gap: 12px;
padding: 12px 16px;
border-bottom: 1px solid var(--border-color, #f0f0f0);
flex-shrink: 0;
}
.bitable-view__table-name {
margin: 0;
font-size: 16px;
font-weight: 600;
}
.bitable-view__field-count {
font-size: 12px;
color: var(--text-secondary, #8c8c8c);
}
.bitable-view__grid-container {
flex: 1;
overflow: hidden;
padding: 0;
}
.bitable-view__load-more {
display: flex;
justify-content: center;
padding: 8px;
border-top: 1px solid var(--border-color, #f0f0f0);
flex-shrink: 0;
}
</style>

View File

@ -0,0 +1,605 @@
"""REST API routes for the bitable companion service.
All endpoints are prefixed ``/api/v1/bitable``. Auth via ``require_bitable_auth``
which accepts either a user JWT (``Authorization: Bearer``) or an internal
service token (``X-Internal-Token``) per KTD11. Service is obtained from
``app.state.bitable_service`` (503 if not initialized).
"""
from __future__ import annotations
import hmac
import logging
import os
import uuid
from pathlib import Path
from typing import Any
from fastapi import (
APIRouter,
Depends,
File,
HTTPException,
Query,
Request,
UploadFile,
)
from fastapi.responses import FileResponse
from pydantic import BaseModel, Field
from agentkit.bitable.models import FieldOwner, FieldType, ViewType
from agentkit.bitable.service import BitableService, FieldDependencyError
from agentkit.server.auth.dependencies import get_current_user
logger = logging.getLogger(__name__)
router = APIRouter(prefix="/bitable", tags=["bitable"])
# ── Upload config (U6) ───────────────────────────────────
MAX_UPLOAD_SIZE = 10 * 1024 * 1024 # 10 MB
BITABLE_UPLOAD_DIR = Path(os.environ.get("AGENTKIT_BITABLE_UPLOAD_DIR", "data/uploads/bitable"))
_IMAGE_MIME_PREFIXES = ("image/",)
def _get_service(request: Request) -> BitableService:
"""Get bitable service from app.state, 503 if not initialized."""
service = getattr(request.app.state, "bitable_service", None)
if service is None:
raise HTTPException(
status_code=503,
detail="Bitable service not available. Server may not have initialized it.",
)
return service
async def require_bitable_auth(request: Request) -> dict[str, Any]:
"""Bitable-specific auth: accept JWT (via middleware) OR X-Internal-Token (KTD11).
The internal token is compared in constant time (hmac.compare_digest).
On success with the internal token, a synthetic service user is returned.
"""
# 1. Check internal service token (KTD11)
internal_token = getattr(request.app.state, "bitable_internal_token", None)
if internal_token:
provided = request.headers.get("X-Internal-Token", "")
if provided and hmac.compare_digest(provided, internal_token):
return {
"user_id": "__bitable_internal__",
"username": "bitable-internal",
"role": "admin",
"internal": True,
}
# 2. Fall back to JWT auth
user = await get_current_user(request)
if user is None:
raise HTTPException(
status_code=401,
detail="Authentication required (JWT or X-Internal-Token)",
)
return user
async def _check_table_ownership(
service: BitableService, table_id: str, user: dict[str, Any]
) -> None:
"""Verify the user owns the table. Internal service users bypass check.
Raises 404 if table not found, 403 if not owned.
"""
table = await service.get_table(table_id)
if table is None:
raise HTTPException(status_code=404, detail="Table not found")
if user.get("internal"):
return # Internal service token (KTD11) bypasses ownership
if table.owner_user_id and table.owner_user_id != user.get("user_id"):
raise HTTPException(status_code=403, detail="Not authorized to access this table")
# ---------------------------------------------------------------------------
# Request models
# ---------------------------------------------------------------------------
class CreateTableRequest(BaseModel):
name: str
description: str = ""
primary_key_field_id: str | None = None
class UpdateTableRequest(BaseModel):
name: str | None = None
description: str | None = None
primary_key_field_id: str | None = None
class CreateFieldRequest(BaseModel):
name: str
field_type: FieldType
config: dict[str, Any] = Field(default_factory=dict)
owner: FieldOwner = FieldOwner.user
class UpdateFieldRequest(BaseModel):
name: str | None = None
config: dict[str, Any] | None = None
class CreateRecordRequest(BaseModel):
values: dict[str, Any] = Field(default_factory=dict)
class BatchCreateRecordsRequest(BaseModel):
records: list[dict[str, Any]] = Field(default_factory=list, max_length=500)
class UpdateRecordRequest(BaseModel):
values: dict[str, Any] = Field(default_factory=dict)
class UpsertRequest(BaseModel):
records: list[dict[str, Any]]
primary_key_field_id: str
class CreateViewRequest(BaseModel):
name: str
view_type: ViewType = ViewType.grid
config: dict[str, Any] = Field(default_factory=dict)
class UpdateViewRequest(BaseModel):
name: str | None = None
config: dict[str, Any] | None = None
# ---------------------------------------------------------------------------
# Table endpoints
# ---------------------------------------------------------------------------
@router.post("/tables", status_code=201)
async def create_table(
body: CreateTableRequest,
request: Request,
user: dict = Depends(require_bitable_auth),
) -> dict[str, Any]:
service = _get_service(request)
table = await service.create_table(
name=body.name,
description=body.description,
primary_key_field_id=body.primary_key_field_id,
owner_user_id=user.get("user_id"),
)
return {"success": True, "table": table.model_dump(mode="json")}
@router.get("/tables")
async def list_tables(
request: Request,
user: dict = Depends(require_bitable_auth),
) -> dict[str, Any]:
service = _get_service(request)
tables = await service.list_tables(owner_user_id=user.get("user_id"))
return {"success": True, "tables": [t.model_dump(mode="json") for t in tables]}
@router.get("/tables/{table_id}")
async def get_table(
table_id: str,
request: Request,
user: dict = Depends(require_bitable_auth),
) -> dict[str, Any]:
service = _get_service(request)
await _check_table_ownership(service, table_id, user)
table = await service.get_table(table_id)
return {"success": True, "table": table.model_dump(mode="json")}
@router.patch("/tables/{table_id}")
async def update_table(
table_id: str,
body: UpdateTableRequest,
request: Request,
user: dict = Depends(require_bitable_auth),
) -> dict[str, Any]:
service = _get_service(request)
await _check_table_ownership(service, table_id, user)
kwargs = body.model_dump(exclude_none=True)
table = await service.update_table(table_id, **kwargs)
if table is None:
raise HTTPException(status_code=404, detail="Table not found")
return {"success": True, "table": table.model_dump(mode="json")}
@router.delete("/tables/{table_id}")
async def delete_table(
table_id: str,
request: Request,
user: dict = Depends(require_bitable_auth),
) -> dict[str, Any]:
service = _get_service(request)
await _check_table_ownership(service, table_id, user)
deleted = await service.delete_table(table_id)
if not deleted:
raise HTTPException(status_code=404, detail="Table not found")
return {"success": True}
# ---------------------------------------------------------------------------
# Field endpoints
# ---------------------------------------------------------------------------
@router.post("/tables/{table_id}/fields", status_code=201)
async def create_field(
table_id: str,
body: CreateFieldRequest,
request: Request,
user: dict = Depends(require_bitable_auth),
) -> dict[str, Any]:
service = _get_service(request)
await _check_table_ownership(service, table_id, user)
field = await service.create_field(
table_id=table_id,
name=body.name,
field_type=body.field_type,
config=body.config,
owner=body.owner,
)
return {"success": True, "field": field.model_dump(mode="json")}
@router.get("/tables/{table_id}/fields")
async def list_fields(
table_id: str,
request: Request,
user: dict = Depends(require_bitable_auth),
) -> dict[str, Any]:
service = _get_service(request)
await _check_table_ownership(service, table_id, user)
fields = await service.list_fields(table_id)
return {"success": True, "fields": [f.model_dump(mode="json") for f in fields]}
@router.patch("/fields/{field_id}")
async def update_field(
field_id: str,
body: UpdateFieldRequest,
request: Request,
user: dict = Depends(require_bitable_auth),
) -> dict[str, Any]:
service = _get_service(request)
kwargs = body.model_dump(exclude_none=True)
field = await service.update_field(field_id, **kwargs)
if field is None:
raise HTTPException(status_code=404, detail="Field not found")
return {"success": True, "field": field.model_dump(mode="json")}
@router.delete("/fields/{field_id}")
async def delete_field(
field_id: str,
request: Request,
force: bool = Query(False, description="Force delete with cascade cleanup"),
user: dict = Depends(require_bitable_auth),
) -> dict[str, Any]:
service = _get_service(request)
try:
deleted = await service.delete_field(field_id, force=force)
except FieldDependencyError as e:
raise HTTPException(
status_code=409,
detail={"message": str(e), "dependencies": e.dependencies},
)
if not deleted:
raise HTTPException(status_code=404, detail="Field not found")
return {"success": True}
# ---------------------------------------------------------------------------
# Formula validation endpoint (U5b)
# ---------------------------------------------------------------------------
class ValidateFormulaRequest(BaseModel):
"""Request body for formula syntax validation."""
formula: str = Field(..., min_length=1, max_length=2000)
@router.post("/fields/validate-formula")
async def validate_formula(
body: ValidateFormulaRequest,
user: dict = Depends(require_bitable_auth),
) -> dict[str, Any]:
"""Validate formula syntax without saving. Returns valid flag + error detail."""
from agentkit.bitable.formula import (
FormulaParseError,
FormulaSecurityError,
UnknownFunctionError,
parse_formula,
)
try:
parse_formula(body.formula)
except (FormulaParseError, FormulaSecurityError, UnknownFunctionError) as e:
return {"valid": False, "error": str(e)}
except Exception as e: # pragma: no cover — defensive
return {"valid": False, "error": f"Unexpected error: {e}"}
return {"valid": True}
# ---------------------------------------------------------------------------
# Record endpoints
# ---------------------------------------------------------------------------
@router.post("/tables/{table_id}/records", status_code=201)
async def create_records(
table_id: str,
body: BatchCreateRecordsRequest,
request: Request,
user: dict = Depends(require_bitable_auth),
) -> dict[str, Any]:
service = _get_service(request)
await _check_table_ownership(service, table_id, user)
created = []
for rec_values in body.records:
record = await service.create_record(table_id, values=rec_values)
created.append(record.model_dump(mode="json"))
return {"success": True, "count": len(created), "records": created}
@router.get("/tables/{table_id}/records")
async def list_records(
table_id: str,
request: Request,
cursor: str | None = Query(None),
limit: int = Query(50, ge=1, le=200),
filters: str | None = Query(None, description="JSON-encoded filter list"),
sorts: str | None = Query(None, description="JSON-encoded sort list"),
user: dict = Depends(require_bitable_auth),
) -> dict[str, Any]:
import json
service = _get_service(request)
await _check_table_ownership(service, table_id, user)
try:
parsed_filters = json.loads(filters) if filters else None
parsed_sorts = json.loads(sorts) if sorts else None
except json.JSONDecodeError as e:
raise HTTPException(status_code=400, detail=f"Invalid JSON in filters/sorts: {e}")
if parsed_filters or parsed_sorts:
records, next_cursor = await service.list_records_filtered(
table_id,
filters=parsed_filters,
sorts=parsed_sorts,
cursor=cursor,
limit=limit,
)
else:
records, next_cursor = await service.list_records(table_id, cursor=cursor, limit=limit)
return {
"success": True,
"records": [r.model_dump(mode="json") for r in records],
"next_cursor": next_cursor,
}
@router.patch("/records/{record_id}")
async def update_record(
record_id: str,
body: UpdateRecordRequest,
request: Request,
user: dict = Depends(require_bitable_auth),
) -> dict[str, Any]:
service = _get_service(request)
record = await service.update_record_values(record_id, body.values)
if record is None:
raise HTTPException(status_code=404, detail="Record not found")
return {"success": True, "record": record.model_dump(mode="json")}
@router.delete("/tables/{table_id}/records")
async def delete_records(
table_id: str,
request: Request,
user: dict = Depends(require_bitable_auth),
) -> dict[str, Any]:
service = _get_service(request)
await _check_table_ownership(service, table_id, user)
count = await service.delete_records_by_table(table_id)
return {"success": True, "deleted": count}
@router.delete("/records/{record_id}")
async def delete_single_record(
record_id: str,
request: Request,
user: dict = Depends(require_bitable_auth),
) -> dict[str, Any]:
service = _get_service(request)
deleted = await service.delete_record(record_id)
if not deleted:
raise HTTPException(status_code=404, detail="Record not found")
return {"success": True}
# ---------------------------------------------------------------------------
# Upsert endpoint (KTD8)
# ---------------------------------------------------------------------------
@router.post("/tables/{table_id}/upsert", status_code=201)
async def upsert_records(
table_id: str,
body: UpsertRequest,
request: Request,
user: dict = Depends(require_bitable_auth),
) -> dict[str, Any]:
service = _get_service(request)
await _check_table_ownership(service, table_id, user)
if not body.primary_key_field_id:
raise HTTPException(status_code=400, detail="primary_key_field_id is required")
result = await service.upsert_records(table_id, body.records, body.primary_key_field_id)
return {"success": True, **result}
# ---------------------------------------------------------------------------
# View endpoints
# ---------------------------------------------------------------------------
@router.post("/tables/{table_id}/views", status_code=201)
async def create_view(
table_id: str,
body: CreateViewRequest,
request: Request,
user: dict = Depends(require_bitable_auth),
) -> dict[str, Any]:
service = _get_service(request)
await _check_table_ownership(service, table_id, user)
view = await service.create_view(
table_id=table_id,
name=body.name,
view_type=body.view_type,
config=body.config,
)
return {"success": True, "view": view.model_dump(mode="json")}
@router.get("/tables/{table_id}/views")
async def list_views(
table_id: str,
request: Request,
user: dict = Depends(require_bitable_auth),
) -> dict[str, Any]:
service = _get_service(request)
await _check_table_ownership(service, table_id, user)
views = await service.list_views(table_id)
return {"success": True, "views": [v.model_dump(mode="json") for v in views]}
@router.patch("/views/{view_id}")
async def update_view(
view_id: str,
body: UpdateViewRequest,
request: Request,
user: dict = Depends(require_bitable_auth),
) -> dict[str, Any]:
service = _get_service(request)
kwargs = body.model_dump(exclude_none=True)
view = await service.update_view(view_id, **kwargs)
if view is None:
raise HTTPException(status_code=404, detail="View not found")
return {"success": True, "view": view.model_dump(mode="json")}
# ---------------------------------------------------------------------------
# File upload / download (U6: attachment & image fields)
# ---------------------------------------------------------------------------
def _sanitize_filename(name: str) -> str:
"""Remove path separators and keep only safe characters."""
name = name.replace("\\", "_").replace("/", "_")
return "".join(c for c in name if c.isalnum() or c in "._-").strip(".")
def _ensure_upload_dir() -> Path:
BITABLE_UPLOAD_DIR.mkdir(parents=True, exist_ok=True)
return BITABLE_UPLOAD_DIR
@router.post("/tables/{table_id}/upload")
async def upload_file(
table_id: str,
request: Request,
file: UploadFile = File(...),
field_id: str = Query(..., description="Target field ID (determines type validation)"),
user: dict = Depends(require_bitable_auth),
) -> dict[str, Any]:
"""Upload a file for an attachment or image field.
Returns file metadata. The frontend then writes this metadata array
to the record's field value via the normal update-record endpoint.
"""
service = _get_service(request)
await _check_table_ownership(service, table_id, user)
# Validate field exists and is attachment/image type
field = await service.get_field(field_id)
if field is None:
raise HTTPException(status_code=404, detail="Field not found")
if field.field_type not in (FieldType.attachment, FieldType.image):
raise HTTPException(
status_code=400,
detail=f"Field '{field.name}' is not an attachment or image field",
)
# Size check (Content-Length header, fast reject)
if file.size is not None and file.size > MAX_UPLOAD_SIZE:
raise HTTPException(status_code=413, detail="File exceeds 10 MB limit")
# Image type check
mime = file.content_type or "application/octet-stream"
if field.field_type == FieldType.image and not mime.startswith(_IMAGE_MIME_PREFIXES):
raise HTTPException(
status_code=400,
detail=f"Image field requires an image file, got '{mime}'",
)
original_name = file.filename or "unnamed"
safe_name = _sanitize_filename(original_name) or "unnamed"
ext = Path(safe_name).suffix
stored_name = f"{uuid.uuid4().hex}{ext}"
upload_dir = _ensure_upload_dir()
file_path = upload_dir / stored_name
# Stream-read with size check to prevent OOM (P1 #12)
total_size = 0
try:
with open(file_path, "wb") as f:
while True:
chunk = await file.read(64 * 1024) # 64KB chunks
if not chunk:
break
total_size += len(chunk)
if total_size > MAX_UPLOAD_SIZE:
f.close()
file_path.unlink(missing_ok=True)
raise HTTPException(status_code=413, detail="File exceeds 10 MB limit")
f.write(chunk)
except HTTPException:
raise
except Exception as exc:
file_path.unlink(missing_ok=True)
logger.error(f"Failed to save uploaded bitable file: {exc}")
raise HTTPException(status_code=500, detail="Failed to save file") from exc
finally:
await file.close()
return {
"filename": original_name,
"stored_name": stored_name,
"mime_type": mime,
"size": total_size,
"url": f"/api/v1/bitable/files/{stored_name}",
}
@router.get("/files/{filename}")
async def download_file(
filename: str,
user: dict = Depends(require_bitable_auth),
) -> FileResponse:
"""Download a bitable attachment/image file by its stored filename."""
safe_filename = _sanitize_filename(filename)
file_path = BITABLE_UPLOAD_DIR / safe_filename
if not file_path.exists() or not file_path.is_file():
raise HTTPException(status_code=404, detail="File not found")
return FileResponse(file_path, filename=safe_filename)

View File

@ -1,5 +1,6 @@
"""Task submission routes""" """Task submission routes"""
import asyncio
import json import json
import uuid import uuid
from dataclasses import asdict from dataclasses import asdict
@ -210,11 +211,17 @@ async def cancel_task(task_id: str, req: Request):
@router.post("/tasks/{task_id}/resume") @router.post("/tasks/{task_id}/resume")
async def resume_task(task_id: str, req: Request): async def resume_task(task_id: str, req: Request, plan_id: str | None = None):
"""Resume a crashed pipeline from the last completed phase checkpoint. """Resume a crashed pipeline from the last completed phase checkpoint.
Reconstructs the team from the saved plan's expert names, creates a new Reconstructs the team from the saved plan's expert names, creates a new
TeamOrchestrator with the checkpoint manager, and calls resume(). TeamOrchestrator with the checkpoint manager, and calls resume().
Args:
task_id: Task ID from the URL path.
plan_id: Optional plan ID. If not provided, falls back to task_id.
Needed because TeamPlan.id is auto-generated and may differ
from the task_id.
""" """
from agentkit.experts.orchestrator import TeamOrchestrator from agentkit.experts.orchestrator import TeamOrchestrator
from agentkit.experts.router import ExpertTeamRouter from agentkit.experts.router import ExpertTeamRouter
@ -223,72 +230,89 @@ async def resume_task(task_id: str, req: Request):
app_state = req.app.state app_state = req.app.state
# 1. Create checkpoint manager # Resolve plan_id: explicit param > task_id fallback
checkpoint = PipelineCheckpoint( resolved_plan_id = plan_id or task_id
redis_client=getattr(app_state, "working_redis_client", None)
)
# 2. Load plan to get expert names # P2 #10: 并发 resume 防护 — 同一 plan_id 的并发 resume 请求只允许一个执行
plan_dict = await checkpoint.load_plan(task_id) lock_attr = f"_resume_lock_{resolved_plan_id}"
if plan_dict is None: lock = getattr(app_state, lock_attr, None)
if lock is None:
lock = asyncio.Lock()
setattr(app_state, lock_attr, lock)
if lock.locked():
raise HTTPException( raise HTTPException(
status_code=404, status_code=409,
detail=f"No checkpoint found for task '{task_id}'", detail=f"Resume already in progress for plan '{resolved_plan_id}'",
)
async with lock:
# 1. Create checkpoint manager
checkpoint = PipelineCheckpoint(
redis_client=getattr(app_state, "working_redis_client", None)
) )
# 3. Extract unique expert names from plan # 2. Load plan to get expert names
expert_names: list[str] = [] plan_dict = await checkpoint.load_plan(resolved_plan_id)
lead_name = plan_dict.get("lead_expert", "") if plan_dict is None:
if lead_name: raise HTTPException(
expert_names.append(lead_name) status_code=404,
for ph in plan_dict.get("phases", []): detail=f"No checkpoint found for plan '{resolved_plan_id}'"
name = ph.get("assigned_expert", "") f" (task_id='{task_id}')",
if name and name not in expert_names: )
expert_names.append(name)
if not expert_names: # 3. Extract unique expert names from plan
raise HTTPException( expert_names: list[str] = []
status_code=400, lead_name = plan_dict.get("lead_expert", "")
detail="Cannot resume: no experts found in saved plan", if lead_name:
expert_names.append(lead_name)
for ph in plan_dict.get("phases", []):
name = ph.get("assigned_expert", "")
if name and name not in expert_names:
expert_names.append(name)
if not expert_names:
raise HTTPException(
status_code=400,
detail="Cannot resume: no experts found in saved plan",
)
# 4. Resolve expert configs via ExpertTeamRouter
template_registry = getattr(app_state, "expert_template_registry", None)
if template_registry is None:
from agentkit.experts.registry import ExpertTemplateRegistry
template_registry = ExpertTemplateRegistry()
team_router = ExpertTeamRouter(template_registry=template_registry)
expert_configs = team_router.resolve_expert_configs(expert_names)
if not expert_configs:
raise HTTPException(
status_code=400,
detail="Cannot resume: failed to resolve expert configs",
)
lead_config = expert_configs[0]
member_configs = expert_configs[1:] if len(expert_configs) > 1 else []
# 5. Create team + orchestrator
team = ExpertTeam(
pool=app_state.agent_pool,
template_registry=template_registry,
redis_client=getattr(app_state, "working_redis_client", None),
) )
await team.create_team(lead_config=lead_config, member_configs=member_configs)
# 4. Resolve expert configs via ExpertTeamRouter
template_registry = getattr(app_state, "expert_template_registry", None)
if template_registry is None:
from agentkit.experts.registry import ExpertTemplateRegistry
template_registry = ExpertTemplateRegistry()
team_router = ExpertTeamRouter(template_registry=template_registry)
expert_configs = team_router.resolve_expert_configs(expert_names)
if not expert_configs:
raise HTTPException(
status_code=400,
detail="Cannot resume: failed to resolve expert configs",
)
lead_config = expert_configs[0]
member_configs = expert_configs[1:] if len(expert_configs) > 1 else []
# 5. Create team + orchestrator
team = ExpertTeam(
pool=app_state.agent_pool,
template_registry=template_registry,
redis_client=getattr(app_state, "working_redis_client", None),
)
await team.create_team(lead_config=lead_config, member_configs=member_configs)
try:
orchestrator = TeamOrchestrator(team=team, checkpoint=checkpoint)
result = await orchestrator.resume(task_id)
finally:
try: try:
await team.dissolve() orchestrator = TeamOrchestrator(team=team, checkpoint=checkpoint)
except Exception: result = await orchestrator.resume(resolved_plan_id)
pass finally:
try:
await team.dissolve()
except Exception:
pass
return { return {
"task_id": task_id, "task_id": task_id,
"plan_id": resolved_plan_id,
"status": result.get("status", "unknown"), "status": result.get("status", "unknown"),
"result": result.get("result"), "result": result.get("result"),
"phase_results": { "phase_results": {

View File

@ -81,7 +81,7 @@ class SkillConfig(AgentConfig):
evolution: dict[str, Any] | None = None, evolution: dict[str, Any] | None = None,
# v3 新增字段SKILL.md 支持 # v3 新增字段SKILL.md 支持
skill_md_path: str | None = None, skill_md_path: str | None = None,
disclosure_level: int = 0, disclosure_level: int = 1, # 默认全量加载向后兼容0=概要模式需显式指定
# v4 新增字段:依赖声明、能力标签 # v4 新增字段:依赖声明、能力标签
dependencies: list[dict[str, Any] | DependencyDecl] | None = None, dependencies: list[dict[str, Any] | DependencyDecl] | None = None,
capabilities: list[str | dict[str, Any] | CapabilityTag] | None = None, capabilities: list[str | dict[str, Any] | CapabilityTag] | None = None,

View File

@ -0,0 +1,486 @@
"""BitableTool — Agent tool for bitable data ingestion and CRUD via HTTP.
Implements KTD5 (REST API boundary even when co-deployed) and KTD11
(internal service token auth). The tool uses ``httpx.AsyncClient`` to call
the bitable REST API; it never imports BitableService directly.
Actions: create_table, import_excel, import_database, collect_api,
upsert_records, query_records.
Batch chunking: upsert and import operations send at most ``BATCH_SIZE``
records per HTTP request. On partial failure, the result includes
``successful_count`` and ``resume_from`` for breakpoint continuation.
"""
from __future__ import annotations
import asyncio
import logging
from typing import Any
import httpx
from agentkit.bitable.ingestion.excel import ParsedSheet, parse_excel, parse_excel_url
from agentkit.bitable.ingestion.database import import_table as import_db_table
from agentkit.bitable.ingestion.api_collector import transform_records
from agentkit.tools.base import Tool
logger = logging.getLogger(__name__)
BATCH_SIZE = 500
class BitableTool(Tool):
"""Agent tool for bitable operations via REST API.
Args:
base_url: Bitable API base URL (e.g. ``http://localhost:8001/api/v1/bitable``).
internal_token: Service token for KTD11 auth. If ``None``, requests
go unauthenticated (will fail if the server requires auth).
"""
def __init__(self, base_url: str, internal_token: str | None = None) -> None:
super().__init__(
name="bitable",
description=(
"Create and manage bitable (multi-dimensional spreadsheet) tables, "
"ingest data from Excel files, databases, or API responses, and "
"query records. Actions: create_table, import_excel, "
"import_database, collect_api, upsert_records, query_records."
),
input_schema={
"type": "object",
"properties": {
"action": {
"type": "string",
"enum": [
"create_table",
"import_excel",
"import_database",
"collect_api",
"upsert_records",
"query_records",
],
"description": "Bitable operation to perform.",
},
"table_name": {
"type": "string",
"description": "Name for the new bitable table (create_table, import_excel, import_database).",
},
"description": {
"type": "string",
"description": "Table description (create_table).",
},
"file_path": {
"type": "string",
"description": "Path to .xlsx file (import_excel).",
},
"file_url": {
"type": "string",
"description": "URL to download .xlsx file (import_excel).",
},
"connection_string": {
"type": "string",
"description": "Database connection string (import_database).",
},
"table_names": {
"type": "array",
"items": {"type": "string"},
"description": "Source table names to import (import_database).",
},
"table_id": {
"type": "string",
"description": "Target bitable table ID (collect_api, upsert_records, query_records).",
},
"records": {
"type": "array",
"description": "Records to write (collect_api, upsert_records).",
},
"field_mapping": {
"type": "object",
"description": "Mapping {source_key: bitable_field_id} (collect_api).",
},
"primary_key_field_id": {
"type": "string",
"description": "Field ID of the primary key (upsert_records, collect_api).",
},
"resume_from": {
"type": "integer",
"description": "Skip this many records before resuming a failed batch (upsert_records, collect_api).",
},
"cursor": {
"type": "string",
"description": "Pagination cursor (query_records).",
},
"limit": {
"type": "integer",
"description": "Max records to return (query_records).",
},
},
"required": ["action"],
},
)
self._base_url = base_url.rstrip("/")
self._internal_token = internal_token
self._client: httpx.AsyncClient | None = None
async def _get_client(self) -> httpx.AsyncClient:
if self._client is None or self._client.is_closed:
headers: dict[str, str] = {}
if self._internal_token:
headers["X-Internal-Token"] = self._internal_token
self._client = httpx.AsyncClient(
base_url=self._base_url,
headers=headers,
timeout=60.0,
)
return self._client
async def close(self) -> None:
if self._client is not None and not self._client.is_closed:
await self._client.aclose()
async def execute(self, **kwargs) -> dict[str, Any]:
action = kwargs.get("action")
handlers = {
"create_table": self._create_table,
"import_excel": self._import_excel,
"import_database": self._import_database,
"collect_api": self._collect_api,
"upsert_records": self._upsert_records,
"query_records": self._query_records,
}
handler = handlers.get(action)
if handler is None:
return {"success": False, "error": f"Unknown action: {action!r}"}
try:
return await handler(**kwargs)
except httpx.HTTPStatusError as e:
return {
"success": False,
"error": f"Bitable API error {e.response.status_code}: {e.response.text[:500]}",
}
except httpx.ConnectError as e:
return {"success": False, "error": f"Cannot connect to bitable API: {e}"}
except Exception as e:
return {"success": False, "error": f"{action} failed: {e}"}
# ------------------------------------------------------------------
# create_table
# ------------------------------------------------------------------
async def _create_table(self, **kwargs) -> dict[str, Any]:
table_name = kwargs.get("table_name")
if not table_name:
return {"success": False, "error": "Missing required field: table_name"}
client = await self._get_client()
resp = await client.post(
"/tables",
json={"name": table_name, "description": kwargs.get("description", "")},
)
resp.raise_for_status()
data = resp.json()
return {"success": True, "table": data["table"]}
# ------------------------------------------------------------------
# import_excel
# ------------------------------------------------------------------
async def _import_excel(self, **kwargs) -> dict[str, Any]:
file_path = kwargs.get("file_path")
file_url = kwargs.get("file_url")
if not file_path and not file_url:
return {"success": False, "error": "Either file_path or file_url is required"}
# Parse Excel — offload sync I/O to thread pool (P2 #21-23).
if file_path:
sheets = await asyncio.to_thread(parse_excel, file_path)
else:
sheets = await asyncio.to_thread(parse_excel_url, file_url)
if not sheets:
return {"success": False, "error": "Excel file has no sheets with data"}
results: list[dict[str, Any]] = []
for sheet in sheets:
result = await self._import_sheet(sheet)
results.append(result)
return {"success": True, "sheets": results}
async def _import_sheet(self, sheet: ParsedSheet) -> dict[str, Any]:
"""Create a bitable table from a parsed sheet and upsert all rows."""
client = await self._get_client()
# 1. Create table
resp = await client.post("/tables", json={"name": sheet.name})
resp.raise_for_status()
table_id = resp.json()["table"]["id"]
# 2. Create fields
field_name_to_id: dict[str, str] = {}
for col_name, field_type in zip(sheet.columns, sheet.field_types):
resp = await client.post(
f"/tables/{table_id}/fields",
json={"name": col_name, "field_type": field_type, "owner": "agent"},
)
resp.raise_for_status()
field_id = resp.json()["field"]["id"]
field_name_to_id[col_name] = field_id
# 3. Map record keys to field IDs and batch upsert
mapped_records = [
{field_name_to_id[k]: v for k, v in rec.items() if k in field_name_to_id}
for rec in sheet.records
]
if not mapped_records:
return {
"table_id": table_id,
"table_name": sheet.name,
"field_count": len(field_name_to_id),
"record_count": 0,
}
# Use first field as PK fallback (import_excel doesn't require a PK)
# If no PK is set, upsert won't work — use create_records instead
upsert_result = await self._batch_create_records(table_id, mapped_records)
return {
"table_id": table_id,
"table_name": sheet.name,
"field_count": len(field_name_to_id),
"record_count": upsert_result["successful_count"],
**upsert_result,
}
async def _batch_create_records(
self, table_id: str, records: list[dict[str, Any]]
) -> dict[str, Any]:
"""Create records in batches via POST /tables/{id}/records."""
client = await self._get_client()
total = len(records)
successful = 0
errors: list[dict[str, Any]] = []
for start in range(0, total, BATCH_SIZE):
batch = records[start : start + BATCH_SIZE]
try:
resp = await client.post(
f"/tables/{table_id}/records",
json={"records": batch},
)
resp.raise_for_status()
successful += len(batch)
except httpx.HTTPStatusError as e:
errors.append(
{
"batch_start": start,
"batch_size": len(batch),
"status": e.response.status_code,
"error": e.response.text[:300],
}
)
break # stop on first failure
return {
"successful_count": successful,
"total": total,
"resume_from": successful,
**({"errors": errors} if errors else {}),
}
# ------------------------------------------------------------------
# import_database
# ------------------------------------------------------------------
async def _import_database(self, **kwargs) -> dict[str, Any]:
conn_str = kwargs.get("connection_string")
table_names = kwargs.get("table_names")
if not conn_str:
return {"success": False, "error": "Missing required field: connection_string"}
if not table_names:
return {"success": False, "error": "Missing required field: table_names"}
results: list[dict[str, Any]] = []
for src_table in table_names:
try:
# Offload sync DB reflection to thread pool (P2 #21-23).
reflected = await asyncio.to_thread(import_db_table, conn_str, src_table)
result = await self._import_reflected_table(reflected)
results.append(result)
except ConnectionError as e:
return {"success": False, "error": str(e), "imported": results}
except Exception as e:
results.append({"table_name": src_table, "success": False, "error": str(e)})
return {"success": True, "tables": results}
async def _import_reflected_table(self, reflected: dict[str, Any]) -> dict[str, Any]:
"""Create a bitable table from reflected DB data and upsert rows."""
client = await self._get_client()
table_name = reflected["table_name"]
# 1. Create table
resp = await client.post("/tables", json={"name": table_name})
resp.raise_for_status()
table_id = resp.json()["table"]["id"]
# 2. Create fields
field_name_to_id: dict[str, str] = {}
pk_field_id: str | None = None
for fdef in reflected["fields"]:
resp = await client.post(
f"/tables/{table_id}/fields",
json={
"name": fdef["name"],
"field_type": fdef["field_type"],
"owner": "agent",
},
)
resp.raise_for_status()
fid = resp.json()["field"]["id"]
field_name_to_id[fdef["name"]] = fid
if fdef.get("is_primary_key"):
pk_field_id = fid
# 3. Set primary key
if pk_field_id:
await client.patch("/tables/" + table_id, json={"primary_key_field_id": pk_field_id})
# 4. Map and upsert records
mapped = [
{field_name_to_id[k]: v for k, v in rec.items() if k in field_name_to_id}
for rec in reflected["records"]
]
if not mapped:
return {
"table_id": table_id,
"table_name": table_name,
"record_count": 0,
"success": True,
}
if pk_field_id:
upsert = await self._batch_upsert(table_id, mapped, pk_field_id)
else:
upsert = await self._batch_create_records(table_id, mapped)
return {
"table_id": table_id,
"table_name": table_name,
"record_count": upsert["successful_count"],
"success": True,
**upsert,
}
# ------------------------------------------------------------------
# collect_api
# ------------------------------------------------------------------
async def _collect_api(self, **kwargs) -> dict[str, Any]:
table_id = kwargs.get("table_id")
records = kwargs.get("records")
field_mapping = kwargs.get("field_mapping")
pk_field_id = kwargs.get("primary_key_field_id")
resume_from = kwargs.get("resume_from", 0)
if not table_id:
return {"success": False, "error": "Missing required field: table_id"}
if not records:
return {"success": False, "error": "Missing required field: records"}
if not field_mapping:
return {"success": False, "error": "Missing required field: field_mapping"}
if not pk_field_id:
return {"success": False, "error": "Missing required field: primary_key_field_id"}
transformed = transform_records(records, field_mapping)
if resume_from > 0:
transformed = transformed[resume_from:]
result = await self._batch_upsert(table_id, transformed, pk_field_id)
return {"success": True, **result}
# ------------------------------------------------------------------
# upsert_records
# ------------------------------------------------------------------
async def _upsert_records(self, **kwargs) -> dict[str, Any]:
table_id = kwargs.get("table_id")
records = kwargs.get("records")
pk_field_id = kwargs.get("primary_key_field_id")
resume_from = kwargs.get("resume_from", 0)
if not table_id:
return {"success": False, "error": "Missing required field: table_id"}
if not records:
return {"success": False, "error": "Missing required field: records"}
if not pk_field_id:
return {"success": False, "error": "Missing required field: primary_key_field_id"}
batch = records[resume_from:] if resume_from > 0 else records
result = await self._batch_upsert(table_id, batch, pk_field_id)
return {"success": True, **result}
async def _batch_upsert(
self, table_id: str, records: list[dict[str, Any]], pk_field_id: str
) -> dict[str, Any]:
"""Upsert records in batches of BATCH_SIZE via POST /tables/{id}/upsert."""
client = await self._get_client()
total = len(records)
successful = 0
errors: list[dict[str, Any]] = []
for start in range(0, total, BATCH_SIZE):
batch = records[start : start + BATCH_SIZE]
try:
resp = await client.post(
f"/tables/{table_id}/upsert",
json={
"records": batch,
"primary_key_field_id": pk_field_id,
},
)
resp.raise_for_status()
data = resp.json()
successful += data.get("inserted", 0) + data.get("updated", 0)
except httpx.HTTPStatusError as e:
errors.append(
{
"batch_start": start,
"batch_size": len(batch),
"status": e.response.status_code,
"error": e.response.text[:300],
}
)
break
return {
"successful_count": successful,
"total": total,
"resume_from": successful,
**({"errors": errors} if errors else {}),
}
# ------------------------------------------------------------------
# query_records
# ------------------------------------------------------------------
async def _query_records(self, **kwargs) -> dict[str, Any]:
table_id = kwargs.get("table_id")
if not table_id:
return {"success": False, "error": "Missing required field: table_id"}
client = await self._get_client()
params: dict[str, Any] = {}
if kwargs.get("cursor"):
params["cursor"] = kwargs["cursor"]
if kwargs.get("limit"):
params["limit"] = kwargs["limit"]
resp = await client.get(f"/tables/{table_id}/records", params=params)
resp.raise_for_status()
data = resp.json()
return {
"success": True,
"records": data["records"],
"next_cursor": data.get("next_cursor"),
}

View File

View File

@ -0,0 +1,143 @@
"""Shared fixtures for bitable unit tests.
Provides:
- ``bitable_db``: initialized BitableDB (skips if PG unavailable)
- ``bitable_service``: BitableService backed by bitable_db
- Factory functions: make_table, make_field, make_record, make_formula_field
"""
from __future__ import annotations
import os
import pytest
from agentkit.bitable.models import FieldOwner, FieldType
def _pg_available() -> bool:
"""Check if PostgreSQL is reachable via DATABASE_URL env var."""
url = os.environ.get("DATABASE_URL") or os.environ.get("AGENTKIT_DATABASE_URL")
return bool(url)
@pytest.fixture
async def bitable_db():
"""Initialize a fresh bitable DB for each test (skips if PG unavailable)."""
if not _pg_available():
pytest.skip("PostgreSQL not available (set DATABASE_URL)")
from agentkit.bitable.db import BitableDB
db = BitableDB()
try:
await db.init()
# Clean slate: drop and recreate bitable schema
from sqlalchemy import text
async with db.engine.begin() as conn:
await conn.execute(text("DROP SCHEMA IF EXISTS bitable CASCADE"))
await db.init() # re-create fresh
yield db
finally:
# Cleanup
from sqlalchemy import text
if db.engine is not None:
async with db.engine.begin() as conn:
await conn.execute(text("DROP SCHEMA IF EXISTS bitable CASCADE"))
await db.close()
@pytest.fixture
async def bitable_service(bitable_db):
"""BitableService backed by the test bitable_db."""
from agentkit.bitable.service import BitableService
yield BitableService(bitable_db)
# ── Factory fixtures ───────────────────────────────────────
@pytest.fixture
def make_table(bitable_service):
"""Factory: create a table and return it."""
counter = [0]
async def _make(
name: str | None = None,
description: str = "",
primary_key_field_id: str | None = None,
):
counter[0] += 1
return await bitable_service.create_table(
name=name or f"test_table_{counter[0]}",
description=description,
primary_key_field_id=primary_key_field_id,
)
return _make
@pytest.fixture
def make_field(bitable_service):
"""Factory: create a field and return it."""
counter = [0]
async def _make(
table_id: str,
name: str | None = None,
field_type: FieldType = FieldType.text,
config: dict | None = None,
owner: FieldOwner = FieldOwner.agent,
):
counter[0] += 1
return await bitable_service.create_field(
table_id=table_id,
name=name or f"field_{counter[0]}",
field_type=field_type,
config=config or {},
owner=owner,
)
return _make
@pytest.fixture
def make_record(bitable_service):
"""Factory: create a record and return it."""
counter = [0]
async def _make(table_id: str, values: dict | None = None):
counter[0] += 1
return await bitable_service.create_record(
table_id=table_id,
values=values or {},
)
return _make
@pytest.fixture
def make_formula_field(bitable_service):
"""Factory: create a formula field and return it."""
counter = [0]
async def _make(
table_id: str,
name: str | None = None,
formula_expr: str = "=1+1",
):
counter[0] += 1
return await bitable_service.create_field(
table_id=table_id,
name=name or f"calc_{counter[0]}",
field_type=FieldType.formula,
config={"formula_expr": formula_expr},
owner=FieldOwner.user,
)
return _make

View File

@ -0,0 +1,322 @@
"""Tests for U6: attachment & image field upload, download, and cleanup.
Requires PostgreSQL marked ``postgres``. Uses ``httpx.AsyncClient`` with
``ASGITransport`` (same pattern as test_routes.py).
"""
from __future__ import annotations
import io
from pathlib import Path
from typing import Any
import httpx
import pytest
from fastapi import FastAPI
from httpx import ASGITransport
from agentkit.bitable.service import BitableService
from agentkit.server.routes import bitable as bitable_routes
from agentkit.server.routes.bitable import require_bitable_auth
pytestmark = pytest.mark.postgres
TEST_USER_ID = "test-user-id"
def _make_test_user() -> dict[str, Any]:
return {"user_id": TEST_USER_ID, "username": "testuser", "role": "member"}
@pytest.fixture
def app(bitable_service: BitableService, tmp_path: Path, monkeypatch: pytest.MonkeyPatch) -> FastAPI:
"""Test app with upload dir redirected to tmp_path."""
upload_dir = tmp_path / "bitable_uploads"
# Patch both the routes module variable AND the env var (service reads env var)
monkeypatch.setattr(bitable_routes, "BITABLE_UPLOAD_DIR", upload_dir)
monkeypatch.setenv("AGENTKIT_BITABLE_UPLOAD_DIR", str(upload_dir))
app = FastAPI()
app.state.bitable_service = bitable_service
app.include_router(bitable_routes.router, prefix="/api/v1")
app.dependency_overrides[require_bitable_auth] = lambda: _make_test_user()
return app
@pytest.fixture
async def client(app: FastAPI) -> httpx.AsyncClient:
transport = ASGITransport(app=app)
async with httpx.AsyncClient(transport=transport, base_url="http://test") as c:
yield c
# ---------------------------------------------------------------------------
# Helpers
# ---------------------------------------------------------------------------
async def _create_table_with_field(
client: httpx.AsyncClient,
field_type: str,
field_name: str = "files",
) -> tuple[str, str]:
"""Create a table + a field, return (table_id, field_id)."""
table_id = (
await client.post("/api/v1/bitable/tables", json={"name": "T"})
).json()["table"]["id"]
field_id = (
await client.post(
f"/api/v1/bitable/tables/{table_id}/fields",
json={"name": field_name, "field_type": field_type, "owner": "agent"},
)
).json()["field"]["id"]
return table_id, field_id
def _make_image_bytes(name: str = "test.png", size: int = 100) -> tuple[bytes, str]:
"""Minimal valid PNG header + padding."""
png_header = b"\x89PNG\r\n\x1a\n"
body = b"\x00" * size
return png_header + body, name
def _make_pdf_bytes(name: str = "doc.pdf", size: int = 50) -> tuple[bytes, str]:
return b"%PDF-1.4\n" + b"\x00" * size, name
# ---------------------------------------------------------------------------
# Upload tests
# ---------------------------------------------------------------------------
async def test_upload_image_success(client: httpx.AsyncClient, tmp_path: Path) -> None:
table_id, field_id = await _create_table_with_field(client, "image")
img_bytes, img_name = _make_image_bytes()
resp = await client.post(
f"/api/v1/bitable/tables/{table_id}/upload",
params={"field_id": field_id},
files={"file": (img_name, io.BytesIO(img_bytes), "image/png")},
)
assert resp.status_code == 200
data = resp.json()
assert data["filename"] == img_name
assert data["mime_type"] == "image/png"
assert data["size"] == len(img_bytes)
assert data["stored_name"].endswith(".png")
assert data["url"].startswith("/api/v1/bitable/files/")
# File exists on disk
file_path = bitable_routes.BITABLE_UPLOAD_DIR / data["stored_name"]
assert file_path.exists()
assert file_path.read_bytes() == img_bytes
async def test_upload_attachment_pdf(client: httpx.AsyncClient) -> None:
table_id, field_id = await _create_table_with_field(client, "attachment")
pdf_bytes, pdf_name = _make_pdf_bytes()
resp = await client.post(
f"/api/v1/bitable/tables/{table_id}/upload",
params={"field_id": field_id},
files={"file": (pdf_name, io.BytesIO(pdf_bytes), "application/pdf")},
)
assert resp.status_code == 200
data = resp.json()
assert data["filename"] == pdf_name
assert data["mime_type"] == "application/pdf"
async def test_upload_image_rejects_non_image(client: httpx.AsyncClient) -> None:
table_id, field_id = await _create_table_with_field(client, "image")
pdf_bytes, _ = _make_pdf_bytes()
resp = await client.post(
f"/api/v1/bitable/tables/{table_id}/upload",
params={"field_id": field_id},
files={"file": ("doc.pdf", io.BytesIO(pdf_bytes), "application/pdf")},
)
assert resp.status_code == 400
assert "image" in resp.json()["detail"].lower()
async def test_upload_rejects_non_attachment_field(client: httpx.AsyncClient) -> None:
table_id, field_id = await _create_table_with_field(client, "text")
img_bytes, _ = _make_image_bytes()
resp = await client.post(
f"/api/v1/bitable/tables/{table_id}/upload",
params={"field_id": field_id},
files={"file": ("test.png", io.BytesIO(img_bytes), "image/png")},
)
assert resp.status_code == 400
async def test_upload_404_unknown_field(client: httpx.AsyncClient) -> None:
table_id = (
await client.post("/api/v1/bitable/tables", json={"name": "T"})
).json()["table"]["id"]
img_bytes, _ = _make_image_bytes()
resp = await client.post(
f"/api/v1/bitable/tables/{table_id}/upload",
params={"field_id": "nonexistent"},
files={"file": ("test.png", io.BytesIO(img_bytes), "image/png")},
)
assert resp.status_code == 404
async def test_upload_requires_auth(bitable_service: BitableService) -> None:
"""No auth override → 401."""
app = FastAPI()
app.state.bitable_service = bitable_service
app.include_router(bitable_routes.router, prefix="/api/v1")
transport = ASGITransport(app=app)
async with httpx.AsyncClient(transport=transport, base_url="http://test") as c:
resp = await c.post(
"/api/v1/bitable/tables/x/upload",
params={"field_id": "y"},
files={"file": ("t.png", io.BytesIO(b"x"), "image/png")},
)
assert resp.status_code == 401
# ---------------------------------------------------------------------------
# Download tests
# ---------------------------------------------------------------------------
async def test_download_file_success(client: httpx.AsyncClient) -> None:
table_id, field_id = await _create_table_with_field(client, "image")
img_bytes, img_name = _make_image_bytes()
upload_resp = await client.post(
f"/api/v1/bitable/tables/{table_id}/upload",
params={"field_id": field_id},
files={"file": (img_name, io.BytesIO(img_bytes), "image/png")},
)
stored_name = upload_resp.json()["stored_name"]
resp = await client.get(f"/api/v1/bitable/files/{stored_name}")
assert resp.status_code == 200
assert resp.content == img_bytes
async def test_download_404_missing_file(client: httpx.AsyncClient) -> None:
resp = await client.get("/api/v1/bitable/files/nonexistent.png")
assert resp.status_code == 404
# ---------------------------------------------------------------------------
# Attachment cleanup on record deletion
# ---------------------------------------------------------------------------
async def test_delete_record_cleans_up_files(
client: httpx.AsyncClient,
bitable_service: BitableService,
) -> None:
table_id, field_id = await _create_table_with_field(client, "image")
img_bytes, _ = _make_image_bytes()
upload_resp = await client.post(
f"/api/v1/bitable/tables/{table_id}/upload",
params={"field_id": field_id},
files={"file": ("pic.png", io.BytesIO(img_bytes), "image/png")},
)
file_meta = upload_resp.json()
stored_name = file_meta["stored_name"]
# Create a record with the image metadata
create_resp = await client.post(
f"/api/v1/bitable/tables/{table_id}/records",
json={"records": [{field_id: [file_meta]}]},
)
record_id = create_resp.json()["records"][0]["id"]
# Verify file exists
file_path = bitable_routes.BITABLE_UPLOAD_DIR / stored_name
assert file_path.exists()
# Delete the record
del_resp = await client.delete(f"/api/v1/bitable/records/{record_id}")
assert del_resp.status_code == 200
# File should be gone
assert not file_path.exists()
async def test_delete_records_by_table_cleans_up_files(
client: httpx.AsyncClient,
) -> None:
table_id, field_id = await _create_table_with_field(client, "attachment")
pdf_bytes, _ = _make_pdf_bytes()
upload_resp = await client.post(
f"/api/v1/bitable/tables/{table_id}/upload",
params={"field_id": field_id},
files={"file": ("doc.pdf", io.BytesIO(pdf_bytes), "application/pdf")},
)
file_meta = upload_resp.json()
stored_name = file_meta["stored_name"]
await client.post(
f"/api/v1/bitable/tables/{table_id}/records",
json={"records": [{field_id: [file_meta]}]},
)
file_path = bitable_routes.BITABLE_UPLOAD_DIR / stored_name
assert file_path.exists()
# Delete all records
resp = await client.delete(f"/api/v1/bitable/tables/{table_id}/records")
assert resp.status_code == 200
assert not file_path.exists()
async def test_delete_record_when_file_already_missing(
client: httpx.AsyncClient,
) -> None:
"""Record deletion should succeed even if the physical file is gone."""
table_id, field_id = await _create_table_with_field(client, "image")
img_bytes, _ = _make_image_bytes()
upload_resp = await client.post(
f"/api/v1/bitable/tables/{table_id}/upload",
params={"field_id": field_id},
files={"file": ("pic.png", io.BytesIO(img_bytes), "image/png")},
)
file_meta = upload_resp.json()
stored_name = file_meta["stored_name"]
create_resp = await client.post(
f"/api/v1/bitable/tables/{table_id}/records",
json={"records": [{field_id: [file_meta]}]},
)
record_id = create_resp.json()["records"][0]["id"]
# Manually delete the file before deleting the record
file_path = bitable_routes.BITABLE_UPLOAD_DIR / stored_name
file_path.unlink()
assert not file_path.exists()
# Record deletion should still succeed
del_resp = await client.delete(f"/api/v1/bitable/records/{record_id}")
assert del_resp.status_code == 200
# ---------------------------------------------------------------------------
# Multiple files in one field
# ---------------------------------------------------------------------------
async def test_multiple_files_in_attachment_field(client: httpx.AsyncClient) -> None:
table_id, field_id = await _create_table_with_field(client, "attachment")
metas = []
for name in ("a.pdf", "b.pdf"):
pdf_bytes, _ = _make_pdf_bytes(name)
resp = await client.post(
f"/api/v1/bitable/tables/{table_id}/upload",
params={"field_id": field_id},
files={"file": (name, io.BytesIO(pdf_bytes), "application/pdf")},
)
metas.append(resp.json())
# Store all files as an array in one record
create_resp = await client.post(
f"/api/v1/bitable/tables/{table_id}/records",
json={"records": [{field_id: metas}]},
)
assert create_resp.status_code == 200
record = create_resp.json()["records"][0]
assert len(record["values"][field_id]) == 2

View File

@ -0,0 +1,485 @@
"""Tests for BitableTool (U4).
Tests the full HTTP flow: BitableTool bitable REST API BitableService.
Uses ``httpx.AsyncClient`` + ``ASGITransport`` so the tool's HTTP calls
and the bitable DB share one event loop.
Covers:
- KTD11: X-Internal-Token auth (valid token accepted, invalid rejected)
- Batch chunking: 1200 records 3 HTTP requests (500+500+200)
- Resume from partial failure
- Three ingestion types: Excel, database, API collector
- create_table, upsert_records, query_records
"""
from __future__ import annotations
import io
import httpx
import pytest
from fastapi import FastAPI
from httpx import ASGITransport
from agentkit.bitable.service import BitableService
from agentkit.server.routes import bitable as bitable_routes
from agentkit.server.routes.bitable import require_bitable_auth
from agentkit.tools.bitable_tool import BATCH_SIZE, BitableTool
pytestmark = pytest.mark.postgres
TEST_TOKEN = "test-internal-token-abc123"
TEST_USER = {"user_id": "test-user", "username": "tester", "role": "member"}
# ---------------------------------------------------------------------------
# Fixtures
# ---------------------------------------------------------------------------
@pytest.fixture
def app(bitable_service: BitableService) -> FastAPI:
"""Test app with bitable_service + internal token on app.state."""
app = FastAPI()
app.state.bitable_service = bitable_service
app.state.bitable_internal_token = TEST_TOKEN
app.include_router(bitable_routes.router, prefix="/api/v1")
# Override auth so JWT path also works (for non-internal-token tests)
app.dependency_overrides[require_bitable_auth] = lambda: TEST_USER
return app
@pytest.fixture
def app_no_override(bitable_service: BitableService) -> FastAPI:
"""App without auth override — tests real X-Internal-Token path."""
app = FastAPI()
app.state.bitable_service = bitable_service
app.state.bitable_internal_token = TEST_TOKEN
app.include_router(bitable_routes.router, prefix="/api/v1")
return app
@pytest.fixture
def app_no_token(bitable_service: BitableService) -> FastAPI:
"""App without internal token configured."""
app = FastAPI()
app.state.bitable_service = bitable_service
app.include_router(bitable_routes.router, prefix="/api/v1")
app.dependency_overrides[require_bitable_auth] = lambda: TEST_USER
return app
def _make_client(app: FastAPI, token: str | None = None) -> httpx.AsyncClient:
"""Create an httpx AsyncClient backed by ASGITransport.
If token is provided, the X-Internal-Token header is set as default
on the client mirroring how BitableTool._get_client configures it.
"""
base = "http://test/api/v1/bitable"
transport = ASGITransport(app=app)
headers: dict[str, str] = {}
if token:
headers["X-Internal-Token"] = token
return httpx.AsyncClient(transport=transport, base_url=base, headers=headers)
@pytest.fixture
async def tool(app: FastAPI) -> BitableTool:
"""BitableTool pointing at the test app via ASGITransport.
ponytail: We patch _client to use ASGITransport instead of real
HTTP this shares the event loop with the async DB fixtures.
"""
client = _make_client(app, token=TEST_TOKEN)
t = BitableTool(base_url="http://test/api/v1/bitable", internal_token=TEST_TOKEN)
t._client = client
yield t
await client.aclose()
@pytest.fixture
async def tool_no_token(app_no_token: FastAPI) -> BitableTool:
"""BitableTool without internal token."""
client = _make_client(app_no_token, token=None)
t = BitableTool(base_url="http://test/api/v1/bitable", internal_token=None)
t._client = client
yield t
await client.aclose()
@pytest.fixture
async def tool_real_auth(app_no_override: FastAPI) -> BitableTool:
"""BitableTool that sends real X-Internal-Token header (no auth override)."""
client = _make_client(app_no_override, token=TEST_TOKEN)
t = BitableTool(base_url="http://test/api/v1/bitable", internal_token=TEST_TOKEN)
t._client = client
yield t
await client.aclose()
# ---------------------------------------------------------------------------
# Helpers
# ---------------------------------------------------------------------------
def _make_xlsx(sheets: dict[str, list[list]]) -> bytes:
"""Create an in-memory .xlsx file."""
from openpyxl import Workbook
wb = Workbook()
wb.remove(wb.active)
for name, rows in sheets.items():
ws = wb.create_sheet(title=name)
for row in rows:
ws.append(row)
buf = io.BytesIO()
wb.save(buf)
return buf.getvalue()
async def _setup_table_with_pk(tool: BitableTool, name: str = "T") -> tuple[str, str, str]:
"""Create a table with a text PK field and a number data field.
Returns (table_id, pk_field_id, data_field_id).
"""
result = await tool.execute(action="create_table", table_name=name)
assert result["success"], result
table_id = result["table"]["id"]
client = await tool._get_client()
# Create PK field
resp = await client.post(
f"/tables/{table_id}/fields",
json={"name": "id", "field_type": "text", "owner": "agent"},
)
resp.raise_for_status()
pk_field_id = resp.json()["field"]["id"]
# Create data field
resp = await client.post(
f"/tables/{table_id}/fields",
json={"name": "val", "field_type": "number", "owner": "agent"},
)
resp.raise_for_status()
data_field_id = resp.json()["field"]["id"]
# Set PK
resp = await client.patch(f"/tables/{table_id}", json={"primary_key_field_id": pk_field_id})
resp.raise_for_status()
return table_id, pk_field_id, data_field_id
# ---------------------------------------------------------------------------
# create_table
# ---------------------------------------------------------------------------
async def test_create_table(tool: BitableTool) -> None:
"""create_table action creates a bitable table via HTTP."""
result = await tool.execute(action="create_table", table_name="MyTable")
assert result["success"] is True
assert result["table"]["name"] == "MyTable"
async def test_create_table_missing_name(tool: BitableTool) -> None:
"""Missing table_name → error."""
result = await tool.execute(action="create_table")
assert result["success"] is False
assert "table_name" in result["error"]
# ---------------------------------------------------------------------------
# KTD11: Internal token auth
# ---------------------------------------------------------------------------
async def test_internal_token_accepted(tool_real_auth: BitableTool) -> None:
"""Valid X-Internal-Token → request succeeds (no JWT needed)."""
result = await tool_real_auth.execute(action="create_table", table_name="Authed")
assert result["success"] is True
async def test_invalid_token_rejected(app_no_override: FastAPI) -> None:
"""Wrong X-Internal-Token → 401."""
transport = ASGITransport(app=app_no_override)
async with httpx.AsyncClient(transport=transport, base_url="http://test") as client:
resp = await client.post(
"/api/v1/bitable/tables",
json={"name": "X"},
headers={"X-Internal-Token": "wrong-token"},
)
assert resp.status_code == 401
async def test_no_auth_rejected(app_no_override: FastAPI) -> None:
"""No auth at all → 401."""
transport = ASGITransport(app=app_no_override)
async with httpx.AsyncClient(transport=transport, base_url="http://test") as client:
resp = await client.post("/api/v1/bitable/tables", json={"name": "X"})
assert resp.status_code == 401
# ---------------------------------------------------------------------------
# Batch chunking (BATCH_SIZE=500)
# ---------------------------------------------------------------------------
async def test_batch_upsert_1200_records(tool: BitableTool) -> None:
"""1200 records → 3 batches (500+500+200), all succeed."""
table_id, pk_fid, data_fid = await _setup_table_with_pk(tool)
records = [{pk_fid: f"r{i}", data_fid: i * 10} for i in range(1200)]
result = await tool.execute(
action="upsert_records",
table_id=table_id,
records=records,
primary_key_field_id=pk_fid,
)
assert result["success"] is True
assert result["successful_count"] == 1200
assert result["total"] == 1200
assert "errors" not in result
async def test_batch_size_is_500() -> None:
"""Verify BATCH_SIZE constant is 500."""
assert BATCH_SIZE == 500
async def test_resume_from_partial_failure(tool: BitableTool) -> None:
"""resume_from skips already-successful records."""
table_id, pk_fid, data_fid = await _setup_table_with_pk(tool)
# First, insert 500 records successfully
batch1 = [{pk_fid: f"r{i}", data_fid: i} for i in range(500)]
result1 = await tool.execute(
action="upsert_records",
table_id=table_id,
records=batch1,
primary_key_field_id=pk_fid,
)
assert result1["successful_count"] == 500
# Now resume from 500 with the remaining 700
all_records = [{pk_fid: f"r{i}", data_fid: i} for i in range(1200)]
remaining = all_records[500:]
result2 = await tool.execute(
action="upsert_records",
table_id=table_id,
records=remaining,
primary_key_field_id=pk_fid,
resume_from=0, # remaining is already sliced
)
assert result2["successful_count"] == 700
# ---------------------------------------------------------------------------
# query_records
# ---------------------------------------------------------------------------
async def test_query_records(tool: BitableTool) -> None:
"""query_records returns records from the table."""
table_id, pk_fid, data_fid = await _setup_table_with_pk(tool)
# Insert some records
await tool.execute(
action="upsert_records",
table_id=table_id,
records=[{pk_fid: "a", data_fid: 1}, {pk_fid: "b", data_fid: 2}],
primary_key_field_id=pk_fid,
)
# Query
result = await tool.execute(action="query_records", table_id=table_id)
assert result["success"] is True
assert len(result["records"]) == 2
async def test_query_records_with_limit(tool: BitableTool) -> None:
"""query_records with limit returns fewer records."""
table_id, pk_fid, data_fid = await _setup_table_with_pk(tool)
await tool.execute(
action="upsert_records",
table_id=table_id,
records=[{pk_fid: f"r{i}", data_fid: i} for i in range(10)],
primary_key_field_id=pk_fid,
)
result = await tool.execute(action="query_records", table_id=table_id, limit=5)
assert result["success"] is True
assert len(result["records"]) == 5
# ---------------------------------------------------------------------------
# import_excel
# ---------------------------------------------------------------------------
async def test_import_excel_file(tool: BitableTool, tmp_path) -> None:
"""import_excel from file path → creates table + fields + records."""
xlsx_bytes = _make_xlsx({"Products": [["name", "price"], ["Widget", 9.99], ["Gadget", 19.99]]})
file_path = tmp_path / "test.xlsx"
file_path.write_bytes(xlsx_bytes)
result = await tool.execute(action="import_excel", file_path=str(file_path))
assert result["success"] is True
sheet_result = result["sheets"][0]
assert sheet_result["record_count"] == 2
assert sheet_result["field_count"] == 2
# Verify data was actually written
table_id = sheet_result["table_id"]
query = await tool.execute(action="query_records", table_id=table_id)
assert len(query["records"]) == 2
async def test_import_excel_empty_sheet(tool: BitableTool, tmp_path) -> None:
"""Excel with only headers (no data rows) → table created, 0 records."""
xlsx_bytes = _make_xlsx({"Empty": [["col1", "col2"]]})
file_path = tmp_path / "empty.xlsx"
file_path.write_bytes(xlsx_bytes)
result = await tool.execute(action="import_excel", file_path=str(file_path))
assert result["success"] is True
assert result["sheets"][0]["record_count"] == 0
assert result["sheets"][0]["field_count"] == 2
async def test_import_excel_missing_path(tool: BitableTool) -> None:
"""No file_path or file_url → error."""
result = await tool.execute(action="import_excel")
assert result["success"] is False
assert "file_path" in result["error"] or "file_url" in result["error"]
# ---------------------------------------------------------------------------
# collect_api
# ---------------------------------------------------------------------------
async def test_collect_api(tool: BitableTool) -> None:
"""collect_api transforms records via field_mapping and upserts."""
table_id, pk_fid, data_fid = await _setup_table_with_pk(tool)
result = await tool.execute(
action="collect_api",
table_id=table_id,
records=[
{"user_id": "u1", "score": 100},
{"user_id": "u2", "score": 200},
],
field_mapping={"user_id": pk_fid, "score": data_fid},
primary_key_field_id=pk_fid,
)
assert result["success"] is True
assert result["successful_count"] == 2
# Verify
query = await tool.execute(action="query_records", table_id=table_id)
assert len(query["records"]) == 2
async def test_collect_api_missing_fields(tool: BitableTool) -> None:
"""Missing required fields → error."""
result = await tool.execute(action="collect_api", records=[])
assert result["success"] is False
# ---------------------------------------------------------------------------
# Error handling
# ---------------------------------------------------------------------------
async def test_unknown_action(tool: BitableTool) -> None:
"""Unknown action → error."""
result = await tool.execute(action="bogus")
assert result["success"] is False
assert "Unknown action" in result["error"]
async def test_query_nonexistent_table(tool: BitableTool) -> None:
"""Querying a non-existent table → error."""
result = await tool.execute(action="query_records", table_id="nonexistent-id")
assert result["success"] is False
# ---------------------------------------------------------------------------
# Database ingestion (type mapping only — no real external DB needed)
# ---------------------------------------------------------------------------
def test_db_type_mapping_integer() -> None:
"""Integer type → 'number'."""
from sqlalchemy import Integer
from agentkit.bitable.ingestion.database import infer_field_type
assert infer_field_type(Integer()) == "number"
assert infer_field_type(Integer) == "number"
def test_db_type_mapping_varchar() -> None:
"""String type → 'text'."""
from sqlalchemy import String
from agentkit.bitable.ingestion.database import infer_field_type
assert infer_field_type(String(255)) == "text"
def test_db_type_mapping_datetime() -> None:
"""DateTime type → 'date'."""
from sqlalchemy import DateTime
from agentkit.bitable.ingestion.database import infer_field_type
assert infer_field_type(DateTime()) == "date"
def test_db_type_mapping_unknown_fallback() -> None:
"""Unknown type → 'text' (safe fallback)."""
from agentkit.bitable.ingestion.database import infer_field_type
class CustomType:
pass
assert infer_field_type(CustomType()) == "text"
# ---------------------------------------------------------------------------
# API collector transform
# ---------------------------------------------------------------------------
def test_transform_records_basic() -> None:
"""transform_records maps source keys to field IDs."""
from agentkit.bitable.ingestion.api_collector import transform_records
result = transform_records(
records=[{"name": "Alice", "age": 30, "extra": "dropped"}],
field_mapping={"name": "fld_abc", "age": "fld_def"},
)
assert result == [{"fld_abc": "Alice", "fld_def": 30}]
def test_transform_records_empty() -> None:
"""Empty records → empty result."""
from agentkit.bitable.ingestion.api_collector import transform_records
assert transform_records([], {"a": "b"}) == []
assert transform_records([{"a": 1}], {}) == []
def test_transform_records_missing_keys() -> None:
"""Source keys not in mapping are silently dropped."""
from agentkit.bitable.ingestion.api_collector import transform_records
result = transform_records(
records=[{"a": 1, "b": 2}],
field_mapping={"a": "fld_a"}, # b is not mapped
)
assert result == [{"fld_a": 1}]

View File

@ -0,0 +1,205 @@
"""Tests for U7: bitable CLI subcommands.
Requires PostgreSQL marked ``postgres``. Uses Typer's CliRunner.
"""
from __future__ import annotations
import os
from pathlib import Path
import pytest
from typer.testing import CliRunner
from agentkit.cli.bitable import bitable_app
pytestmark = pytest.mark.postgres
runner = CliRunner()
@pytest.fixture
def db_env(monkeypatch: pytest.MonkeyPatch) -> None:
"""Ensure DATABASE_URL is set for CLI tests."""
url = os.environ.get("DATABASE_URL") or os.environ.get("AGENTKIT_DATABASE_URL")
if not url:
pytest.skip("PostgreSQL not available (set DATABASE_URL)")
monkeypatch.setenv("DATABASE_URL", url)
@pytest.fixture
def clean_schema(monkeypatch: pytest.MonkeyPatch) -> None:
"""Drop and recreate bitable schema before each test."""
import asyncio
url = os.environ.get("DATABASE_URL") or os.environ.get("AGENTKIT_DATABASE_URL")
if not url:
pytest.skip("PostgreSQL not available")
from agentkit.bitable.db import BitableDB
from sqlalchemy import text
async def _clean():
db = BitableDB()
await db.init()
async with db.engine.begin() as conn:
await conn.execute(text("DROP SCHEMA IF EXISTS bitable CASCADE"))
await db.init()
await db.close()
asyncio.run(_clean())
# ---------------------------------------------------------------------------
# list-tables
# ---------------------------------------------------------------------------
def test_list_tables_empty(db_env, clean_schema) -> None:
result = runner.invoke(bitable_app, ["list-tables"])
assert result.exit_code == 0
assert "No tables found" in result.output
def test_list_tables_after_create(db_env, clean_schema) -> None:
# Create a table first
runner.invoke(bitable_app, ["create-table", "--name", "TestTable"])
result = runner.invoke(bitable_app, ["list-tables"])
assert result.exit_code == 0
assert "TestTable" in result.output
# ---------------------------------------------------------------------------
# create-table
# ---------------------------------------------------------------------------
def test_create_table_success(db_env, clean_schema) -> None:
result = runner.invoke(
bitable_app,
["create-table", "--name", "MyTable", "--description", "A test table"],
)
assert result.exit_code == 0
assert "Created table" in result.output
assert "MyTable" in result.output
assert "A test table" in result.output
def test_create_table_minimal(db_env, clean_schema) -> None:
result = runner.invoke(bitable_app, ["create-table", "--name", "Minimal"])
assert result.exit_code == 0
assert "Minimal" in result.output
# ---------------------------------------------------------------------------
# query
# ---------------------------------------------------------------------------
def test_query_table_not_found(db_env, clean_schema) -> None:
result = runner.invoke(bitable_app, ["query", "--table", "nonexistent-id"])
assert result.exit_code == 1
assert "not found" in result.output
def test_query_empty_table(db_env, clean_schema) -> None:
# Create a table first
create_result = runner.invoke(bitable_app, ["create-table", "--name", "Empty"])
assert create_result.exit_code == 0
# Extract table ID from output — it's on the "ID:" line
lines = create_result.output.split("\n")
table_id = None
for line in lines:
if "ID:" in line:
# Extract the cyan-colored ID
table_id = line.split("ID:")[1].strip().strip("[]").split(" ")[0]
# Remove rich formatting
table_id = table_id.replace("[cyan]", "").replace("[/cyan]", "")
break
assert table_id is not None, f"Could not extract table ID from: {create_result.output}"
result = runner.invoke(bitable_app, ["query", "--table", table_id])
assert result.exit_code == 0
assert "No records found" in result.output
def test_query_with_records(db_env, clean_schema) -> None:
"""Create table + field + records via service, then query via CLI."""
import asyncio
from agentkit.bitable.db import BitableDB
from agentkit.bitable.models import FieldType
from agentkit.bitable.service import BitableService
async def _setup():
db = BitableDB()
await db.init()
service = BitableService(db)
table = await service.create_table(name="Data")
field = await service.create_field(
table_id=table.id, name="name", field_type=FieldType.text
)
await service.create_record(table_id=table.id, values={field.id: "Alice"})
await service.create_record(table_id=table.id, values={field.id: "Bob"})
await db.close()
return table.id
table_id = asyncio.run(_setup())
result = runner.invoke(bitable_app, ["query", "--table", table_id, "--limit", "10"])
assert result.exit_code == 0
assert "Alice" in result.output
assert "Bob" in result.output
# ---------------------------------------------------------------------------
# import-excel
# ---------------------------------------------------------------------------
def test_import_excel_file_not_found(db_env, clean_schema) -> None:
result = runner.invoke(
bitable_app,
["import-excel", "--file", "/nonexistent/file.xlsx"],
)
assert result.exit_code == 1
assert "not found" in result.output
def test_import_excel_success(db_env, clean_schema, tmp_path: Path) -> None:
"""Create a real xlsx file and import it."""
from openpyxl import Workbook
wb = Workbook()
ws = wb.active
ws.title = "Sheet1"
ws.append(["name", "age"])
ws.append(["Alice", 30])
ws.append(["Bob", 25])
xlsx_path = tmp_path / "test.xlsx"
wb.save(xlsx_path)
result = runner.invoke(
bitable_app,
["import-excel", "--file", str(xlsx_path), "--table", "Imported"],
)
assert result.exit_code == 0
assert "Created table" in result.output
assert "Imported" in result.output
assert "Imported 2 records" in result.output
assert "name" in result.output
assert "age" in result.output
# ---------------------------------------------------------------------------
# Error path: no DATABASE_URL
# ---------------------------------------------------------------------------
def test_no_database_url(monkeypatch: pytest.MonkeyPatch) -> None:
"""CLI should exit with clear error when DATABASE_URL is not set."""
monkeypatch.delenv("DATABASE_URL", raising=False)
monkeypatch.delenv("AGENTKIT_DATABASE_URL", raising=False)
result = runner.invoke(bitable_app, ["list-tables"])
assert result.exit_code == 1
assert "DATABASE_URL" in result.output

View File

@ -0,0 +1,246 @@
"""Tests for bitable DB initialization, schema, and constraints (U1).
Requires PostgreSQL marked ``postgres``. Skips automatically when
``DATABASE_URL`` / ``AGENTKIT_DATABASE_URL`` is unset (see conftest.py).
"""
from __future__ import annotations
import pytest
pytestmark = pytest.mark.postgres
# ---------------------------------------------------------------------------
# init_bitable_db / BitableDB.init
# ---------------------------------------------------------------------------
async def test_init_creates_schema_and_all_tables(bitable_db) -> None:
"""init creates the bitable schema and all 6 tables."""
from sqlalchemy import text
async with bitable_db.engine.begin() as conn:
# Schema exists
result = await conn.execute(
text(
"SELECT schema_name FROM information_schema.schemata WHERE schema_name = 'bitable'"
)
)
assert result.fetchone() is not None
# All 6 tables present
result = await conn.execute(
text(
"SELECT table_name FROM information_schema.tables "
"WHERE table_schema = 'bitable' ORDER BY table_name"
)
)
tables = {row[0] for row in result.fetchall()}
assert tables == {
"bitable_fields",
"bitable_meta",
"bitable_records",
"bitable_recalc_queue",
"bitable_tables",
"bitable_views",
}
async def test_init_is_idempotent(bitable_db) -> None:
"""Calling init() twice does not raise and keeps schema intact."""
# bitable_db fixture already called init(); call again
await bitable_db.init()
await bitable_db.init() # third time also fine
from sqlalchemy import text
async with bitable_db.engine.begin() as conn:
result = await conn.execute(text("SELECT COUNT(*) FROM bitable.bitable_meta"))
assert result.fetchone()[0] >= 1
async def test_schema_version_recorded_in_meta(bitable_db) -> None:
"""bitable_meta stores the current schema version."""
from agentkit.bitable.db import _META_SCHEMA_VERSION_KEY, _SCHEMA_VERSION
from sqlalchemy import text
async with bitable_db.engine.begin() as conn:
result = await conn.execute(
text("SELECT value FROM bitable.bitable_meta WHERE key = :key"),
{"key": _META_SCHEMA_VERSION_KEY},
)
row = result.fetchone()
assert row is not None
assert int(row[0]) == _SCHEMA_VERSION
# ---------------------------------------------------------------------------
# Constraints
# ---------------------------------------------------------------------------
async def test_recalc_queue_unique_record_field(bitable_db) -> None:
"""Recalc queue enforces (record_id, field_id) uniqueness — dedup."""
from agentkit.bitable.models import FieldType
from agentkit.bitable.repository import BitableRepository
repo = BitableRepository(bitable_db)
table = await repo.create_table(name="T")
field = await repo.create_field(table_id=table.id, name="f", field_type=FieldType.text)
record = await repo.create_record(table_id=table.id)
# First enqueue succeeds
task1 = await repo.enqueue_recalc(table.id, record.id, field.id)
assert task1 is not None
# Second enqueue is a no-op (ON CONFLICT DO NOTHING) — returns None
task2 = await repo.enqueue_recalc(table.id, record.id, field.id)
assert task2 is None
async def test_recalc_queue_status_index_exists(bitable_db) -> None:
"""The (status, queued_at) index exists for worker consumption."""
from sqlalchemy import text
async with bitable_db.engine.begin() as conn:
result = await conn.execute(
text(
"SELECT indexname FROM pg_indexes "
"WHERE schemaname = 'bitable' AND tablename = 'bitable_recalc_queue'"
)
)
indexes = {row[0] for row in result.fetchall()}
assert "ix_recalc_status_queued" in indexes
assert "uq_recalc_record_field" in indexes
async def test_records_values_gin_index_exists(bitable_db) -> None:
"""GIN index on records.values exists for JSONB key lookups."""
from sqlalchemy import text
async with bitable_db.engine.begin() as conn:
result = await conn.execute(
text(
"SELECT indexname FROM pg_indexes "
"WHERE schemaname = 'bitable' AND tablename = 'bitable_records'"
)
)
indexes = {row[0] for row in result.fetchall()}
assert "ix_bitable_records_values_gin" in indexes
# ---------------------------------------------------------------------------
# Repository CRUD smoke (verifies schema is usable end-to-end)
# ---------------------------------------------------------------------------
async def test_repository_crud_round_trip(bitable_db) -> None:
"""Repository can create/get/list/delete across all entities."""
from agentkit.bitable.models import FieldOwner, FieldType, ViewType
from agentkit.bitable.repository import BitableRepository
repo = BitableRepository(bitable_db)
# Table
table = await repo.create_table(name="Orders", description="desc")
assert table.name == "Orders"
fetched = await repo.get_table(table.id)
assert fetched is not None and fetched.id == table.id
# Field
field = await repo.create_field(
table_id=table.id,
name="Amount",
field_type=FieldType.number,
owner=FieldOwner.agent,
)
fields = await repo.list_fields(table.id)
assert len(fields) == 1
assert fields[0].id == field.id
# Record
record = await repo.create_record(table_id=table.id, values={field.id: 42})
fetched_rec = await repo.get_record(record.id)
assert fetched_rec is not None
assert fetched_rec.values[field.id] == 42
# Cursor pagination
rec2 = await repo.create_record(table_id=table.id, values={field.id: 99})
records, next_cursor = await repo.list_records(table.id, limit=1)
assert len(records) == 1
assert next_cursor is not None
records2, next_cursor2 = await repo.list_records(table.id, cursor=next_cursor, limit=1)
assert len(records2) == 1
# The second page should be the other record
assert {records[0].id, records2[0].id} == {record.id, rec2.id}
# View
view = await repo.create_view(table_id=table.id, name="All", view_type=ViewType.grid)
views = await repo.list_views(table.id)
assert len(views) == 1 and views[0].id == view.id
# Delete cascades
deleted = await repo.delete_table(table.id)
assert deleted is True
assert await repo.get_table(table.id) is None
assert await repo.get_field(field.id) is None
assert await repo.get_record(record.id) is None
assert (await repo.list_views(table.id)) == []
# ---------------------------------------------------------------------------
# Crash recovery
# ---------------------------------------------------------------------------
async def test_reset_stale_recalc_tasks(bitable_db) -> None:
"""reset_stale_recalc_tasks flips 'calculating' back to 'pending'."""
from agentkit.bitable.models import FieldType, RecalcStatus
from agentkit.bitable.repository import BitableRepository
repo = BitableRepository(bitable_db)
table = await repo.create_table(name="T")
field = await repo.create_field(table_id=table.id, name="f", field_type=FieldType.text)
record = await repo.create_record(table_id=table.id)
task = await repo.enqueue_recalc(table.id, record.id, field.id)
assert task is not None
# Simulate a worker crash mid-calculation
await repo.update_recalc_status(task.id, RecalcStatus.calculating)
reset_count = await repo.reset_stale_recalc_tasks()
assert reset_count == 1
pending = await repo.get_pending_recalc_tasks()
assert any(t.id == task.id for t in pending)
# ---------------------------------------------------------------------------
# Degradation (no PG)
# ---------------------------------------------------------------------------
async def test_bitable_db_without_url_raises() -> None:
"""BitableDB with no URL raises RuntimeError on init (not silently None)."""
# Clear env vars for this test to ensure no URL resolution
import os
saved = (
os.environ.pop("DATABASE_URL", None),
os.environ.pop("AGENTKIT_DATABASE_URL", None),
)
try:
from agentkit.bitable.db import BitableDB
db = BitableDB(database_url=None)
# _database_url is None because no arg and no env
assert db.database_url is None
with pytest.raises(RuntimeError, match="No database URL"):
await db.init()
finally:
for key, val in zip(("DATABASE_URL", "AGENTKIT_DATABASE_URL"), saved):
if val is not None:
os.environ[key] = val

View File

@ -0,0 +1,211 @@
"""Tests for the formula engine — DAG, cycle detection, evaluation.
Covers: topological sort, circular reference detection, aggregate vs row
context, formula-to-formula dependencies, and the built-in function library.
"""
from __future__ import annotations
import pytest
from agentkit.bitable.formula.engine import (
CircularReferenceError,
FormulaEngine,
)
# ---------------------------------------------------------------------------
# Basic evaluation
# ---------------------------------------------------------------------------
def test_engine_evaluate_simple_arithmetic() -> None:
"""=1+2*3 → 7"""
engine = FormulaEngine()
engine.add_formula("calc", "=1+2*3")
result = engine.evaluate("calc", row_values={})
assert result == 7
def test_engine_evaluate_row_reference() -> None:
"""={f1} + {f2} → row-level sum"""
engine = FormulaEngine()
engine.add_formula("sum", "={f1} + {f2}")
result = engine.evaluate("sum", row_values={"f1": 10, "f2": 20})
assert result == 30
def test_engine_evaluate_aggregate_sum() -> None:
"""=SUM({f1}) → aggregate sum of column"""
engine = FormulaEngine()
engine.add_formula("total", "=SUM({f1})")
result = engine.evaluate("total", row_values={}, column_values={"f1": [1, 2, 3]})
assert result == 6
def test_engine_evaluate_aggregate_avg() -> None:
"""=AVG({f1}) → average of column"""
engine = FormulaEngine()
engine.add_formula("avg", "=AVG({f1})")
result = engine.evaluate("avg", row_values={}, column_values={"f1": [10, 20, 30]})
assert result == 20.0
def test_engine_evaluate_aggregate_count() -> None:
"""=COUNT({f1}) → count of non-empty values"""
engine = FormulaEngine()
engine.add_formula("cnt", "=COUNT({f1})")
result = engine.evaluate("cnt", row_values={}, column_values={"f1": [1, None, 3, "", 5]})
assert result == 3 # None and "" are ignored
def test_engine_evaluate_mixed_aggregate_and_row() -> None:
"""={f1} + SUM({f2}) → row f1 + column f2 sum"""
engine = FormulaEngine()
engine.add_formula("mixed", "={f1} + SUM({f2})")
result = engine.evaluate("mixed", row_values={"f1": 10}, column_values={"f2": [1, 2, 3]})
assert result == 16 # 10 + 6
def test_engine_evaluate_concat() -> None:
"""=CONCAT({f1}, "-", {f2}) → string concat"""
engine = FormulaEngine()
engine.add_formula("label", '=CONCAT({f1}, "-", {f2})')
result = engine.evaluate("label", row_values={"f1": "a", "f2": "b"})
assert result == "a-b"
def test_engine_evaluate_if_function() -> None:
"""=IF({f1} > 5, "big", "small")"""
engine = FormulaEngine()
engine.add_formula("size", '=IF({f1} > 5, "big", "small")')
assert engine.evaluate("size", row_values={"f1": 10}) == "big"
assert engine.evaluate("size", row_values={"f1": 3}) == "small"
def test_engine_evaluate_min_max() -> None:
engine = FormulaEngine()
engine.add_formula("mn", "=MIN({f1})")
engine.add_formula("mx", "=MAX({f1})")
cols = {"f1": [3, 1, 4, 1, 5, 9, 2, 6]}
assert engine.evaluate("mn", {}, cols) == 1
assert engine.evaluate("mx", {}, cols) == 9
# ---------------------------------------------------------------------------
# DAG: dependencies and dependents
# ---------------------------------------------------------------------------
def test_engine_get_dependencies() -> None:
engine = FormulaEngine()
engine.add_formula("c", "={a} + {b}")
assert engine.get_dependencies("c") == {"a", "b"}
assert engine.get_dependents("a") == {"c"}
assert engine.get_dependents("b") == {"c"}
def test_engine_topological_order() -> None:
"""c depends on b, b depends on a → order: a, b, c"""
engine = FormulaEngine()
engine.add_formula("c", "={b} + 1")
engine.add_formula("b", "={a} + 1")
engine.add_formula("a", "=1")
order = engine.topological_order()
assert order.index("a") < order.index("b")
assert order.index("b") < order.index("c")
def test_engine_evaluate_all_for_record() -> None:
"""Formula-to-formula dependency: c = b + 1, b = a + 1, a = 5 → c = 7"""
engine = FormulaEngine()
engine.add_formula("a", "=5")
engine.add_formula("b", "={a} + 1")
engine.add_formula("c", "={b} + 1")
results = engine.evaluate_all_for_record(row_values={})
assert results["a"] == 5
assert results["b"] == 6
assert results["c"] == 7
# ---------------------------------------------------------------------------
# Circular reference detection
# ---------------------------------------------------------------------------
def test_circular_reference_detected() -> None:
"""f1 = f2 + 1, f2 = f1 + 1 → CircularReferenceError"""
engine = FormulaEngine()
engine.add_formula("f1", "={f2} + 1")
with pytest.raises(CircularReferenceError):
engine.add_formula("f2", "={f1} + 1")
def test_circular_reference_rollback() -> None:
"""When cycle is detected, the formula is not added (rollback)."""
engine = FormulaEngine()
engine.add_formula("f1", "={f2} + 1")
with pytest.raises(CircularReferenceError):
engine.add_formula("f2", "={f1} + 1")
# f2 should not be in the engine
assert "f2" not in engine._formulas
assert "f2" not in engine._dag
def test_self_reference_detected() -> None:
"""f1 = f1 + 1 → CircularReferenceError"""
engine = FormulaEngine()
with pytest.raises(CircularReferenceError):
engine.add_formula("f1", "={f1} + 1")
def test_remove_formula_breaks_cycle() -> None:
"""Remove a formula, then the cycle can be broken."""
engine = FormulaEngine()
engine.add_formula("f1", "={f2} + 1")
# Can't add f2 = f1 + 1 (cycle)
with pytest.raises(CircularReferenceError):
engine.add_formula("f2", "={f1} + 1")
# Remove f1, now f2 can be added standalone
engine.remove_formula("f1")
engine.add_formula("f2", "=42")
assert engine.evaluate("f2", {}) == 42
# ---------------------------------------------------------------------------
# Edge cases
# ---------------------------------------------------------------------------
def test_evaluate_missing_field_value_is_none() -> None:
"""Missing field values are None — arithmetic on None raises TypeError."""
engine = FormulaEngine()
engine.add_formula("calc", "={missing_field} + 1")
# The engine passes None for missing fields (row_values.get returns None)
with pytest.raises(TypeError):
engine.evaluate("calc", row_values={})
def test_aggregate_ignores_none_and_empty() -> None:
"""SUM ignores None and empty string values."""
engine = FormulaEngine()
engine.add_formula("total", "=SUM({f1})")
result = engine.evaluate("total", row_values={}, column_values={"f1": [1, None, 2, "", 3]})
assert result == 6
def test_division_by_zero_returns_error_in_evaluate_all() -> None:
"""Division by zero is caught in evaluate_all_for_record, returns error dict."""
engine = FormulaEngine()
engine.add_formula("calc", "={f1} / 0")
results = engine.evaluate_all_for_record(row_values={"f1": 10})
assert "__error" in results["calc"]
def test_engine_with_uuid_field_ids() -> None:
"""Field IDs with hyphens (UUIDs) work correctly."""
fid = "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
engine = FormulaEngine()
engine.add_formula("calc", f"={{{fid}}} * 2")
result = engine.evaluate("calc", row_values={fid: 21})
assert result == 42

View File

@ -0,0 +1,199 @@
"""Tests for the formula parser (KTD7 security + parsing).
Test-first per U3 execution note: parser, security constraints, and cycle
detection tests are written before the engine/recalc worker.
"""
from __future__ import annotations
import pytest
from agentkit.bitable.formula.parser import (
FormulaParseError,
FormulaSecurityError,
UnknownFunctionError,
evaluate_ast,
parse_formula,
)
ALLOWED = {"SUM", "AVG", "COUNT", "MIN", "MAX", "ABS", "ROUND", "IF", "LEN", "CONCAT"}
# ---------------------------------------------------------------------------
# Parsing happy paths
# ---------------------------------------------------------------------------
def test_parse_simple_arithmetic() -> None:
tree, mapping = parse_formula("=1+2*3", ALLOWED)
assert mapping == {}
result = evaluate_ast(tree, {}, {})
assert result == 7
def test_parse_strips_equals_prefix() -> None:
tree1, _ = parse_formula("=1+1", ALLOWED)
tree2, _ = parse_formula("1+1", ALLOWED)
assert evaluate_ast(tree1, {}, {}) == evaluate_ast(tree2, {}, {}) == 2
def test_parse_field_reference() -> None:
tree, mapping = parse_formula("={field_abc} + 1", ALLOWED)
assert "field_abc" in mapping.values()
# Safe name is prefixed with _f_
safe_name = next(k for k, v in mapping.items() if v == "field_abc")
result = evaluate_ast(tree, {safe_name: 41}, {})
assert result == 42
def test_parse_uuid_field_reference() -> None:
"""Field IDs are UUIDs with hyphens — must be substituted to safe names."""
fid = "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
tree, mapping = parse_formula(f"={{{fid}}} * 2", ALLOWED)
# The mapping should have a safe name → original UUID
assert fid in mapping.values()
# Evaluate using the safe name (prefixed with _f_)
safe_name = next(k for k, v in mapping.items() if v == fid)
assert safe_name.startswith("_f_")
result = evaluate_ast(tree, {safe_name: 21}, {})
assert result == 42
def test_parse_string_concatenation() -> None:
tree, _ = parse_formula('="hello" + " " + "world"', ALLOWED)
assert evaluate_ast(tree, {}, {}) == "hello world"
def test_parse_conditional_ifexp() -> None:
tree, _ = parse_formula("=1 if True else 2", ALLOWED)
assert evaluate_ast(tree, {}, {}) == 1
def test_parse_comparison() -> None:
tree, mapping = parse_formula("={f} > 5", ALLOWED)
safe_name = next(k for k, v in mapping.items() if v == "f")
assert evaluate_ast(tree, {safe_name: 10}, {}) is True
assert evaluate_ast(tree, {safe_name: 3}, {}) is False
def test_parse_boolean_ops() -> None:
tree, _ = parse_formula("=True and False", ALLOWED)
assert evaluate_ast(tree, {}, {}) is False
tree2, _ = parse_formula("=True or False", ALLOWED)
assert evaluate_ast(tree2, {}, {}) is True
# ---------------------------------------------------------------------------
# Function calls
# ---------------------------------------------------------------------------
def test_parse_function_call_sum() -> None:
tree, mapping = parse_formula("=SUM({f1})", ALLOWED)
safe_name = next(k for k, v in mapping.items() if v == "f1")
result = evaluate_ast(tree, {safe_name: [1, 2, 3]}, {"SUM": sum})
assert result == 6
def test_parse_function_call_concat() -> None:
tree, mapping = parse_formula('=CONCAT({f1}, "-", {f2})', ALLOWED)
safe_f1 = next(k for k, v in mapping.items() if v == "f1")
safe_f2 = next(k for k, v in mapping.items() if v == "f2")
result = evaluate_ast(
tree, {safe_f1: "a", safe_f2: "b"}, {"CONCAT": lambda *a: "".join(str(x) for x in a)}
)
assert result == "a-b"
def test_parse_nested_function_calls() -> None:
tree, _ = parse_formula("=ABS(-5) + ROUND(3.7, 0)", ALLOWED)
funcs = {"ABS": abs, "ROUND": round}
result = evaluate_ast(tree, {}, funcs)
assert result == 9 # 5 + 4
# ---------------------------------------------------------------------------
# KTD7 Security — disallowed nodes
# ---------------------------------------------------------------------------
def test_security_rejects_attribute_access() -> None:
"""__import__('os') is rejected — it's a Call to an unregistered function.
(Attribute access like os.system would be caught by the Attribute node check,
but __import__ is caught earlier as an unknown function.)"""
with pytest.raises((FormulaSecurityError, UnknownFunctionError)):
parse_formula("=__import__('os')", ALLOWED)
def test_security_rejects_attribute_chain() -> None:
"""Attribute access like ''.join([]) is rejected by the Attribute node check."""
with pytest.raises(FormulaSecurityError):
parse_formula("=''.join([])", ALLOWED)
def test_security_rejects_lambda() -> None:
with pytest.raises(FormulaSecurityError):
parse_formula("=(lambda: 1)()", ALLOWED)
def test_security_rejects_subscript() -> None:
with pytest.raises(FormulaSecurityError):
parse_formula("=[1,2,3][0]", ALLOWED)
def test_security_rejects_assignment() -> None:
"""Assignment is a statement, not an expression — rejected at parse stage."""
with pytest.raises((FormulaSecurityError, FormulaParseError)):
parse_formula("=x = 1", ALLOWED)
def test_unknown_function_rejected() -> None:
with pytest.raises(UnknownFunctionError):
parse_formula("=UNKNOWN(1)", ALLOWED)
def test_eval_function_rejected_if_not_registered() -> None:
"""eval is not in the registry → UnknownFunctionError."""
with pytest.raises(UnknownFunctionError):
parse_formula("=eval('1+1')", ALLOWED)
# ---------------------------------------------------------------------------
# Error paths
# ---------------------------------------------------------------------------
def test_parse_error_unbalanced_parens() -> None:
with pytest.raises(FormulaParseError):
parse_formula("=(1+2", ALLOWED)
def test_parse_error_empty_formula() -> None:
with pytest.raises(FormulaParseError):
parse_formula("=", ALLOWED)
def test_parse_error_empty_string() -> None:
with pytest.raises(FormulaParseError):
parse_formula("", ALLOWED)
def test_evaluate_unknown_field_ref_raises() -> None:
tree, _ = parse_formula("={nonexistent} + 1", ALLOWED)
with pytest.raises(FormulaParseError, match="Unknown field reference"):
evaluate_ast(tree, {}, {})
# ---------------------------------------------------------------------------
# Mixed aggregate + row context
# ---------------------------------------------------------------------------
def test_mixed_aggregate_and_row_context() -> None:
"""={f1} + SUM({f2}) — row f1 + column f2 sum."""
tree, mapping = parse_formula("={f1} + SUM({f2})", ALLOWED)
safe_f1 = next(k for k, v in mapping.items() if v == "f1")
safe_f2 = next(k for k, v in mapping.items() if v == "f2")
# f1 is a row value (scalar), f2 is a column value (list)
result = evaluate_ast(tree, {safe_f1: 10, safe_f2: [1, 2, 3]}, {"SUM": sum})
assert result == 16 # 10 + 6

View File

@ -0,0 +1,182 @@
"""Tests for Excel ingestion (U4).
Tests parse_excel_bytes with in-memory .xlsx files created via openpyxl.
No PostgreSQL required these are pure parsing tests.
"""
from __future__ import annotations
import io
from datetime import datetime
import pytest
from agentkit.bitable.ingestion.excel import parse_excel_bytes
pytestmark = pytest.mark.postgres # Reuse the same PG test group for consistency
def _make_xlsx(
sheets: dict[str, list[list]],
) -> bytes:
"""Create an in-memory .xlsx file from sheet data.
Args:
sheets: {sheet_name: [[row1_col1, row1_col2], [row2_col1, ...]]}
"""
from openpyxl import Workbook
wb = Workbook()
# Remove default sheet
wb.remove(wb.active)
for name, rows in sheets.items():
ws = wb.create_sheet(title=name)
for row in rows:
ws.append(row)
buf = io.BytesIO()
wb.save(buf)
return buf.getvalue()
# ---------------------------------------------------------------------------
# Happy path: basic parsing
# ---------------------------------------------------------------------------
def test_parse_simple_sheet() -> None:
"""One sheet with header + 2 data rows → correct columns, types, records."""
xlsx = _make_xlsx(
{
"Sheet1": [
["name", "age", "city"],
["Alice", 30, "NYC"],
["Bob", 25, "LA"],
]
}
)
sheets = parse_excel_bytes(xlsx)
assert len(sheets) == 1
sheet = sheets[0]
assert sheet.name == "Sheet1"
assert sheet.columns == ["name", "age", "city"]
assert sheet.field_types == ["text", "number", "text"]
assert len(sheet.records) == 2
assert sheet.records[0] == {"name": "Alice", "age": 30, "city": "NYC"}
assert sheet.records[1] == {"name": "Bob", "age": 25, "city": "LA"}
def test_parse_multiple_sheets() -> None:
"""Multiple sheets → multiple ParsedSheet objects."""
xlsx = _make_xlsx(
{
"Users": [["id", "name"], [1, "Alice"]],
"Orders": [["order_id", "amount"], [101, 99.9]],
}
)
sheets = parse_excel_bytes(xlsx)
assert len(sheets) == 2
assert sheets[0].name == "Users"
assert sheets[1].name == "Orders"
assert sheets[1].records[0]["amount"] == 99.9
# ---------------------------------------------------------------------------
# Type inference
# ---------------------------------------------------------------------------
def test_type_inference_all_number() -> None:
"""Column with all integers → 'number'."""
xlsx = _make_xlsx({"S": [["val"], [1], [2], [3]]})
sheets = parse_excel_bytes(xlsx)
assert sheets[0].field_types == ["number"]
def test_type_inference_mixed_text_number() -> None:
"""Column with mixed text and number → 'text'."""
xlsx = _make_xlsx({"S": [["val"], [1], ["two"], [3]]})
sheets = parse_excel_bytes(xlsx)
assert sheets[0].field_types == ["text"]
def test_type_inference_date_column() -> None:
"""Column with all datetime values → 'date'."""
xlsx = _make_xlsx({"S": [["when"], [datetime(2024, 1, 1)], [datetime(2024, 6, 15)]]})
sheets = parse_excel_bytes(xlsx)
assert sheets[0].field_types == ["date"]
assert "2024-01-01" in sheets[0].records[0]["when"]
def test_type_inference_empty_column() -> None:
"""Column with no values → 'text' (safe default)."""
xlsx = _make_xlsx({"S": [["a", "b"], [1, None], [2, None]]})
sheets = parse_excel_bytes(xlsx)
assert sheets[0].field_types == ["number", "text"]
# ---------------------------------------------------------------------------
# Edge cases
# ---------------------------------------------------------------------------
def test_empty_sheet_skipped() -> None:
"""Completely empty sheet → not included in results."""
xlsx = _make_xlsx({"Empty": [], "Data": [["x"], [1]]})
sheets = parse_excel_bytes(xlsx)
assert len(sheets) == 1
assert sheets[0].name == "Data"
def test_header_only_no_data_rows() -> None:
"""Sheet with only a header row → 0 records, fields still created."""
xlsx = _make_xlsx({"S": [["name", "age"]]})
sheets = parse_excel_bytes(xlsx)
assert len(sheets) == 1
assert sheets[0].columns == ["name", "age"]
assert len(sheets[0].records) == 0
def test_duplicate_headers_deduplicated() -> None:
"""Duplicate header names → suffixed with _1, _2, etc."""
xlsx = _make_xlsx({"S": [["name", "name"], ["Alice", "Bob"]]})
sheets = parse_excel_bytes(xlsx)
assert sheets[0].columns == ["name", "name_1"]
def test_none_header_replaced() -> None:
"""None header value → auto-generated column name."""
xlsx = _make_xlsx({"S": [[None, "real"], [1, 2]]})
sheets = parse_excel_bytes(xlsx)
assert sheets[0].columns[0] == "col_0"
assert sheets[0].columns[1] == "real"
def test_corrupt_file_raises_value_error() -> None:
"""Non-xlsx bytes → ValueError with clear message."""
with pytest.raises(ValueError, match="Failed to parse"):
parse_excel_bytes(b"not an excel file")
# ---------------------------------------------------------------------------
# Merged cells (known limitation)
# ---------------------------------------------------------------------------
def test_merged_cells_only_top_left_has_value() -> None:
"""Merged cell: only top-left has value, others are None (known limitation)."""
from openpyxl import Workbook
wb = Workbook()
ws = wb.active
ws.title = "Merged"
ws.append(["a", "b", "c"])
ws.append([1, 2, 3])
ws.merge_cells("A2:B2") # merge A2:B2 — only A2 has value
buf = io.BytesIO()
wb.save(buf)
sheets = parse_excel_bytes(buf.getvalue())
rec = sheets[0].records[0]
# A2 has value 1, B2 is None (merged cell limitation)
assert rec["a"] == 1
assert rec["b"] is None

View File

@ -0,0 +1,303 @@
"""Tests for bitable Pydantic v2 data models (U1).
Covers: enum values, round-trip serialization, field config shapes per
field_type, Record.values empty-dict legality, default factories.
"""
from __future__ import annotations
from datetime import datetime, timezone
import pytest
from pydantic import ValidationError
from agentkit.bitable.models import (
Field,
FieldOwner,
FieldType,
Record,
RecalcStatus,
RecalcTask,
Table,
View,
ViewType,
)
# ---------------------------------------------------------------------------
# Enums
# ---------------------------------------------------------------------------
def test_field_type_values() -> None:
"""FieldType has the 9 supported types with correct string values."""
expected = {
"text",
"number",
"date",
"select",
"multiselect",
"attachment",
"image",
"formula",
"lookup",
}
assert {ft.value for ft in FieldType} == expected
def test_field_owner_values() -> None:
"""FieldOwner distinguishes agent vs user (drives upsert merge)."""
assert FieldOwner.agent.value == "agent"
assert FieldOwner.user.value == "user"
def test_view_type_values() -> None:
"""ViewType enumerates the 5 view kinds (v1 only grid is implemented)."""
assert {vt.value for vt in ViewType} == {"grid", "kanban", "gantt", "gallery", "form"}
def test_recalc_status_lifecycle() -> None:
"""RecalcStatus covers the full recalc lifecycle."""
assert {rs.value for rs in RecalcStatus} == {"pending", "calculating", "done", "error"}
# ---------------------------------------------------------------------------
# Table
# ---------------------------------------------------------------------------
def test_table_minimal_construction() -> None:
"""Table requires id + name; other fields have defaults."""
table = Table(id="t1", name="Orders")
assert table.id == "t1"
assert table.name == "Orders"
assert table.description == ""
assert table.primary_key_field_id is None
assert table.owner_user_id is None
assert isinstance(table.created_at, datetime)
assert isinstance(table.updated_at, datetime)
def test_table_round_trip() -> None:
"""Table serializes to dict and re-parses losslessly."""
table = Table(
id="t1",
name="Orders",
description="Customer orders",
primary_key_field_id="f_pk",
owner_user_id="u1",
)
data = table.model_dump(mode="json")
restored = Table.model_validate(data)
assert restored == table
def test_table_requires_id_and_name() -> None:
"""Table requires id and name (non-optional)."""
with pytest.raises(ValidationError):
Table(name="no id") # type: ignore[call-arg]
with pytest.raises(ValidationError):
Table(id="t1") # type: ignore[call-arg]
# ---------------------------------------------------------------------------
# Field
# ---------------------------------------------------------------------------
def test_field_text_default_config() -> None:
"""Text field has empty config by default and user owner."""
field = Field(id="f1", table_id="t1", name="Title", field_type=FieldType.text)
assert field.config == {}
assert field.owner == FieldOwner.user
def test_field_select_config_shape() -> None:
"""Select field config carries options list."""
field = Field(
id="f1",
table_id="t1",
name="Status",
field_type=FieldType.select,
config={"options": [{"label": "Open", "value": "open"}]},
owner=FieldOwner.agent,
)
assert field.config["options"][0]["value"] == "open"
assert field.owner == FieldOwner.agent
def test_field_formula_config_shape() -> None:
"""Formula field config carries formula_expr."""
field = Field(
id="f1",
table_id="t1",
name="Total",
field_type=FieldType.formula,
config={"formula_expr": "=SUM({f_price})"},
)
assert field.config["formula_expr"] == "=SUM({f_price})"
def test_field_lookup_config_shape() -> None:
"""Lookup field config carries lookup_target with table/field ids."""
field = Field(
id="f1",
table_id="t1",
name="Customer Name",
field_type=FieldType.lookup,
config={
"lookup_target": {
"table_id": "t_customers",
"field_id": "f_name",
"filter_field_id": "f_id",
"filter_value": "cust-123",
}
},
)
assert field.config["lookup_target"]["field_id"] == "f_name"
def test_field_round_trip() -> None:
"""Field serializes and re-parses losslessly across types."""
field = Field(
id="f1",
table_id="t1",
name="Score",
field_type=FieldType.number,
config={"precision": 2},
owner=FieldOwner.agent,
)
restored = Field.model_validate(field.model_dump(mode="json"))
assert restored == field
assert restored.field_type == FieldType.number
assert restored.owner == FieldOwner.agent
def test_field_type_accepts_string() -> None:
"""FieldType coerces from string (JSON round-trip scenario)."""
field = Field(id="f1", table_id="t1", name="X", field_type="number") # type: ignore[arg-type]
assert field.field_type == FieldType.number
# ---------------------------------------------------------------------------
# Record
# ---------------------------------------------------------------------------
def test_record_empty_values_allowed() -> None:
"""Record.values defaults to empty dict (new row before data entry)."""
record = Record(id="r1", table_id="t1")
assert record.values == {}
def test_record_values_round_trip() -> None:
"""Record.values (JSONB-shaped dict) round-trips through JSON."""
record = Record(
id="r1",
table_id="t1",
values={"f_name": "Alice", "f_age": 30, "f_tags": ["a", "b"]},
)
restored = Record.model_validate(record.model_dump(mode="json"))
assert restored.values == record.values
assert restored.values["f_tags"] == ["a", "b"]
def test_record_values_with_null() -> None:
"""Record.values can carry None for unset fields."""
record = Record(id="r1", table_id="t1", values={"f_name": None})
assert record.values["f_name"] is None
# ---------------------------------------------------------------------------
# View
# ---------------------------------------------------------------------------
def test_view_defaults_to_grid() -> None:
"""View defaults to grid type with empty config."""
view = View(id="v1", table_id="t1", name="All")
assert view.view_type == ViewType.grid
assert view.config == {}
def test_view_round_trip() -> None:
"""View with filter/sort config round-trips."""
view = View(
id="v1",
table_id="t1",
name="Open only",
view_type=ViewType.grid,
config={
"filters": [{"field_id": "f_status", "op": "eq", "value": "open"}],
"sorts": [{"field_id": "f_created", "direction": "desc"}],
"hidden_fields": ["f_internal"],
},
)
restored = View.model_validate(view.model_dump(mode="json"))
assert restored == view
assert restored.config["filters"][0]["op"] == "eq"
# ---------------------------------------------------------------------------
# RecalcTask
# ---------------------------------------------------------------------------
def test_recalc_task_defaults() -> None:
"""RecalcTask defaults to pending status, no error, no completed_at."""
task = RecalcTask(id="q1", table_id="t1", record_id="r1", field_id="f1")
assert task.status == RecalcStatus.pending
assert task.error_message is None
assert task.completed_at is None
assert isinstance(task.queued_at, datetime)
def test_recalc_task_error_state() -> None:
"""RecalcTask in error state carries message and completed_at."""
task = RecalcTask(
id="q1",
table_id="t1",
record_id="r1",
field_id="f1",
status=RecalcStatus.error,
error_message="division by zero",
completed_at=datetime.now(timezone.utc),
)
assert task.status == RecalcStatus.error
assert task.error_message == "division by zero"
def test_recalc_task_round_trip() -> None:
"""RecalcTask round-trips through JSON."""
task = RecalcTask(
id="q1",
table_id="t1",
record_id="r1",
field_id="f1",
status=RecalcStatus.done,
)
restored = RecalcTask.model_validate(task.model_dump(mode="json"))
assert restored == task
assert restored.status == RecalcStatus.done
# ---------------------------------------------------------------------------
# from_attributes (ORM row compatibility)
# ---------------------------------------------------------------------------
def test_table_from_attributes() -> None:
"""Table.model_validate accepts an ORM-like object (from_attributes)."""
class _Row:
id = "t1"
name = "Orders"
description = "desc"
primary_key_field_id = None
owner_user_id = None
created_at = datetime.now(timezone.utc)
updated_at = datetime.now(timezone.utc)
table = Table.model_validate(_Row())
assert table.id == "t1"
assert table.name == "Orders"

View File

@ -0,0 +1,330 @@
"""Tests for the async recalc pipeline (U3).
Requires PostgreSQL marked ``postgres``. Tests the full pipeline:
record write recalc enqueue worker processing formula value written.
Also covers: crash recovery, deduplication, and error handling.
"""
from __future__ import annotations
import asyncio
import pytest
from agentkit.bitable.models import FieldOwner, FieldType, RecalcStatus
from agentkit.bitable.recalc_worker import RecalcWorker
from agentkit.bitable.service import BitableService
pytestmark = pytest.mark.postgres
# ---------------------------------------------------------------------------
# Helper: process all pending recalc tasks synchronously
# ---------------------------------------------------------------------------
async def _process_all_pending(service: BitableService) -> None:
"""Process all pending recalc tasks (for testing without background worker)."""
tasks = await service.get_pending_recalc_tasks(limit=100)
for task in tasks:
await service.process_recalc_task(task)
# ---------------------------------------------------------------------------
# Happy path: formula recalc after record write
# ---------------------------------------------------------------------------
async def test_recalc_simple_formula_after_create(bitable_service: BitableService) -> None:
"""Create a record with data → formula field gets recalculated."""
table = await bitable_service.create_table(name="T")
src_field = await bitable_service.create_field(
table_id=table.id, name="src", field_type=FieldType.number, owner=FieldOwner.agent
)
calc_field = await bitable_service.create_field(
table_id=table.id,
name="calc",
field_type=FieldType.formula,
config={"formula_expr": f"={{{src_field.id}}} * 2"},
)
# Create a record — should trigger recalc
record = await bitable_service.create_record(table_id=table.id, values={src_field.id: 21})
# Process pending recalc tasks
await _process_all_pending(bitable_service)
# Verify formula result was written
updated = await bitable_service.get_record(record.id)
assert updated is not None
assert updated.values[calc_field.id] == 42
async def test_recalc_aggregate_formula(bitable_service: BitableService) -> None:
"""SUM aggregate formula recalculates correctly across all records."""
table = await bitable_service.create_table(name="T")
src_field = await bitable_service.create_field(
table_id=table.id, name="amt", field_type=FieldType.number, owner=FieldOwner.agent
)
total_field = await bitable_service.create_field(
table_id=table.id,
name="total",
field_type=FieldType.formula,
config={"formula_expr": f"=SUM({{{src_field.id}}})"},
)
# Create multiple records
for amt in [10, 20, 30]:
await bitable_service.create_record(table_id=table.id, values={src_field.id: amt})
# Process all pending recalc tasks
await _process_all_pending(bitable_service)
# Each record's total field should be 60 (sum of all)
records, _ = await bitable_service.list_records(table.id)
assert len(records) == 3
for rec in records:
assert rec.values[total_field.id] == 60
async def test_recalc_after_upsert(bitable_service: BitableService) -> None:
"""Upsert triggers recalc for affected formula fields."""
table = await bitable_service.create_table(name="T")
pk_field = await bitable_service.create_field(
table_id=table.id, name="id", field_type=FieldType.text, owner=FieldOwner.agent
)
data_field = await bitable_service.create_field(
table_id=table.id, name="data", field_type=FieldType.number, owner=FieldOwner.agent
)
calc_field = await bitable_service.create_field(
table_id=table.id,
name="doubled",
field_type=FieldType.formula,
config={"formula_expr": f"={{{data_field.id}}} * 2"},
)
await bitable_service.update_table(table.id, primary_key_field_id=pk_field.id)
# Upsert a record
await bitable_service.upsert_records(
table.id,
[{pk_field.id: "r1", data_field.id: 15}],
pk_field.id,
)
# Process recalc
await _process_all_pending(bitable_service)
# Verify formula result
records, _ = await bitable_service.list_records(table.id)
assert len(records) == 1
assert records[0].values[calc_field.id] == 30
async def test_recalc_formula_chain(bitable_service: BitableService) -> None:
"""Formula-to-formula dependency: c = b*2, b = a*2 → c = a*4."""
table = await bitable_service.create_table(name="T")
src = await bitable_service.create_field(
table_id=table.id, name="a", field_type=FieldType.number, owner=FieldOwner.agent
)
mid = await bitable_service.create_field(
table_id=table.id,
name="b",
field_type=FieldType.formula,
config={"formula_expr": f"={{{src.id}}} * 2"},
)
top = await bitable_service.create_field(
table_id=table.id,
name="c",
field_type=FieldType.formula,
config={"formula_expr": f"={{{mid.id}}} * 2"},
)
await bitable_service.create_record(table_id=table.id, values={src.id: 5})
# Process recalc — may need multiple passes for formula chains
# ponytail: The current implementation processes tasks in queue order, not
# topological order. For formula chains, we may need to process twice:
# first pass computes b, second pass computes c (which depends on b).
await _process_all_pending(bitable_service)
await _process_all_pending(bitable_service)
records, _ = await bitable_service.list_records(table.id)
assert len(records) == 1
rec = records[0]
assert rec.values[mid.id] == 10 # 5 * 2
assert rec.values[top.id] == 20 # 10 * 2
# ---------------------------------------------------------------------------
# Crash recovery
# ---------------------------------------------------------------------------
async def test_crash_recovery_resets_calculating_tasks(
bitable_service: BitableService,
) -> None:
"""Stale 'calculating' tasks are reset to 'pending' on worker start."""
table = await bitable_service.create_table(name="T")
src = await bitable_service.create_field(
table_id=table.id, name="s", field_type=FieldType.number, owner=FieldOwner.agent
)
calc = await bitable_service.create_field(
table_id=table.id,
name="c",
field_type=FieldType.formula,
config={"formula_expr": f"={{{src.id}}} + 1"},
)
record = await bitable_service.create_record(table_id=table.id, values={src.id: 10})
# create_record already enqueued a recalc task — get it from pending
tasks = await bitable_service.get_pending_recalc_tasks()
assert len(tasks) == 1
task = tasks[0]
from agentkit.bitable.repository import BitableRepository
repo = BitableRepository(bitable_service._db)
await repo.update_recalc_status(task.id, RecalcStatus.calculating)
# Verify it's stuck in calculating
tasks = await bitable_service.get_pending_recalc_tasks()
assert len(tasks) == 0 # not pending, it's calculating
# Crash recovery
reset_count = await bitable_service.reset_stale_recalc_tasks()
assert reset_count == 1
# Now it should be pending again
tasks = await bitable_service.get_pending_recalc_tasks()
assert len(tasks) == 1
# Process it
await _process_all_pending(bitable_service)
# Verify result
rec = await bitable_service.get_record(record.id)
assert rec is not None
assert rec.values[calc.id] == 11
# ---------------------------------------------------------------------------
# Deduplication
# ---------------------------------------------------------------------------
async def test_recalc_deduplication(bitable_service: BitableService) -> None:
"""Same (record_id, field_id) enqueued twice → only one task in queue."""
table = await bitable_service.create_table(name="T")
src = await bitable_service.create_field(
table_id=table.id, name="s", field_type=FieldType.number, owner=FieldOwner.agent
)
calc = await bitable_service.create_field(
table_id=table.id,
name="c",
field_type=FieldType.formula,
config={"formula_expr": f"={{{src.id}}} + 1"},
)
record = await bitable_service.create_record(table_id=table.id, values={src.id: 10})
# The create_record already enqueued one task. Enqueue again manually.
task2 = await bitable_service.trigger_recalc(table.id, record.id, calc.id)
# Should return None (duplicate, ON CONFLICT DO NOTHING)
assert task2 is None
# Only one pending task
tasks = await bitable_service.get_pending_recalc_tasks()
assert len(tasks) == 1
# ---------------------------------------------------------------------------
# Error handling
# ---------------------------------------------------------------------------
async def test_recalc_error_marks_task_as_error(bitable_service: BitableService) -> None:
"""Formula with division by zero marks task as error."""
table = await bitable_service.create_table(name="T")
src = await bitable_service.create_field(
table_id=table.id, name="s", field_type=FieldType.number, owner=FieldOwner.agent
)
calc = await bitable_service.create_field(
table_id=table.id,
name="c",
field_type=FieldType.formula,
config={"formula_expr": f"={{{src.id}}} / 0"},
)
record = await bitable_service.create_record(table_id=table.id, values={src.id: 10})
# Process recalc — should fail with division by zero
await _process_all_pending(bitable_service)
# Verify task is marked as error
from sqlalchemy import text
db = bitable_service._db
async with db.session_factory() as session:
result = await session.execute(
text(
"SELECT status, error_message FROM bitable.bitable_recalc_queue "
"WHERE record_id = :rid AND field_id = :fid"
),
{"rid": record.id, "fid": calc.id},
)
row = result.fetchone()
assert row is not None
assert row[0] == RecalcStatus.error.value
assert "division" in row[1].lower() or "zero" in row[1].lower()
# ---------------------------------------------------------------------------
# Worker lifecycle
# ---------------------------------------------------------------------------
async def test_worker_starts_and_stops(bitable_service: BitableService) -> None:
"""RecalcWorker starts and stops gracefully."""
worker = RecalcWorker(bitable_service._db, bitable_service, poll_interval=0.1)
await worker.start()
assert worker._task is not None
assert worker._reaper_task is not None
# Let it run briefly
await asyncio.sleep(0.2)
await worker.stop()
assert worker._task is None
assert worker._reaper_task is None
async def test_worker_processes_tasks(bitable_service: BitableService) -> None:
"""Background worker picks up and processes recalc tasks."""
table = await bitable_service.create_table(name="T")
src = await bitable_service.create_field(
table_id=table.id, name="s", field_type=FieldType.number, owner=FieldOwner.agent
)
calc = await bitable_service.create_field(
table_id=table.id,
name="c",
field_type=FieldType.formula,
config={"formula_expr": f"={{{src.id}}} + 100"},
)
record = await bitable_service.create_record(table_id=table.id, values={src.id: 5})
# Start worker — it should pick up the pending task
worker = RecalcWorker(bitable_service._db, bitable_service, poll_interval=0.1)
await worker.start()
# Wait for worker to process
await asyncio.sleep(1.0)
await worker.stop()
# Verify formula was computed
rec = await bitable_service.get_record(record.id)
assert rec is not None
assert rec.values[calc.id] == 105

View File

@ -0,0 +1,579 @@
"""Tests for bitable REST API routes (U2).
Requires PostgreSQL marked ``postgres``. Uses ``httpx.AsyncClient`` with
``ASGITransport`` so the async DB engine and the HTTP client share one event
loop (TestClient runs in a separate thread/loop, which breaks asyncpg's
loop-bound connections).
"""
from __future__ import annotations
import json
from typing import Any
import httpx
import pytest
from fastapi import FastAPI
from httpx import ASGITransport
from agentkit.bitable.service import BitableService
from agentkit.server.routes import bitable as bitable_routes
from agentkit.server.routes.bitable import require_bitable_auth
pytestmark = pytest.mark.postgres
TEST_USER_ID = "test-user-id"
def _make_test_user() -> dict[str, Any]:
return {"user_id": TEST_USER_ID, "username": "testuser", "role": "member"}
# ---------------------------------------------------------------------------
# Fixtures
# ---------------------------------------------------------------------------
@pytest.fixture
def app(bitable_service: BitableService) -> FastAPI:
"""Test app with bitable_service on app.state and auth bypassed."""
app = FastAPI()
app.state.bitable_service = bitable_service
app.include_router(bitable_routes.router, prefix="/api/v1")
app.dependency_overrides[require_bitable_auth] = lambda: _make_test_user()
return app
@pytest.fixture
def unauth_app(bitable_service: BitableService) -> FastAPI:
"""App without auth override — simulates unauthenticated requests."""
app = FastAPI()
app.state.bitable_service = bitable_service
app.include_router(bitable_routes.router, prefix="/api/v1")
return app
@pytest.fixture
def no_service_app() -> FastAPI:
"""App without bitable_service on state — simulates uninitialized subsystem."""
app = FastAPI()
app.include_router(bitable_routes.router, prefix="/api/v1")
app.dependency_overrides[require_bitable_auth] = lambda: _make_test_user()
return app
@pytest.fixture
async def client(app: FastAPI) -> httpx.AsyncClient:
"""Async HTTP client — shares event loop with async DB fixtures."""
transport = ASGITransport(app=app)
async with httpx.AsyncClient(transport=transport, base_url="http://test") as c:
yield c
@pytest.fixture
async def unauth_client(unauth_app: FastAPI) -> httpx.AsyncClient:
transport = ASGITransport(app=unauth_app)
async with httpx.AsyncClient(transport=transport, base_url="http://test") as c:
yield c
@pytest.fixture
async def no_service_client(no_service_app: FastAPI) -> httpx.AsyncClient:
transport = ASGITransport(app=no_service_app)
async with httpx.AsyncClient(transport=transport, base_url="http://test") as c:
yield c
# ---------------------------------------------------------------------------
# Auth + service availability
# ---------------------------------------------------------------------------
async def test_create_table_requires_auth(unauth_client: httpx.AsyncClient) -> None:
"""No auth → 401."""
resp = await unauth_client.post("/api/v1/bitable/tables", json={"name": "T"})
assert resp.status_code == 401
async def test_endpoint_returns_503_when_service_unavailable(
no_service_client: httpx.AsyncClient,
) -> None:
"""No service on app.state → 503."""
resp = await no_service_client.post("/api/v1/bitable/tables", json={"name": "T"})
assert resp.status_code == 503
# ---------------------------------------------------------------------------
# Tables CRUD
# ---------------------------------------------------------------------------
async def test_create_table_success(client: httpx.AsyncClient) -> None:
resp = await client.post(
"/api/v1/bitable/tables", json={"name": "Orders", "description": "desc"}
)
assert resp.status_code == 200
data = resp.json()
assert data["success"] is True
assert data["table"]["name"] == "Orders"
assert data["table"]["description"] == "desc"
assert "id" in data["table"]
async def test_list_tables_returns_created(client: httpx.AsyncClient) -> None:
for name in ("A", "B", "C"):
await client.post("/api/v1/bitable/tables", json={"name": name})
resp = await client.get("/api/v1/bitable/tables")
assert resp.status_code == 200
data = resp.json()
assert data["success"] is True
assert len(data["tables"]) == 3
names = {t["name"] for t in data["tables"]}
assert names == {"A", "B", "C"}
async def test_get_table_404_when_missing(client: httpx.AsyncClient) -> None:
resp = await client.get("/api/v1/bitable/tables/nonexistent-id")
assert resp.status_code == 404
async def test_update_table_success(client: httpx.AsyncClient) -> None:
create_resp = await client.post("/api/v1/bitable/tables", json={"name": "Old"})
table_id = create_resp.json()["table"]["id"]
resp = await client.patch(f"/api/v1/bitable/tables/{table_id}", json={"name": "New"})
assert resp.status_code == 200
assert resp.json()["table"]["name"] == "New"
async def test_delete_table_success(client: httpx.AsyncClient) -> None:
create_resp = await client.post("/api/v1/bitable/tables", json={"name": "T"})
table_id = create_resp.json()["table"]["id"]
resp = await client.delete(f"/api/v1/bitable/tables/{table_id}")
assert resp.status_code == 200
assert resp.json()["success"] is True
# Verify gone
assert (await client.get(f"/api/v1/bitable/tables/{table_id}")).status_code == 404
# ---------------------------------------------------------------------------
# Fields CRUD + dependency check (409)
# ---------------------------------------------------------------------------
async def test_create_field_success(client: httpx.AsyncClient) -> None:
table_id = (await client.post("/api/v1/bitable/tables", json={"name": "T"})).json()["table"][
"id"
]
resp = await client.post(
f"/api/v1/bitable/tables/{table_id}/fields",
json={"name": "title", "field_type": "text", "owner": "agent"},
)
assert resp.status_code == 200
data = resp.json()
assert data["field"]["name"] == "title"
assert data["field"]["field_type"] == "text"
assert data["field"]["owner"] == "agent"
async def test_list_fields(client: httpx.AsyncClient) -> None:
table_id = (await client.post("/api/v1/bitable/tables", json={"name": "T"})).json()["table"][
"id"
]
for name in ("f1", "f2"):
await client.post(
f"/api/v1/bitable/tables/{table_id}/fields",
json={"name": name, "field_type": "text"},
)
resp = await client.get(f"/api/v1/bitable/tables/{table_id}/fields")
assert resp.status_code == 200
assert len(resp.json()["fields"]) == 2
async def test_delete_field_no_deps(client: httpx.AsyncClient) -> None:
table_id = (await client.post("/api/v1/bitable/tables", json={"name": "T"})).json()["table"][
"id"
]
field_id = (
await client.post(
f"/api/v1/bitable/tables/{table_id}/fields",
json={"name": "f", "field_type": "text"},
)
).json()["field"]["id"]
resp = await client.delete(f"/api/v1/bitable/fields/{field_id}")
assert resp.status_code == 200
async def test_delete_field_returns_409_when_referenced_by_formula(
client: httpx.AsyncClient,
) -> None:
table_id = (await client.post("/api/v1/bitable/tables", json={"name": "T"})).json()["table"][
"id"
]
source_id = (
await client.post(
f"/api/v1/bitable/tables/{table_id}/fields",
json={"name": "src", "field_type": "number"},
)
).json()["field"]["id"]
await client.post(
f"/api/v1/bitable/tables/{table_id}/fields",
json={
"name": "calc",
"field_type": "formula",
"config": {"formula_expr": f"={source_id} * 2"},
},
)
resp = await client.delete(f"/api/v1/bitable/fields/{source_id}")
assert resp.status_code == 409
detail = resp.json()["detail"]
assert "dependencies" in detail
assert "formula_fields" in detail["dependencies"]
async def test_delete_field_force_cascades(client: httpx.AsyncClient) -> None:
table_id = (await client.post("/api/v1/bitable/tables", json={"name": "T"})).json()["table"][
"id"
]
source_id = (
await client.post(
f"/api/v1/bitable/tables/{table_id}/fields",
json={"name": "src", "field_type": "number", "owner": "agent"},
)
).json()["field"]["id"]
# Create a record with the source field
await client.post(
f"/api/v1/bitable/tables/{table_id}/records",
json={"records": [{source_id: 42}]},
)
resp = await client.delete(f"/api/v1/bitable/fields/{source_id}?force=true")
assert resp.status_code == 200
# ---------------------------------------------------------------------------
# Records CRUD + cursor pagination
# ---------------------------------------------------------------------------
async def test_create_records_batch(client: httpx.AsyncClient) -> None:
table_id = (await client.post("/api/v1/bitable/tables", json={"name": "T"})).json()["table"][
"id"
]
resp = await client.post(
f"/api/v1/bitable/tables/{table_id}/records",
json={"records": [{"a": 1}, {"a": 2}, {"a": 3}]},
)
assert resp.status_code == 200
data = resp.json()
assert data["count"] == 3
assert len(data["records"]) == 3
async def test_list_records_cursor_pagination(client: httpx.AsyncClient) -> None:
table_id = (await client.post("/api/v1/bitable/tables", json={"name": "T"})).json()["table"][
"id"
]
await client.post(
f"/api/v1/bitable/tables/{table_id}/records",
json={"records": [{"i": i} for i in range(5)]},
)
# Page 1
resp = await client.get(f"/api/v1/bitable/tables/{table_id}/records?limit=2")
assert resp.status_code == 200
data = resp.json()
assert len(data["records"]) == 2
assert data["next_cursor"] is not None
# Page 2
resp2 = await client.get(
f"/api/v1/bitable/tables/{table_id}/records?limit=2&cursor={data['next_cursor']}"
)
data2 = resp2.json()
assert len(data2["records"]) == 2
# No overlap
ids1 = {r["id"] for r in data["records"]}
ids2 = {r["id"] for r in data2["records"]}
assert ids1.isdisjoint(ids2)
async def test_list_records_with_filters(client: httpx.AsyncClient) -> None:
table_id = (await client.post("/api/v1/bitable/tables", json={"name": "T"})).json()["table"][
"id"
]
num_field_id = (
await client.post(
f"/api/v1/bitable/tables/{table_id}/fields",
json={"name": "amt", "field_type": "number", "owner": "agent"},
)
).json()["field"]["id"]
await client.post(
f"/api/v1/bitable/tables/{table_id}/records",
json={"records": [{num_field_id: 10}, {num_field_id: 50}, {num_field_id: 100}]},
)
filters = json.dumps([{"field_id": num_field_id, "op": "gt", "value": 40}])
resp = await client.get(
f"/api/v1/bitable/tables/{table_id}/records", params={"filters": filters}
)
assert resp.status_code == 200
data = resp.json()
assert len(data["records"]) == 2 # 50 and 100
async def test_update_record(client: httpx.AsyncClient) -> None:
table_id = (await client.post("/api/v1/bitable/tables", json={"name": "T"})).json()["table"][
"id"
]
record_id = (
await client.post(
f"/api/v1/bitable/tables/{table_id}/records", json={"records": [{"a": 1}]}
)
).json()["records"][0]["id"]
resp = await client.patch(f"/api/v1/bitable/records/{record_id}", json={"values": {"a": 99}})
assert resp.status_code == 200
assert resp.json()["record"]["values"]["a"] == 99
async def test_delete_single_record(client: httpx.AsyncClient) -> None:
table_id = (await client.post("/api/v1/bitable/tables", json={"name": "T"})).json()["table"][
"id"
]
record_id = (
await client.post(
f"/api/v1/bitable/tables/{table_id}/records", json={"records": [{"a": 1}]}
)
).json()["records"][0]["id"]
resp = await client.delete(f"/api/v1/bitable/records/{record_id}")
assert resp.status_code == 200
# Verify gone
resp2 = await client.get(f"/api/v1/bitable/tables/{table_id}/records")
assert len(resp2.json()["records"]) == 0
# ---------------------------------------------------------------------------
# Upsert endpoint (KTD8)
# ---------------------------------------------------------------------------
async def test_upsert_inserts_then_updates(client: httpx.AsyncClient) -> None:
table_id = (await client.post("/api/v1/bitable/tables", json={"name": "T"})).json()["table"][
"id"
]
pk_id = (
await client.post(
f"/api/v1/bitable/tables/{table_id}/fields",
json={"name": "id", "field_type": "text", "owner": "agent"},
)
).json()["field"]["id"]
data_id = (
await client.post(
f"/api/v1/bitable/tables/{table_id}/fields",
json={"name": "data", "field_type": "text", "owner": "agent"},
)
).json()["field"]["id"]
await client.patch(f"/api/v1/bitable/tables/{table_id}", json={"primary_key_field_id": pk_id})
# First: insert
resp = await client.post(
f"/api/v1/bitable/tables/{table_id}/upsert",
json={
"records": [{pk_id: "r1", data_id: "v1"}],
"primary_key_field_id": pk_id,
},
)
assert resp.status_code == 200
assert resp.json()["inserted"] == 1
assert resp.json()["updated"] == 0
# Second: update
resp2 = await client.post(
f"/api/v1/bitable/tables/{table_id}/upsert",
json={
"records": [{pk_id: "r1", data_id: "v2"}],
"primary_key_field_id": pk_id,
},
)
assert resp2.status_code == 200
assert resp2.json()["inserted"] == 0
assert resp2.json()["updated"] == 1
# Verify value
records = (await client.get(f"/api/v1/bitable/tables/{table_id}/records")).json()["records"]
assert len(records) == 1
assert records[0]["values"][data_id] == "v2"
async def test_upsert_preserves_user_columns(client: httpx.AsyncClient) -> None:
"""KTD8 via API: upsert updates agent columns, user columns untouched."""
table_id = (await client.post("/api/v1/bitable/tables", json={"name": "T"})).json()["table"][
"id"
]
pk_id = (
await client.post(
f"/api/v1/bitable/tables/{table_id}/fields",
json={"name": "id", "field_type": "text", "owner": "agent"},
)
).json()["field"]["id"]
agent_id = (
await client.post(
f"/api/v1/bitable/tables/{table_id}/fields",
json={"name": "agent_data", "field_type": "text", "owner": "agent"},
)
).json()["field"]["id"]
user_id = (
await client.post(
f"/api/v1/bitable/tables/{table_id}/fields",
json={"name": "user_data", "field_type": "text", "owner": "user"},
)
).json()["field"]["id"]
await client.patch(f"/api/v1/bitable/tables/{table_id}", json={"primary_key_field_id": pk_id})
# Insert with both agent and user values
await client.post(
f"/api/v1/bitable/tables/{table_id}/upsert",
json={
"records": [{pk_id: "r1", agent_id: "a1", user_id: "u1"}],
"primary_key_field_id": pk_id,
},
)
# Manually set user column (simulating user edit via PATCH)
records = (await client.get(f"/api/v1/bitable/tables/{table_id}/records")).json()["records"]
rec_id = records[0]["id"]
await client.patch(
f"/api/v1/bitable/records/{rec_id}",
json={"values": {**records[0]["values"], user_id: "USER_EDITED"}},
)
# Second upsert: tries to change user column — should be ignored
await client.post(
f"/api/v1/bitable/tables/{table_id}/upsert",
json={
"records": [{pk_id: "r1", agent_id: "a2", user_id: "SHOULD_NOT_APPLY"}],
"primary_key_field_id": pk_id,
},
)
records = (await client.get(f"/api/v1/bitable/tables/{table_id}/records")).json()["records"]
assert len(records) == 1
assert records[0]["values"][agent_id] == "a2" # updated
assert records[0]["values"][user_id] == "USER_EDITED" # preserved
# ---------------------------------------------------------------------------
# Views CRUD
# ---------------------------------------------------------------------------
async def test_create_view_success(client: httpx.AsyncClient) -> None:
table_id = (await client.post("/api/v1/bitable/tables", json={"name": "T"})).json()["table"][
"id"
]
resp = await client.post(
f"/api/v1/bitable/tables/{table_id}/views",
json={"name": "Grid View", "view_type": "grid", "config": {}},
)
assert resp.status_code == 200
assert resp.json()["view"]["name"] == "Grid View"
assert resp.json()["view"]["view_type"] == "grid"
async def test_list_views(client: httpx.AsyncClient) -> None:
table_id = (await client.post("/api/v1/bitable/tables", json={"name": "T"})).json()["table"][
"id"
]
for name in ("v1", "v2"):
await client.post(f"/api/v1/bitable/tables/{table_id}/views", json={"name": name})
resp = await client.get(f"/api/v1/bitable/tables/{table_id}/views")
assert resp.status_code == 200
assert len(resp.json()["views"]) == 2
async def test_update_view(client: httpx.AsyncClient) -> None:
table_id = (await client.post("/api/v1/bitable/tables", json={"name": "T"})).json()["table"][
"id"
]
view_id = (
await client.post(f"/api/v1/bitable/tables/{table_id}/views", json={"name": "Old"})
).json()["view"]["id"]
resp = await client.patch(f"/api/v1/bitable/views/{view_id}", json={"name": "New"})
assert resp.status_code == 200
assert resp.json()["view"]["name"] == "New"
# ---------------------------------------------------------------------------
# Formula validation (U5b)
# ---------------------------------------------------------------------------
async def test_validate_formula_valid(client: httpx.AsyncClient) -> None:
"""Valid formula returns valid=true."""
resp = await client.post(
"/api/v1/bitable/fields/validate-formula",
json={"formula": "1 + 2"},
)
assert resp.status_code == 200
data = resp.json()
assert data["valid"] is True
assert "error" not in data
async def test_validate_formula_with_field_ref(client: httpx.AsyncClient) -> None:
"""Formula with field reference is valid syntax."""
resp = await client.post(
"/api/v1/bitable/fields/validate-formula",
json={"formula": "{field_abc} + 1"},
)
assert resp.status_code == 200
assert resp.json()["valid"] is True
async def test_validate_formula_with_function(client: httpx.AsyncClient) -> None:
"""Formula with built-in function is valid."""
resp = await client.post(
"/api/v1/bitable/fields/validate-formula",
json={"formula": "SUM({f1}) + AVG({f2})"},
)
assert resp.status_code == 200
assert resp.json()["valid"] is True
async def test_validate_formula_syntax_error(client: httpx.AsyncClient) -> None:
"""Syntax error returns valid=false with error message."""
resp = await client.post(
"/api/v1/bitable/fields/validate-formula",
json={"formula": "1 +"},
)
assert resp.status_code == 200
data = resp.json()
assert data["valid"] is False
assert "error" in data
async def test_validate_formula_security_error(client: httpx.AsyncClient) -> None:
"""Dangerous constructs (import) are rejected."""
resp = await client.post(
"/api/v1/bitable/fields/validate-formula",
json={"formula": "__import__('os')"},
)
assert resp.status_code == 200
data = resp.json()
assert data["valid"] is False
assert "error" in data
async def test_validate_formula_unknown_function(client: httpx.AsyncClient) -> None:
"""Unknown function is rejected."""
resp = await client.post(
"/api/v1/bitable/fields/validate-formula",
json={"formula": "UNKNOWN_FUNC(1)"},
)
assert resp.status_code == 200
data = resp.json()
assert data["valid"] is False
assert "error" in data
async def test_validate_formula_requires_auth(unauth_client: httpx.AsyncClient) -> None:
"""No auth → 401."""
resp = await unauth_client.post(
"/api/v1/bitable/fields/validate-formula",
json={"formula": "1 + 2"},
)
assert resp.status_code == 401

View File

@ -0,0 +1,296 @@
"""Tests for bitable service layer (U2): upsert, field deletion, view filtering.
Requires PostgreSQL marked ``postgres``.
"""
from __future__ import annotations
import pytest
from agentkit.bitable.models import FieldOwner, FieldType
from agentkit.bitable.service import FieldDependencyError
pytestmark = pytest.mark.postgres
# ---------------------------------------------------------------------------
# Upsert (KTD8: jsonb_set preserves user columns)
# ---------------------------------------------------------------------------
async def test_upsert_inserts_new_records(bitable_service) -> None:
"""First upsert inserts all records."""
table = await bitable_service.create_table(name="T")
pk_field = await bitable_service.create_field(
table_id=table.id, name="id", field_type=FieldType.text, owner=FieldOwner.agent
)
data_field = await bitable_service.create_field(
table_id=table.id, name="data", field_type=FieldType.text, owner=FieldOwner.agent
)
await bitable_service.update_table(table.id, primary_key_field_id=pk_field.id)
result = await bitable_service.upsert_records(
table.id,
[
{pk_field.id: "row1", data_field.id: "hello"},
{pk_field.id: "row2", data_field.id: "world"},
],
pk_field.id,
)
assert result == {"inserted": 2, "updated": 0, "skipped": 0}
records, _ = await bitable_service.list_records(table.id)
assert len(records) == 2
async def test_upsert_updates_existing_preserves_user_columns(bitable_service) -> None:
"""KTD8: upsert updates agent columns via jsonb_set, user columns untouched."""
table = await bitable_service.create_table(name="T")
pk_field = await bitable_service.create_field(
table_id=table.id, name="id", field_type=FieldType.text, owner=FieldOwner.agent
)
agent_field = await bitable_service.create_field(
table_id=table.id, name="agent_data", field_type=FieldType.text, owner=FieldOwner.agent
)
user_field = await bitable_service.create_field(
table_id=table.id, name="user_data", field_type=FieldType.text, owner=FieldOwner.user
)
await bitable_service.update_table(table.id, primary_key_field_id=pk_field.id)
# First: insert with both agent and user values
await bitable_service.upsert_records(
table.id,
[{pk_field.id: "row1", agent_field.id: "agent_v1", user_field.id: "user_v1"}],
pk_field.id,
)
# Manually set user column (simulating user edit)
records, _ = await bitable_service.list_records(table.id)
assert len(records) == 1
rec = records[0]
await bitable_service.update_record_values(rec.id, {**rec.values, user_field.id: "USER_EDITED"})
# Second upsert: only agent column changes
result = await bitable_service.upsert_records(
table.id,
[{pk_field.id: "row1", agent_field.id: "agent_v2", user_field.id: "SHOULD_NOT_APPLY"}],
pk_field.id,
)
assert result == {"inserted": 0, "updated": 1, "skipped": 0}
# Verify: agent column updated, user column preserved
records, _ = await bitable_service.list_records(table.id)
assert len(records) == 1
rec = records[0]
assert rec.values[agent_field.id] == "agent_v2" # updated
assert rec.values[user_field.id] == "USER_EDITED" # preserved (NOT "SHOULD_NOT_APPLY")
async def test_upsert_skips_records_without_pk(bitable_service) -> None:
"""Records without PK value are skipped."""
table = await bitable_service.create_table(name="T")
pk_field = await bitable_service.create_field(
table_id=table.id, name="id", field_type=FieldType.text, owner=FieldOwner.agent
)
await bitable_service.update_table(table.id, primary_key_field_id=pk_field.id)
result = await bitable_service.upsert_records(
table.id,
[{pk_field.id: "row1"}, {}], # second has no PK
pk_field.id,
)
assert result == {"inserted": 1, "updated": 0, "skipped": 1}
async def test_upsert_empty_batch(bitable_service) -> None:
"""Empty batch returns all zeros."""
table = await bitable_service.create_table(name="T")
pk_field = await bitable_service.create_field(
table_id=table.id, name="id", field_type=FieldType.text, owner=FieldOwner.agent
)
result = await bitable_service.upsert_records(table.id, [], pk_field.id)
assert result == {"inserted": 0, "updated": 0, "skipped": 0}
async def test_upsert_without_pk_field_raises(bitable_service) -> None:
"""Upsert without primary_key_field_id raises ValueError."""
table = await bitable_service.create_table(name="T")
with pytest.raises(ValueError, match="primary_key_field_id"):
await bitable_service.upsert_records(table.id, [{}], "")
# ---------------------------------------------------------------------------
# Field deletion with dependency checking
# ---------------------------------------------------------------------------
async def test_delete_field_no_dependencies(bitable_service) -> None:
"""Deleting a field with no dependencies succeeds."""
table = await bitable_service.create_table(name="T")
field = await bitable_service.create_field(
table_id=table.id, name="f", field_type=FieldType.text
)
deleted = await bitable_service.delete_field(field.id)
assert deleted is True
assert await bitable_service.get_field(field.id) is None
async def test_delete_field_referenced_by_formula_returns_deps(bitable_service) -> None:
"""Deleting a field referenced by a formula raises FieldDependencyError."""
table = await bitable_service.create_table(name="T")
source_field = await bitable_service.create_field(
table_id=table.id, name="source", field_type=FieldType.number
)
formula_field = await bitable_service.create_field(
table_id=table.id,
name="calc",
field_type=FieldType.formula,
config={"formula_expr": f"={source_field.id} * 2"},
)
with pytest.raises(FieldDependencyError) as exc_info:
await bitable_service.delete_field(source_field.id)
deps = exc_info.value.dependencies
assert "formula_fields" in deps
assert any(f["id"] == formula_field.id for f in deps["formula_fields"])
async def test_delete_primary_key_field_returns_deps(bitable_service) -> None:
"""Deleting the primary key field raises FieldDependencyError."""
table = await bitable_service.create_table(name="T")
pk_field = await bitable_service.create_field(
table_id=table.id, name="id", field_type=FieldType.text
)
await bitable_service.update_table(table.id, primary_key_field_id=pk_field.id)
with pytest.raises(FieldDependencyError) as exc_info:
await bitable_service.delete_field(pk_field.id)
assert exc_info.value.dependencies.get("is_primary_key") is True
async def test_delete_field_force_casces_cleanup(bitable_service) -> None:
"""Force delete cascades: removes field from records, marks formula as error."""
table = await bitable_service.create_table(name="T")
source_field = await bitable_service.create_field(
table_id=table.id, name="source", field_type=FieldType.number, owner=FieldOwner.agent
)
formula_field = await bitable_service.create_field(
table_id=table.id,
name="calc",
field_type=FieldType.formula,
config={"formula_expr": f"={source_field.id} * 2"},
)
# Create a record with the source field value
record = await bitable_service.create_record(table_id=table.id, values={source_field.id: 42})
# Force delete
deleted = await bitable_service.delete_field(source_field.id, force=True)
assert deleted is True
# Record should no longer have the source field key
rec = await bitable_service.get_record(record.id)
assert rec is not None
assert source_field.id not in rec.values
# Formula field should have error in config
formula = await bitable_service.get_field(formula_field.id)
assert formula is not None
assert "error" in formula.config
# ---------------------------------------------------------------------------
# View-filtered record listing
# ---------------------------------------------------------------------------
async def test_list_records_filtered_by_number_gt(bitable_service) -> None:
"""View filter with gt op on number field correctly filters (CAST NUMERIC)."""
table = await bitable_service.create_table(name="T")
num_field = await bitable_service.create_field(
table_id=table.id, name="amount", field_type=FieldType.number, owner=FieldOwner.agent
)
# Create records with various amounts
for amt in [10, 50, 100, 200]:
await bitable_service.create_record(table_id=table.id, values={num_field.id: amt})
# Filter: amount > 50
records, _ = await bitable_service.list_records_filtered(
table.id,
filters=[{"field_id": num_field.id, "op": "gt", "value": 50}],
)
amounts = [r.values[num_field.id] for r in records]
assert all(a > 50 for a in amounts)
assert len(records) == 2 # 100 and 200
async def test_list_records_filtered_by_text_eq(bitable_service) -> None:
"""View filter with eq op on text field."""
table = await bitable_service.create_table(name="T")
text_field = await bitable_service.create_field(
table_id=table.id, name="status", field_type=FieldType.text, owner=FieldOwner.agent
)
for status in ["open", "closed", "open", "pending"]:
await bitable_service.create_record(table_id=table.id, values={text_field.id: status})
records, _ = await bitable_service.list_records_filtered(
table.id,
filters=[{"field_id": text_field.id, "op": "eq", "value": "open"}],
)
assert len(records) == 2
assert all(r.values[text_field.id] == "open" for r in records)
async def test_list_records_filtered_with_sort(bitable_service) -> None:
"""View sort by number field descending."""
table = await bitable_service.create_table(name="T")
num_field = await bitable_service.create_field(
table_id=table.id, name="score", field_type=FieldType.number, owner=FieldOwner.agent
)
for score in [30, 10, 50, 20]:
await bitable_service.create_record(table_id=table.id, values={num_field.id: score})
records, _ = await bitable_service.list_records_filtered(
table.id,
sorts=[{"field_id": num_field.id, "direction": "desc"}],
)
# Records should be sorted by score descending (as text, but single/double digit sorts OK)
assert len(records) == 4
async def test_list_records_filtered_cursor_pagination(bitable_service) -> None:
"""Cursor pagination with filters."""
table = await bitable_service.create_table(name="T")
text_field = await bitable_service.create_field(
table_id=table.id, name="name", field_type=FieldType.text, owner=FieldOwner.agent
)
for i in range(5):
await bitable_service.create_record(table_id=table.id, values={text_field.id: f"item_{i}"})
# First page
records, next_cursor = await bitable_service.list_records_filtered(table.id, limit=2)
assert len(records) == 2
assert next_cursor is not None
# Second page
records2, next_cursor2 = await bitable_service.list_records_filtered(
table.id, cursor=next_cursor, limit=2
)
assert len(records2) == 2
assert next_cursor2 is not None
# Third page
records3, next_cursor3 = await bitable_service.list_records_filtered(
table.id, cursor=next_cursor2, limit=2
)
assert len(records3) == 1
assert next_cursor3 is None
# All records unique
all_ids = {r.id for r in [records, records2, records3] for r in r}
assert len(all_ids) == 5

View File

@ -437,12 +437,12 @@ class TestTokenUsageMiddleware:
@pytest.mark.asyncio @pytest.mark.asyncio
async def test_after_extracts_usage_from_result(self) -> None: async def test_after_extracts_usage_from_result(self) -> None:
"""after 从 result 提取 token_usage""" """after 从 result 提取 total_tokensReActResult 属性名)"""
mw = TokenUsageMiddleware() mw = TokenUsageMiddleware()
ctx = _make_ctx() ctx = _make_ctx()
result = MagicMock() result = MagicMock()
result.token_usage = {"total": 100} result.total_tokens = {"total": 100}
await mw.after(ctx, result) await mw.after(ctx, result)

View File

@ -273,7 +273,7 @@ class TestCheckpointTTL:
from datetime import datetime, timedelta, timezone from datetime import datetime, timedelta, timezone
expired_time = (datetime.now(timezone.utc) - timedelta(seconds=10)).isoformat() expired_time = (datetime.now(timezone.utc) - timedelta(seconds=10)).isoformat()
cp._memory["plan_1"][0].saved_at = expired_time cp._memory["plan_1"]["p1"].saved_at = expired_time
# 过期后 load 应返回 None # 过期后 load 应返回 None
loaded = await cp.load("plan_1") loaded = await cp.load("plan_1")
@ -329,7 +329,7 @@ class TestCheckpointTTL:
from datetime import datetime, timedelta, timezone from datetime import datetime, timedelta, timezone
expired_time = (datetime.now(timezone.utc) - timedelta(seconds=10)).isoformat() expired_time = (datetime.now(timezone.utc) - timedelta(seconds=10)).isoformat()
cp._memory["plan_1"][0].saved_at = expired_time cp._memory["plan_1"]["p1"].saved_at = expired_time
# list 应过滤掉过期的,只返回 1 个 # list 应过滤掉过期的,只返回 1 个
checkpoints = await cp.list_checkpoints("plan_1") checkpoints = await cp.list_checkpoints("plan_1")
@ -363,7 +363,7 @@ class TestCheckpointTTL:
from datetime import datetime, timedelta, timezone from datetime import datetime, timedelta, timezone
expired_time = (datetime.now(timezone.utc) - timedelta(seconds=10)).isoformat() expired_time = (datetime.now(timezone.utc) - timedelta(seconds=10)).isoformat()
cp._memory["plan_1"][0].saved_at = expired_time cp._memory["plan_1"]["p1"].saved_at = expired_time
# 内存降级 + TTL 过期 → 应返回 None # 内存降级 + TTL 过期 → 应返回 None
loaded = await cp.load("plan_1") loaded = await cp.load("plan_1")

View File

@ -193,7 +193,7 @@ class TestSkillMdToSkillConfig:
assert config.name == "content-generator" assert config.name == "content-generator"
assert config.description != "" assert config.description != ""
assert config.disclosure_level == 0 assert config.disclosure_level == 1
# Level 0: prompt 仅含 identity概要信息 # Level 0: prompt 仅含 identity概要信息
assert config.prompt is not None assert config.prompt is not None
assert "identity" in config.prompt assert "identity" in config.prompt
@ -257,14 +257,14 @@ class TestSkillConfigNewFields:
) )
assert config.skill_md_path is None assert config.skill_md_path is None
def test_default_disclosure_level_is_zero(self): def test_default_disclosure_level_is_one(self):
config = SkillConfig( config = SkillConfig(
name="test", name="test",
agent_type="test", agent_type="test",
task_mode="llm_generate", task_mode="llm_generate",
prompt={"identity": "test"}, prompt={"identity": "test"},
) )
assert config.disclosure_level == 0 assert config.disclosure_level == 1
def test_skill_md_path_set(self): def test_skill_md_path_set(self):
config = SkillConfig( config = SkillConfig(
@ -321,7 +321,7 @@ class TestSkillConfigNewFields:
} }
config = SkillConfig.from_dict(data) config = SkillConfig.from_dict(data)
assert config.skill_md_path is None assert config.skill_md_path is None
assert config.disclosure_level == 0 assert config.disclosure_level == 1
# ── SkillLoader.load_from_skill_md 测试 ─────────────────── # ── SkillLoader.load_from_skill_md 测试 ───────────────────