docs: add bitable security/reliability patterns solution doc + CONCEPTS.md

- docs/solutions/architecture-patterns/bitable-companion-service-security-reliability-patterns.md Knowledge-track doc capturing 10 security/reliability patterns from the bitable companion service (SSRF prevention, SQL injection, IDOR, atomic task claiming, cache invalidation, composite cursor, batch ops, async I/O safety, OOM prevention, internal token auth) - CONCEPTS.md Seeded with 3 core domain nouns: Bitable, Field Ownership, Recalc - AGENTS.md Added discoverability tips for docs/solutions/ and CONCEPTS.md
2026-06-25 01:25:06 +08:00 · 2026-06-25 01:25:06 +08:00 · 71eaf8dc7c
parent bbbf9cd40a
commit 71eaf8dc7c
3 changed files with 448 additions and 0 deletions
--- a/AGENTS.md
+++ b/AGENTS.md
@ -196,6 +196,8 @@ CLI 参数 > `agentkit.yaml` > 环境变量（`${VAR:-default}`）> `.env` > 硬
 - 测试文件：`tests/unit/` 和 `tests/integration/`
 - 前端 stores：Pinia，每个领域一个（chat、team、settings）
 - 前端组件：`src/agentkit/server/frontend/src/components/`
+- 知识库：`docs/solutions/` — 已记录的问题解决方案（bug 修复、最佳实践、架构模式），按类别组织，YAML frontmatter 可搜索（`module`、`tags`、`problem_type`）。在已记录的领域内实现或调试时相关。
+- 领域词汇：`CONCEPTS.md` — 项目共享领域词汇表（实体、命名流程、状态概念），在熟悉代码库或讨论领域概念时相关。

 ## 边界

--- a/CONCEPTS.md
+++ b/CONCEPTS.md
@ -0,0 +1,14 @@
+# Concepts
+
+Shared domain vocabulary for this project — entities, named processes, and status concepts with project-specific meaning. Seeded with core domain vocabulary, then accretes as ce-compound and ce-compound-refresh process learnings; direct edits are fine. Glossary only, not a spec or catch-all.
+
+## Bitable
+
+### Bitable
+A companion service providing multi-dimensional table storage (structured records with typed fields, views, and formula columns) that runs alongside the main AgentKit server. Owns its own PostgreSQL schema and exposes a REST API at `/api/v1/bitable`. Agents interact with it over HTTP using an internal service token; end-users interact via JWT auth.
+
+### Field Ownership
+The model controlling which actor may modify a field's definition or values. Every field has an owner of either `agent` or `user`. Agent-owned fields are written by the BitableTool during data ingestion; user-owned fields are edited by humans in the UI. Upsert operations update only agent-owned fields via `jsonb_set` — user-owned fields are never overwritten by agent writes.
+
+### Recalc
+The background process that recomputes formula field values after source data changes. When a record's data columns are written, a recalc task is enqueued; a RecalcWorker claims tasks atomically (FOR UPDATE SKIP LOCKED), evaluates the formula DAG in topological order, and writes results back. The reaper resets stale `calculating` tasks to `pending` for crash recovery.
--- a/docs/solutions/architecture-patterns/bitable-companion-service-security-reliability-patterns.md
+++ b/docs/solutions/architecture-patterns/bitable-companion-service-security-reliability-patterns.md
@ -0,0 +1,432 @@
+---
+title: "Bitable Companion Service: Security and Reliability Patterns"
+date: 2026-06-25
+category: docs/solutions/architecture-patterns/
+module: bitable
+problem_type: architecture_pattern
+component: service_object
+severity: high
+applies_when:
+  - Building a companion service that runs alongside a main server with its own schema and REST API boundary
+  - Accepting user-supplied URLs for server-side fetching (SSRF prevention required)
+  - Interpolating dynamic identifiers into SQL (field_id, table_id validation before interpolation)
+  - Multi-tenant resource access where every table/record endpoint must verify ownership (IDOR)
+  - Concurrent background workers claiming tasks from a shared queue (atomic claiming)
+  - Caching computed values (formula engines) that must invalidate on schema changes
+  - Paginating with non-id sort columns requiring stable composite cursors
+  - Bulk data import/export paths that risk OOM without streaming and batching
+  - Mixing synchronous libraries (openpyxl, pandas) into an async server (asyncio.to_thread)
+  - Service-to-service authentication between companion and main server (internal token)
+tags: [security, ssrf, sql-injection, idor, concurrency, cache-invalidation, pagination, batch-operations, async-io, internal-auth, oom-prevention, companion-service, bitable]
+---
+
+# Bitable Companion Service: Security and Reliability Patterns
+
+## Context
+
+The Fischer AgentKit project (Python 3.11+, FastAPI, SQLAlchemy 2 async, Vue 3) introduced a **Bitable** (multi-dimensional table) companion service to give agents and end-users a structured-record store alongside AgentKit's existing chat, memory, and orchestration layers. Unlike a greenfield module, the Bitable service had to slot into a deployed system that already runs an HTTP API, a PostgreSQL database, and a Redis bus — and it had to do so without weakening AgentKit's existing security posture.
+
+During `ce-code-review` and `ce-debug` of the new service, multiple security and reliability issues surfaced. They were not exotic bugs; they were the standard failure modes that show up whenever a CRUD service with URL-based ingestion, dynamic SQL filtering, file uploads, and background workers is bolted onto an existing platform:
+
+- SSRF exposure from a "fetch Excel from URL" feature
+- SQL injection through field IDs interpolated into JSONB path literals
+- IDOR (insecure direct object reference) on every table-level endpoint
+- Race conditions when multiple recalc workers pull from the same queue
+- Stale formula-engine caches after field schema changes
+- Broken pagination when sorting by non-id columns
+- N+1 queries on batch lookups and upserts
+- Event-loop blocking from sync Excel/DB-reflect calls
+- OOM risk from unbounded uploads and bulk deletes
+
+This document captures the architecture patterns and security/reliability practices applied during the fix, so the next companion service (or the next reviewer of this one) starts from a known-good baseline rather than re-deriving each rule from first principles.
+
+## Guidance
+
+The patterns below are organized by the concern they address. Each is shown with the actual code shape used in the Bitable service.
+
+### 1. Companion Service Isolation
+
+A companion service should be **isolated at the data, API, and auth layers**, even when co-deployed with the host system. Sharing tables or in-process calls saves a few lines today and couples failure domains forever.
+
+**Data isolation.** The Bitable service owns its own `bitable` PostgreSQL schema. No AgentKit core table is read or written by Bitable code, and vice versa. A bug in either system cannot corrupt the other's data.
+
+**API boundary.** Even though Bitable ships inside the same FastAPI app, agents and core code call it over HTTP at `/api/v1/bitable`. This keeps the boundary explicit, makes the service trivially extractable later, and forces every caller through the same auth and validation path. The lifespan init in `src/agentkit/server/app.py` mounts the Bitable router and initializes its engine/schema alongside the core app, but the route prefix is the contract.
+
+**Internal token auth.** Agent-to-Bitable calls use a separate `X-Internal-Token` header verified with `hmac.compare_digest` (constant-time, side-channel resistant). This is distinct from the user-facing JWT auth — internal service callers get a bypass on ownership checks (see pattern 4) but never on validation.
+
+```python
+# src/agentkit/server/routes/bitable.py
+async def require_bitable_auth(request: Request) -> dict[str, Any]:
+    internal_token = getattr(request.app.state, "bitable_internal_token", None)
+    if internal_token:
+        provided = request.headers.get("X-Internal-Token", "")
+        if provided and hmac.compare_digest(provided, internal_token):
+            return {"user_id": "__bitable_internal__", "internal": True}
+    user = await get_current_user(request)
+    if user is None:
+        raise HTTPException(status_code=401, detail="Authentication required")
+    return user
+```
+
+### 2. SSRF Prevention in URL-based Ingestion
+
+Any feature that fetches a user-supplied URL is an SSRF vector. The standard defense is: validate scheme, resolve hostname, reject private/loopback/reserved IPs, and **re-validate on every redirect hop**. The Bitable `parse_excel_url` flow implements all four.
+
+```python
+# src/agentkit/bitable/ingestion/excel.py
+def _assert_safe_host(host: str) -> None:
+    try:
+        addr = ipaddress.ip_address(host)
+    except ValueError:
+        infos = socket.getaddrinfo(host, None)
+        for info in infos:
+            ip_str = info[4][0]
+            try:
+                addr = ipaddress.ip_address(ip_str)
+            except ValueError:
+                continue
+            if _is_unsafe_ip(addr):
+                raise ValueError(f"Host {host!r} resolves to private/reserved IP {addr}")
+        return
+    if _is_unsafe_ip(addr):
+        raise ValueError(f"Host {host!r} is a private/loopback/reserved IP: {addr}")
+
+def _is_unsafe_ip(addr: ipaddress._BaseAddress) -> bool:
+    return (
+        addr.is_private or addr.is_loopback or addr.is_link_local
+        or addr.is_reserved or addr.is_multicast or addr.is_unspecified
+    )
+```
+
+Two details matter:
+
+1. **Check every resolved address, not just the first.** DNS can return multiple A records; an attacker may mix a public and a private IP. Reject if *any* resolved address is unsafe.
+2. **Follow redirects manually with re-validation.** `httpx.get(url, follow_redirects=True)` silently follows to `http://169.254.169.254/...`. The ingestion code disables auto-redirect, inspects the `Location` header, and runs `_assert_safe_host` on each hop before fetching.
+
+### 3. SQL Injection Prevention via Field ID Validation
+
+The Bitable repository interpolates field IDs into JSONB path literals (`values->>'field_id'`). Because these are string interpolations into SQL fragments, an unvalidated `field_id` is a direct injection vector. The defense is **allow-list validation at the service layer**: before any filter or sort reaches the repository, the service confirms every `field_id` against the table's actual fields.
+
+```python
+# src/agentkit/bitable/service.py — list_records_filtered
+if filters or sorts:
+    fields = await self._repo.list_fields(table_id)
+    valid_field_ids = {f.id for f in fields}
+    for f in filters or []:
+        fid = f.get("field_id", "")
+        if fid not in valid_field_ids:
+            raise ValueError(f"Unknown field_id in filter: {fid}")
+```
+
+This is stronger than parameterization alone: parameterization prevents injection but still lets a caller reference a field from a *different* table if the ID leaks. The allow-list check enforces that the field belongs to the table being queried.
+
+### 4. IDOR Protection via Ownership Checks
+
+Every table-level endpoint (`GET /tables/{id}`, `POST /tables/{id}/records`, etc.) is an IDOR risk if it accepts an ID and returns data without checking ownership. The Bitable service centralizes this in a `_check_table_ownership` helper called at the top of every table-scoped route.
+
+```python
+# src/agentkit/server/routes/bitable.py
+async def _check_table_ownership(service, table_id, user):
+    table = await service.get_table(table_id)
+    if table is None:
+        raise HTTPException(status_code=404, detail="Table not found")
+    if user.get("internal"):
+        return  # Internal service token bypasses ownership
+    if table.owner_user_id and table.owner_user_id != user.get("user_id"):
+        raise HTTPException(status_code=403, detail="Not authorized")
+```
+
+Two design choices are load-bearing:
+
+- **404 before 403.** Returning 404 for a table the caller doesn't own prevents existence disclosure. An attacker probing IDs cannot distinguish "does not exist" from "exists but not yours."
+- **Internal bypass is explicit and scoped.** Internal service callers (pattern 1) skip ownership but still go through validation. The bypass is a single `if user.get("internal"): return` — never a blanket `if internal: skip_all_checks`.
+
+### 5. Atomic Task Claiming with FOR UPDATE SKIP LOCKED
+
+When multiple recalc workers pull from the same `recalc_queue` table, a naive `SELECT ... WHERE status='pending' LIMIT 1` lets two workers grab the same row. The fix is a single atomic claim query using `FOR UPDATE SKIP LOCKED`:
+
+```python
+# src/agentkit/bitable/repository.py — claim_recalc_tasks
+subq = (
+    select(RecalcQueueModel.id)
+    .where(RecalcQueueModel.status == RecalcStatus.pending.value)
+    .order_by(RecalcQueueModel.queued_at)
+    .limit(limit)
+    .with_for_update(skip_locked=True)
+).subquery()
+stmt = (
+    update(RecalcQueueModel)
+    .where(RecalcQueueModel.id.in_(select(subq.c.id)))
+    .values(status=RecalcStatus.calculating.value)
+    .returning(RecalcQueueModel)
+)
+```
+
+`SKIP LOCKED` means a row already locked by another worker is invisible to this query, so concurrent workers naturally partition the queue without coordination. The `UPDATE ... RETURNING` wraps the claim and the state transition in one statement, so there is no window where a row is "claimed but still pending."
+
+### 6. Formula Engine Cache Invalidation
+
+The `RecalcWorker` caches a `FormulaEngine` per `table_id` (`_engines`). Without invalidation, a field rename or type change leaves the worker computing against a stale schema. The service calls `worker.invalidate_engine(table_id)` on every field create/update/delete.
+
+```python
+# src/agentkit/bitable/service.py — _invalidate_engine_cache
+def _invalidate_engine_cache(self, table_id: str) -> None:
+    if self._recalc_worker is not None:
+        self._recalc_worker.invalidate_engine(table_id)
+```
+
+The construction-order wrinkle: the service creates the worker, but the worker needs a back-reference to the service for recalc triggers. This cycle is broken by `set_recalc_worker()` — the service constructs the worker, then registers itself. Cache invalidation is the service calling *into* the worker; recalc triggering is the worker calling *into* the service. Both directions work because registration happens after construction.
+
+### 7. Composite Cursor Pagination for Non-id Sorts
+
+When a list endpoint sorts by a field value (not the primary key), the naive `WHERE id > cursor` breaks: the cursor is an id, but the sort is by `sort_val`, so the next page is wrong. The fix is a **composite cursor**: base64-encoded JSON carrying both the last row's id and its sort values.
+
+```json
+{"id": "rec_abc", "sv": ["2026-06-20", "high"]}
+```
+
+The repository decodes the cursor and uses a row-value comparison to fetch the next page:
+
+```sql
+-- ASC sort
+WHERE (sort_col1, sort_col2, id) > (cursor_val1, cursor_val2, cursor_id)
+-- DESC sort
+WHERE (sort_col1, sort_col2, id) < (cursor_val1, cursor_val2, cursor_id)
+```
+
+The `id` is always the final tie-breaker so pagination is stable even when sort values collide. This lives in `src/agentkit/bitable/repository.py` (`list_records_filtered`).
+
+### 8. Batch Operations to Eliminate N+1
+
+Three batch patterns eliminate the most common N+1 loops in the service:
+
+- **`find_records_by_pk_batch`**: one `SELECT ... WHERE pk = ANY(:pk_values)` instead of N lookups.
+- **`create_records_batch`**: `insert().values(rows).returning()` for bulk insert with returned rows.
+- **`upsert_records`**: one `SELECT` to load existing records by primary key, then partition the input into insert and update lists — two queries total, not N+1.
+
+```python
+# src/agentkit/bitable/service.py — upsert_records (batch pattern)
+pk_values_by_str = {str(rec_values.get(pk_field_id)): rec_values for rec_values in records}
+existing_map = await self._repo.find_records_by_pk_batch(
+    table_id, primary_key_field_id, list(pk_values_by_str.keys())
+)
+for pk_str, rec_values in pk_values_by_str.items():
+    existing = existing_map.get(pk_str)
+    if existing is None:
+        to_insert.append(rec_values)
+    else:
+        to_update.append((rec_values, existing.id))
+```
+
+### 9. Async I/O Safety
+
+Sync I/O must never run on the event loop thread. The Bitable tool layer wraps every sync call in `asyncio.to_thread()`:
+
+```python
+# src/agentkit/tools/bitable_tool.py
+if file_path:
+    sheets = await asyncio.to_thread(parse_excel, file_path)
+else:
+    sheets = await asyncio.to_thread(parse_excel_url, file_url)
+```
+
+This covers `parse_excel` (openpyxl is sync), `parse_excel_url` (httpx sync client), and `import_db_table` (SQLAlchemy reflection is sync). Without `to_thread`, a single slow Excel parse blocks every other request on the worker.
+
+### 10. OOM Prevention
+
+Three bounds keep memory predictable under load:
+
+- **Streaming file upload.** The upload endpoint reads the body in 64KB chunks and rejects the request the moment the running size exceeds 10MB — no buffering the whole file first.
+- **Batch delete via cursor iteration.** Attachment cleanup on table delete iterates records with cursor pagination and deletes attachments per page, instead of `SELECT *` loading every record into memory.
+- **Batch insert in 500-row chunks.** `create_records_batch` splits large inputs into 500-row chunks to keep statement size and memory bounded.
+
+```python
+# src/agentkit/server/routes/bitable.py — streaming upload
+total_size = 0
+with open(file_path, "wb") as f:
+    while True:
+        chunk = await file.read(64 * 1024)  # 64KB chunks
+        if not chunk:
+            break
+        total_size += len(chunk)
+        if total_size > MAX_UPLOAD_SIZE:
+            f.close()
+            file_path.unlink(missing_ok=True)
+            raise HTTPException(status_code=413, detail="File exceeds 10 MB limit")
+        f.write(chunk)
+```
+
+## Why This Matters
+
+These patterns are not theoretical. Each maps to a concrete failure mode observed in review or debug:
+
+| Pattern | Failure it prevents |
+|---|---|
+| Service isolation | A Bitable bug corrupting AgentKit core tables; an AgentKit outage taking down Bitable |
+| SSRF prevention | Cloud metadata endpoint exfiltration (`169.254.169.254`); internal port scanning |
+| Field ID validation | SQL injection via `values->>'field_id'` path literals |
+| Ownership checks | IDOR — user A reading user B's tables by guessing IDs |
+| `SKIP LOCKED` claiming | Duplicate recalc execution; lost tasks under worker contention |
+| Cache invalidation | Stale formula results after field schema changes — silent data corruption |
+| Composite cursor | Broken pagination — skipped or duplicated rows on non-id sorts |
+| Batch operations | N+1 queries turning a 1000-record import into 1001 round-trips |
+| `asyncio.to_thread` | Event-loop stalls blocking every concurrent request |
+| OOM bounds | Worker killed mid-request on a 500MB upload or a million-row delete |
+
+The cost of skipping any one of these is a security advisory or an outage. The cost of applying them is a few dozen lines per pattern, most of which are reusable across the next companion service.
+
+A second reason this matters: **the patterns compose**. Service isolation (1) is what makes the internal-token bypass (4) safe — the bypass is scoped to one service, not the whole app. Field ID validation (3) is what makes the composite cursor (7) safe — the cursor carries sort values that are already validated. `SKIP LOCKED` (5) is what makes the recalc cache invalidation (6) correct — without atomic claiming, two workers could invalidate and recompute the same engine concurrently. Treating these as independent checklists misses the dependencies.
+
+## When to Apply
+
+Apply these patterns whenever you are:
+
+- **Adding a companion service to an existing platform.** The isolation, internal-token, and ownership-check patterns apply regardless of the service's domain.
+- **Building any feature that fetches a user-supplied URL.** SSRF prevention (scheme validation, IP allow-listing, manual redirect re-validation) is mandatory, not optional — even for "internal-only" features, because internal-only has a way of becoming external.
+- **Interpolating user-controlled identifiers into SQL.** Field IDs, column names, and table names cannot be parameterized; they must be allow-listed against the actual schema.
+- **Running background workers against a shared queue.** `FOR UPDATE SKIP LOCKED` is the correct default for any claim-then-process pattern on PostgreSQL.
+- **Caching derived state tied to a mutable schema.** Any cache keyed by `table_id` (or equivalent) needs an invalidation hook on schema changes, and the hook must be called from the same code path that mutates the schema.
+- **Paginating by a non-unique sort key.** A composite cursor with the primary key as the final tie-breaker is the general solution.
+- **Doing bulk reads, writes, or deletes.** Batch the operation and bound the chunk size.
+- **Calling sync libraries from an async framework.** `asyncio.to_thread` is the floor; an async-native client is better when available.
+
+These patterns do **not** apply when:
+
+- The service is a throwaway prototype with no external callers — isolation overhead is not justified.
+- The URL fetch is hardcoded to a trusted internal endpoint with no user input — SSRF prevention is redundant (but document why).
+- The sort is always by primary key — a simple `WHERE id > cursor` is correct and cheaper than a composite cursor.
+
+## Examples
+
+### Example 1: SSRF — before and after
+
+**Before (vulnerable):**
+
+```python
+def parse_excel_url(url: str) -> Workbook:
+    resp = httpx.get(url, follow_redirects=True)  # follows redirects silently, no IP check
+    return parse_excel_bytes(resp.content)
+```
+
+An attacker submits `http://internal-service.attacker.com/redirect` which 302s to `http://169.254.169.254/latest/meta-data/iam/security-credentials/`. The server happily fetches cloud metadata.
+
+**After (hardened):**
+
+```python
+def parse_excel_url(url: str) -> list[ParsedSheet]:
+    parsed = urlparse(url)
+    if parsed.scheme not in ("http", "https"):
+        raise ValueError(f"Disallowed URL scheme: {parsed.scheme!r}")
+    _assert_safe_host(parsed.hostname)
+
+    resp = httpx.get(url, follow_redirects=False)
+    seen_redirects = 0
+    while resp.is_redirect and seen_redirects < 5:
+        seen_redirects += 1
+        next_url = httpx.URL(url).join(resp.headers["location"])
+        if next_url.scheme not in ("http", "https") or not next_url.host:
+            raise ValueError(f"Unsafe redirect target: {next_url}")
+        _assert_safe_host(next_url.host)
+        resp = httpx.get(next_url, follow_redirects=False)
+        url = str(next_url)
+    return parse_excel_bytes(resp.content)
+```
+
+Every hop is re-validated. A redirect to a private IP raises before the fetch.
+
+### Example 2: IDOR — before and after
+
+**Before (vulnerable):**
+
+```python
+@router.get("/tables/{table_id}/records")
+async def list_records(table_id: str, user=Depends(get_user)):
+    return await service.list_records(table_id)  # no ownership check
+```
+
+User B passes user A's `table_id` and reads all records.
+
+**After (hardened):**
+
+```python
+@router.get("/tables/{table_id}/records")
+async def list_records(table_id: str, user=Depends(require_bitable_auth)):
+    await _check_table_ownership(service, table_id, user)
+    return await service.list_records(table_id)
+```
+
+The ownership helper returns 404 (not 403) for tables the caller doesn't own, so existence is not disclosed.
+
+### Example 3: N+1 — before and after
+
+**Before (N+1):**
+
+```python
+async def upsert_records(table_id, records):
+    for r in records:
+        existing = await repo.find_record_by_pk(table_id, r.pk)  # N queries
+        if existing:
+            await repo.update_record(existing.id, r)  # N queries
+        else:
+            await repo.create_record(table_id, r)  # N queries
+```
+
+1000 records → 2000+ queries.
+
+**After (batched):**
+
+```python
+async def upsert_records(table_id, records, pk_field_id):
+    pk_values = [str(r.get(pk_field_id)) for r in records]
+    existing_map = await repo.find_records_by_pk_batch(table_id, pk_field_id, pk_values)  # 1 query
+    to_insert = [r for r in records if str(r.get(pk_field_id)) not in existing_map]
+    to_update = [(r, existing_map[str(r.get(pk_field_id))].id) for r in records if str(r.get(pk_field_id)) in existing_map]
+    for r in to_insert:
+        await repo.create_record(table_id, r)
+    for r, eid in to_update:
+        await repo.upsert_record_agent_fields(eid, agent_values)
+```
+
+1000 records → 1 SELECT + N INSERT/UPDATE (batch INSERT available via `create_records_batch`).
+
+### Example 4: Composite cursor — before and after
+
+**Before (broken on non-id sort):**
+
+```python
+# sort by "priority" field, paginate by id
+stmt = text("SELECT * FROM bitable_records WHERE table_id = :tid ORDER BY values->>'priority' LIMIT 50")
+if cursor:
+    stmt = stmt.where(text("id > :cursor"))  # WRONG: cursor is id, sort is priority
+```
+
+Page 2 returns rows with `id > last_id` but sorted by priority — rows with high priority but low id are skipped.
+
+**After (composite cursor):**
+
+```python
+# cursor = base64({"id": "rec_abc", "sv": ["high"]})
+if cursor:
+    decoded = json.loads(base64.b64decode(cursor))
+    cursor_id = decoded["id"]
+    cursor_sort_vals = decoded["sv"]
+    # Row-value comparison: (priority, id) > (cursor_val, cursor_id) for ASC
+    where_clauses.append(
+        f"(values->>'priority', id) > (:csv0, :cursor_id)"
+    )
+```
+
+Pagination is stable regardless of sort direction or value collisions.
+
+## Related
+
+- **`docs/plans/2026-06-24-001-feat-bitable-companion-service-plan.md`** — implementation plan with KTD7 (formula AST security), KTD11 (internal token auth), batch chunking, async recalc worker design.
+- **`docs/brainstorms/2026-06-24-bitable-module-requirements.md`** — origin requirements defining companion service architecture, upsert semantics, field ownership model.
+- **`docs/plans/2026-06-14-001-feat-p0-production-hardening-plan.md`** — cross-cutting SSRF protection patterns (KTD4: protocol whitelist + private IP range blocking).
+- **`docs/plans/2026-06-10-018-fix-agentkit-p2-hardening-plan.md`** — SSRF DNS rebinding, `hmac.compare_digest`, concurrency safety patterns.
+- **AgentKit AGENTS.md** — project-level rules on async generator safety, `hmac.compare_digest` for API key comparison, and the `HandoffTransport` bounded-queue sentinel pattern.
+- **PostgreSQL docs — `FOR UPDATE SKIP LOCKED`** — canonical reference for concurrent queue claiming.
+- **OWASP A01:2021 — Broken Access Control** — covers the IDOR pattern addressed by ownership checks.
+- **OWASP A10:2021 — Server-Side Request Forgery** — covers the URL-ingestion SSRF pattern.
+
+Future companion services in this repo (a vector store, a workflow runner, etc.) should be reviewed against this checklist before merge. The patterns are domain-agnostic; only the field names change.