From 71eaf8dc7c7b44ee19ab5162b01d3c9d3ba46f64 Mon Sep 17 00:00:00 2001 From: chiguyong Date: Thu, 25 Jun 2026 01:25:06 +0800 Subject: [PATCH] docs: add bitable security/reliability patterns solution doc + CONCEPTS.md - docs/solutions/architecture-patterns/bitable-companion-service-security-reliability-patterns.md Knowledge-track doc capturing 10 security/reliability patterns from the bitable companion service (SSRF prevention, SQL injection, IDOR, atomic task claiming, cache invalidation, composite cursor, batch ops, async I/O safety, OOM prevention, internal token auth) - CONCEPTS.md Seeded with 3 core domain nouns: Bitable, Field Ownership, Recalc - AGENTS.md Added discoverability tips for docs/solutions/ and CONCEPTS.md --- AGENTS.md | 2 + CONCEPTS.md | 14 + ...n-service-security-reliability-patterns.md | 432 ++++++++++++++++++ 3 files changed, 448 insertions(+) create mode 100644 CONCEPTS.md create mode 100644 docs/solutions/architecture-patterns/bitable-companion-service-security-reliability-patterns.md diff --git a/AGENTS.md b/AGENTS.md index 8cbee10..2ad4c8a 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -196,6 +196,8 @@ CLI 参数 > `agentkit.yaml` > 环境变量(`${VAR:-default}`)> `.env` > 硬 - 测试文件:`tests/unit/` 和 `tests/integration/` - 前端 stores:Pinia,每个领域一个(chat、team、settings) - 前端组件:`src/agentkit/server/frontend/src/components/` +- 知识库:`docs/solutions/` — 已记录的问题解决方案(bug 修复、最佳实践、架构模式),按类别组织,YAML frontmatter 可搜索(`module`、`tags`、`problem_type`)。在已记录的领域内实现或调试时相关。 +- 领域词汇:`CONCEPTS.md` — 项目共享领域词汇表(实体、命名流程、状态概念),在熟悉代码库或讨论领域概念时相关。 ## 边界 diff --git a/CONCEPTS.md b/CONCEPTS.md new file mode 100644 index 0000000..d8b4b07 --- /dev/null +++ b/CONCEPTS.md @@ -0,0 +1,14 @@ +# Concepts + +Shared domain vocabulary for this project — entities, named processes, and status concepts with project-specific meaning. Seeded with core domain vocabulary, then accretes as ce-compound and ce-compound-refresh process learnings; direct edits are fine. Glossary only, not a spec or catch-all. + +## Bitable + +### Bitable +A companion service providing multi-dimensional table storage (structured records with typed fields, views, and formula columns) that runs alongside the main AgentKit server. Owns its own PostgreSQL schema and exposes a REST API at `/api/v1/bitable`. Agents interact with it over HTTP using an internal service token; end-users interact via JWT auth. + +### Field Ownership +The model controlling which actor may modify a field's definition or values. Every field has an owner of either `agent` or `user`. Agent-owned fields are written by the BitableTool during data ingestion; user-owned fields are edited by humans in the UI. Upsert operations update only agent-owned fields via `jsonb_set` — user-owned fields are never overwritten by agent writes. + +### Recalc +The background process that recomputes formula field values after source data changes. When a record's data columns are written, a recalc task is enqueued; a RecalcWorker claims tasks atomically (FOR UPDATE SKIP LOCKED), evaluates the formula DAG in topological order, and writes results back. The reaper resets stale `calculating` tasks to `pending` for crash recovery. diff --git a/docs/solutions/architecture-patterns/bitable-companion-service-security-reliability-patterns.md b/docs/solutions/architecture-patterns/bitable-companion-service-security-reliability-patterns.md new file mode 100644 index 0000000..f8c9063 --- /dev/null +++ b/docs/solutions/architecture-patterns/bitable-companion-service-security-reliability-patterns.md @@ -0,0 +1,432 @@ +--- +title: "Bitable Companion Service: Security and Reliability Patterns" +date: 2026-06-25 +category: docs/solutions/architecture-patterns/ +module: bitable +problem_type: architecture_pattern +component: service_object +severity: high +applies_when: + - Building a companion service that runs alongside a main server with its own schema and REST API boundary + - Accepting user-supplied URLs for server-side fetching (SSRF prevention required) + - Interpolating dynamic identifiers into SQL (field_id, table_id validation before interpolation) + - Multi-tenant resource access where every table/record endpoint must verify ownership (IDOR) + - Concurrent background workers claiming tasks from a shared queue (atomic claiming) + - Caching computed values (formula engines) that must invalidate on schema changes + - Paginating with non-id sort columns requiring stable composite cursors + - Bulk data import/export paths that risk OOM without streaming and batching + - Mixing synchronous libraries (openpyxl, pandas) into an async server (asyncio.to_thread) + - Service-to-service authentication between companion and main server (internal token) +tags: [security, ssrf, sql-injection, idor, concurrency, cache-invalidation, pagination, batch-operations, async-io, internal-auth, oom-prevention, companion-service, bitable] +--- + +# Bitable Companion Service: Security and Reliability Patterns + +## Context + +The Fischer AgentKit project (Python 3.11+, FastAPI, SQLAlchemy 2 async, Vue 3) introduced a **Bitable** (multi-dimensional table) companion service to give agents and end-users a structured-record store alongside AgentKit's existing chat, memory, and orchestration layers. Unlike a greenfield module, the Bitable service had to slot into a deployed system that already runs an HTTP API, a PostgreSQL database, and a Redis bus — and it had to do so without weakening AgentKit's existing security posture. + +During `ce-code-review` and `ce-debug` of the new service, multiple security and reliability issues surfaced. They were not exotic bugs; they were the standard failure modes that show up whenever a CRUD service with URL-based ingestion, dynamic SQL filtering, file uploads, and background workers is bolted onto an existing platform: + +- SSRF exposure from a "fetch Excel from URL" feature +- SQL injection through field IDs interpolated into JSONB path literals +- IDOR (insecure direct object reference) on every table-level endpoint +- Race conditions when multiple recalc workers pull from the same queue +- Stale formula-engine caches after field schema changes +- Broken pagination when sorting by non-id columns +- N+1 queries on batch lookups and upserts +- Event-loop blocking from sync Excel/DB-reflect calls +- OOM risk from unbounded uploads and bulk deletes + +This document captures the architecture patterns and security/reliability practices applied during the fix, so the next companion service (or the next reviewer of this one) starts from a known-good baseline rather than re-deriving each rule from first principles. + +## Guidance + +The patterns below are organized by the concern they address. Each is shown with the actual code shape used in the Bitable service. + +### 1. Companion Service Isolation + +A companion service should be **isolated at the data, API, and auth layers**, even when co-deployed with the host system. Sharing tables or in-process calls saves a few lines today and couples failure domains forever. + +**Data isolation.** The Bitable service owns its own `bitable` PostgreSQL schema. No AgentKit core table is read or written by Bitable code, and vice versa. A bug in either system cannot corrupt the other's data. + +**API boundary.** Even though Bitable ships inside the same FastAPI app, agents and core code call it over HTTP at `/api/v1/bitable`. This keeps the boundary explicit, makes the service trivially extractable later, and forces every caller through the same auth and validation path. The lifespan init in `src/agentkit/server/app.py` mounts the Bitable router and initializes its engine/schema alongside the core app, but the route prefix is the contract. + +**Internal token auth.** Agent-to-Bitable calls use a separate `X-Internal-Token` header verified with `hmac.compare_digest` (constant-time, side-channel resistant). This is distinct from the user-facing JWT auth — internal service callers get a bypass on ownership checks (see pattern 4) but never on validation. + +```python +# src/agentkit/server/routes/bitable.py +async def require_bitable_auth(request: Request) -> dict[str, Any]: + internal_token = getattr(request.app.state, "bitable_internal_token", None) + if internal_token: + provided = request.headers.get("X-Internal-Token", "") + if provided and hmac.compare_digest(provided, internal_token): + return {"user_id": "__bitable_internal__", "internal": True} + user = await get_current_user(request) + if user is None: + raise HTTPException(status_code=401, detail="Authentication required") + return user +``` + +### 2. SSRF Prevention in URL-based Ingestion + +Any feature that fetches a user-supplied URL is an SSRF vector. The standard defense is: validate scheme, resolve hostname, reject private/loopback/reserved IPs, and **re-validate on every redirect hop**. The Bitable `parse_excel_url` flow implements all four. + +```python +# src/agentkit/bitable/ingestion/excel.py +def _assert_safe_host(host: str) -> None: + try: + addr = ipaddress.ip_address(host) + except ValueError: + infos = socket.getaddrinfo(host, None) + for info in infos: + ip_str = info[4][0] + try: + addr = ipaddress.ip_address(ip_str) + except ValueError: + continue + if _is_unsafe_ip(addr): + raise ValueError(f"Host {host!r} resolves to private/reserved IP {addr}") + return + if _is_unsafe_ip(addr): + raise ValueError(f"Host {host!r} is a private/loopback/reserved IP: {addr}") + +def _is_unsafe_ip(addr: ipaddress._BaseAddress) -> bool: + return ( + addr.is_private or addr.is_loopback or addr.is_link_local + or addr.is_reserved or addr.is_multicast or addr.is_unspecified + ) +``` + +Two details matter: + +1. **Check every resolved address, not just the first.** DNS can return multiple A records; an attacker may mix a public and a private IP. Reject if *any* resolved address is unsafe. +2. **Follow redirects manually with re-validation.** `httpx.get(url, follow_redirects=True)` silently follows to `http://169.254.169.254/...`. The ingestion code disables auto-redirect, inspects the `Location` header, and runs `_assert_safe_host` on each hop before fetching. + +### 3. SQL Injection Prevention via Field ID Validation + +The Bitable repository interpolates field IDs into JSONB path literals (`values->>'field_id'`). Because these are string interpolations into SQL fragments, an unvalidated `field_id` is a direct injection vector. The defense is **allow-list validation at the service layer**: before any filter or sort reaches the repository, the service confirms every `field_id` against the table's actual fields. + +```python +# src/agentkit/bitable/service.py — list_records_filtered +if filters or sorts: + fields = await self._repo.list_fields(table_id) + valid_field_ids = {f.id for f in fields} + for f in filters or []: + fid = f.get("field_id", "") + if fid not in valid_field_ids: + raise ValueError(f"Unknown field_id in filter: {fid}") +``` + +This is stronger than parameterization alone: parameterization prevents injection but still lets a caller reference a field from a *different* table if the ID leaks. The allow-list check enforces that the field belongs to the table being queried. + +### 4. IDOR Protection via Ownership Checks + +Every table-level endpoint (`GET /tables/{id}`, `POST /tables/{id}/records`, etc.) is an IDOR risk if it accepts an ID and returns data without checking ownership. The Bitable service centralizes this in a `_check_table_ownership` helper called at the top of every table-scoped route. + +```python +# src/agentkit/server/routes/bitable.py +async def _check_table_ownership(service, table_id, user): + table = await service.get_table(table_id) + if table is None: + raise HTTPException(status_code=404, detail="Table not found") + if user.get("internal"): + return # Internal service token bypasses ownership + if table.owner_user_id and table.owner_user_id != user.get("user_id"): + raise HTTPException(status_code=403, detail="Not authorized") +``` + +Two design choices are load-bearing: + +- **404 before 403.** Returning 404 for a table the caller doesn't own prevents existence disclosure. An attacker probing IDs cannot distinguish "does not exist" from "exists but not yours." +- **Internal bypass is explicit and scoped.** Internal service callers (pattern 1) skip ownership but still go through validation. The bypass is a single `if user.get("internal"): return` — never a blanket `if internal: skip_all_checks`. + +### 5. Atomic Task Claiming with FOR UPDATE SKIP LOCKED + +When multiple recalc workers pull from the same `recalc_queue` table, a naive `SELECT ... WHERE status='pending' LIMIT 1` lets two workers grab the same row. The fix is a single atomic claim query using `FOR UPDATE SKIP LOCKED`: + +```python +# src/agentkit/bitable/repository.py — claim_recalc_tasks +subq = ( + select(RecalcQueueModel.id) + .where(RecalcQueueModel.status == RecalcStatus.pending.value) + .order_by(RecalcQueueModel.queued_at) + .limit(limit) + .with_for_update(skip_locked=True) +).subquery() +stmt = ( + update(RecalcQueueModel) + .where(RecalcQueueModel.id.in_(select(subq.c.id))) + .values(status=RecalcStatus.calculating.value) + .returning(RecalcQueueModel) +) +``` + +`SKIP LOCKED` means a row already locked by another worker is invisible to this query, so concurrent workers naturally partition the queue without coordination. The `UPDATE ... RETURNING` wraps the claim and the state transition in one statement, so there is no window where a row is "claimed but still pending." + +### 6. Formula Engine Cache Invalidation + +The `RecalcWorker` caches a `FormulaEngine` per `table_id` (`_engines`). Without invalidation, a field rename or type change leaves the worker computing against a stale schema. The service calls `worker.invalidate_engine(table_id)` on every field create/update/delete. + +```python +# src/agentkit/bitable/service.py — _invalidate_engine_cache +def _invalidate_engine_cache(self, table_id: str) -> None: + if self._recalc_worker is not None: + self._recalc_worker.invalidate_engine(table_id) +``` + +The construction-order wrinkle: the service creates the worker, but the worker needs a back-reference to the service for recalc triggers. This cycle is broken by `set_recalc_worker()` — the service constructs the worker, then registers itself. Cache invalidation is the service calling *into* the worker; recalc triggering is the worker calling *into* the service. Both directions work because registration happens after construction. + +### 7. Composite Cursor Pagination for Non-id Sorts + +When a list endpoint sorts by a field value (not the primary key), the naive `WHERE id > cursor` breaks: the cursor is an id, but the sort is by `sort_val`, so the next page is wrong. The fix is a **composite cursor**: base64-encoded JSON carrying both the last row's id and its sort values. + +```json +{"id": "rec_abc", "sv": ["2026-06-20", "high"]} +``` + +The repository decodes the cursor and uses a row-value comparison to fetch the next page: + +```sql +-- ASC sort +WHERE (sort_col1, sort_col2, id) > (cursor_val1, cursor_val2, cursor_id) +-- DESC sort +WHERE (sort_col1, sort_col2, id) < (cursor_val1, cursor_val2, cursor_id) +``` + +The `id` is always the final tie-breaker so pagination is stable even when sort values collide. This lives in `src/agentkit/bitable/repository.py` (`list_records_filtered`). + +### 8. Batch Operations to Eliminate N+1 + +Three batch patterns eliminate the most common N+1 loops in the service: + +- **`find_records_by_pk_batch`**: one `SELECT ... WHERE pk = ANY(:pk_values)` instead of N lookups. +- **`create_records_batch`**: `insert().values(rows).returning()` for bulk insert with returned rows. +- **`upsert_records`**: one `SELECT` to load existing records by primary key, then partition the input into insert and update lists — two queries total, not N+1. + +```python +# src/agentkit/bitable/service.py — upsert_records (batch pattern) +pk_values_by_str = {str(rec_values.get(pk_field_id)): rec_values for rec_values in records} +existing_map = await self._repo.find_records_by_pk_batch( + table_id, primary_key_field_id, list(pk_values_by_str.keys()) +) +for pk_str, rec_values in pk_values_by_str.items(): + existing = existing_map.get(pk_str) + if existing is None: + to_insert.append(rec_values) + else: + to_update.append((rec_values, existing.id)) +``` + +### 9. Async I/O Safety + +Sync I/O must never run on the event loop thread. The Bitable tool layer wraps every sync call in `asyncio.to_thread()`: + +```python +# src/agentkit/tools/bitable_tool.py +if file_path: + sheets = await asyncio.to_thread(parse_excel, file_path) +else: + sheets = await asyncio.to_thread(parse_excel_url, file_url) +``` + +This covers `parse_excel` (openpyxl is sync), `parse_excel_url` (httpx sync client), and `import_db_table` (SQLAlchemy reflection is sync). Without `to_thread`, a single slow Excel parse blocks every other request on the worker. + +### 10. OOM Prevention + +Three bounds keep memory predictable under load: + +- **Streaming file upload.** The upload endpoint reads the body in 64KB chunks and rejects the request the moment the running size exceeds 10MB — no buffering the whole file first. +- **Batch delete via cursor iteration.** Attachment cleanup on table delete iterates records with cursor pagination and deletes attachments per page, instead of `SELECT *` loading every record into memory. +- **Batch insert in 500-row chunks.** `create_records_batch` splits large inputs into 500-row chunks to keep statement size and memory bounded. + +```python +# src/agentkit/server/routes/bitable.py — streaming upload +total_size = 0 +with open(file_path, "wb") as f: + while True: + chunk = await file.read(64 * 1024) # 64KB chunks + if not chunk: + break + total_size += len(chunk) + if total_size > MAX_UPLOAD_SIZE: + f.close() + file_path.unlink(missing_ok=True) + raise HTTPException(status_code=413, detail="File exceeds 10 MB limit") + f.write(chunk) +``` + +## Why This Matters + +These patterns are not theoretical. Each maps to a concrete failure mode observed in review or debug: + +| Pattern | Failure it prevents | +|---|---| +| Service isolation | A Bitable bug corrupting AgentKit core tables; an AgentKit outage taking down Bitable | +| SSRF prevention | Cloud metadata endpoint exfiltration (`169.254.169.254`); internal port scanning | +| Field ID validation | SQL injection via `values->>'field_id'` path literals | +| Ownership checks | IDOR — user A reading user B's tables by guessing IDs | +| `SKIP LOCKED` claiming | Duplicate recalc execution; lost tasks under worker contention | +| Cache invalidation | Stale formula results after field schema changes — silent data corruption | +| Composite cursor | Broken pagination — skipped or duplicated rows on non-id sorts | +| Batch operations | N+1 queries turning a 1000-record import into 1001 round-trips | +| `asyncio.to_thread` | Event-loop stalls blocking every concurrent request | +| OOM bounds | Worker killed mid-request on a 500MB upload or a million-row delete | + +The cost of skipping any one of these is a security advisory or an outage. The cost of applying them is a few dozen lines per pattern, most of which are reusable across the next companion service. + +A second reason this matters: **the patterns compose**. Service isolation (1) is what makes the internal-token bypass (4) safe — the bypass is scoped to one service, not the whole app. Field ID validation (3) is what makes the composite cursor (7) safe — the cursor carries sort values that are already validated. `SKIP LOCKED` (5) is what makes the recalc cache invalidation (6) correct — without atomic claiming, two workers could invalidate and recompute the same engine concurrently. Treating these as independent checklists misses the dependencies. + +## When to Apply + +Apply these patterns whenever you are: + +- **Adding a companion service to an existing platform.** The isolation, internal-token, and ownership-check patterns apply regardless of the service's domain. +- **Building any feature that fetches a user-supplied URL.** SSRF prevention (scheme validation, IP allow-listing, manual redirect re-validation) is mandatory, not optional — even for "internal-only" features, because internal-only has a way of becoming external. +- **Interpolating user-controlled identifiers into SQL.** Field IDs, column names, and table names cannot be parameterized; they must be allow-listed against the actual schema. +- **Running background workers against a shared queue.** `FOR UPDATE SKIP LOCKED` is the correct default for any claim-then-process pattern on PostgreSQL. +- **Caching derived state tied to a mutable schema.** Any cache keyed by `table_id` (or equivalent) needs an invalidation hook on schema changes, and the hook must be called from the same code path that mutates the schema. +- **Paginating by a non-unique sort key.** A composite cursor with the primary key as the final tie-breaker is the general solution. +- **Doing bulk reads, writes, or deletes.** Batch the operation and bound the chunk size. +- **Calling sync libraries from an async framework.** `asyncio.to_thread` is the floor; an async-native client is better when available. + +These patterns do **not** apply when: + +- The service is a throwaway prototype with no external callers — isolation overhead is not justified. +- The URL fetch is hardcoded to a trusted internal endpoint with no user input — SSRF prevention is redundant (but document why). +- The sort is always by primary key — a simple `WHERE id > cursor` is correct and cheaper than a composite cursor. + +## Examples + +### Example 1: SSRF — before and after + +**Before (vulnerable):** + +```python +def parse_excel_url(url: str) -> Workbook: + resp = httpx.get(url, follow_redirects=True) # follows redirects silently, no IP check + return parse_excel_bytes(resp.content) +``` + +An attacker submits `http://internal-service.attacker.com/redirect` which 302s to `http://169.254.169.254/latest/meta-data/iam/security-credentials/`. The server happily fetches cloud metadata. + +**After (hardened):** + +```python +def parse_excel_url(url: str) -> list[ParsedSheet]: + parsed = urlparse(url) + if parsed.scheme not in ("http", "https"): + raise ValueError(f"Disallowed URL scheme: {parsed.scheme!r}") + _assert_safe_host(parsed.hostname) + + resp = httpx.get(url, follow_redirects=False) + seen_redirects = 0 + while resp.is_redirect and seen_redirects < 5: + seen_redirects += 1 + next_url = httpx.URL(url).join(resp.headers["location"]) + if next_url.scheme not in ("http", "https") or not next_url.host: + raise ValueError(f"Unsafe redirect target: {next_url}") + _assert_safe_host(next_url.host) + resp = httpx.get(next_url, follow_redirects=False) + url = str(next_url) + return parse_excel_bytes(resp.content) +``` + +Every hop is re-validated. A redirect to a private IP raises before the fetch. + +### Example 2: IDOR — before and after + +**Before (vulnerable):** + +```python +@router.get("/tables/{table_id}/records") +async def list_records(table_id: str, user=Depends(get_user)): + return await service.list_records(table_id) # no ownership check +``` + +User B passes user A's `table_id` and reads all records. + +**After (hardened):** + +```python +@router.get("/tables/{table_id}/records") +async def list_records(table_id: str, user=Depends(require_bitable_auth)): + await _check_table_ownership(service, table_id, user) + return await service.list_records(table_id) +``` + +The ownership helper returns 404 (not 403) for tables the caller doesn't own, so existence is not disclosed. + +### Example 3: N+1 — before and after + +**Before (N+1):** + +```python +async def upsert_records(table_id, records): + for r in records: + existing = await repo.find_record_by_pk(table_id, r.pk) # N queries + if existing: + await repo.update_record(existing.id, r) # N queries + else: + await repo.create_record(table_id, r) # N queries +``` + +1000 records → 2000+ queries. + +**After (batched):** + +```python +async def upsert_records(table_id, records, pk_field_id): + pk_values = [str(r.get(pk_field_id)) for r in records] + existing_map = await repo.find_records_by_pk_batch(table_id, pk_field_id, pk_values) # 1 query + to_insert = [r for r in records if str(r.get(pk_field_id)) not in existing_map] + to_update = [(r, existing_map[str(r.get(pk_field_id))].id) for r in records if str(r.get(pk_field_id)) in existing_map] + for r in to_insert: + await repo.create_record(table_id, r) + for r, eid in to_update: + await repo.upsert_record_agent_fields(eid, agent_values) +``` + +1000 records → 1 SELECT + N INSERT/UPDATE (batch INSERT available via `create_records_batch`). + +### Example 4: Composite cursor — before and after + +**Before (broken on non-id sort):** + +```python +# sort by "priority" field, paginate by id +stmt = text("SELECT * FROM bitable_records WHERE table_id = :tid ORDER BY values->>'priority' LIMIT 50") +if cursor: + stmt = stmt.where(text("id > :cursor")) # WRONG: cursor is id, sort is priority +``` + +Page 2 returns rows with `id > last_id` but sorted by priority — rows with high priority but low id are skipped. + +**After (composite cursor):** + +```python +# cursor = base64({"id": "rec_abc", "sv": ["high"]}) +if cursor: + decoded = json.loads(base64.b64decode(cursor)) + cursor_id = decoded["id"] + cursor_sort_vals = decoded["sv"] + # Row-value comparison: (priority, id) > (cursor_val, cursor_id) for ASC + where_clauses.append( + f"(values->>'priority', id) > (:csv0, :cursor_id)" + ) +``` + +Pagination is stable regardless of sort direction or value collisions. + +## Related + +- **`docs/plans/2026-06-24-001-feat-bitable-companion-service-plan.md`** — implementation plan with KTD7 (formula AST security), KTD11 (internal token auth), batch chunking, async recalc worker design. +- **`docs/brainstorms/2026-06-24-bitable-module-requirements.md`** — origin requirements defining companion service architecture, upsert semantics, field ownership model. +- **`docs/plans/2026-06-14-001-feat-p0-production-hardening-plan.md`** — cross-cutting SSRF protection patterns (KTD4: protocol whitelist + private IP range blocking). +- **`docs/plans/2026-06-10-018-fix-agentkit-p2-hardening-plan.md`** — SSRF DNS rebinding, `hmac.compare_digest`, concurrency safety patterns. +- **AgentKit AGENTS.md** — project-level rules on async generator safety, `hmac.compare_digest` for API key comparison, and the `HandoffTransport` bounded-queue sentinel pattern. +- **PostgreSQL docs — `FOR UPDATE SKIP LOCKED`** — canonical reference for concurrent queue claiming. +- **OWASP A01:2021 — Broken Access Control** — covers the IDOR pattern addressed by ownership checks. +- **OWASP A10:2021 — Server-Side Request Forgery** — covers the URL-ingestion SSRF pattern. + +Future companion services in this repo (a vector store, a workflow runner, etc.) should be reviewed against this checklist before merge. The patterns are domain-agnostic; only the field names change.