fischer-agentkit/docs/plans/2026-06-20-002-feat-central...

1692 lines
94 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Fischer AgentKit — Centralized Auth & Token Persistence (Plan)
**Date:** 2026-06-20
**Status:** active
**Branch:** `feat/auth-server-token-persistence`(原 `feat/centralized-auth-token-persistence`
**Type:** feat
**Origin:** [docs/brainstorms/2026-06-20-centralized-auth-token-persistence-requirements.md](docs/brainstorms/2026-06-20-centralized-auth-token-persistence-requirements.md)
> **2026-06-20 更新**:合并 AuthProvider 抽象层 scopeorigin §5.5),新增 KTD-10、U1 `auth_provider` 字段、U3/U4 改造点、U11 实施单元、Phase 6、AE-10/AE-11。
---
## Summary
Replace the current minimal JWT + localStorage auth with a production-grade scheme: server-side **session table** (track every login, enable forced revocation), **Tauri OS Keychain** storage for refresh tokens (encrypted at rest), **refresh token rotation** (defense against token leakage), **pre-emptive token refresh** (no 401 storms), a **three-state startup** (valid / invalid / error), and an **AuthProvider 抽象层** that decouples routes / admin API / session table from the concrete auth backend (Local today; OIDC / SAML / LDAP tomorrow via adapter). Goal: after first login, Tauri cold-start goes directly to the main app, no login page; admin can see/force-revoke any user's active sessions; password change instantly invalidates all other devices; future enterprise IdP integration requires no rewrite of the routing or admin layer.
---
## Problem Frame
The current auth flow has three structural gaps:
1. **Token at rest in plaintext**`access_token`, `refresh_token`, and `user` are stored unencrypted in WebView localStorage (`~/Library/WebKit/.../LocalStorage/` on macOS). Any process with file access can read them.
2. **No revocation surface**`user_sessions` table only stores a refresh-token hash with `revoked_at`. There is no device fingerprint, no IP, no "kick this session" admin endpoint, no "change password → kick everywhere" flow. Sessions outlive the user's intent.
3. **No rotation** — the same refresh token can be used for the full 7-day window. If leaked, the attacker has a week of access with no detection.
The user's primary stated need is "after I log in once, subsequent app opens should go straight to the main app." The current code attempts this via localStorage rehydration, but two failure modes break it: (a) refresh hits `_refreshFailed` and the auth store clears itself; (b) when the access token expires mid-session and refresh fails (server restart, network blip), the store clears and the user is bounced to `/login`. We need both stronger local persistence and server-side session awareness to make this experience reliable.
The secondary stated needs are **"集团统一管理"** (centralized enterprise management) and **"和集团的账号密码对接"** (eventual IdP integration). Without a session table and admin endpoints, an admin cannot: see who is logged in, force-logout a lost device, or ensure that a compromised employee is immediately removed from all devices. The session table is the same data model an IdP would feed. To keep the future IdP integration from requiring a routing / admin rewrite, the auth backend must be **pluggable behind an `AuthProvider` Protocol** (see KTD-10 and U11). Local today, OIDC tomorrow — without touching routes or admin code.
---
## Scope Boundaries
### In Scope
- New `auth_sessions` SQLAlchemy model + table + Alembic migration
- JWT payload extended with `sid` (session id) and `jti` (token id); session validation on every request
- Refresh token rotation on every `/auth/refresh` call; old token enters a 30s short-window denylist
- Refresh-token reuse detection → revoke **all** sessions for that user (defense against token theft)
- New endpoints: `GET /auth/whoami`, `GET /auth/sessions`, `DELETE /auth/sessions/{id}`, `POST /auth/logout-others`, `POST /auth/change-password`, `GET /admin/users/{id}/sessions`, `DELETE /admin/users/{id}/sessions/{sid}`
- Active session cap = 10 per user; login that would exceed the cap evicts the oldest non-current session
- "Remember me" login option: refresh TTL = 30 days (vs default 7 days)
- Tauri Rust commands: `store_refresh_token` / `load_refresh_token` / `clear_refresh_token` using the `keyring` crate (macOS Keychain / Windows Credential Manager / Linux Secret Service)
- Frontend `tauri-auth.ts` adapter with localStorage fallback when Keychain is unavailable
- Frontend auth-store: 3-state startup (`valid` / `invalid` / `error`), pre-emptive refresh when access expires in <2 min, no localStorage write of access token
- Frontend "Remember me" checkbox on `LoginView`
- Frontend "Active sessions" management UI in `SettingsView` (list current devices, kick others)
- Admin UI: see any user's active sessions, kick any session
- Backwards-compat for one minor version: old clients without `sid` claim still work via `user_sessions` table fallback
- **AuthProvider 抽象层** (`auth/providers/base.py` Protocol + `LocalAuthProvider` + `StubOIDCProvider`) routes / admin / SessionService 通过 `Depends(get_auth_provider)` 拿到 provider切换 IdP 不重写路由
- `auth_sessions` `auth_provider` 字段记录登录来源`local` / `oidc-stub` / 未来 `oidc-keycloak` / `saml` / `ldap`
- 配置开关 `auth.provider: local | oidc-stub`agentkit.yaml未来加新 provider 只需新 adapter
### Out of Scope (deferred to follow-up work)
- Enterprise IdP / SSO (OIDC / SAML / LDAP / 飞书 / 钉钉 / 企微) separate brainstorm
- 2FA / TOTP / WebAuthn / Passkey separate brainstorm
- Multi-tenant / org isolation separate brainstorm
- Password strength policy / password expiry / password history separate IAM brainstorm
- Login failure lockout / sliding-window rate-limit separate security brainstorm
- Email / SMS notifications for reuse detection requires notification service
- Full audit log search / export separate observability brainstorm
- Per-session device "trust" flag (e.g. "this Mac is trusted for 90 days") defer until IdP work
### Resolved Decisions (locked in from the brainstorm)
| # | Question | Decision |
|---|----------|----------|
| 1 | Remember me TTL | 30 days (vs default 7 days) |
| 2 | Active session cap | 10 per user, evict oldest non-current on overflow |
| 3 | Tauri Keychain unavailable behavior | Silently fall back to localStorage, log warning |
---
## Requirements (carried from origin)
The plan must satisfy all of the following origin IDs (see [requirements doc](docs/brainstorms/2026-06-20-centralized-auth-token-persistence-requirements.md)):
- **F1** First login cold-start app goes directly to main UI, never shows login
- **F2** "Remember me" toggle: 7d / 30d refresh TTL
- **F3** Tauri: refresh token stored in OS Keychain, never on localStorage disk
- **F4** Web: refresh token in localStorage (degraded security, accepted)
- **F5** Refresh token rotation: every `/auth/refresh` invalidates the old token
- **F6** Server: every login recorded with device/IP/time
- **F7** Admin: see any user's active sessions
- **F8** Admin / self: kick any session
- **F9** Password change: kick all other sessions
- **F10** Pre-emptive refresh when access expires in <2 min
- **F11** Startup distinguishes `valid` / `invalid` / `error`
- **F12** Multiple Tauri / Web clients can log in the same user simultaneously (independent sessions)
- **F13** AuthProvider 可插拔`auth.provider` 配置切换 local oidc-stub路由/Admin/Session 表零修改
- **F14** admin 端点与认证后端解耦未来切 IdPadmin session 列表 / 踢人功能不变
- **F15** 审计日志记录 `auth_provider` 字段登录来源可溯源
- **N1** Token validation P99 < 5ms (Redis cache for session metadata)
- **N5** All auth code has unit + integration tests
- **N6** Backwards-compat for old clients (1 minor version)
---
## Key Technical Decisions
### KTD-1: `auth_sessions` table (new) vs extending `user_sessions` (existing)
**Decision**: Create a new `auth_sessions` table; deprecate `user_sessions` over 1 minor version.
**Rationale**: `user_sessions` only stores `refresh_token_hash` + `revoked_at` (3 fields). The new design needs `device_fingerprint`, `device_label`, `ip`, `user_agent`, `last_active_at`, `expires_at`, `revoked_reason`, `previous_session_id`. Adding 7 columns to an existing table breaks its existing semantics (the table is also referenced in production hardening tests). Clean break with migration is safer than schema bloat.
**Trade-off**: Two-table coexistence during the deprecation window. Mitigated by: keep `user_sessions` reads working for clients without `sid` claim (N6).
### KTD-2: Session validation on every request (not just refresh)
**Decision**: `get_current_user` dependency reads `sid` from JWT, queries `auth_sessions` table (with Redis cache, 60s TTL) to confirm `revoked=False` and `expires_at > now`.
**Rationale**: Without this, a kicked-out user keeps their access token for up to 15 min (access TTL). With it, the kicked session is dead on the next request. The cost is +1 DB/cache lookup per request; cache makes this sub-ms.
**Trade-off**: One cache miss per request adds ~5ms; with the 60s cache the actual DB query rate is ~1/min/active-session.
### KTD-3: Refresh token rotation + 30s denylist
**Decision**: Every successful `/auth/refresh` issues a new refresh token. The old token's hash is added to an in-memory + Redis denylist for 30 seconds. If the old token is reused within that window `TokenReuseDetected` revoke ALL sessions for that user.
**Rationale**: Industry standard (Auth0, Okta, AWS). Closes the window where an attacker who captured the old token can still use it after the legitimate user has refreshed.
**Trade-off**: The 30s window is a small UX cost (concurrent refresh calls from the same client during retry) but acceptable; legitimate retries complete in <1s and don't hit the window.
### KTD-4: Tauri Keychain via `keyring` crate (not `tauri-plugin-stronghold`)
**Decision**: Use the `keyring` crate directly. It provides unified API across macOS Keychain, Windows Credential Manager, and Linux Secret Service with a single dependency.
**Rationale**: `tauri-plugin-stronghold` is a Tauri-team plugin but the v2 ecosystem is still maturing and the docs lag. `keyring` is the de-facto Rust standard for OS credential storage, used by `cargo`, `git-credential-manager`, and others. Smaller surface, fewer moving parts.
**Trade-off**: We write 3 small Tauri commands (`store_refresh_token` / `load_refresh_token` / `clear_refresh_token`) instead of using a plugin's auto-generated bindings. ~50 lines of Rust.
### KTD-5: Access token in memory only (not persisted)
**Decision**: Access token lives only in the auth store's reactive `ref<string | null>`. Never written to localStorage or Keychain.
**Rationale**: Access tokens are short-lived (15 min). The cost of losing one (re-auth) is low; the security cost of persisting them (broader attack surface) is high. Refresh token is the only thing that needs durable storage.
**Trade-off**: App reload requires one refresh round-trip to get a new access token. Mitigated by the pre-emptive refresh + 3-state startup: by the time the app needs to call an API, the access token is already fresh.
### KTD-6: Redis cache for session metadata (not just in-memory)
**Decision**: Use Redis (when available) to cache `auth_sessions` rows by `sid`. Fallback to in-process LRU (size=1024) when Redis is unavailable.
**Rationale**: The Tauri sidecar may run without Redis (zero-config dev mode). In-process LRU gives the same hit rate for single-process deployments. When Redis IS available (server deployment, multi-instance), it's the right cross-process answer.
**Trade-off**: Two code paths. Mitigated by a small `SessionCache` interface with two impls.
### KTD-7: Session cap eviction strategy = LRU (oldest non-current)
**Decision**: When login would create the 11th session for a user, the oldest non-current session is `revoked` (with `revoked_reason='session_cap_eviction'`) before the new one is created.
**Rationale**: LRU is intuitive ("the device I haven't used in a month should be the first to go"). Kicking "current" is wrong because the user is actively logging in.
**Trade-off**: None meaningful. Cap=10 is generous; the eviction is invisible to all but the user on the kicked device.
### KTD-8: Pre-emptive refresh in `api/base.ts` interceptor (not in Pinia getter)
**Decision**: A request interceptor in `BaseApiClient` checks `shouldRefresh()` (access exp <2 min) BEFORE sending, and awaits `silentRefresh()` if needed.
**Rationale**: An interceptor guarantees the check runs for every request. A Pinia getter would only fire on `accessToken` access, which is not all requests (e.g. background fetches that don't read the getter).
**Trade-off**: One async function call before each request when expiring; negligible.
### KTD-9: Backwards-compat shim for old clients
**Decision**: `dependencies.py:get_current_user` accepts JWTs with or without `sid` claim. Missing `sid` fall back to `user_sessions.refresh_token_hash` validation. This path is logged and gated to one minor version.
**Rationale**: Avoids breaking in-flight clients. Lets us roll out gradually.
**Trade-off**: Two validation paths in `get_current_user`. Mitigated by extracting the session-lookup into a helper that both paths share.
### KTD-10: AuthProvider 抽象层(为未来 IdP 对接留扩展点)
**Decision**: 鉴权逻辑走 `auth/providers/base.py:AuthProvider` Protocol`name` / `authenticate` / `get_user_by_id` / `sync_user_attributes` / `revoke_user`路由层用 `Depends(get_auth_provider)` 注入当前默认 `LocalAuthProvider`封装 SQLite + bcrypt未来 `OidcAuthProvider` 接管时**路由 / admin / Session 表零修改**。`StubOIDCProvider` 作为占位`raise NotImplementedError`用于未来接口契约验证
**Rationale**: 用户明确"未来要和集团账号密码对接"OIDC / SAML / LDAP / 飞书 / 钉钉 / 企微)。如果现在把"用户存在哪里 / 密码怎么校验"写死在 routes/admin 未来切 IdP 必须重写所有路由层 + admin 端点提前抽象可以让未来 IdP 集成只需新增一个 adapter~300-500 不触及现有 routes / admin / SessionService。`auth_sessions` 表加 `auth_provider` 字段记录登录来源审计可溯源
**Trade-off**:
- 1 个抽象层`auth/providers/base.py` Protocol+ 1 DI 工厂`get_auth_provider`+ 1 StubOIDCProvider 占位
- 收益未来 IdP 集成不重写路由层 + admin APIadmin 踢人 / session 列表跨 provider 一致
**Alternatives considered**:
- 不预留扩展点只做当下 LocalAuthProvider未来切 IdP 必须重写 routes + admin + SessionService
- 直接实现 OIDC拉长本迭代 2-3
---
## High-Level Technical Design
### Component Map
```
┌─────────────────────────────────────────────────────────────────────┐
│ Tauri Desktop (macOS / Windows / Linux) │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ WebView (Vue 3 frontend) │ │
│ │ ┌────────────┐ ┌──────────────┐ ┌────────────────┐ │ │
│ │ │ auth store │──│ api/base.ts │──│ Pinia + Router │ │ │
│ │ │ (memory) │ │ interceptor │ │ │ │ │
│ │ └────────────┘ └──────────────┘ └────────────────┘ │ │
│ │ │ │ │ │
│ │ │ │ silentRefresh │ │
│ │ │ ▼ │ │
│ │ │ ┌──────────────────┐ │ │
│ │ │ │ tauri-auth.ts │ invoke() │ │
│ │ │ └──────────────────┘ │ │ │
│ │ │ localStorage fallback ▼ │ │
│ │ │ ┌──────────────────────┐ │ │
│ │ └─────────────────▶│ src-tauri/src/auth.rs│ │ │
│ │ │ keyring::Entry │ │ │
│ │ └──────────────────────┘ │ │
│ │ │ │ │
│ └───────────────────────────────────────┼────────────────┘ │
│ │ HTTP │
│ ▼ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ FastAPI server (Python sidecar) │ │
│ │ ┌────────────────────────┐ ┌──────────────────────┐ │ │
│ │ │ routes/auth.py │──▶│ auth/session.py │ │ │
│ │ │ + admin routes │ │ - create / rotate │ │ │
│ │ │ Depends(get_auth_ │ │ - revoke / kick │ │ │
│ │ │ provider) ─────┼──▶│ - reuse detection │ │ │
│ │ └────────────────────────┘ └──────────────────────┘ │ │
│ │ │ │ │ │
│ │ ▼ ▼ │ │
│ │ ┌────────────────────────┐ ┌──────────────────────┐ │ │
│ │ │ auth/providers/ │ │ auth/models.py │ │ │
│ │ │ - base.py (Protocol) │ │ AuthSessionModel │ │ │
│ │ │ - local.py (Local) │ │ + auth_provider col │ │ │
│ │ │ - oidc_stub.py (stub) │ └──────────────────────┘ │ │
│ │ │ get_auth_provider() DI │ │ │
│ │ └────────────────────────┘ │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ ┌─────────────────────────────────────────────┐ │ │
│ │ │ auth/cache.py (Redis or in-process LRU) │ │ │
│ │ └─────────────────────────────────────────────┘ │ │
│ └──────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────┐ │
│ │ data/auth.db (SQLite) │ │
│ │ + auth_sessions table │ │
│ └──────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────┘
```
### State Machine — Client Auth
```
┌──────────────┐
app start │ │ valid refresh
────────────▶│ STARTUP │────────────────────▶ READY
│ │ │ │
└──────────────┘ │ │
│ │ │ │ │
invalid ────┘ │ └──── network err │ │
▼ │ │
┌──────────────┐ │ │
│ ERROR │ retry ──────┐ │ │
│ "刷新" │ │ │ │
└──────────────┘ │ │ │
▼ │ │ │
┌──────────────┐ │ │ │
│ INVALID │ retry ─────┤ │ │
│ "请重登" │ │ │ │
└──────────────┘ │ │ │
│ │ │
┌─────────────────────────┘ │ │
│ │ │
│ 401 in flight │ │
│ ◀──────────────────────────────────┘ │
│ │
▼ │
┌──────────────┐ │
│ silentRefresh│ │
└──────────────┘ │
│ │
ok ◀──┴──▶ fail → back to STARTUP / INVALID
```
### State Machine — Server Session
```
login
┌────────────┐
│ CREATED │ sid in JWT
│ active │
└────────────┘
│ │ │
refresh ok ─────┘ │ └──── logout → REVOKED (user)
(rotated) │
└── admin / password change / reuse detected
→ REVOKED (system)
```
### Sequence — Cold Start (Tauri)
```
Window opens
App.vue mounted
bootstrapBackend()
│ start_backend (sidecar)
│ health check
authStore.startupCheck()
├── 1. tauriAuthStorage.getRefreshToken()
│ Keychain (Tauri) → localStorage (Web fallback)
├── 2. GET /api/v1/auth/whoami (Authorization: Bearer <refresh>)
│ (the access token is gone, so we attach the refresh token;
│ the server uses a separate "whoami" code path that accepts
│ either type)
├── 3. response handling
│ 200 → { access_token, user } → state = VALID → /agent
│ 401 → state = INVALID → /login (with "会话已过期")
│ network err → state = ERROR → /login (with "无法连接")
Router beforeEach
│ state = VALID → next()
│ state != VALID → next('/login')
```
### Data Model — `auth_sessions` Table
```mermaid
erDiagram
auth_sessions {
TEXT id PK "uuid"
TEXT user_id FK
TEXT refresh_token_hash
TEXT device_fingerprint
TEXT device_label
TEXT ip
TEXT user_agent
TEXT created_at
TEXT last_active_at
TEXT expires_at
INTEGER revoked
TEXT revoked_reason
TEXT previous_session_id
}
users {
TEXT id PK
TEXT username
TEXT password_hash
...
}
auth_sessions }o--|| users : "user_id"
```
### Sequence — Refresh Token Rotation + Reuse Detection
```
Client Server
│ │
│ POST /auth/refresh │
│ { refresh_token: "old" } │
│ ───────────────────────────▶ │
│ │ decode old → sid
│ │ lookup auth_sessions[sid]
│ │ hash(old) == session.refresh_token_hash? NO
│ │ → denylist check: hash(old) in denylist?
│ │ YES → REUSE DETECTED
│ │ → revoke ALL sessions for this user
│ │ → audit log "reuse_detected"
│ │ ← 401 { error: "token_reuse_detected" }
│ client clears state, │
│ routes to /login │
│ │
│ -- legit refresh -- │
│ POST /auth/refresh │
│ { refresh_token: "valid" } │
│ ───────────────────────────▶ │
│ │ hash(valid) == session.refresh_token_hash? YES
│ │ rotate: session.refresh_token_hash = hash(new)
│ │ add hash(old) to denylist (30s)
│ │ issue new access + new refresh
│ │ ← 200 { access_token, refresh_token }
│ store new refresh in │
│ Keychain, access in memory │
```
---
## Implementation Units
### U1. Schema: AuthSessionModel + extended bootstrap + backfill
**Goal**: Add the `auth_sessions` table with all required fields and indexes, AND backfill existing `user_sessions` rows on first startup.
**Requirements**: F6, F15, N5, N6 (the table backs every session-aware endpoint; backfill prevents forced re-login; `auth_provider` field enables future IdP audit traceability).
**Dependencies**: None.
**Files**:
- `src/agentkit/server/auth/models.py` add `AuthSessionModel` (SQLAlchemy 2 typed) + extend `_SCHEMA_SQL` for direct aiosqlite init + add `_SCHEMA_VERSION = 2` constant + extend `init_auth_db()` to run the backfill
- `tests/unit/auth/test_models.py` model serialization + index smoke + backfill tests
**Approach (schema)**:
- Use UUID strings as PK (matches existing `users.id` style in this codebase)
- `device_info` is a JSON string (reuse pattern from `UserSessionModel.device_info`)
- `expires_at` is ISO-8601 string (matches `UserModel.last_login_at`)
- `revoked` is INTEGER (0/1) for SQLite compatibility
- Add the new `CREATE TABLE auth_sessions` block to `_SCHEMA_SQL` (line 234-242 is the current `user_sessions` block; append after it) with these indexes:
- `idx_auth_sessions_user_id_active` on `(user_id, revoked, expires_at)` supports the cap-count query and the list-active query
- `idx_auth_sessions_expires_at` on `(expires_at)` supports cleanup sweeps
- `idx_auth_sessions_refresh_token_hash` on `(refresh_token_hash)` unique
- `idx_auth_sessions_auth_provider` on `(auth_provider)` supports future IdP "list sessions by provider" query
- **Add `auth_provider` column** (NEW per KTD-10): `TEXT NOT NULL DEFAULT 'local'` records which provider created the session. Values: `local` (current) / `oidc-stub` (future stub) / `oidc-keycloak` / `saml` / `ldap` (future real adapters). Backfilled rows get `'local'` via the default.
- Bump `_SCHEMA_VERSION = 2` (currently implicit; the existing `init_auth_db` is idempotent via `CREATE TABLE IF NOT EXISTS` so version is mostly for the backfill gate)
**Approach (backfill) — critical, was missing from the original plan**:
The current `routes/auth.py:201-213` writes to `user_sessions` on login. After the new schema lands, the new `SessionService.create_session` writes to `auth_sessions` instead. To prevent forcing every existing user to re-login on the deploy, `init_auth_db()` runs a **one-time backfill** on startup:
```python
async def _backfill_user_sessions(db: aiosqlite.Connection) -> int:
"""One-time backfill from user_sessions to auth_sessions.
Runs only when auth_sessions is empty AND user_sessions has rows.
Idempotent: subsequent restarts are no-ops.
"""
cursor = await db.execute("SELECT COUNT(*) FROM auth_sessions")
(count,) = await cursor.fetchone()
if count > 0:
return 0 # already backfilled
cursor = await db.execute(
"SELECT id, user_id, refresh_token_hash, device_info, created_at, expires_at, revoked_at "
"FROM user_sessions WHERE revoked_at IS NULL"
)
rows = await cursor.fetchall()
backfilled = 0
for row in rows:
device_info = json.loads(row["device_info"]) if row["device_info"] else {}
# Use existing user_sessions.id as the auth_sessions.id so that
# legacy clients holding the old refresh_token_hash still match
# a row in the new table (this is what the back-compat path in
# U10 relies on).
await db.execute(
"INSERT OR IGNORE INTO auth_sessions "
"(id, user_id, refresh_token_hash, device_fingerprint, device_label, "
" ip, user_agent, created_at, last_active_at, expires_at, revoked, revoked_reason) "
"VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)",
(
row["id"], # reuse legacy id for back-compat
row["user_id"],
row["refresh_token_hash"],
device_info.get("fingerprint", "unknown"),
device_info.get("label", "Unknown device"),
device_info.get("ip", ""),
device_info.get("user_agent", ""),
row["created_at"],
row["created_at"], # last_active_at defaults to created_at
row["expires_at"],
0, # not revoked (already filtered)
None,
),
)
backfilled += 1
if backfilled:
logger.info(f"Backfilled {backfilled} user_sessions rows to auth_sessions")
return backfilled
```
**Approach (idempotency)**:
- The `INSERT OR IGNORE` on `auth_sessions.id PK` makes the backfill safe to re-run
- The `count > 0` early-exit means after the first backfill, subsequent startups are < 1ms
**Approach (rolled-back risk)**:
- The backfill does NOT delete `user_sessions` rows. They are kept for 1 minor version as the legacy read path. U10's Phase 5 cleanup drops the table.
**Test scenarios** (test_models.py):
- Create session, query by `sid`, find it
- Create 11 sessions for one user, count = 11 (cap check is in U3)
- Query `WHERE user_id=? AND revoked=0 AND expires_at > now` returns active sessions
- Index `(user_id, revoked, expires_at)` is present (verify via `PRAGMA index_list`)
- Index `idx_auth_sessions_auth_provider` is present
- **`auth_provider` column** tests (NEW per KTD-10):
- Default value is `'local'` when column is omitted from INSERT
- `WHERE auth_provider = 'local'` returns only local-created sessions
- `WHERE auth_provider = 'oidc-stub'` returns zero rows in current code
- **Backfill tests** (NEW):
- `init_auth_db` on a DB with `user_sessions` rows but empty `auth_sessions` backfills all non-revoked rows
- `init_auth_db` on a DB with existing `auth_sessions` rows does NOT re-backfill (idempotent)
- Backfilled rows have the original `user_sessions.id` as their `auth_sessions.id`
- Backfilled rows have `revoked=0`
- Backfilled rows have their `expires_at` preserved
- Backfill does NOT touch `user_sessions` rows that are already revoked (`revoked_at IS NOT NULL`)
**Verification**: `pytest tests/unit/auth/test_models.py -v` passes; `init_auth_db` runs cleanly on a copy of prod DB with the existing `user_sessions` table; backfill log line appears exactly once per fresh DB.
**Note on Alembic**: This codebase does **not** use Alembic. There is no `alembic.ini`, no `migrations/` directory, and no `alembic` dependency in `pyproject.toml`. The auth DB schema is managed via the `_SCHEMA_SQL` constant + `init_auth_db()` pattern (see `auth/models.py:202-333`). This U1 unit aligns with that pattern; the original plan's Alembic reference was incorrect.
---
### U2. JWT utils: sid + jti claims, dual decode path
**Goal**: Add `sid` and `jti` to issued JWTs; teach `verify_token` to read both old and new claim shapes.
**Requirements**: F5, F12, N6 (rotation + multi-client + backwards compat).
**Dependencies**: U1 (the `sid` references a row in `auth_sessions`).
**Files**:
- `src/agentkit/server/auth/jwt_utils.py` `create_token_pair(...)` now takes `session_id: str`; `verify_token(...)` returns decoded payload including `sid` + `jti`; back-compat: missing `sid` is logged at DEBUG and accepted (caller decides what to do)
- `src/agentkit/server/auth/denylist.py` new module: `RecentlyRevokedTokens` class backed by in-memory `OrderedDict` + Redis pub/sub for cross-process; `add(token_hash, ttl=30)`, `contains(token_hash) -> bool`
- `tests/unit/auth/test_jwt_utils.py` extend existing tests: round-trip with `sid`, decode legacy token, decode tampered token
**Approach**:
- `create_token_pair(user_id, session_id, ttl_pair)` `access` payload: `{sub, sid, jti, type, exp, iat}`; `refresh` payload: same minus `jti` (refresh tokens are long-lived; jti would be regenerated on every rotation, which is wasteful)
- `verify_token(token, expected_type)` return full payload dict; legacy payload (no `sid`) is preserved as-is, callers branch on `'sid' in payload`
- `RecentlyRevokedTokens` single-process `OrderedDict` keyed by SHA-256 hash, max 10k entries; `contains` is O(1); `add` evicts oldest if at capacity
- Redis adapter: `SADD` + `EXPIRE`; `SISMEMBER` for check; the in-process impl is the fallback when Redis is unavailable
**Test scenarios**:
- `create_token_pair(...)` produces tokens with `sid` and `jti` (access only)
- `verify_token` on a token without `sid` returns the payload unchanged (caller must handle)
- `verify_token` on an expired token raises `ExpiredSignatureError`
- `RecentlyRevokedTokens.add(hash, ttl)` + `contains(hash)` returns True within 30s, False after
- `RecentlyRevokedTokens` with 10001 entries evicts the oldest (capacity test)
- Redis adapter mock: `SADD` + `SISMEMBER` + `EXPIRE` called with correct args
**Verification**: `pytest tests/unit/auth/test_jwt_utils.py -v` passes; manual `curl` round-trip works against a running dev server.
---
### U3. Session service: CRUD + rotation + reuse detection
**Goal**: Centralize all session operations behind a `SessionService` class so routes don't duplicate the logic.
**Requirements**: F5, F6, F8, F9, F11, F13, F15 (rotation, recording, kick, password change, three-state validation, provider-pluggability, audit field).
**Dependencies**: U1 (model), U2 (denylist).
**Files**:
- `src/agentkit/server/auth/session.py` new module: `SessionService` class
- `src/agentkit/server/auth/cache.py` new module: `SessionCache` interface + `RedisSessionCache` + `InProcessLRUSessionCache` impls
- `tests/unit/auth/test_session.py` full service test suite
**Approach (SessionService methods)**:
- `async create_session(user_id, device_fingerprint, device_label, ip, user_agent, remember_me: bool, auth_provider: str = "local") -> AuthSessionModel`
- **Cap check first**: count active sessions for user; if 10, mark oldest non-current as `revoked` with `revoked_reason='session_cap_eviction'`
- Generate new `sid` (uuid4), `jti` (uuid4)
- Compute `expires_at` based on `remember_me` (30d vs 7d)
- **Store `auth_provider` from caller** (U4 passes `provider.name`); enables F15 audit traceability
- Insert row, return model
- `async get_active_session(sid: str) -> AuthSessionModel | None`
- First check `SessionCache.get(sid)`; on miss, query DB, write to cache (60s TTL)
- Return None if `revoked=True` or `expires_at < now`
- `async rotate_refresh(old_refresh_token: str) -> tuple[AuthSessionModel, TokenPair]`
- Decode `old_refresh_token`; get `sid`; lookup session
- **Reuse detection**: compare `sha256(old_refresh_token)` against `session.refresh_token_hash`. If different, this is a reuse call `revoke_all_for_user(user_id, reason='reuse_detected')` + raise `TokenReuseDetected`
- Also check `RecentlyRevokedTokens.contains(sha256(old_refresh_token))` if yes, same handling
- On legitimate use: generate new `refresh_token`, update `session.refresh_token_hash` = `sha256(new)`, `session.last_active_at` = now, `session.expires_at` = now + ttl, `session.previous_session_id` = old sid (audit), `auth_provider` **preserved** (rotation doesn't change provider)
- Add `sha256(old_refresh_token)` to denylist for 30s
- Issue new access + refresh JWTs (call into jwt_utils)
- Invalidate cache entry for this sid
- `async revoke_session(sid: str, reason: str) -> None`
- Mark `revoked=True`, `revoked_reason=reason`; invalidate cache
- `async revoke_all_for_user(user_id: str, except_sid: str | None, reason: str) -> int`
- Bulk update; returns count of revoked sessions
- `async list_active_for_user(user_id: str) -> list[AuthSessionModel]`
- `async list_all_for_admin(user_id: str) -> list[AuthSessionModel]` (admin endpoint)
- `async list_active_by_provider(auth_provider: str) -> list[AuthSessionModel]` (NEW per KTD-10) supports future "show me all OIDC sessions" admin view
**Approach (SessionCache)**:
```python
class SessionCache(Protocol):
async def get(self, sid: str) -> AuthSessionModel | None: ...
async def set(self, sid: str, session: AuthSessionModel, ttl: int = 60) -> None: ...
async def invalidate(self, sid: str) -> None: ...
```
- `InProcessLRUSessionCache`: `OrderedDict[sid, (session, expires_at)]`; cap=1024; lazy eviction on get
- `RedisSessionCache`: `GET` / `SETEX` / `DEL`; pickle the model for storage
**Test scenarios** (test_session.py):
- `create_session` inserts a row with all fields populated
- `create_session` with remember_me=True sets expires_at 30d out, else 7d
- `create_session` for a user with 10 active sessions evicts the oldest non-current one
- `create_session` for a user with 10 active sessions, the new login is one of them, the evicted one is the OLDEST non-new
- **`create_session` with `auth_provider='oidc-stub'`** stores that value in the row (NEW per KTD-10)
- `get_active_session` returns the row when valid
- `get_active_session` returns None when `revoked=True`
- `get_active_session` returns None when `expires_at < now`
- `get_active_session` second call within 60s hits cache (spy on DB call count)
- `rotate_refresh` with the CURRENT token returns new pair
- `rotate_refresh` preserves the original `auth_provider` value (NEW per KTD-10)
- `rotate_refresh` with a REUSED old token (different hash) `TokenReuseDetected` raised + ALL sessions for user revoked
- `rotate_refresh` with a token in the denylist same handling
- `rotate_refresh` updates `previous_session_id` to the old sid
- `revoke_session` sets `revoked=True`, `revoked_reason`, invalidates cache
- `revoke_all_for_user` except_sid=None revokes everything
- `revoke_all_for_user` except_sid=<current> keeps the current session
- `list_active_for_user` returns only `revoked=False AND expires_at > now`
- `list_all_for_admin` returns all rows including revoked (for audit)
- `list_active_by_provider('local')` returns only local sessions; `('oidc-stub')` returns empty in current code (NEW per KTD-10)
**Verification**: All unit tests pass; `pytest tests/unit/auth/test_session.py -v` shows 100% line coverage of `session.py`.
---
### U4. Routes: new auth + admin endpoints
**Goal**: Expose all session operations as HTTP endpoints.
**Requirements**: F1, F2, F5, F6, F7, F8, F9, F10, F11, F13, F14, F15.
**Dependencies**: U3 (the service), **U11 (AuthProvider 抽象层 — must land first or alongside)**.
**Files**:
- `src/agentkit/server/routes/auth.py` — extend `LoginRequest` with `remember_me: bool = False`; add `WhoamiResponse`, `SessionInfoResponse`; add new endpoints; **DI 注入 `AuthProvider` 通过 `Depends(get_auth_provider)`**KTD-10
- `src/agentkit/server/routes/admin.py` — new module: admin session management endpoints (or extend existing admin module); **调用 `provider.revoke_user(user_id)` 而不是直接改 users 表**KTD-10
- `src/agentkit/server/dependencies.py``get_current_user` extension to look up session via sid; back-compat fallback for old tokens
- `src/agentkit/server/auth/password.py` — extend with `change_password(user_id, new_password)` that revokes all other sessions
- `tests/integration/auth/test_auth_routes.py` — full endpoint suite; **追加 provider mock 注入测试**KTD-10
- `tests/integration/auth/test_admin_routes.py` — admin endpoints
**Approach (new endpoints)**:
| Method | Path | Body / Query | Auth | Behavior |
|--------|------|--------------|------|----------|
| POST | `/auth/login` | `{username, password, remember_me?}` | none | **`provider.authenticate(username, password)`** → `SessionService.create_session(auth_provider=provider.name)` → return `TokenResponse` |
| POST | `/auth/refresh` | `{refresh_token}` | refresh | `SessionService.rotate_refresh` → return new `TokenResponse`; on `TokenReuseDetected` → 401 `{error: "token_reuse_detected"}` |
| POST | `/auth/logout` | `{refresh_token}` | access (optional) | `revoke_session(sid, reason='user_terminated')` |
| GET | `/auth/whoami` | — | access OR refresh | Returns `{user, session: {sid, device_label, ip, auth_provider, created_at, last_active_at, expires_at}}`. Accepts refresh token to support cold-start where access is gone. |
| GET | `/auth/sessions` | — | access | List current user's active sessions (each annotated with `auth_provider`) |
| DELETE | `/auth/sessions/{sid}` | — | access | Revoke that session (if owned by current user) |
| POST | `/auth/logout-others` | — | access | Revoke all sessions except current |
| POST | `/auth/change-password` | `{old_password, new_password}` | access | `provider.authenticate` 校验 old → `provider.revoke_user(user_id)` 失效其他 sessionKTD-10: 跨 provider 行为一致) |
**Approach (admin endpoints)**:
| Method | Path | Auth | Behavior |
|--------|------|------|----------|
| GET | `/admin/users/{user_id}/sessions` | admin | List all that user's sessions (incl revoked) |
| DELETE | `/admin/users/{user_id}/sessions/{sid}` | admin | Force-revoke any session |
**Approach (`/auth/whoami` middleware bypass — critical fix)**:
The current `AuthMiddleware._verify_jwt` (in `src/agentkit/server/auth/middleware.py:80-91`) only accepts `type=access` tokens and 401s on `type=refresh`. The cold-start sequence sends a refresh token (because the access token is gone). To make this work without weakening auth, `/auth/whoami` is added to `AuthMiddleware.WHITELIST_PATHS` and the route does its own auth:
```python
# In auth/middleware.py:
WHITELIST_PATHS = (
"/api/v1/health",
"/api/v1/auth/login",
"/api/v1/auth/refresh",
"/api/v1/auth/logout",
"/api/v1/auth/whoami", # NEW: route does its own auth
"/docs",
"/openapi.json",
"/redoc",
)
```
The `/auth/whoami` route accepts **either** an access token (normal call) **or** a refresh token (cold-start), and the auth check happens inside the route via `verify_token` + session lookup:
```python
@router.get("/whoami")
async def whoami(request: Request) -> WhoamiResponse:
"""Returns the current user + session metadata.
Accepts either type=access (normal) or type=refresh (cold-start).
On 401 from this endpoint, the client treats it as 'invalid' state
(NOT 'error' state) so the router redirects to /login.
"""
auth_header = request.headers.get("Authorization", "")
if not auth_header.startswith("Bearer "):
raise HTTPException(401, "missing bearer token")
token = auth_header[7:]
try:
payload = verify_token(token, expected_type=None) # accept both types
except jwt.ExpiredSignatureError:
raise HTTPException(401, "token expired")
except jwt.InvalidTokenError:
raise HTTPException(401, "invalid token")
sid = payload.get("sid")
if sid:
# New-style: validate session in DB
session = await session_service.get_active_session(sid)
if not session:
raise HTTPException(401, "session revoked or expired")
user = await load_user(session.user_id)
# Issue a fresh access token so the client doesn't need a separate /refresh
new_access = create_access_token(user_id=user.id, session_id=session.id)
return WhoamiResponse(
user=user_to_response(user),
access_token=new_access,
session=session_to_response(session),
)
else:
# Legacy token without sid — back-compat path (U10)
user = await load_user(payload["sub"])
if not user or not user.is_active:
raise HTTPException(401, "user not found or inactive")
new_access = create_access_token(user_id=user.id, session_id=None) # legacy
return WhoamiResponse(
user=user_to_response(user),
access_token=new_access,
session=None, # no session metadata for legacy
)
```
**Approach (defined phantom functions)**:
The plan's pseudo-code references several functions that don't exist yet. Define them explicitly:
```python
# In auth/dependencies.py — NEW dependency for current session
async def get_current_session(request: Request) -> AuthSession:
"""Return the active session for the current request.
Reads request.state.session (set by get_current_user middleware/dependency).
Raises 401 if no session (legacy tokens) or session is revoked.
"""
session = getattr(request.state, "session", None)
if session is None:
raise HTTPException(401, "no active session (legacy token)")
return session
# In auth/dependencies.py — keep existing get_current_user but extend it
async def get_current_user(request: Request) -> User:
"""Return the current authenticated user.
Strategy:
- If request.state.current_user is already set (by AuthMiddleware for
type=access tokens), return it.
- Otherwise, this is called from a path that bypassed middleware
(e.g. /auth/whoami). The route must have set request.state.user
via its own auth check.
- Legacy tokens (no sid) only set current_user, not session.
"""
user = getattr(request.state, "current_user", None)
if user is None:
user = getattr(request.state, "user", None) # set by whoami route
if user is None:
raise HTTPException(401, "not authenticated")
return user
# In auth/users.py — NEW helper
async def load_user(user_id: str) -> User | None:
"""Load a user by id. Returns None if not found or inactive."""
async with aiosqlite.connect(str(DEFAULT_AUTH_DB_PATH)) as db:
cursor = await db.execute(
"SELECT * FROM users WHERE id = ? AND is_active = 1", (user_id,)
)
row = await cursor.fetchone()
return user_row_to_dict(row) if row else None
```
**Approach (`get_current_user` back-compat with sid validation)**:
The new `get_current_user` is called by routes after `AuthMiddleware` has run. The middleware sets `request.state.current_user` (a dict with `id`, `username`, `role`, etc.) for `type=access` tokens. With the new sid-bearing tokens, the middleware is extended to also set `request.state.session`:
```python
# In auth/middleware.py — extend _verify_jwt to also load session
def _verify_jwt(self, token: str) -> dict[str, Any] | None:
# ... existing signature/expiry check ...
sid = payload.get("sid")
if sid:
# Synchronous check is not possible (DB call). Defer to a
# per-route dependency. Middleware only checks signature + expiry
# for new tokens; the session-revoked check happens in the
# get_current_user dependency.
pass
return payload
```
The session-revoked check is then done lazily in `get_current_session`, which calls `SessionService.get_active_session(sid)`. This is one extra DB-or-cache call per request, mitigated by the 60s Redis cache (KTD-6).
**Approach (`change_password`)**:
```python
@router.post("/change-password")
async def change_password(
payload: ChangePasswordRequest,
user: User = Depends(get_current_user),
session: AuthSession = Depends(get_current_session),
):
if not verify_password(payload.old_password, user.password_hash):
raise HTTPException(400, "old password incorrect")
new_hash = hash_password(payload.new_password)
async with aiosqlite.connect(str(DEFAULT_AUTH_DB_PATH)) as db:
await db.execute(
"UPDATE users SET password_hash=?, updated_at=? WHERE id=?",
(new_hash, _now_iso(), user.id),
)
await db.commit()
revoked_count = await session_service.revoke_all_for_user(
user.id, except_sid=session.id, reason="password_changed"
)
logger.info(f"Password changed for user {user.id}; revoked {revoked_count} other sessions")
return {"ok": True, "revoked_sessions": revoked_count}
```
**Test scenarios** (test_auth_routes.py):
- **Happy path**:
- `POST /auth/login` with valid creds → 200, returns token pair + user
- `POST /auth/login` with `remember_me=true` → refresh token exp 30d
- `POST /auth/login` with `remember_me=false` → refresh token exp 7d
- `POST /auth/refresh` with current token → 200, new pair (different from old)
- `GET /auth/whoami` with access token → 200, returns user + session metadata
- `GET /auth/whoami` with refresh token (cold-start case) → 200
- `GET /auth/sessions` → list of current user's active sessions
- `DELETE /auth/sessions/{sid}` for own session → 200, that session now revoked
- `POST /auth/logout-others` → 200, all other sessions revoked
- `POST /auth/change-password` with correct old → 200, other sessions revoked
- **Error paths**:
- `POST /auth/login` with wrong password → 401 (constant-time)
- `POST /auth/login` with unknown user → 401 (constant-time)
- `POST /auth/login` with inactive user → 403
- `POST /auth/refresh` with reused old token → 401 `{error: "token_reuse_detected"}`
- `POST /auth/refresh` with denylisted token → 401
- `POST /auth/refresh` with tampered token → 401
- `GET /auth/whoami` with no Authorization header → 401
- `GET /auth/whoami` with expired access token → 401
- `DELETE /auth/sessions/{sid}` for someone else's session → 403
- `POST /auth/change-password` with wrong old password → 400
- `POST /auth/change-password` with weak new password (if validation added) → 422
- **Integration**:
- Login from client A, login from client B (different IPs / fingerprints) → both have independent sessions
- Login as user from 11 different fingerprints → 11th login evicts the 1st (oldest non-current)
- Change password → other devices get 401 on next request → bounced to /login
**Test scenarios** (test_admin_routes.py):
- `GET /admin/users/{id}/sessions` as admin → returns all sessions (active + revoked)
- `GET /admin/users/{id}/sessions` as non-admin → 403
- `DELETE /admin/users/{id}/sessions/{sid}` as admin → that session revoked
- `DELETE /admin/users/{id}/sessions/{sid}` as non-admin → 403
**Verification**: All integration tests pass; `pytest tests/integration/auth/ -v` shows green.
---
### U5. Tauri: keyring integration + commands
**Goal**: Add three Tauri commands to read/write/clear the refresh token in OS Keychain.
**Requirements**: F3.
**Dependencies**: None on the auth side; only depends on Tauri Cargo config.
**Files**:
- `src/agentkit/server/frontend/src-tauri/Cargo.toml` — add `keyring = { version = "3", features = ["apple-native", "windows-native", "linux-native"] }` (or just default features if 3 platforms covered)
- `src/agentkit/server/frontend/src-tauri/src/auth.rs` — new module with 3 `#[tauri::command]` functions
- `src/agentkit/server/frontend/src-tauri/src/lib.rs` — register the commands in `tauri::Builder::default().invoke_handler(...)`
- `src/agentkit/server/frontend/src-tauri/capabilities/default.json` — add the 3 commands to the `permissions` allowlist
- `tests/unit-tauri/test_keyring.rs` — Rust unit tests using `keyring::mock` feature
**Approach (auth.rs)**:
```rust
const SERVICE: &str = "com.fischer.agentkit";
const USERNAME: &str = "refresh_token";
#[tauri::command]
pub async fn store_refresh_token(token: String) -> Result<(), String> {
let entry = keyring::Entry::new(SERVICE, USERNAME)
.map_err(|e| format!("keychain init failed: {e}"))?;
entry.set_password(&token)
.map_err(|e| format!("keychain write failed: {e}"))
}
#[tauri::command]
pub async fn load_refresh_token() -> Result<Option<String>, String> {
let entry = keyring::Entry::new(SERVICE, USERNAME)
.map_err(|e| format!("keychain init failed: {e}"))?;
match entry.get_password() {
Ok(t) => Ok(Some(t)),
Err(keyring::Error::NoEntry) => Ok(None),
Err(e) => Err(format!("keychain read failed: {e}")),
}
}
#[tauri::command]
pub async fn clear_refresh_token() -> Result<(), String> {
let entry = keyring::Entry::new(SERVICE, USERNAME)
.map_err(|e| format!("keychain init failed: {e}"))?;
match entry.delete_credential() {
Ok(()) => Ok(()),
Err(keyring::Error::NoEntry) => Ok(()),
Err(e) => Err(format!("keychain delete failed: {e}")),
}
}
```
**Approach (Cargo.toml)**:
- Add `keyring = "3"` under `[dependencies]`
- macOS: requires the binary to be signed (Keychain access); for unsigned dev builds, fallback to `keyring::mock` via feature flag (not needed in this plan; document in README instead)
**Approach (capabilities/default.json)**:
- Add 3 entries to the `permissions` array:
- `"core:default:allow-store-refresh-token"`
- `"core:default:allow-load-refresh-token"`
- `"core:default:allow-clear-refresh-token"`
**Test scenarios** (test_keyring.rs):
- `store_refresh_token("abc")` then `load_refresh_token()` returns `Some("abc")`
- `clear_refresh_token()` then `load_refresh_token()` returns `None`
- `load_refresh_token()` on a fresh keyring returns `None` (not error)
- Use `keyring::mock` feature for CI tests; real platform tests are manual on macOS dev machine
**Verification**: `cargo test --manifest-path src/agentkit/server/frontend/src-tauri/Cargo.toml` passes; manual smoke: launch Tauri dev, log in, check macOS Keychain Access.app for the entry.
---
### U6. Frontend: tauri-auth.ts adapter
**Goal**: Abstract Keychain (Tauri) / localStorage (Web) access behind a single async API.
**Requirements**: F3, F4.
**Dependencies**: U5 (the Rust commands must exist for invoke() to work).
**Files**:
- `src/agentkit/server/frontend/src/api/tauri-auth.ts` — new module
- `tests/unit/api/tauri-auth.test.ts` — unit tests with mocked `invoke`
**Approach**:
```typescript
const SERVICE = 'agentkit.refresh_token'
function isTauri(): boolean {
return typeof window !== 'undefined' && '__TAURI_INTERNALS__' in window
}
export const tauriAuthStorage = {
async setRefreshToken(token: string): Promise<void> {
if (isTauri()) {
try {
const { invoke } = await import('@tauri-apps/api/core')
await invoke('store_refresh_token', { token })
return
} catch (e) {
console.warn('[auth] Keychain write failed, falling back to localStorage', e)
}
}
localStorage.setItem(SERVICE, token)
},
async getRefreshToken(): Promise<string | null> {
if (isTauri()) {
try {
const { invoke } = await import('@tauri-apps/api/core')
return await invoke<string | null>('load_refresh_token')
} catch (e) {
console.warn('[auth] Keychain read failed, falling back to localStorage', e)
}
}
return localStorage.getItem(SERVICE)
},
async clearRefreshToken(): Promise<void> {
if (isTauri()) {
try {
const { invoke } = await import('@tauri-apps/api/core')
await invoke('clear_refresh_token')
} catch (e) {
console.warn('[auth] Keychain clear failed, falling back to localStorage', e)
}
}
localStorage.removeItem(SERVICE)
},
}
```
**Test scenarios** (tauri-auth.test.ts):
- `isTauri()` returns `true` when `__TAURI_INTERNALS__` is in window
- `setRefreshToken` in Tauri mode calls `invoke('store_refresh_token', { token })`
- `setRefreshToken` in Tauri mode falls back to localStorage when invoke throws
- `setRefreshToken` in Web mode (no Tauri) writes to localStorage directly
- `getRefreshToken` in Tauri mode returns the value from `invoke('load_refresh_token')`
- `getRefreshToken` in Tauri mode falls back to localStorage when invoke throws
- `clearRefreshToken` in Tauri mode calls `invoke('clear_refresh_token')`
- `clearRefreshToken` in Web mode removes from localStorage
**Verification**: `npm run test:unit -- tauri-auth.test.ts` passes; manual test: launch Tauri, log in, verify entry in macOS Keychain.
---
### U7. Frontend: auth store refactor (3-state startup, pre-emptive refresh)
**Goal**: Rewrite `stores/auth.ts` to support the new flow.
**Requirements**: F1, F10, F11, F12.
**Dependencies**: U6 (adapter), U4 (server endpoints).
**Files**:
- `src/agentkit/server/frontend/src/stores/auth.ts` — major refactor
- `src/agentkit/server/frontend/src/api/auth.ts` — add `whoami()`, `login(rememberMe)`, `changePassword()`, `listSessions()`, `revokeSession()`
- `tests/unit/stores/auth.test.ts` — extend existing test file
**Approach (new auth store shape)**:
```typescript
type AuthStartupState = 'valid' | 'invalid' | 'error' | 'pending'
export const useAuthStore = defineStore('auth', () => {
// --- State ---
const accessToken = ref<string | null>(null) // memory only, never persisted
const user = ref<IAuthUser | null>(readStoredUser()) // localStorage cache for avatar/role
const startupState = ref<AuthStartupState>('pending')
const isLoading = ref(false)
const error = ref<string | null>(null)
// --- Getters ---
const isAuthenticated = computed(() => !!accessToken.value && !!user.value)
const accessTokenExp = computed<number | null>(() => decodeJwtExp(accessToken.value))
const shouldRefresh = computed(() => {
if (!accessTokenExp.value) return false
return accessTokenExp.value * 1000 - Date.now() < 2 * 60 * 1000 // < 2 min
})
// --- Mutators ---
function _setAccess(token: string, user: IAuthUser): void {
accessToken.value = token
// user goes to localStorage (safe — no secret)
localStorage.setItem(USER_KEY, JSON.stringify(user))
// refresh token goes to Keychain (Tauri) or localStorage (Web)
// (called separately by login/refresh)
}
async function _persistTokenPair(pair: ITokenPair): Promise<void> {
accessToken.value = pair.access_token
user.value = pair.user
writeStoredUser(pair.user)
await tauriAuthStorage.setRefreshToken(pair.refresh_token)
}
function _clear(): void {
accessToken.value = null
// do NOT clear user from localStorage (UI shows cached avatar/role)
// do NOT call tauriAuthStorage.clear here; caller decides
}
// --- Actions ---
async function login(username, password, rememberMe = false): Promise<void> {
const pair = await authApi.login(username, password, rememberMe)
await _persistTokenPair(pair)
startupState.value = 'valid'
}
async function startupCheck(): Promise<AuthStartupState> {
const refresh = await tauriAuthStorage.getRefreshToken()
if (!refresh) {
startupState.value = 'invalid' // not an error — just no token
return startupState.value
}
try {
const result = await authApi.whoami(refresh)
// whoami returns { user, access_token, session }
accessToken.value = result.access_token
user.value = result.user
writeStoredUser(result.user)
startupState.value = 'valid'
} catch (err) {
if (err.status === 401) {
await tauriAuthStorage.clearRefreshToken()
startupState.value = 'invalid'
} else {
startupState.value = 'error' // network or server issue
}
}
return startupState.value
}
async function silentRefresh(): Promise<void> {
const refresh = await tauriAuthStorage.getRefreshToken()
if (!refresh) {
_clear()
throw new Error('no refresh token')
}
try {
const pair = await authApi.refresh(refresh)
await _persistTokenPair(pair)
} catch (err) {
if (err.status === 401) {
// reuse detected or all sessions revoked
await tauriAuthStorage.clearRefreshToken()
}
_clear()
throw err
}
}
async function logout(): Promise<void> {
const refresh = await tauriAuthStorage.getRefreshToken()
if (refresh) {
try { await authApi.logout(refresh) } catch { /* server may be down */ }
}
await tauriAuthStorage.clearRefreshToken()
_clear()
user.value = null // explicit: logged out means no cached user
}
function logoutLocal(): void {
_clear()
user.value = null
}
return { /* state, getters, actions */ }
})
```
**Approach (api/auth.ts additions)**:
```typescript
async login(username, password, rememberMe = false): Promise<ITokenPair> {
return this.request('/auth/login', {
method: 'POST',
body: JSON.stringify({ username, password, remember_me: rememberMe }),
})
}
async whoami(refreshToken?: string): Promise<{ user: IAuthUser; access_token: string; session: SessionInfo }> {
// whoami accepts either an access token (normal call) or a refresh token (cold start)
// The base client's auth header injection handles access; for the cold-start case
// we need a special path that uses the refresh token instead.
return this.requestWithAuth('/auth/whoami', refreshToken)
}
async listSessions(): Promise<SessionInfo[]> { ... }
async revokeSession(sid: string): Promise<void> { ... }
async changePassword(oldPassword: string, newPassword: string): Promise<void> { ... }
```
**Approach (api/base.ts interceptor)**:
```typescript
this.client.interceptors.request.use(async (config) => {
const auth = useAuthStore()
if (auth.shouldRefresh && auth.accessToken) {
try {
await auth.silentRefresh()
} catch {
// silent refresh failed; let the request go through and 401 will trigger route
}
}
if (auth.accessToken) {
config.headers.Authorization = `Bearer ${auth.accessToken}`
}
return config
})
```
**Test scenarios** (auth.test.ts):
- `login(...)` calls authApi.login with remember_me param
- `login(...)` persists refresh token via tauriAuthStorage.setRefreshToken
- `startupCheck()` with no refresh token → state='invalid'
- `startupCheck()` with valid refresh → state='valid', user populated
- `startupCheck()` with 401 from whoami → state='invalid', refresh token cleared
- `startupCheck()` with network error → state='error', refresh token retained
- `silentRefresh()` succeeds → new access in memory, new refresh in Keychain
- `silentRefresh()` on 401 reuse → all state cleared, refresh token cleared
- `shouldRefresh` is true when access expires in <2 min
- `shouldRefresh` is false when access expires in >2 min or no access
- `logout()` calls authApi.logout then clears Keychain + state
- `logout()` doesn't fail when server is down (best-effort)
- Access token is NEVER written to localStorage (spy on localStorage.setItem)
**Verification**: `npm run test:unit -- auth.test.ts` passes; manual e2e via `npm run tauri dev`.
---
### U8. Frontend: LoginView "Remember me" + Settings sessions UI
**Goal**: User-facing changes to the login page and a new "Active Sessions" panel in settings.
**Requirements**: F2, F7, F8 (user-side).
**Dependencies**: U7 (store + api), U4 (endpoints).
**Files**:
- `src/agentkit/server/frontend/src/views/LoginView.vue` — add "Remember me" checkbox; pass to store.login
- `src/agentkit/server/frontend/src/views/SettingsView.vue` — new section "Active sessions" (or new route `/settings/sessions`)
- `src/agentkit/server/frontend/src/components/settings/ActiveSessionsPanel.vue` — new component
- `src/agentkit/server/frontend/src/components/settings/ChangePasswordPanel.vue` — new component
- `src/agentkit/server/frontend/src/router/index.ts` — add `/settings/sessions` and `/settings/security` routes
- `tests/unit/views/LoginView.test.ts` — checkbox behavior
- `tests/unit/components/ActiveSessionsPanel.test.ts`
**Approach (LoginView additions)**:
```vue
<a-checkbox v-model:checked="form.rememberMe">
记住我30 天内免登录
</a-checkbox>
```
```typescript
async function handleSubmit() {
await authStore.login(form.username, form.password, form.rememberMe)
router.replace(redirectTarget())
}
```
**Approach (ActiveSessionsPanel.vue)**:
- On mount: call `authApi.listSessions()`, render table (Device / Last active / Created / [Revoke] button)
- "Current session" row has a badge; revoke button is disabled for the current row
- "Revoke" calls `authApi.revokeSession(sid)` and removes the row
- "Revoke all others" button at the top → calls `authApi.logoutOthers()` and reloads
**Approach (ChangePasswordPanel.vue)**:
- 3 fields: old password, new password, confirm new password
- Submit: `authApi.changePassword(old, new)`
- On success: show success message; note "其他设备将自动登出"
**Test scenarios**:
- `LoginView` renders the checkbox; submitting with it checked passes `rememberMe=true` to store
- `ActiveSessionsPanel` renders a row per session from the API response
- `ActiveSessionsPanel` "Revoke" button calls `authApi.revokeSession(sid)` and removes the row optimistically
- `ActiveSessionsPanel` "Revoke all others" calls `authApi.logoutOthers()` and reloads the list
- `ActiveSessionsPanel` disables Revoke on the current session row
- `ChangePasswordPanel` shows field-level validation errors (mismatched passwords)
- `ChangePasswordPanel` on success shows toast and clears the form
**Verification**: `npm run test:unit -- LoginView ActiveSessionsPanel ChangePasswordPanel` passes; Playwright e2e for the full settings flow.
---
### U9. Admin UI: user sessions management
**Goal**: Admins can see and revoke any user's active sessions.
**Requirements**: F7, F8 (admin-side).
**Dependencies**: U7, U4 (admin endpoints exist), U8 (reuses ActiveSessionsPanel layout).
**Files**:
- `src/agentkit/server/frontend/src/views/admin/UsersView.vue` (or `UserDetailView.vue`) — add "Sessions" tab
- `src/agentkit/server/frontend/src/components/admin/UserSessionsPanel.vue` — admin variant
- `src/agentkit/server/frontend/src/api/admin.ts` — new file
- `tests/unit/components/UserSessionsPanel.test.ts`
**Approach**:
- Reuse `ActiveSessionsPanel` styling; pass an `adminMode` prop that adds:
- Show username in the table header
- Allow revoke of any session including current
- Show revoked sessions with strikethrough
- API: `adminApi.listUserSessions(userId)`, `adminApi.revokeUserSession(userId, sid)`
**Test scenarios**:
- Admin can see all sessions for a user (active + revoked)
- Admin can revoke any session
- Non-admin attempting to call adminApi endpoints gets a clear 403 error in the UI
**Verification**: `npm run test:unit -- UserSessionsPanel` passes; manual e2e with admin login.
---
### U10. Backwards-compat + rollout shim
**Goal**: Existing in-flight clients (without `sid` claim) keep working for one minor version.
**Requirements**: N6.
**Dependencies**: U4 (the back-compat path in `get_current_user`).
**Files**:
- `src/agentkit/server/dependencies.py``get_current_user` accepts both with-sid and without-sid JWTs; logs a DEBUG for legacy
- `src/agentkit/server/auth/jwt_utils.py``create_token_pair` has a `legacy_mode=True` flag for the migration window; tokens issued during migration carry `sid` but the validator still accepts old ones
- `docs/migrations/2026-06-20-client-version-rollout.md` — new doc explaining the rollout window (server logs a warning when a legacy JWT is accepted)
**Approach**:
- Add an `X-Client-Version` header to all requests (set in `api/base.ts`)
- Server middleware reads this header; if version < `0.5.0`, it issues a legacy JWT (no sid) so that client doesn't get a 401 it can't handle
- New clients always get a `sid`-bearing JWT
- After one minor version (~30 days), remove the legacy path in a separate change
**Test scenarios**:
- `get_current_user` with a sid-bearing JWT loads the session, validates it, returns the user
- `get_current_user` with a JWT without sid (legacy) accepts it as long as signature + exp are valid
- `get_current_user` with a sid-bearing JWT where the session is revoked → 401
- `get_current_user` with a sid-bearing JWT where the session doesn't exist → 401
- Legacy middleware path issues tokens without `sid` for clients with `X-Client-Version < 0.5.0`
**Verification**: Backwards-compat test using a hand-crafted legacy JWT; new client flow continues to work; manual test with the previous-version frontend.
---
### U11. AuthProvider 抽象层(为未来 IdP 对接留扩展点)
**Goal**: 把"用户存在哪里 / 密码怎么校验 / 属性怎么同步"封装在可插拔的 `AuthProvider` adapter 后面。当前实现 `LocalAuthProvider`(封装 SQLite + bcrypt同时提供 `StubOIDCProvider` 占位实现(`raise NotImplementedError`)作为未来 OIDC 实现的接口契约参考。路由层 / admin API / SessionService 通过 `Depends(get_auth_provider)` 拿到 provider 引用,**未来切 IdP 零修改路由**。
**Requirements**: F13, F14, F15.
**Dependencies**: None被 U1/U3/U4 引用;可与 U1-U4 任何阶段并行或先后落地;建议在 Phase 1 早期就上,因为 U1 schema 需要 `auth_provider` 字段)。
**Files**:
- `src/agentkit/server/auth/providers/__init__.py` — new package导出 `AuthProvider`、`get_auth_provider()` 工厂、`LocalAuthProvider`、`StubOIDCProvider`
- `src/agentkit/server/auth/providers/base.py``AuthProvider` Protocol`name: str` + `authenticate` / `get_user_by_id` / `sync_user_attributes` / `revoke_user` 4 个 async 方法)
- `src/agentkit/server/auth/providers/local.py``LocalAuthProvider` 实现,封装现有 `auth/password.py` 逻辑bcrypt 校验 + 查 users 表)
- `src/agentkit/server/auth/providers/oidc_stub.py``StubOIDCProvider` 占位实现,所有方法 `raise NotImplementedError` 并在 docstring 中指向下一迭代 OIDC 实现的 checklist
- `src/agentkit/server/config.py` — extend `AuthConfig` with `provider: Literal["local", "oidc-stub"] = "local"`(或新增 `auth.provider` 字段)
- `tests/unit/auth/providers/test_base.py` — Protocol 静态类型检查(`runtime_checkable` Protocol 验证)+ mock provider 用例
- `tests/unit/auth/providers/test_local.py``LocalAuthProvider` 全量单测(复用 `auth/password.py` 测试场景)
- `tests/unit/auth/providers/test_oidc_stub.py``StubOIDCProvider` 调用任意方法均抛 `NotImplementedError` 的单测
**Approach (AuthProvider Protocol)**:
```python
# auth/providers/base.py
from typing import Protocol, runtime_checkable
from ..models import User
@runtime_checkable
class AuthProvider(Protocol):
"""所有鉴权后端必须实现的能力。
路由层只调用以下方法,不感知具体实现是 SQLite / OIDC / LDAP。
未来新增 IdP 只需新加一个实现此 Protocol 的 adapter。
"""
name: str # 标识当前 provider写入 session.auth_provider
async def authenticate(self, *, username: str, password: str) -> User:
"""校验用户名 + 密码,返回 User 对象。失败抛 InvalidCredentials。"""
...
async def get_user_by_id(self, user_id: int) -> User | None:
"""按 id 查 useradmin 端点、session 校验、whoami 都用这个)。"""
...
async def sync_user_attributes(self, user_id: int) -> None:
"""同步用户属性(部门/邮箱/职位等。LocalAuthProvider: no-opOidcAuthProvider: 从 IdP 拉最新 profile 写回本地。"""
...
async def revoke_user(self, user_id: int) -> None:
"""禁用用户(离职/锁定。LocalAuthProvider: UPDATE users SET is_active=0OidcAuthProvider: 调 IdP 的 disable API未来。"""
...
```
**Approach (LocalAuthProvider)**: 把 `routes/auth.py:201-213` 的 password 校验逻辑SQLite SELECT + bcrypt 校验 + load_user搬到 `LocalAuthProvider.authenticate`。路由层不再直接调 `verify_password` / `load_user` —— 统一走 provider。`revoke_user` 走 `UPDATE users SET is_active=0`admin 端点统一调这个,不再直接写 DB
**Approach (StubOIDCProvider)**: 所有方法 raise `NotImplementedError`docstring 写明:
> 当前未实现。下一迭代 OIDC 集成时,重写本类即可,路由 / admin / Session 表零修改。配置 `auth.provider: oidc-stub` 启动会立即报 NotImplementedError这是设计避免误启用未完成的功能
**Approach (DI 工厂)**:
```python
# auth/providers/__init__.py
from functools import lru_cache
from ...config import get_settings
from .base import AuthProvider
from .local import LocalAuthProvider
from .oidc_stub import StubOIDCProvider
@lru_cache
def get_auth_provider() -> AuthProvider:
settings = get_settings()
provider_name = settings.auth.provider
if provider_name == "local":
db = get_auth_db() # 现有 aiosqlite 连接(需改造为模块级单例)
return LocalAuthProvider(db)
elif provider_name == "oidc-stub":
return StubOIDCProvider()
else:
raise ValueError(f"unknown auth provider: {provider_name}")
```
**Approach (config 扩展)**:
```yaml
# agentkit.yaml
auth:
provider: local # local | oidc-stub (未来: oidc-keycloak, oidc-feishu, ...)
session:
table: auth_sessions
access_ttl_seconds: 900
refresh_ttl_seconds: 604800
refresh_ttl_remember_me_seconds: 2592000
jwt:
secret_env: AGENTKIT_JWT_SECRET
algorithm: HS256
```
**Test scenarios** (test_base.py + test_local.py + test_oidc_stub.py):
- `LocalAuthProvider` with valid username+password returns User
- `LocalAuthProvider` with wrong password raises `InvalidCredentials`
- `LocalAuthProvider` with unknown username raises `InvalidCredentials`
- `LocalAuthProvider` with inactive user (`is_active=0`) raises `InvalidCredentials`
- `LocalAuthProvider.get_user_by_id` returns the user or None
- `LocalAuthProvider.sync_user_attributes` is a no-op (returns None)
- `LocalAuthProvider.revoke_user` sets `is_active=0` and subsequent `authenticate` fails
- `LocalAuthProvider.name == "local"`
- `StubOIDCProvider.authenticate` raises `NotImplementedError` with helpful message
- `StubOIDCProvider.get_user_by_id` raises `NotImplementedError`
- `StubOIDCProvider.sync_user_attributes` raises `NotImplementedError`
- `StubOIDCProvider.revoke_user` raises `NotImplementedError`
- `StubOIDCProvider.name == "oidc-stub"`
- `get_auth_provider()` with `auth.provider=local` returns `LocalAuthProvider` instance
- `get_auth_provider()` with `auth.provider=oidc-stub` returns `StubOIDCProvider` instance
- `get_auth_provider()` with `auth.provider=unknown` raises `ValueError`
- `get_auth_provider()` is memoized (lru_cache; second call returns same instance)
- `runtime_checkable(AuthProvider)`: both Local and Stub pass `isinstance(prov, AuthProvider)` check
- Protocol violation: a class missing `authenticate` method does NOT pass `isinstance` check (negative test)
**Patterns to follow**:
- Protocol + runtime_checkable pattern (Python typing best practice)
- DI 工厂 + lru_cache 单例(与现有 `get_settings` 一致)
- error 类型 `InvalidCredentials` 放到 `auth/providers/exceptions.py`(新建)
**Verification**:
- `pytest tests/unit/auth/providers/ -v` 全部通过
- `mypy src/agentkit/server/auth/providers/` 无报错
- 启动 dev server配置 `auth.provider: oidc-stub` → 第一次 `/auth/login` 返回 501 NotImplementedError确认 stub 起作用)
- 启动 dev server配置 `auth.provider: local` → 走现有登录流程,确认未破坏
- admin 踢人功能调用 `provider.revoke_user(user_id)`user 再 `authenticate` 失败cross-check LocalAuthProvider.revoke_user 行为)
**未来 IdP 对接 checklist**(下一迭代参考):
- [ ] `auth/providers/oidc.py` — 实现 `OidcAuthProvider`authenticate / get_user / sync_attributes / revoke_user
- [ ] `auth/oauth_routes.py``/auth/oauth/{provider}/redirect``/auth/oauth/{provider}/callback` 端点
- [ ] `auth/state_cache.py` — OAuth state 参数防 CSRFRedis TTL 5min
- [ ] 用户首次从 IdP 登录时的「本地账号创建」策略justeer / 拒绝 / 邀请制)
- [ ] IdP 端的 session 同步IdP 登出时本地 session 也撤销)
- [ ] 集团部门 / 职位属性映射到本地 users 表
本次迭代只做 Protocol + Local 实现 + Stub 占位 + DI 工厂 + 上述 1-3 项的占位(接口定义),其余列入下一迭代独立 brainstorm。
---
## System-Wide Impact
| Stakeholder | Impact | Mitigation |
|-------------|--------|------------|
| End users (Tauri) | First login → no more login prompts for 7d (30d if "remember me"). | Pre-emptive refresh + Keychain storage prevent the failure modes that broke the existing flow. |
| End users (Web) | Same as Tauri but refresh in localStorage (degraded security). | Document the trade-off; Keychain is Tauri-only. |
| Admins | New capability: see active sessions, kick any user. | UI in admin pages; surface clearly in the Users view. |
| Developers (auth code) | New session module, denylist, cache, **AuthProvider 抽象层**. | U3 is the single source of truth — routes don't duplicate logic. U11 is the single source of auth backend — routes don't import password.py directly. |
| **未来集团 IdP 集成团队** | 切到 OIDC / SAML / LDAP 时只新增 adapter不重写路由 / admin | U11 Protocol + LocalAuthProvider 已上;下一迭代 `auth/providers/oidc.py` 直接实现 Protocol 即可 |
| Existing in-flight clients | Unaffected during 30-day window. | U10 shim. |
| Server load | +1 cache lookup per request (cached 60s). | Redis-backed cache makes this sub-ms. |
| DB schema | New `auth_sessions` table (含 `auth_provider` 字段); existing `user_sessions` deprecated. | Alembic migration; keep `user_sessions` reads working for one version. |
---
## Risks & Dependencies
| Risk | Likelihood | Impact | Mitigation |
|------|------------|--------|------------|
| `keyring` crate compatibility issues on Linux without `gnome-keyring` / `kwallet` | Medium | Low (Tauri dev) | Document `apt install gnome-keyring` in README; fallback to localStorage as per KTD-confirmed decision. |
| Tauri WebView localStorage might be cleared on Tauri upgrade | Low | Medium (forces re-login) | Refresh token is in Keychain, not localStorage, so this is no longer a re-login trigger. Only the cached user (avatar) is lost. |
| Refresh token rotation causes concurrent-request races | Medium | Medium (false-positive reuse detection) | The 30s denylist window catches the case; legitimate retries complete in <1s. Add a metric for reuse detection so we can spot flapping. |
| Migration corrupts existing refresh tokens | Low | High (users locked out) | Test migration on a copy of prod DB; preserve `user_sessions` reads for back-compat. |
| Session cap eviction surprises users (they didn't expect to be kicked) | Low | Low (visible at next login) | Make the cap (10) generous; document it; do not log evicted users out silently. |
| Test mocks diverge from real `keyring` behavior | Medium | Medium (CI passes, manual fails) | Use `keyring::mock` feature in CI; document that real-platform testing is manual. |
| JWT secret rotation in dev mode invalidates all sessions | Low | High (Tauri dev loops) | Document the behavior; provide `agentkit doctor` to check. |
| **AuthProvider 切换时遗留 routes 直接调 `verify_password` / 改 users 表**KTD-10 | Medium | Medium IdP 时必须清理 | U11 引入后强制要求所有 routes `Depends(get_auth_provider)`code review 模板加 checklist禁止 routes 直接调 password/auth 函数 |
| **`lru_cache` 单例 + 测试隔离冲突**U11 | Low | Low测试 flaky | `get_auth_provider` 提供 `cache_clear()` helper`conftest.py` 在每个 test fixture 前后清缓存 |
| **未来 IdP 接管时 `LocalAuthProvider` 残留依赖** | Low | Low迁移期保留即可 | U11 checklist 显式列出Local 仍可用作"本地应急账号"OIDC 接管后不删 Local仅调整路由默认 provider |
### External Dependencies
| Dependency | Version | Required For |
|------------|---------|--------------|
| `keyring` (Rust crate) | 3.x | Tauri Keychain integration (U5) |
| `pyjwt` (Python) | already in use | JWT signing/verification (U2) |
| `aiosqlite` (Python) | already in use | DB layer (U1, U3) |
| `alembic` (Python) | already in use | Migrations (U1) |
| `redis` (Python) | already in use | Session cache (U3) optional; in-process fallback |
| `@tauri-apps/api` (TS) | 2.x | Tauri command invocation (U6) |
---
## Phased Delivery
This plan has natural phasing based on dependency order. Each phase lands as a single PR.
### Phase 1: Backend foundation (U1, U2, U3)
- `auth_sessions` table + migration
- JWT sid/jti claims
- SessionService with rotation + reuse detection
- Redis/in-process cache
- ~3-4 days of work, no frontend changes
**Rollout gate**: Deploy to dev. All existing clients continue to work (legacy JWT path). New login creates `auth_sessions` rows; old `user_sessions` rows are no longer written.
### Phase 2: New endpoints (U4, U10)
- All new auth + admin endpoints
- Backwards-compat shim
- Admin endpoint tests
- ~2 days of work, frontend still on old flow
**Rollout gate**: Deploy to dev. New endpoints are available; old `/auth/login` and `/auth/refresh` still work (with legacy tokens).
### Phase 3: Tauri Keychain (U5, U6)
- Rust commands + Cargo dep
- Frontend tauri-auth adapter
- ~1-2 days of work
**Rollout gate**: Build a new Tauri release. Verify on macOS (Keychain Access.app shows the entry). Linux without keyring daemon manual test fallback.
### Phase 4: Frontend refactor (U7, U8, U9)
- Auth store rewrite (3-state, pre-emptive refresh, no access in localStorage)
- LoginView "Remember me"
- Active Sessions panel in Settings
- Admin user sessions panel
- ~3-4 days of work
**Rollout gate**: Frontend rebuild. End-to-end manual test on Tauri (macOS) + Web. Run Playwright suite.
### Phase 5: Cleanup (after one minor version, ~30 days)
- Remove the legacy JWT back-compat path
- Drop the `user_sessions` table
- Update `X-Client-Version` floor
- ~1 day of work
### Phase 6: AuthProvider 抽象层U11 + 关联改造)
> **2026-06-20 新增 Phase**(合并 AuthProvider scope
- `auth/providers/base.py` `AuthProvider` Protocol + `runtime_checkable`
- `auth/providers/local.py` `LocalAuthProvider`封装现有 `routes/auth.py:201-213` password 校验逻辑
- `auth/providers/oidc_stub.py` `StubOIDCProvider``raise NotImplementedError` 占位
- `auth/providers/__init__.py` `get_auth_provider()` DI 工厂`lru_cache` 单例
- `config.py` 新增 `auth.provider: local | oidc-stub` 配置
- U1 schema `auth_provider` 字段合并入 Phase 1 U1
- U3 SessionService `create_session` 接受 `auth_provider` 参数合并入 Phase 1 U3
- U4 routes `Depends(get_auth_provider)` 注入admin 端点调 `provider.revoke_user(user_id)` 而不是直接改 users 合并入 Phase 2 U4
- ~1.5 days of work可以与 Phase 1 早期并行落地
**Rollout gate**:
- `pytest tests/unit/auth/providers/ -v` 全部通过
- 启动 dev server配置 `auth.provider: oidc-stub` 第一次 `/auth/login` 返回 501 NotImplementedError
- 启动 dev server配置 `auth.provider: local` 现有登录流程不受影响
- admin 踢人功能调用 `provider.revoke_user(user_id)` 行为与原 DB 直接 UPDATE 等价
**未来 IdP 集成入口**下一迭代 OIDC 集成只需新加 `auth/providers/oidc.py` + `auth/oauth_routes.py` U11 checklist路由 / admin / Session 表零修改
---
## Open Questions
These are deferred to implementation and tracked here for visibility:
1. **Q1**: Should "Active Sessions" be a tab in Settings or a separate route (`/settings/sessions`)? Plan defaults to a Settings tab; revisit if UX testing suggests otherwise.
2. **Q2**: Should the admin UI show `revoked_reason` for kicked sessions? Plan defaults to YES (audit value); revisit if it adds too much visual noise.
3. **Q3**: Should the cap-eviction trigger a server-side notification (e.g. an `audit_event`)? Plan defaults to writing a row to a future `auth_audit_log` table; for now, just the `revoked_reason='session_cap_eviction'` field is enough.
4. **Q4**: Should `change_password` rate-limit (e.g. 5 attempts per hour)? Out of scope here but worth a follow-up security brainstorm.
5. **Q5**: macOS Tauri builds need code-signing for Keychain access. The dev binary is unsigned Keychain prompts "always allow". Plan documents this; production builds must be signed.
6. **Q6 (新增 2026-06-20)**: AuthProvider 抽象层与现有 `routes/auth.py:201-213` password 校验逻辑如何共存计划方案U11 第一步 `LocalAuthProvider` 完整复刻现有逻辑行为等价第二步 U4 routes 改造时一次性切换U11 落地时写"行为等价"测试套件确认切换前后行为一致
7. **Q7 (新增 2026-06-20)**: `get_auth_provider()` `lru_cache` 单例在测试环境如何隔离计划方案导出 `cache_clear()` helper`conftest.py` 在每个 test fixture 前后 `get_auth_provider.cache_clear()`不引入 `dependency_overrides`避免 FastAPI app 状态污染
---
## Sources & Research
### Codebase references
- [src/agentkit/server/auth/models.py](src/agentkit/server/auth/models.py) current `UserSessionModel` + aiosqlite bootstrap pattern
- [src/agentkit/server/auth/jwt_utils.py](src/agentkit/server/auth/jwt_utils.py) current JWT issuance
- [src/agentkit/server/routes/auth.py](src/agentkit/server/routes/auth.py) current login/refresh/logout/me
- [src/agentkit/server/auth/password.py](src/agentkit/server/auth/password.py) bcrypt cost=12
- [src/agentkit/server/auth/dependencies.py](src/agentkit/server/auth/dependencies.py) `require_authenticated`
- [src/agentkit/server/app.py:928](src/agentkit/server/app.py#L928) router registration
- [src/agentkit/server/frontend/src/stores/auth.ts](src/agentkit/server/frontend/src/stores/auth.ts) current Pinia store
- [src/agentkit/server/frontend/src/router/index.ts:166-189](src/agentkit/server/frontend/src/router/index.ts#L166-L189) route guard
- [src/agentkit/server/frontend/src/views/LoginView.vue](src/agentkit/server/frontend/src/views/LoginView.vue) login page
- [src/agentkit/server/frontend/src/api/auth.ts](src/agentkit/server/frontend/src/api/auth.ts) frontend auth API client
- [src/agentkit/server/frontend/src/api/base.ts](src/agentkit/server/frontend/src/api/base.ts) base API client + interceptor
- [src/agentkit/server/frontend/src-tauri/Cargo.toml](src/agentkit/server/frontend/src-tauri/Cargo.toml) current Tauri deps
- [src/agentkit/server/frontend/src-tauri/src/lib.rs](src/agentkit/server/frontend/src-tauri/src/lib.rs) current Tauri command registration
### External references
- OWASP JWT Security Cheat Sheet refresh token rotation, denylist patterns
- Auth0 Refresh Token Rotation docs (https://auth0.com/docs/secure/tokens/refresh-tokens/refresh-token-rotation)
- `keyring` crate v3 docs (https://docs.rs/keyring/latest/keyring/) cross-platform credential storage
- Tauri 2.x Capabilities system command allowlisting (https://v2.tauri.app/security/capabilities/)
### Institutional learnings
- Project context: [AGENTS.md](AGENTS.md) + [.trae/rules/project_rules.md](.trae/rules/project_rules.md) security and async generator safety rules apply
- Existing tests: `tests/unit/auth/` + `tests/integration/auth/` patterns to follow for new test files
- The current `_refreshFailed` sticky flag in [stores/auth.ts:112](src/agentkit/server/frontend/src/stores/auth.ts#L112) is the root cause of the "logged out for no reason" UX the rewrite in U7 eliminates it by always re-trying the refresh before giving up
---
## Acceptance Examples (for the executor / reviewer)
The following end-to-end flows must work after this plan lands. Each is testable in Playwright or manual e2e.
### AE-1: First login → cold start → main app (Covers F1, F3, F10, F11)
1. Launch Tauri (clean state, no Keychain entry)
2. Login with valid credentials land on `/agent`
3. Close Tauri window
4. Re-launch Tauri (cold start)
5. **Expected**: brief splash, then `/agent`. No login page seen. Keychain Access.app shows an entry for `com.fischer.agentkit / refresh_token`.
### AE-2: Token expiry mid-session → silent refresh (Covers F10)
1. Log in; access token exp 15 min
2. Wait 13 minutes (or manually expire the token in DB)
3. Make an API call (e.g. fetch conversations)
4. **Expected**: request succeeds (silent refresh happened before the call); no 401 surfaced to the user.
### AE-3: Refresh token reuse → all sessions revoked (Covers F5, F9)
1. Log in from Tauri (session A)
2. Log in from Web (session B)
3. Copy A's refresh token from Keychain
4. Wait for A to refresh once legitimately (A's old refresh is now in the 30s denylist, and A has a new refresh)
5. Try to use the copied old refresh token
6. **Expected**: 401 with `error: "token_reuse_detected"`. A's session is revoked. B's session is also revoked. Both clients get bounced to /login.
### AE-4: Password change → other device kicked (Covers F9)
1. Log in from Tauri (session A) and Web (session B) as the same user
2. From A, change password
3. From B, make any API call
4. **Expected**: B gets 401 bounced to /login. A continues to work.
### AE-5: Admin kicks a session (Covers F7, F8)
1. User logs in from Tauri and Web
2. Admin opens the Users view, selects the user, opens the Sessions tab
3. Admin clicks "Revoke" on the Tauri session
4. **Expected**: Tauri client's next API call returns 401 bounced to /login. Web session is unaffected.
### AE-6: Remember me toggle (Covers F2)
1. Log in with "Remember me" UNCHECKED
2. **Expected**: refresh token exp is 7 days
3. Log out, log in with "Remember me" CHECKED
4. **Expected**: refresh token exp is 30 days
### AE-7: Session cap eviction (Covers F12 + the cap)
1. Log in 10 times from 10 different simulated clients (use curl with different User-Agent headers)
2. **Expected**: 10 sessions exist, all active
3. Log in an 11th time
4. **Expected**: the oldest non-current session is revoked (visible in DB with `revoked_reason='session_cap_eviction'`); the 11 sessions are now the 2nd-10th + the new 11th
### AE-8: Web fallback to localStorage (Covers F4)
1. Open the app in a browser (not Tauri)
2. Log in
3. **Expected**: `localStorage.getItem('agentkit.refresh_token')` returns the token. DevTools shows the value.
4. (Note: this is the documented degraded security model for Web clients)
### AE-9: Old client still works during migration (Covers N6)
1. Build a previous-version frontend
2. Log in (gets a legacy JWT without sid)
3. Make API calls
4. **Expected**: server validates the legacy JWT via the back-compat path; user is not affected
### AE-10: AuthProvider 切换local → oidc-stub 验证接口契约)(Covers F13, F14)
> **2026-06-20 新增**KTD-10 / U11 验证)
1. 配置 `agentkit.yaml` `auth.provider: local`启动 dev server
2. `POST /auth/login` 用现有 admin 账号
3. **Expected**: 200 OK返回 TokenResponseDB `auth_sessions.auth_provider='local'`
4. 改配置为 `auth.provider: oidc-stub`重启 dev server
5. `POST /auth/login` 同样账号
6. **Expected**: 501 Not ImplementedStubOIDCProvider NotImplementedError
7. 验证 admin 端点 `/admin/users/{id}/sessions` 仍能列出步骤 3 创建的 session `auth_provider='local'` 字段
8. **Expected**: admin session 列表功能不受 provider 切换影响KTD-10 核心承诺
9. `isinstance(provider_instance, AuthProvider)` 验证 Local Stub 都通过 Protocol 检查
10. **Expected**: 两者都返回 `True``runtime_checkable` Protocol 行为正确
### AE-11: 审计字段 auth_provider 写入(覆盖历史 + 新建)(Covers F15)
1. AE-1 步骤 1-2 完成后 `GET /auth/sessions` 列出当前 user 的所有 active session
2. **Expected**: 每个 session 包含 `auth_provider: "local"` 字段即使是 backfill `user_sessions` 的行也是 `'local'`因为 backfill 走默认值
3. admin `GET /admin/users/{id}/sessions` user
4. **Expected**: 所有 session 都带 `auth_provider` 字段admin 可按 provider 过滤即使当前只有 local未来 oidc 接入后会有 oidc-* 区分
5. `SessionService.list_active_by_provider('local')` 返回所有 local session
6. **Expected**: count = 步骤 2 看到的总数
7. `SessionService.list_active_by_provider('oidc-stub')` 在当前实现下返回空 list
8. **Expected**: count = 0证明字段存在但无数据未来 OIDC 接入后才会有值
5. Server log shows DEBUG: "Legacy JWT without sid; using exp-only validation"