feat(arcrun): recipe system + resumable workflow + component registry canon
Three new platform capabilities + one component (kbdb_get) to enable
real AI workflow execution through cypher binding YAML.
## Recipe System (容器 + Recipe 模式)
SDD: .agents/specs/recipe-system/
- prompt_recipe schema (Zod): fragments + inputs + assembly + output
- recipe-expander.ts: expand recipe ref → real prompt by fetching KBDB blocks
+ pulling context fields with transforms (pluck_content / extract_field / etc)
- 7 transform whitelist: json_array / to_string / join / markdown_list /
extract_field / first / pluck_content
- graph-executor hooks: detect node.data.recipe → expand → inject into ctx
- output JSON parser (with markdown fence stripping for Claude-wrapped JSON)
- Stored in RECIPES KV under prompt_recipe:{name}
## Resumable Workflow (webhook callback resume)
SDD: .agents/specs/resumable-workflow/
- WorkflowPaused class + paused-runs.ts (persist/load/consume in EXEC_CONTEXT KV, 24h TTL)
- graph-executor: detect {pending:true, task_id} → persist state → throw WorkflowPaused
- cypher-handlers: catch → return {success:true, paused:true, task_id, run_id}
- POST /workflows/resume route: consume KV state → resumeFromPaused()
- Auto-inject callback_url for claude_api nodes (PUBLIC_BASE_URL or default cypher.arcrun.dev)
- claude_api/main.go: forward callback_url to Mira daemon, default timeout 25s→120s
- Idempotent (consume = load+delete)
## Component Registry Canon
SDD: .agents/specs/component-registry-canon/
- Add POST /components/index-only endpoint (metadata-only, no wasm/sandbox)
- Backfill script (mjs): scan registry/components/*/contract.yaml → submit to KV
- register-component.sh: SSOT for local + CI hook (deploy.yml change in next commit)
- Drop R2 dead storage from submitComponent + types + wrangler
- Schema relaxed: category enum + auth/ai/platform; cold_start 50→500ms; size 2→8MB
## kbdb_get component
- registry/components/kbdb_get/: TinyGo WASM, two modes (block_id / page_name list)
- .component-builds/kbdb_get/: WASI shim worker (kbdb-get.arcrun.dev)
End-to-end validation: AI uses MCP execute_workflow with recipe ref →
cypher-executor expands prompt from KBDB schema/skill blocks + drafts →
claude_api calls Mira daemon → daemon callback fires resume route →
workflow continues. Verified with real 2KB+ Karpathy LLM Wiki draft.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,162 @@
|
||||
# SDD: arcrun Component Registry 正典化(Component Registry Canon)
|
||||
|
||||
> 2026-05-07 建立。狗糧發現的根本問題:registry 活著但 index 空的,AI 找不到零件就會繞回 Python。
|
||||
> 範圍:**讓 registry 成為零件 metadata 的 SSOT**,含 u6u → arcrun rebrand。
|
||||
|
||||
---
|
||||
|
||||
## 1. 問題
|
||||
|
||||
### 1.1 表象
|
||||
- `registry.arcrun.dev/components/search?q=*` 永遠回 0 結果
|
||||
- MCP `u6u_search_components` 找不到任何零件
|
||||
- `acr parts list` 同樣空
|
||||
|
||||
### 1.2 根因
|
||||
`matrix/arcrun/registry/components/` 下 30+ 個零件已經部署成獨立 Worker(kbdb_ingest, claude_api, kbdb_create_block, kbdb_patch_block, http_request, string_ops, ⋯),但**它們的 contract.yaml 沒有透過 `POST /components/submit` 進 registry index**。
|
||||
|
||||
部署路徑:
|
||||
```
|
||||
registry/components/{name}/main.go ← TinyGo 寫的零件
|
||||
↓ tinygo build
|
||||
.component-builds/{name}/component.wasm
|
||||
↓ wrangler deploy
|
||||
{name}.arcrun.dev (Worker) ← 零件可被 HTTP 呼叫了
|
||||
|
||||
registry index? ← 這步從來沒做
|
||||
```
|
||||
|
||||
### 1.3 影響(吃狗糧的觀察)
|
||||
- 新 AI(Claude / Gemini / Codex)進來不知道有什麼零件 → 自己寫 Python 直打 API
|
||||
- arcrun 想推「AI-first 自服務」整個破功
|
||||
- 文件寫得再好都救不了 — 因為 README 只能寫概念,零件清單必須是 API 動態查
|
||||
|
||||
---
|
||||
|
||||
## 2. 目標
|
||||
|
||||
**Registry 是零件 metadata 的 SSOT**:
|
||||
|
||||
- 零件 Worker 在跑 ⇔ registry 有對應 entry(雙向綁定)
|
||||
- AI 透過 MCP `search_components` 永遠找得到所有現役零件
|
||||
- README 不寫死數量,動態 badge 即時反映
|
||||
- 第三方裝完 MCP 30 秒內能找到第一個可用零件
|
||||
|
||||
---
|
||||
|
||||
## 3. 三層設計
|
||||
|
||||
### Layer 1: 一次性 backfill(Phase 1)
|
||||
|
||||
掃 `matrix/arcrun/registry/components/*/component.contract.yaml`,把每個 contract POST 進 registry index。
|
||||
|
||||
工具:`matrix/arcrun/registry/scripts/backfill-index.ts`
|
||||
- 讀檔 → 解析 YAML → 呼叫 registry submit endpoint
|
||||
- idempotent:已存在不重複寫(registry 端要支援 upsert)
|
||||
- 跳過沙盒驗收(這些零件已驗過、已部署,不用重跑 gherkin tests)
|
||||
|
||||
### Layer 2: 部署即註冊(Phase 2)
|
||||
|
||||
改 `.github/workflows/deploy.yml`:
|
||||
- 通用掃描掃到 `.component-builds/{name}/wrangler.toml` 部署成功後
|
||||
- post-deploy step 自動呼叫 registry submit(contract 從 `registry/components/{name}/component.contract.yaml` 讀)
|
||||
|
||||
零件 Worker 部署 ⇒ registry 自動更新。沒有「零件部署了但 registry 不知道」的可能。
|
||||
|
||||
### Layer 3: Discoverability(Phase 3)
|
||||
|
||||
- README 移除「21 個零件」這種寫死數字,改「跑 search 看當前清單」
|
||||
- 加 badge endpoint `registry.arcrun.dev/badge/components.svg` 即時顯示數量
|
||||
- MCP `get_component_guide` 開頭加鐵律:「動工前必須先 search,不是猜」
|
||||
- onboarding kit GitHub template:CLAUDE.md / .cursor/rules / AGENTS.md 三件套,all 強制 search 優先
|
||||
|
||||
### Layer 4: u6u → arcrun Rebrand(Phase 4)
|
||||
|
||||
把 `matrix/u6u-mcp/` 跟所有 `u6u_*` tool 名搬到 arcrun 命名空間。
|
||||
|
||||
理由:
|
||||
- u6u 是申請 arcrun.dev 之前的暫名,現在已過時
|
||||
- 命名混亂阻礙推廣(「為什麼 arcrun 文件叫 u6u_*?」)
|
||||
- 第三方看到 u6u 不知道是同一個產品
|
||||
|
||||
範圍:
|
||||
1. 目錄:`matrix/u6u-mcp/` → `matrix/arcrun-mcp/`
|
||||
2. Worker name:`u6u-mcp` → `arcrun-mcp`
|
||||
3. Tool 前綴:`u6u_search_components` → `arcrun_search_components`(14 個 tool)
|
||||
4. Hostname:`mcp.finally.click` → `mcp.arcrun.dev`(finally.click 保留 redirect 到 arcrun.dev 過渡期)
|
||||
5. Repo / Worker 內部 ID:u6u-mcp-server → arcrun-mcp-server
|
||||
6. README 全文:u6u → arcrun
|
||||
7. user memory(CLAUDE.md / MEMORY.md)相關提及一併更新
|
||||
8. inkstone-component-registry(舊 worker)廢止 → arcrun-registry 為唯一現役
|
||||
|
||||
**Rebrand 原則:**
|
||||
- 用戶端 config(claude_desktop_config.json 等)給過渡期:兩個 URL 都活,舊的回 deprecation header 提示換新
|
||||
- Tool 前綴 `u6u_*` → `arcrun_*` 沒有過渡期(一刀切,因為前綴是 AI 看的,不是用戶記憶肌肉)
|
||||
- 文件 / repo 內所有 reference 立即改
|
||||
|
||||
---
|
||||
|
||||
## 4. 範圍邊界
|
||||
|
||||
**在本 SDD 範圍內:**
|
||||
- ✅ Phase 1: backfill index
|
||||
- ✅ Phase 2: 部署即註冊 hook
|
||||
- ✅ Phase 3: README + badge + onboarding kit
|
||||
- ✅ Phase 4: u6u → arcrun rebrand(含目錄 / worker / hostname / tool 前綴 / 文件)
|
||||
|
||||
**不在範圍內:**
|
||||
- 新零件開發(這是 polaris 業務範圍)
|
||||
- registry KV schema 改動(用既有結構)
|
||||
- u6u-gui 的 rebrand(u6u-mcp 同 monorepo 但獨立 SDD)
|
||||
- Phase 5(用戶自製零件 R2 上傳)— 等 Phase 4 完成後另開 SDD
|
||||
|
||||
**前置依賴(已完成):**
|
||||
- ✅ u6u-mcp Zod 4 → Zod 3 修復(2026-05-07)
|
||||
- ✅ u6u-mcp service binding 改指 arcrun-registry(2026-05-07)
|
||||
- ✅ arcrun-registry Worker 部署在 registry.arcrun.dev
|
||||
|
||||
---
|
||||
|
||||
## 5. 驗收標準
|
||||
|
||||
### Phase 1 驗收
|
||||
- `u6u_search_components("kbdb")` 回非空結果,含 `kbdb_ingest` / `kbdb_create_block` / `kbdb_patch_block`
|
||||
- `acr parts list` CLI 端對端能列出
|
||||
- registry KV 內至少 30 entries
|
||||
|
||||
### Phase 2 驗收
|
||||
- 部署任一既有零件後,registry 30 秒內 reflect 更新
|
||||
- 部署一個全新零件,無需手動 publish,registry 自動有
|
||||
- CI workflow 不會因 registry 寫入失敗就擋部署(degraded mode:寫入失敗 log warning 但不 fail)
|
||||
|
||||
### Phase 3 驗收
|
||||
- README 沒有「21 個零件」「30 個零件」這種寫死數字
|
||||
- badge SVG 渲染正確、數字跟 KV 一致
|
||||
- onboarding kit clone 下來,照 README 跑能 30 秒內 list 到零件
|
||||
|
||||
### Phase 4 驗收
|
||||
- `mcp.arcrun.dev/mcp/mcp` 通,回的 tool 名都是 `arcrun_*`
|
||||
- 舊 `mcp.finally.click/mcp/mcp` 仍可用但回 deprecation header
|
||||
- README / docs / GUIDE 全部 u6u 字樣消除
|
||||
- `matrix/u6u-mcp/` 目錄不存在,改為 `matrix/arcrun-mcp/`
|
||||
- 用戶記憶(`~/.claude/.../MEMORY.md`)arcrun MCP 設定範例已更新
|
||||
|
||||
---
|
||||
|
||||
## 6. 風險與緩解
|
||||
|
||||
| 風險 | 緩解 |
|
||||
|---|---|
|
||||
| backfill 把 contract 灌進去後,沙盒驗收覆蓋既有資料 | registry submit 加 `skip_acceptance=true` flag,僅 backfill 用 |
|
||||
| 部署 hook 寫入失敗擋掉部署 | hook degraded mode:失敗只 warning,不 fail 部署 |
|
||||
| Rebrand 把現役 client 弄壞 | 過渡期:舊 hostname 跟 worker 並存 1 個月 |
|
||||
| Tool 前綴改名 AI 適應期 | 不過渡,一刀切(前綴是 system instruction 範圍,AI 一個 prompt 就學會)|
|
||||
| 既有用戶 config 寫死 finally.click | 提前公告 + 過渡期 + 舊 endpoint 自動 redirect / proxy |
|
||||
|
||||
---
|
||||
|
||||
## 7. 變更紀錄
|
||||
|
||||
| 版本 | 日期 | 內容 |
|
||||
|---|---|---|
|
||||
| v1.0 | 2026-05-07 | 初版。吃狗糧發現 registry 空的,三層設計(backfill / auto-register / discoverability)+ u6u → arcrun rebrand 一併納入。 |
|
||||
@@ -0,0 +1,159 @@
|
||||
# Tasks — Component Registry Canon
|
||||
|
||||
> 對應 SDD:[design.md](design.md)
|
||||
> 上次更新:2026-05-07
|
||||
|
||||
**狀態 legend**:`[ ]` 待辦 / `[🔄]` 進行中 / `[x]` 完成
|
||||
|
||||
---
|
||||
|
||||
## Phase 0:前置(已完成)
|
||||
|
||||
- [x] 0.1 u6u-mcp Zod 4 → Zod 3 降版修 tools/list `_zod undefined` bug(2026-05-07)
|
||||
- [x] 0.2 u6u-mcp service binding `inkstone-component-registry` → `arcrun-registry`(2026-05-07)
|
||||
- [x] 0.3 確認 `mcp.finally.click/mcp/mcp` 端對端通,tools/list 回 14 個 tool(2026-05-07)
|
||||
|
||||
---
|
||||
|
||||
## Phase 1:Backfill Index(半天,立即見效)
|
||||
|
||||
- [x] 1.1 探查 registry 既有 endpoint:發現
|
||||
- 既有 `POST /components` 強制要 wasm bytes(multipart 或 base64),跑沙盒驗收 + 寫 R2 + 寫 KV
|
||||
- cypher-executor 已不從 R2 動態載 wasm(line 32 標 R2 路徑作廢,零件用獨立 Worker URL)
|
||||
- 結論:R2 是 legacy,registry 真正用途是 metadata 索引給 AI 搜尋
|
||||
- 決策:**加新 endpoint `POST /components/index-only`** 接 contract(無 wasm、無沙盒),專供 backfill 跟「已部署但未索引」零件用
|
||||
- [x] 1.1.1 加 `src/actions/indexOnlyComponent.ts`(metadata-only 寫 KV,冪等)
|
||||
- [x] 1.1.2 加 `src/routes/components.ts` 的 `POST /index-only` route
|
||||
- [x] 1.1.3 部署 + smoke test(contract 驗證 + 錯誤處理通過)
|
||||
- [x] 1.2 寫 `matrix/arcrun/registry/scripts/backfill-index.mjs`(zero-build node script,用 js-yaml)
|
||||
- [x] 1.3 dry-run 確認 30 個 component 全 parse 通
|
||||
- [x] 1.4 跑真 backfill(過程中發現並修了兩個 schema 問題):
|
||||
- schema enum `category` 補 `auth` / `ai` / `platform`(types.ts)
|
||||
- `max_cold_start_ms` 上限放寬 50 → 500(auth/ai 含 crypto 需要)
|
||||
- `no_network_syscall` / `no_filesystem_syscall` 改 optional
|
||||
- `max_size_kb` 上限放寬 2048 → 8192
|
||||
- index-only route 對缺 gherkin/description/tags 的零件補 placeholder(不擋索引)
|
||||
- [x] 1.5 驗證:MCP `u6u_search_components("kbdb")` 回 3 個零件(kbdb_ingest / kbdb_create_block / kbdb_patch_block)
|
||||
- [ ] 1.6 驗證:`acr parts list` CLI 端對端能列
|
||||
- [x] 1.7 驗證:registry KV 30 entries(30 created + 30 idx 共 60 keys)
|
||||
|
||||
---
|
||||
|
||||
## Phase 1.5:砍 R2 dead storage(先於 Phase 2,清架構斷層)
|
||||
|
||||
> 2026-05-07 加入。R2 wasm 路徑早已 dead(cypher-executor 不從 R2 讀),保留只會誤導 AI。
|
||||
> SDD design.md 的「Phase 5 用戶自製零件 R2 上傳」一併廢止。
|
||||
|
||||
- [x] 1.5.1 改 `submitComponent.ts`:移除 R2 寫入段落,保留 KV 寫入
|
||||
- [x] 1.5.2 移除 `wrangler.toml` 的 `[[r2_buckets]] WASM_BUCKET` binding
|
||||
- [x] 1.5.3 移除 `types.ts` Bindings 的 `WASM_BUCKET: R2Bucket`
|
||||
- [x] 1.5.4 既有 `wasm_r2_key` 欄位保留為 deprecated(queryComponents 仍會讀 legacy record)
|
||||
- [ ] 1.5.5 廢止 `arcrun-wasm` R2 bucket(30 天觀察期後 → 2026-06-07 之後 `wrangler r2 bucket delete`)
|
||||
- [x] 1.5.6 部署 + smoke test:search 端對端通過(kbdb 找到 3 個零件)
|
||||
|
||||
## Phase 2:部署即註冊(1-2 天)
|
||||
|
||||
- [x] 2.1 選擇方案:CI step(github actions)— 在 wrangler deploy 之後 curl `/index-only`
|
||||
- [x] 2.2 寫 `registry/scripts/register-component.sh`(本地 + CI 共用 SSOT,python3 + pyyaml 解 YAML,curl POST registry)
|
||||
- [x] 2.3 改 `.github/workflows/deploy.yml` tier1 deploy step 後加 "Register component in registry" step(degraded mode:失敗只 warning)
|
||||
- [x] 2.4 本地驗 `bash scripts/register-component.sh kbdb_ingest` → 200 + already_indexed
|
||||
- [ ] 2.5 真正 push 一個新零件驗 CI hook 端對端(需要等下次新增零件時驗)
|
||||
- [ ] 2.6 文件化:`docs/contributing-components.md`「新增零件的標準流程」
|
||||
- [ ] 2.7 廢止 `u6u_publish_component` tool 的「需手動 publish」假設(rebrand 一起做)
|
||||
|
||||
---
|
||||
|
||||
## Phase 3:Discoverability(半天)
|
||||
|
||||
- [ ] 3.1 改 GitHub `richblack/arcrun` README
|
||||
- 移除「21 個零件」這種寫死數字
|
||||
- 加「跑 `acr parts list` 或 MCP search 看當前清單」
|
||||
- 加 badge:``
|
||||
- [ ] 3.2 加 `matrix/arcrun/registry/src/routes/badge.ts`
|
||||
- GET `/badge/components.svg` 回 shields.io 格式 SVG
|
||||
- count 從 KV 即時 query
|
||||
- cache 1 分鐘(`Cache-Control: max-age=60`)
|
||||
- [ ] 3.3 改 MCP `u6u_get_component_guide` tool(之後改名 `arcrun_*`)
|
||||
- 開頭加「鐵律:動工前必須先 search_components,找不到才 publish」
|
||||
- [ ] 3.4 onboarding kit GitHub template repo(建議名 `arcrun-quickstart`)
|
||||
- 三件套:CLAUDE.md / `.cursor/rules/arcrun.mdc` / AGENTS.md
|
||||
- 強制:「呼叫 Claude/任何 AI 前,先 list MCP tools;arcrun MCP 已連線時,**禁止用 Python 直打 HTTP API**」
|
||||
- 內附範例 hello workflow 跟 component
|
||||
- [ ] 3.5 寫 onboarding doc:`docs/onboarding-third-party-engineer.md`
|
||||
- 第三方工程師如何 30 秒內讓 AI 學會用 arcrun
|
||||
|
||||
---
|
||||
|
||||
## Phase 4:u6u → arcrun Rebrand(1 天,最後做)
|
||||
|
||||
> 規劃做完 Phase 1-3 驗證 OK 才動 rebrand,避免邊改邊驗。
|
||||
|
||||
### 4.1 Repo & Worker
|
||||
- [ ] 4.1.1 `git mv matrix/u6u-mcp matrix/arcrun-mcp`(或 cp + rm,視 git history 偏好)
|
||||
- [ ] 4.1.2 改 `matrix/arcrun-mcp/wrangler.toml`:
|
||||
- `name = "u6u-mcp"` → `name = "arcrun-mcp"`
|
||||
- 加 route `mcp.arcrun.dev/*`,舊 `studio.finally.click/mcp*` 保留 1 個月
|
||||
- [ ] 4.1.3 改 `package.json`:`@inkstone/u6u-mcp-worker` → `@arcrun/mcp-worker`
|
||||
|
||||
### 4.2 Tool 前綴改名
|
||||
- [ ] 4.2.1 14 個 tool 檔案 rename:`u6u_*.ts` → `arcrun_*.ts`
|
||||
- [ ] 4.2.2 每個 tool 內部 `server.tool("u6u_xxx", ...)` 改 `server.tool("arcrun_xxx", ...)`
|
||||
- [ ] 4.2.3 `src/tools/registry.ts` import 路徑全改
|
||||
- [ ] 4.2.4 `src/index.ts` `serverInfo.name` 從 `u6u-mcp-server` 改 `arcrun-mcp-server`
|
||||
|
||||
### 4.3 文件
|
||||
- [ ] 4.3.1 README.md 全文 u6u → arcrun
|
||||
- [ ] 4.3.2 GUIDE.md 同上
|
||||
- [ ] 4.3.3 GitHub `richblack/arcrun` README 補 MCP 段落(之前沒提)
|
||||
- [ ] 4.3.4 任何提到 `u6u-mcp` / `mcp.finally.click` 的 docs 更新
|
||||
|
||||
### 4.4 用戶記憶
|
||||
- [ ] 4.4.1 `~/.claude/projects/.../memory/MEMORY.md` 加 arcrun MCP entry
|
||||
- URL: `https://mcp.arcrun.dev/mcp/mcp`
|
||||
- tool 前綴: `arcrun_*`
|
||||
- finally.click 過渡期到何時
|
||||
- [ ] 4.4.2 polaris/mira/CLAUDE.md 提到 daemon / arcrun / MCP 的部分對齊新命名
|
||||
|
||||
### 4.5 過渡期(舊 endpoint 不立刻砍)
|
||||
- [ ] 4.5.1 舊 `mcp.finally.click/mcp/mcp` 加回應 header `Deprecation: true` + `Link: <https://mcp.arcrun.dev/mcp/mcp>; rel="successor-version"`
|
||||
- [ ] 4.5.2 舊 worker 繼續服務 30 天(2026-06-07 為止)
|
||||
- [ ] 4.5.3 廢止排程:2026-06-07 後舊 worker 改回 410 Gone + 提示換新 URL
|
||||
|
||||
### 4.6 驗證
|
||||
- [ ] 4.6.1 `mcp.arcrun.dev/mcp/mcp` initialize + tools/list + 一個 tool call 全通
|
||||
- [ ] 4.6.2 我的 Claude Code config 切到新 URL,用 `mcp__arcrun__search_components` 端對端測
|
||||
- [ ] 4.6.3 grep `u6u` 在 `matrix/arcrun-mcp/` 結果為 0(除了 changelog 紀錄)
|
||||
|
||||
---
|
||||
|
||||
## 風險追蹤
|
||||
|
||||
- 風險 1:backfill 跑進去發現某些 contract.yaml 格式跟 registry 期望不一樣 → 緩解:dry-run 先看,必要時補 contract 欄位
|
||||
- 風險 2:Phase 4 rebrand 期間用戶 client 設定亂 → 緩解:過渡期 + Deprecation header
|
||||
- 風險 3:自動註冊 hook 失敗導致部署被擋 → 緩解:degraded mode(warning 不 fail)
|
||||
|
||||
---
|
||||
|
||||
## Known Issues(吃狗糧發現的,先記錄)
|
||||
|
||||
### KI-1:u6u-mcp README URL 寫錯
|
||||
- README 寫 `mcp.finally.click/mcp`,實際是 `mcp.finally.click/mcp/mcp`(basePath + route)
|
||||
- 影響:用戶照 README 裝完試打不通
|
||||
- 解法:rebrand 時順便修
|
||||
|
||||
### KI-2:inkstone-component-registry 跟 arcrun-registry 並存
|
||||
- 兩個 worker 都活著,u6u-mcp 之前指錯
|
||||
- inkstone-component-registry 是舊版(2026-03-24)、arcrun-registry 是現役(2026-04-16)
|
||||
- 解法:Phase 1 backfill 完成後,inkstone-component-registry worker 廢止
|
||||
|
||||
### KI-3:search 對自然語言不夠靈敏(吃狗糧第一個發現)
|
||||
- 現象:
|
||||
- `search("從 KBDB 讀取或查詢 block")` → 0 結果
|
||||
- `search("kbdb")` → 3 結果(kbdb_ingest / kbdb_patch_block / kbdb_create_block)
|
||||
- 根因:搜尋走 embedding(bge-m3)相似度,但既有零件清單少(30 個)+ description 寫得正式,自然語言整句的 embedding 跟 description 距離太遠
|
||||
- 影響:**致命** — AI 第一句永遠是自然語言整句,回 0 就會放棄 search 改寫 Python
|
||||
- 解法(Phase 3 處理):
|
||||
1. embedding search 之外加 keyword fallback(split query → 對 canonical_id / display_name / tags 做 ILIKE)
|
||||
2. 或 lower threshold(目前 SCORE_THRESHOLD = 0.5,可能過高)
|
||||
3. MCP get_component_guide 教 AI 「找不到時拆關鍵字再 search」
|
||||
- 優先級:P1(會擋推廣)
|
||||
@@ -0,0 +1,240 @@
|
||||
# SDD: arcrun Recipe System(容器 + Recipe 模式)
|
||||
|
||||
> 2026-05-07 建立。吃狗糧寫 wiki 合成 workflow 時撞牆發現的平台缺口。
|
||||
> 核心原則:**一個 WASM 零件 = 容器,內容(recipe)存資料庫**。
|
||||
> n8n 為每種 API 寫獨立 node,arcrun 走「容器 + recipe」減少零件數量。
|
||||
|
||||
---
|
||||
|
||||
## 1. 問題
|
||||
|
||||
### 1.1 撞牆現場
|
||||
|
||||
寫 mira wiki 合成 workflow(7-B)時:
|
||||
- 流程:`kbdb_get(stale)` → foreach → `kbdb_get(drafts)` → `claude_api(合成 prompt)` → `kbdb_ingest`
|
||||
- 第三步要組 prompt:`schema 內容 + skill 模板 + drafts array + existing_entities`
|
||||
- cypher binding 內建 `{{var}}` 模板太弱(只支援 top-level,不支援嵌套 / array → string)
|
||||
- 沒有 `string_template` 零件、沒有 `array_to_markdown` 零件
|
||||
- 寫專用 `wiki_prompt_builder` 零件 = 走 n8n 老路,每個 AI workflow 都要寫一個
|
||||
|
||||
### 1.2 根因
|
||||
|
||||
**arcrun recipe 系統只覆蓋 HTTP / auth 兩層**:
|
||||
|
||||
| Recipe 種類 | 存哪 | 容器 | 狀態 |
|
||||
|---|---|---|---|
|
||||
| auth_recipe | RECIPES KV (`auth_recipe:{service}`) | auth_static_key / auth_oauth2 / ... | ✅ 已有 |
|
||||
| api_recipe | RECIPES KV (`rec_{hash}`) | http_request | ✅ 已有(hard-code 在 cypher-executor 待清,Phase 1-3 處理)|
|
||||
| **prompt_recipe** | ❌ 不存在 | claude_api(容器) | **缺** |
|
||||
|
||||
`claude_api` 零件目前吃 `prompt: string`(已組好的字串),沒有「recipe 模式」可以讓 AI 用「組合配方」的方式呼叫。
|
||||
|
||||
### 1.3 影響
|
||||
|
||||
- **致命**:寫不出第一個 wiki 合成 workflow(7-B 卡關)
|
||||
- **推廣破功**:arcrun 對外 prop 是「容器 + recipe,AI 不用寫 code」,但 prompt 這層做不到
|
||||
- **未來所有 AI workflow 都會撞同樣問題**:rss-tech-news 評語、河道 AI 副駕、ai-comment、文章摘要⋯ 全部需要組 prompt
|
||||
|
||||
---
|
||||
|
||||
## 2. 設計
|
||||
|
||||
### 2.1 核心:prompt_recipe 平行於 auth_recipe / api_recipe
|
||||
|
||||
**儲存**:`RECIPES` KV,key 格式 `prompt_recipe:{name}`
|
||||
|
||||
**結構**:
|
||||
```yaml
|
||||
id: prompt_recipe:wiki_synthesis
|
||||
version: v1
|
||||
description: "Mira wiki 合成(抽 triplet + 寫 wiki paragraph)"
|
||||
model: sonnet # haiku / sonnet / opus(claude_api 沿用既有 routing)
|
||||
|
||||
# 從 KBDB / 其他來源取的 fragment(在 prompt 組合時抓並插入)
|
||||
fragments:
|
||||
- var: schema
|
||||
source: kbdb_block
|
||||
block_id: "7a4e456e-1b0f-406a-8842-5e01d1cf1eef" # mira-wiki-schema
|
||||
field: content
|
||||
- var: skill_template
|
||||
source: kbdb_block
|
||||
block_id: "85e3b81e-dca8-4131-bcdc-990bd0d3a16f" # source-skill-wiki-synthesis
|
||||
field: content
|
||||
|
||||
# 從 workflow context 取(input/前置節點輸出)
|
||||
inputs:
|
||||
- var: drafts # 草稿 array
|
||||
from: "ctx.read_drafts.blocks"
|
||||
transform: "json_array" # 轉成 JSON array string
|
||||
- var: existing_entities
|
||||
from: "ctx.read_entities.blocks"
|
||||
transform: "extract_field:page_name" # 抽 array 的 page_name 欄位 join 成 list
|
||||
- var: entity_name
|
||||
from: "ctx.loop.item" # foreach 迴圈當前元素
|
||||
|
||||
# 最終 prompt 由 fragments + inputs 套進 skill_template 組成
|
||||
prompt_assembly:
|
||||
system: "{{schema}}" # 直接用 schema 當 system prompt
|
||||
user: "{{skill_template}}" # skill template 內含 {{drafts}} {{existing_entities}} {{entity_name}} 變數
|
||||
|
||||
# 期待輸出
|
||||
output:
|
||||
format: json # claude_api 自動 parse 為 object
|
||||
schema: # zod-style,parse 失敗回 success:false
|
||||
type: object
|
||||
required: [triplets, entities, paragraphs, source_summary]
|
||||
```
|
||||
|
||||
### 2.2 Recipe 解析在 cypher-executor(架構選擇 B)
|
||||
|
||||
**設計決策**(2026-05-07):recipe 解析跟 prompt 組裝**在 cypher-executor TS**,不改既有 claude_api WASM。
|
||||
|
||||
理由:
|
||||
1. recipe 解析是 cypher-executor 既有 `api_recipe / auth_recipe` 同性質工作
|
||||
2. 既有 claude_api 已部署 + 已測試,不動影響面最小
|
||||
3. transform 邏輯(json_array / extract_field 等)TS 寫起來比 TinyGo 簡單 10 倍
|
||||
4. 不違反 §1.6 — skill 還是 KBDB block,cypher-executor 只是組合者,不寫死 prompt
|
||||
|
||||
**流程:**
|
||||
|
||||
```
|
||||
workflow YAML 節點 config 出現 `recipe: prompt_recipe:xxx`
|
||||
│
|
||||
▼
|
||||
cypher-executor graph-executor.ts
|
||||
在執行該節點前 → 偵測 recipe 欄位 → 走 recipe expander
|
||||
│
|
||||
▼
|
||||
recipe expander(新 module)
|
||||
1. 從 RECIPES KV 抓 `prompt_recipe:xxx` 定義
|
||||
2. 按 fragments 規則 → 用既有 KBDB client 抓 block content
|
||||
3. 按 inputs 規則 → 從 context 取值 + 跑 transform
|
||||
4. 組 system prompt + user prompt
|
||||
5. 把 {prompt, model, mira_token, ...} 當作節點實際 input
|
||||
│
|
||||
▼
|
||||
loader 呼叫 claude_api 容器(不知道 recipe 存在,仍吃舊介面)
|
||||
│
|
||||
▼
|
||||
claude_api 容器 → Mira daemon → 回 LLM 結果
|
||||
│
|
||||
▼
|
||||
graph-executor 取結果 → 按 recipe.output 規則 parse JSON / 驗 schema
|
||||
```
|
||||
|
||||
**對 claude_api 容器的影響**:完全沒有。它仍吃 `{mira_token, prompt, model}`。
|
||||
|
||||
**對 workflow 作者的體驗**:
|
||||
```yaml
|
||||
config:
|
||||
synthesize:
|
||||
component: claude_api
|
||||
recipe: "prompt_recipe:wiki_synthesis" # ← cypher-executor 偵測到這欄位,自動解析
|
||||
mira_token: "{{secret.mira_token}}"
|
||||
```
|
||||
|
||||
不寫 recipe 走舊路:
|
||||
```yaml
|
||||
config:
|
||||
reply:
|
||||
component: claude_api
|
||||
prompt: "{{ctx.user_message}}" # ← 沒 recipe,cypher-executor 直接透傳
|
||||
mira_token: "{{secret.mira_token}}"
|
||||
```
|
||||
|
||||
### 2.3 Workflow YAML 體驗
|
||||
|
||||
```yaml
|
||||
name: wiki_synthesis
|
||||
flow:
|
||||
- "input >> 完成後 >> read_stale"
|
||||
- "read_stale >> 對每個 >> read_drafts"
|
||||
- "read_drafts >> 完成後 >> synthesize"
|
||||
- "synthesize >> 完成後 >> write_wiki"
|
||||
config:
|
||||
read_stale:
|
||||
component: kbdb_get
|
||||
page_name: "mira-wiki-index-stale"
|
||||
read_drafts:
|
||||
component: kbdb_get
|
||||
page_name: "{{loop.item}}" # entity name
|
||||
synthesize:
|
||||
component: claude_api
|
||||
recipe: "prompt_recipe:wiki_synthesis" # ← 重點:指 recipe,不寫 prompt
|
||||
mira_token: "{{secret.mira_token}}"
|
||||
write_wiki:
|
||||
component: kbdb_ingest
|
||||
text: "{{prev.paragraphs}}"
|
||||
```
|
||||
|
||||
**AI 寫這 workflow 只需要:**
|
||||
1. 知道有 `kbdb_get / claude_api / kbdb_ingest` 三個容器(MCP search 找得到)
|
||||
2. 知道有 `prompt_recipe:wiki_synthesis` 這個配方(MCP search 找得到)
|
||||
3. 不需要懂 prompt 怎麼組、不需要看 wiki schema 文字
|
||||
|
||||
### 2.4 Recipe 是 KBDB block 還是 KV?
|
||||
|
||||
**選 KV**(`RECIPES` namespace),跟既有 auth_recipe / api_recipe 一致:
|
||||
- key: `prompt_recipe:{name}`
|
||||
- value: YAML/JSON
|
||||
- CLI 跟 MCP 用既有 `recipe push` / `recipe list` 工具管理(不需新工具)
|
||||
|
||||
**不選 KBDB block**:
|
||||
- 雖然 polaris/mira/CLAUDE.md §1.6 說「source-skill 存 KBDB block」
|
||||
- 但 §1.6 講的是 mira 業務的 skill template(schema / skill 模板)
|
||||
- recipe 是「組合配方」(指向哪些 block + 怎麼組),是 platform 層
|
||||
- recipe **裡面** 引用 KBDB block id(fragments.source: kbdb_block)— 兩層關係清楚
|
||||
|
||||
---
|
||||
|
||||
## 3. 範圍邊界
|
||||
|
||||
**在本 SDD 範圍內:**
|
||||
- ✅ Phase 1: prompt_recipe schema + RECIPES KV 規範
|
||||
- ✅ Phase 2: claude_api 改吃 recipe(向後相容舊 prompt 參數)
|
||||
- ✅ Phase 3: 寫第一個 recipe `prompt_recipe:wiki_synthesis`
|
||||
- ✅ Phase 4: 用此 recipe 完成 mira 7-B workflow
|
||||
- ✅ Phase 5: MCP 加 recipe 管理 tool(list / get / push / delete prompt_recipe)
|
||||
|
||||
**不在範圍內:**
|
||||
- HTTP api_recipe / auth_recipe 改造(已有,不動)
|
||||
- 多模態 prompt(image input)— 等 P2
|
||||
- recipe 沙盒驗收(recipe 是資料不是 code,不需要)
|
||||
|
||||
**前置依賴(已完成):**
|
||||
- ✅ kbdb_get 零件(5.3)
|
||||
- ✅ component-registry MCP backfill(component-registry-canon Phase 1)
|
||||
|
||||
---
|
||||
|
||||
## 4. 為什麼這個設計重要
|
||||
|
||||
| n8n | arcrun |
|
||||
|---|---|
|
||||
| Gmail node、Slack node、OpenAI node、Anthropic node、各 LLM node ⋯(每種 API 一個 node)| `http_request` 容器 + 各 service 的 api_recipe |
|
||||
| 每個 LLM 用法新 node(chat / completion / embedding)| `claude_api` 容器 + 各用途的 prompt_recipe |
|
||||
| AI 要學「Gmail node 怎麼用」「Slack node 怎麼用」⋯ | AI 要學「容器 + 配方」一次學會 |
|
||||
| 零件數爆炸(500+) | 容器固定(< 30),配方無限擴充 |
|
||||
| 配方藏在程式碼 | 配方在 KV,AI 直接 CRUD |
|
||||
|
||||
**對 AI 推廣**:第三方 AI 看到「30 個容器 + 100 個配方」遠比「500 個 node」好理解,且配方是文字資料不是 code,AI 寫配方比寫 node 簡單。
|
||||
|
||||
---
|
||||
|
||||
## 5. 風險與緩解
|
||||
|
||||
| 風險 | 緩解 |
|
||||
|---|---|
|
||||
| recipe 結構過度複雜,AI 寫不出來 | Phase 3 寫第一個 recipe(wiki_synthesis)作為範本,未來 AI 抄 |
|
||||
| 向後相容讓 claude_api 變兩條路 | 內部統一用 recipe path,舊 prompt 參數 → 自動轉成 inline recipe |
|
||||
| recipe 引用 KBDB block id 寫死,block 改 id 就壞 | KBDB block 用 `page_name` 識別比 id 穩定,recipe 支援 `block_page_name` 欄位 |
|
||||
| KV 寫入頻繁的 transform 邏輯(json_array, extract_field:x)→ 變 mini DSL | 限制 transform 種類(10 個內),列白名單,超過就請寫零件 |
|
||||
|
||||
---
|
||||
|
||||
## 6. 變更紀錄
|
||||
|
||||
| 版本 | 日期 | 內容 |
|
||||
|---|---|---|
|
||||
| v1.0 | 2026-05-07 | 初版。吃狗糧寫 wiki 合成 workflow 撞到「prompt 組裝缺口」,補 prompt_recipe 層平行於既有 auth_recipe / api_recipe。 |
|
||||
| v1.1 | 2026-05-07 | 架構選擇 B:recipe 解析在 cypher-executor TS(不改 claude_api WASM)。減少改動面、可單元測試、跟既有 api_recipe 同層次。 |
|
||||
@@ -0,0 +1,110 @@
|
||||
# Tasks — Recipe System (容器 + Recipe 模式)
|
||||
|
||||
> 對應 SDD:[design.md](design.md)
|
||||
> 上次更新:2026-05-07
|
||||
|
||||
**狀態 legend**:`[ ]` 待辦 / `[🔄]` 進行中 / `[x]` 完成
|
||||
|
||||
---
|
||||
|
||||
## Phase 1:prompt_recipe Schema + KV 規範
|
||||
|
||||
- [x] 1.1 寫 `cypher-executor/src/lib/prompt-recipe-schema.ts`(85 行 Zod schema:fragments / inputs / prompt_assembly / output + transform 白名單 7 個)
|
||||
- [x] 1.2 確認 cypher-executor wrangler.toml 已有 RECIPES KV binding
|
||||
- [x] 1.3 寫 recipe loader (`recipe-loader.ts` 50 行) + transforms (`recipe-transforms.ts` 58 行) + expander (`recipe-expander.ts` 127 行)
|
||||
- transform 7 個:json_array / to_string / join / markdown_list / extract_field / first / pluck_content
|
||||
- expander:fragments(KBDB) + inputs(context+transform) → 套 {{var}} 模板 → {prompt, model, output_*}
|
||||
- type-check 全通過
|
||||
|
||||
## Phase 2:cypher-executor recipe expander(架構選擇 B,不改 claude_api)
|
||||
|
||||
- [x] 2.1 寫 `recipe-expander.ts`(127 行:load → fragments → inputs+transform → 套模板 → 回傳 prompt+model+output_*)
|
||||
- [x] 2.2 寫 `recipe-transforms.ts`(58 行:7 個 transform)
|
||||
- [x] 2.3 改 `graph-executor.ts` Component case:偵測 `node.data.recipe` → 呼叫 expandPromptRecipe → merge 進 mergedContext
|
||||
- [x] 2.4 output parser hook:執行完若 `_recipe_output_format === 'json'` 自動 parse + required_fields 驗證
|
||||
- [x] 2.5 部署 cypher-executor v426b099e
|
||||
- [x] 2.6 端對端驗證:用 curl 打 `/cypher/execute` 帶 recipe,trace 顯示 recipe 展開正確 + claude_api 拿到組好 prompt(Mira daemon 端 522 timeout 是 daemon 問題,不是 recipe 系統)
|
||||
- [x] 2.7 [紅利修復] cypher-executor `WASM_HTTP_RUNNER_IDS` 加 5 個 mira 零件(claude_api / kbdb_*)— 短期解,根本修法見 KI-13
|
||||
|
||||
## Phase 3:第一個 recipe — wiki_synthesis
|
||||
|
||||
- [x] 3.1 寫 `polaris/mira/recipes/wiki_synthesis.json`(4 fragments + 4 inputs + system/user template + json output)
|
||||
- [x] 3.2 用 `wrangler kv key put --remote` 推進 RECIPES KV (key: `prompt_recipe:wiki_synthesis`)
|
||||
- [x] 3.3 確認 KV 寫入成功(wrangler kv get 驗證)
|
||||
- [ ] 3.4 不適用(架構選擇 B 不改 claude_api,recipe 在 cypher-executor 解析)
|
||||
- [x] 3.5 端對端測試:用 MCP `u6u_execute_workflow` 跑 wiki_synthesis 成功
|
||||
- input:1 句草稿(黃仁勳 GTC 2026 物理 AI)
|
||||
- output:3 triplets + 3 entities + 1 wiki paragraph + source_summary
|
||||
- 過程修了 KI-14 (service binding 指錯)、KI-15 (token 沒轉發)、KI-16 (Claude markdown fence 沒剝)
|
||||
|
||||
## Phase 4:mira 7-B 用 recipe 完成 wiki workflow
|
||||
|
||||
- [🔄] 4.1 寫 `polaris/mira/workflows/wiki_synthesis.yaml`(cypher binding YAML)
|
||||
- 用 `recipe: prompt_recipe:wiki_synthesis` 指 recipe
|
||||
- 4-5 個節點:read_stale → foreach → read_drafts → synthesize → write_wiki + log
|
||||
- [ ] 4.2 用 MCP `u6u_execute_workflow` sandbox 跑(試一個 entity 不真寫 KBDB)
|
||||
- [ ] 4.3 用 MCP `u6u_deploy_workflow` 部署到 cypher-executor
|
||||
- [ ] 4.4 手動觸發 cron,驗 wiki page 真的出現
|
||||
- [ ] 4.5 在 mira/wiki/ 前端看到第一張 AI 合成 wiki page
|
||||
|
||||
## Phase 5:MCP recipe 管理 tools
|
||||
|
||||
- [ ] 5.1 加 MCP tool `arcrun_list_recipes(prefix?)`:列所有 prompt_recipe
|
||||
- [ ] 5.2 加 MCP tool `arcrun_get_recipe(name)`:取單一 recipe 內容
|
||||
- [ ] 5.3 加 MCP tool `arcrun_push_recipe(name, yaml_content)`:upsert recipe
|
||||
- [ ] 5.4 加 MCP tool `arcrun_delete_recipe(name)`
|
||||
- [ ] 5.5 既有 auth_recipe / api_recipe 也通用同套 tool(不只 prompt_recipe)
|
||||
|
||||
---
|
||||
|
||||
## 風險追蹤
|
||||
|
||||
- 風險 1:claude_api 改造跟 mira-app 同時動,可能影響河道 AI 副駕
|
||||
- 緩解:向後相容,舊 input 仍可用,mira 河道先不切 recipe
|
||||
- 風險 2:recipe transform 白名單漏了某種需求
|
||||
- 緩解:發現缺什麼再加,第一版優先支援 wiki 用到的(json_array, extract_field, join)
|
||||
- 風險 3:KV 跟 KBDB 都存配置,AI 困惑「該存哪邊」
|
||||
- 緩解:清楚分層 — recipe(容器組合方式) KV,data(schema 文字、skill 模板) KBDB
|
||||
|
||||
---
|
||||
|
||||
## Known Issues(吃狗糧發現,記錄)
|
||||
|
||||
### KI-11:MCP `u6u_execute_workflow` 不暴露 config 欄位 ✅ 修復(2026-05-07)
|
||||
- 已修:tool schema 加 optional `config: Record<string, Record<string, any>>`
|
||||
- 部署:u6u-mcp v11d7e366
|
||||
- 用戶要重啟 client session 才能看到新 schema
|
||||
|
||||
### KI-12:MCP execute 路由打 `/execute` 而非 `/cypher/execute` ✅ 修復(2026-05-07)
|
||||
- 已修:service binding fetch URL 改成 `http://cypher-executor/cypher/execute`
|
||||
- 部署:u6u-mcp v11d7e366
|
||||
|
||||
### KI-14:u6u-mcp service binding 指向已廢棄的 inkstone-cypher-executor ✅ 修復
|
||||
- 現象:MCP 路徑跑 workflow trace 顯示 synth 變 Output、config 被忽略
|
||||
- 根因:`u6u-mcp/wrangler.toml` services binding 是舊 worker `inkstone-cypher-executor`,不是現役 `arcrun-cypher-executor`
|
||||
- 解法:改 service name + redeploy
|
||||
|
||||
### KI-15:u6u-mcp 沒把 partner token 轉發給 cypher-executor ✅ 修復
|
||||
- 現象:recipe expander 抓 KBDB block 401(沒 auth)
|
||||
- 根因:partnerAuthMiddleware 驗完 token 但只 set org_namespace,沒留 token;execute_workflow tool fetch 沒帶 X-Arcrun-API-Key
|
||||
- 解法:middleware 也 set partner_token、handleMcpRequest + registerAllTools + execute_workflow 多一個 partnerToken 參數、fetch header 加 X-Arcrun-API-Key
|
||||
|
||||
### KI-16:Recipe JSON output 被 Claude 包在 ```json``` markdown fence ✅ 修復
|
||||
- 現象:JSON.parse 失敗 "Unexpected token \`"
|
||||
- 根因:Claude 預設輸出 ```json\n{...}\n``` 包裝
|
||||
- 解法:cypher-executor 解析前 regex 剝 fence
|
||||
|
||||
### KI-13:cypher-executor `WASM_HTTP_RUNNER_IDS` 寫死白名單
|
||||
- 現象:每加新零件要回 cypher-executor 改白名單 + 重部署
|
||||
- 影響:違反 arcrun「容器+ recipe,新零件無需改 platform」承諾
|
||||
- 短期解:手動加進白名單(claude_api / kbdb_* 已加)
|
||||
- 根本解:改成從 component-registry KV 動態查 canonical_id
|
||||
- 優先級:P1(推廣破口),需新 SDD `cypher-executor-dynamic-component-discovery`
|
||||
|
||||
---
|
||||
|
||||
## 對外推廣(Phase 6+,本 SDD 不執行,記錄)
|
||||
|
||||
- README 示範「容器 + recipe = 一個 service」(Gmail / Slack / Claude)
|
||||
- onboarding kit GitHub template 內含 5 個經典 recipe 當範例
|
||||
- 「recipe market」想法:用戶分享 recipe 幫他人少寫 prompt
|
||||
@@ -0,0 +1,285 @@
|
||||
# SDD: Resumable Workflow(webhook callback 喚醒)
|
||||
|
||||
> 2026-05-07 建立。狗糧寫 wiki 合成 workflow 時,Mira daemon 對長草稿(>2KB)切非同步模式回 `{pending, task_id, poll_url}`,cypher-executor 沒處理就直接傳下游。
|
||||
> 本 SDD 解這層:**workflow 跑到一半遇到 pending 任務 → 暫停 + 持久化狀態 → 外部 callback 進來時喚醒繼續**。
|
||||
> 範圍:兩家自家服務之間(Mira daemon ↔ cypher-executor)走 webhook 推。對外服務無 webhook 的場景留 wishlist 用 poll 解。
|
||||
|
||||
---
|
||||
|
||||
## 1. 問題
|
||||
|
||||
### 1.1 撞牆現場
|
||||
|
||||
wiki 合成 workflow 第一節點 `claude_api(recipe:wiki_synthesis)`:
|
||||
- 短草稿(< 2KB)→ daemon 同步回 `{success, data: {text}}`,recipe output parser 解 JSON 成功
|
||||
- 長草稿(> 2KB)→ daemon 估 75s,切非同步模式回:
|
||||
|
||||
```json
|
||||
{
|
||||
"success": true,
|
||||
"pending": true,
|
||||
"task_id": "task_14_1778133152480",
|
||||
"poll_url": "https://mira.uncle6.me/mira/execute/task_14_1778133152480",
|
||||
"estimated_seconds": 75
|
||||
}
|
||||
```
|
||||
|
||||
cypher-executor 拿到這個物件就當 result,但裡面沒 `data.text`,下游 recipe output parser 找不到要 parse 的東西,整個 workflow 算「success」但實際上 wiki 還沒生出來。
|
||||
|
||||
### 1.2 現有 toolkit 不夠
|
||||
|
||||
- `wait` 零件:固定 sleep N ms,沒 retry / 條件判斷
|
||||
- `http_request` 零件:通用 HTTP,不認 daemon 的 polling 協議
|
||||
- cypher-executor `visited` Set:擋住節點重訪,沒辦法做迴圈式 poll
|
||||
- Worker CPU 30s 限制:同步 poll 75s 任務不可能
|
||||
|
||||
### 1.3 Push vs Pull 抉擇(2026-05-07 拍板)
|
||||
|
||||
| | Webhook 推 | Poll 拉 |
|
||||
|---|---|---|
|
||||
| 適用 | 雙方都自家 | 對方無 callback 能力 |
|
||||
| Worker 時間消耗 | 趨近 0 | 全程占用 |
|
||||
| 時長限制 | 無 | Worker CPU 30s |
|
||||
| 工程位置 | runtime 能力(cypher-executor)| 零件(poll_task) |
|
||||
|
||||
**走 Webhook 推**(自家服務優先,poll_task 進 wishlist)。
|
||||
|
||||
---
|
||||
|
||||
## 2. 設計
|
||||
|
||||
### 2.1 三層改動
|
||||
|
||||
**A. Mira daemon 端(infra/cloud-cto)**
|
||||
- `/mira/execute` 接受新欄位 `callback_url: string`(optional)
|
||||
- task 完成時 POST 到 `callback_url`,body:
|
||||
```json
|
||||
{
|
||||
"task_id": "task_14_xxx",
|
||||
"success": true,
|
||||
"data": { "text": "..." }
|
||||
}
|
||||
```
|
||||
- 失敗也要 callback,body 含 `error` 欄位
|
||||
- 重試策略:3 次 backoff(1s / 5s / 30s),最後失敗就放棄(task 狀態存進 daemon 自己 KV)
|
||||
|
||||
**B. cypher-executor 端(resumable runtime)**
|
||||
|
||||
新概念:**workflow run 可以暫停**。
|
||||
|
||||
設計:
|
||||
1. 新 KV namespace(或用既有 `EXEC_CONTEXT`)存暫停的 run state:
|
||||
- key: `paused_run:{task_id}` 或 `paused_run:{run_id}`
|
||||
- value: `{ run_id, graph, paused_node_id, paused_node_pending_result, context, trace_so_far, kv_store_ref, expires_at }`
|
||||
2. graph-executor 偵測節點 result 含 `pending: true` + `task_id` → 暫停 + 寫 KV + 回 `{paused: true, task_id, run_id}`
|
||||
3. 新 endpoint `POST /workflows/resume`:
|
||||
- body: `{ task_id, result }`(result 是 daemon callback 給的完整資料)
|
||||
- 從 KV 拿 paused state → merge result 進 paused_node 的 output → 從下個節點繼續執行
|
||||
4. claude_api 容器呼叫 daemon 時自動帶 `callback_url`:
|
||||
- `https://cypher.arcrun.dev/workflows/resume?task_id={預先派發的 task_id}`
|
||||
- 但 task_id 是 daemon 自己派的,cypher-executor 不知道。需先 daemon 派完 task_id 才能組 URL
|
||||
- 解:daemon 改成「先回 task_id,再啟動實際工作 + 完成時 callback」— 兩階段 hand-shake
|
||||
|
||||
實際流程(兩階段):
|
||||
|
||||
```
|
||||
cypher-executor Mira daemon
|
||||
│ │
|
||||
│ POST /mira/execute │
|
||||
│ { prompt, │
|
||||
│ callback_url: "?run_id=R1" }
|
||||
├─────────────────────────────>│
|
||||
│ │ 立即回 task_id(決定走非同步)
|
||||
│<─────────────────────────────┤ { pending, task_id: T9 }
|
||||
│ │
|
||||
├─ 看到 pending → 寫 KV │ 啟動實際 LLM 任務
|
||||
│ paused_run:T9 = {run R1, │
|
||||
│ paused_node, ctx, ...} │
|
||||
│ │
|
||||
│ 立即回 client (MCP): │
|
||||
│ { paused, task_id: T9 } │
|
||||
│ │
|
||||
⋯⋯⋯⋯⋯ 75s 後 ⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯
|
||||
│ │ task done
|
||||
│ POST /workflows/resume │
|
||||
│ { task_id: T9, result: {...} }
|
||||
│<─────────────────────────────┤
|
||||
│ │
|
||||
│ 從 KV 取 paused_run:T9 │
|
||||
│ → merge result 進 paused 節點 │
|
||||
│ → 從下個節點繼續 │
|
||||
│ │
|
||||
│ run 跑完 → 寫 trace │
|
||||
│ → 通知 client (?) │
|
||||
│ │
|
||||
```
|
||||
|
||||
### 2.2 範圍邊界
|
||||
|
||||
**第一版(v1)做:**
|
||||
- ✅ 單節點 pending → resume(最常見:claude_api 拿到 daemon pending)
|
||||
- ✅ daemon 加 callback_url 支援
|
||||
- ✅ cypher-executor `/workflows/resume` endpoint
|
||||
- ✅ run state 寫 EXEC_CONTEXT KV,含 24h TTL(避免 KV 累積)
|
||||
- ✅ 整合測:用 wiki 合成跑長草稿,驗 callback 進來能繼續
|
||||
|
||||
**第一版不做:**
|
||||
- ❌ 多節點都 pending 的 nested 場景(例如 claude_api → 又一個 claude_api)— v2
|
||||
- ❌ foreach 內 pending(item-level resume)— v2
|
||||
- ❌ pending 期間用戶看到「進度」的前端 UI — 走 trace 有 paused 標記,前端 polling 自己做即可
|
||||
- ❌ pending callback 失敗時的 retry / DLQ — v2,先記 log
|
||||
|
||||
**前置依賴:**
|
||||
- ✅ recipe-system 已部署(cypher-executor 已會解 recipe)
|
||||
- ✅ Mira daemon 在 Hetzner,可改 code
|
||||
|
||||
### 2.3 為什麼不用 Cloudflare Queues / Durable Objects
|
||||
|
||||
- **CF Queues**:適合大量 fan-out,這裡是點對點 callback,KV 已夠
|
||||
- **Durable Objects**:long-lived state 比 KV 強,但成本高 + 複雜
|
||||
- **EXEC_CONTEXT KV**:既有 binding,工程量最小
|
||||
|
||||
未來真撞到 KV 限制(每 partner 寫入頻率上限)再升級。
|
||||
|
||||
---
|
||||
|
||||
## 3. 詳細設計
|
||||
|
||||
### 3.1 daemon 端 callback 機制
|
||||
|
||||
`infra/cloud-cto/index.js`(Mira daemon):
|
||||
|
||||
```js
|
||||
// /mira/execute handler
|
||||
{
|
||||
// 既有 input + 新加:
|
||||
callback_url: string // optional
|
||||
}
|
||||
|
||||
// 處理邏輯:
|
||||
// 1. 啟動 task(既有邏輯)
|
||||
// 2. 預估時間 > 30s → 切非同步:
|
||||
// - 立即回 { success: true, pending: true, task_id, poll_url, estimated_seconds }
|
||||
// - 背景 task 完成時:
|
||||
// if (callback_url) POST callback_url with { task_id, success, data, error? }
|
||||
// (不論用戶有沒有 poll,callback 一定會送)
|
||||
```
|
||||
|
||||
callback 失敗策略:
|
||||
- 3 次重試(1s / 5s / 30s)
|
||||
- 全失敗:task 狀態維持完成,等 client 主動 poll(poll_url 仍有效)
|
||||
- 超過 24h 沒被消化的 task:daemon GC
|
||||
|
||||
### 3.2 cypher-executor 端 resumable runtime
|
||||
|
||||
#### 3.2.1 偵測 pending(graph-executor)
|
||||
|
||||
在 Component case,runner 回傳後:
|
||||
|
||||
```ts
|
||||
result = await runner(mergedContext);
|
||||
|
||||
// 偵測 pending pattern(daemon 約定的回應結構)
|
||||
if (isResumablePending(result)) {
|
||||
await persistPausedRun(this.env.EXEC_CONTEXT, taskIdFromResult(result), {
|
||||
run_id, graph, paused_node_id: node.id, paused_context: context,
|
||||
paused_result: result, trace_so_far: trace, expires_at: Date.now() + 24*60*60*1000
|
||||
});
|
||||
// 提早結束此 run,回 paused 狀態
|
||||
return { paused: true, task_id, run_id };
|
||||
}
|
||||
|
||||
// ... 既有的 recipe output parsing / kvSetNodeOutput / 等
|
||||
```
|
||||
|
||||
`isResumablePending(result)` = `result?.pending === true && typeof result?.task_id === 'string'`
|
||||
|
||||
#### 3.2.2 callback URL 注入(claude_api 之前的 layer)
|
||||
|
||||
問題:claude_api 容器發 daemon 請求時,要帶 `callback_url`。但 task_id 是 daemon 派的,URL 裡只能放 run_id,daemon 收到 callback 時填 task_id:
|
||||
|
||||
`callback_url = https://cypher.arcrun.dev/workflows/resume?run_id={current_run_id}`
|
||||
|
||||
但 cypher-executor 端用 task_id 找 paused state(一個 run 可能多個 pending),所以 callback URL 應該是:
|
||||
|
||||
`callback_url = https://cypher.arcrun.dev/workflows/resume`(不帶 query,task_id 在 body)
|
||||
|
||||
**實作位置**:在 graph-executor 呼叫 claude_api 前,自動注入 `callback_url` 到 mergedContext:
|
||||
|
||||
```ts
|
||||
if (node.componentId === 'claude_api' && this.env?.PUBLIC_BASE_URL) {
|
||||
mergedContext.callback_url = `${this.env.PUBLIC_BASE_URL}/workflows/resume`;
|
||||
}
|
||||
```
|
||||
|
||||
> 暫先用「componentId 寫死匹配」是 hacky,未來 component contract 加 `supports_async_callback: true` 標記就 generic 了。
|
||||
|
||||
#### 3.2.3 resume endpoint
|
||||
|
||||
`POST /workflows/resume`:
|
||||
|
||||
```ts
|
||||
{
|
||||
task_id: string, // daemon 給的
|
||||
success: boolean,
|
||||
data?: { text: string }, // 跟同步呼叫一樣的結構
|
||||
error?: string
|
||||
}
|
||||
```
|
||||
|
||||
處理:
|
||||
1. 從 EXEC_CONTEXT KV `paused_run:{task_id}` 拿 state
|
||||
2. 沒拿到(過期 / 重複 callback)→ 回 200 + log
|
||||
3. 把 callback 給的 result 當作 paused_node 的 output
|
||||
4. 重建 GraphExecutor,從下個節點繼續執行
|
||||
5. 跑完寫完整 trace
|
||||
|
||||
**問題:resume 後沒辦法再回給原 client。** 用戶最初打 `/cypher/execute`(同步),拿到 `{paused, task_id}` 之後就斷了;resume 跑完 result 沒地方送。
|
||||
|
||||
**v1 解法**:resume 完寫進 `analytics_kv` 或 D1,**用戶要主動 query**。簡單但 UX 差。
|
||||
**v2 想法**:resume 完發另一個 webhook 給原 client(client 在 trigger 時帶 final_callback_url)。
|
||||
|
||||
---
|
||||
|
||||
## 4. 範圍
|
||||
|
||||
**在本 SDD 範圍內:**
|
||||
- 4.1 daemon `/mira/execute` 加 callback_url 支援
|
||||
- 4.2 cypher-executor 偵測 pending + 持久化 paused state
|
||||
- 4.3 cypher-executor `/workflows/resume` endpoint
|
||||
- 4.4 callback_url 自動注入(claude_api 場景)
|
||||
- 4.5 wiki 合成 workflow 用長草稿端對端測試
|
||||
|
||||
**不在本 SDD 範圍:**
|
||||
- nested pending(v2)
|
||||
- foreach 內 pending(v2)
|
||||
- final_callback 給原 client(v2)
|
||||
- poll_task 零件(wishlist)
|
||||
|
||||
---
|
||||
|
||||
## 5. 驗收標準
|
||||
|
||||
1. wiki 合成 workflow 餵 5KB+ 草稿,跑完後 wiki page 有寫進 KBDB(不再 trace `pending` 假成功)
|
||||
2. trace 有 `paused` 紀錄,能看到 task_id
|
||||
3. 從 daemon 觸發 callback 後 < 5s 內 cypher-executor 把 paused state 撿起來繼續
|
||||
4. 24h 沒 callback 的 paused state KV 自動 expire(看 KV TTL 列表)
|
||||
|
||||
---
|
||||
|
||||
## 6. 風險
|
||||
|
||||
| 風險 | 緩解 |
|
||||
|---|---|
|
||||
| daemon callback 進來時 cypher-executor 重啟 → state 還在 KV,OK | KV 持久化 |
|
||||
| 同 task_id 重複 callback(網路重試)→ 重複執行下游 | resume endpoint idempotent:拿到 state 後立刻刪 KV,重複 callback 找不到 state |
|
||||
| daemon callback 失敗(網路)| daemon 端 3 retry + 24h GC,超過就需手動干預(v1 接受) |
|
||||
| paused state 含敏感資料(partner key)| KV 有 24h TTL;不寫 plaintext secrets(既有 credential injection 在執行前才解,paused state 存的是執行前的 context,secret 還沒解)|
|
||||
|
||||
---
|
||||
|
||||
## 7. 變更紀錄
|
||||
|
||||
| 版本 | 日期 | 內容 |
|
||||
|---|---|---|
|
||||
| v1.0 | 2026-05-07 | 初版。狗糧 wiki 合成撞 daemon 非同步 → 補 resumable workflow runtime。第一版只做單節點 pending + claude_api callback 注入。|
|
||||
@@ -0,0 +1,61 @@
|
||||
# Tasks — Resumable Workflow
|
||||
|
||||
> 對應 SDD:[design.md](design.md)
|
||||
> 上次更新:2026-05-07
|
||||
|
||||
**狀態 legend**:`[ ]` 待辦 / `[🔄]` 進行中 / `[x]` 完成
|
||||
|
||||
---
|
||||
|
||||
## Phase 1:Mira daemon 端 callback 支援
|
||||
|
||||
- [x] 1.1 改 `/opt/mira/mira-daemon.js`(Hetzner mira container)`/execute` 接受 `params.callback_url`
|
||||
- [x] 1.2 fireCallback function:task done/failed 時 POST callback_url,body = `{task_id, success, data?, error?}`
|
||||
- [x] 1.3 callback retry:4 次(立即 + 1s/5s/30s backoff),全失敗 log
|
||||
- [x] 1.4 patch script 寫好 `/tmp/patch-mira-daemon.py`,docker cp 進 container(注意:rebuild image 會丟失,需重 patch 或正式 commit 進 Dockerfile/git repo)
|
||||
- [x] 1.5 真實端對端驗證:daemon log 顯示 `[Mira callback] task=task_2_... POST https://cypher.arcrun.dev/workflows/resume OK 200`(2026-05-07 07:24:04 + task_3 短測試)
|
||||
|
||||
## Phase 2:cypher-executor resumable runtime
|
||||
|
||||
- [x] 2.1 寫 `paused-runs.ts`(81 行):persistPausedRun / loadPausedRun / consumePausedRun + isResumablePending 偵測器,24h TTL
|
||||
- [x] 2.2 改 `graph-executor.ts` Component case:偵測 pending → 寫 KV + throw WorkflowPaused
|
||||
- [x] 2.3 改 `cypher-handlers.ts`:catch WorkflowPaused → 回 `{success:true, paused:true, task_id, run_id, paused_node_id, trace, graph}`
|
||||
- [x] 2.4 callback_url 自動注入:componentId==='claude_api' 時 mergedContext.callback_url = PUBLIC_BASE_URL 或預設 cypher.arcrun.dev/workflows/resume
|
||||
|
||||
## Phase 3:resume endpoint
|
||||
|
||||
- [x] 3.1 寫 `routes/resume.ts`:POST /workflows/resume,consumePausedRun → resumeFromPaused
|
||||
- [x] 3.2 graph-executor 加 `resumeFromPaused()` 方法:把 callback_result 當 paused_node 輸出 + spread 進 ctx + 從下游節點繼續
|
||||
- [x] 3.3 idempotent 驗證:第二次 callback 回 `{noop:true, reason:"state 不存在或過期"}`
|
||||
- [x] 3.4 cypher-executor 部署 v0580980b
|
||||
- [x] 3.5 mount /workflows/resume 進 index.ts
|
||||
|
||||
## Phase 4:claude_api 容器透傳 callback_url
|
||||
|
||||
- [x] 4.1 改 `claude_api/main.go`:Input 加 CallbackURL;timeout 預設改 120s
|
||||
- [x] 4.2 重 build wasm + redeploy claude-api.arcrun.dev (v f926e3dd)
|
||||
- [x] 4.3 真實端對端驗證:daemon 收到 callback_url → task done 後 POST cypher-executor/workflows/resume → 200 OK
|
||||
|
||||
## Phase 5:端對端整合測試
|
||||
|
||||
- [ ] 5.1 用 MCP `u6u_execute_workflow` 跑 wiki 合成 + 5KB+ 草稿
|
||||
- [ ] 5.2 第一次回應應為 `{paused, task_id, run_id}`
|
||||
- [ ] 5.3 等 daemon callback 進來(log 看到 /workflows/resume 命中)
|
||||
- [ ] 5.4 觀察 wiki page 真的寫進 KBDB(即使原 MCP call 已斷線)
|
||||
- [ ] 5.5 trace 含完整節點紀錄(paused → resumed)
|
||||
|
||||
---
|
||||
|
||||
## 風險追蹤
|
||||
|
||||
- 風險 1:daemon callback 進來時,cypher.arcrun.dev 還沒醒(CF Worker cold start)→ 第一次 retry 接住(daemon retry policy 涵蓋)
|
||||
- 風險 2:v1 沒 final_callback 給原 client → 用戶要主動查狀態
|
||||
- 接受:mira 河道 UI 可定期 refetch wiki page,或用既有 KBDB 觸發機制
|
||||
- v2 加 final_callback 統一處理
|
||||
|
||||
## v2 已記錄
|
||||
|
||||
- nested pending(一個 run 多個 paused 節點)
|
||||
- foreach 內 pending(item-level resume)
|
||||
- final_callback 給原 client(trigger 時帶 final_callback_url)
|
||||
- poll_task 零件(外部 API 沒 webhook 時用)
|
||||
Reference in New Issue
Block a user