feat(arcrun): recipe system + resumable workflow + component registry canon

Three new platform capabilities + one component (kbdb_get) to enable
real AI workflow execution through cypher binding YAML.

## Recipe System (容器 + Recipe 模式)
SDD: .agents/specs/recipe-system/

- prompt_recipe schema (Zod): fragments + inputs + assembly + output
- recipe-expander.ts: expand recipe ref → real prompt by fetching KBDB blocks
  + pulling context fields with transforms (pluck_content / extract_field / etc)
- 7 transform whitelist: json_array / to_string / join / markdown_list /
  extract_field / first / pluck_content
- graph-executor hooks: detect node.data.recipe → expand → inject into ctx
- output JSON parser (with markdown fence stripping for Claude-wrapped JSON)
- Stored in RECIPES KV under prompt_recipe:{name}

## Resumable Workflow (webhook callback resume)
SDD: .agents/specs/resumable-workflow/

- WorkflowPaused class + paused-runs.ts (persist/load/consume in EXEC_CONTEXT KV, 24h TTL)
- graph-executor: detect {pending:true, task_id} → persist state → throw WorkflowPaused
- cypher-handlers: catch → return {success:true, paused:true, task_id, run_id}
- POST /workflows/resume route: consume KV state → resumeFromPaused()
- Auto-inject callback_url for claude_api nodes (PUBLIC_BASE_URL or default cypher.arcrun.dev)
- claude_api/main.go: forward callback_url to Mira daemon, default timeout 25s→120s
- Idempotent (consume = load+delete)

## Component Registry Canon
SDD: .agents/specs/component-registry-canon/

- Add POST /components/index-only endpoint (metadata-only, no wasm/sandbox)
- Backfill script (mjs): scan registry/components/*/contract.yaml → submit to KV
- register-component.sh: SSOT for local + CI hook (deploy.yml change in next commit)
- Drop R2 dead storage from submitComponent + types + wrangler
- Schema relaxed: category enum + auth/ai/platform; cold_start 50→500ms; size 2→8MB

## kbdb_get component
- registry/components/kbdb_get/: TinyGo WASM, two modes (block_id / page_name list)
- .component-builds/kbdb_get/: WASI shim worker (kbdb-get.arcrun.dev)

End-to-end validation: AI uses MCP execute_workflow with recipe ref →
cypher-executor expands prompt from KBDB schema/skill blocks + drafts →
claude_api calls Mira daemon → daemon callback fires resume route →
workflow continues. Verified with real 2KB+ Karpathy LLM Wiki draft.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
2026-05-07 15:52:19 +08:00
parent e2221161a8
commit 497f92a268
32 changed files with 3562 additions and 36 deletions
+61
View File
@@ -0,0 +1,61 @@
# Tasks — Resumable Workflow
> 對應 SDD[design.md](design.md)
> 上次更新:2026-05-07
**狀態 legend**`[ ]` 待辦 / `[🔄]` 進行中 / `[x]` 完成
---
## Phase 1Mira daemon 端 callback 支援
- [x] 1.1 改 `/opt/mira/mira-daemon.js`Hetzner mira container`/execute` 接受 `params.callback_url`
- [x] 1.2 fireCallback functiontask done/failed 時 POST callback_urlbody = `{task_id, success, data?, error?}`
- [x] 1.3 callback retry4 次(立即 + 1s/5s/30s backoff),全失敗 log
- [x] 1.4 patch script 寫好 `/tmp/patch-mira-daemon.py`docker cp 進 container(注意:rebuild image 會丟失,需重 patch 或正式 commit 進 Dockerfile/git repo
- [x] 1.5 真實端對端驗證:daemon log 顯示 `[Mira callback] task=task_2_... POST https://cypher.arcrun.dev/workflows/resume OK 200`2026-05-07 07:24:04 + task_3 短測試)
## Phase 2cypher-executor resumable runtime
- [x] 2.1 寫 `paused-runs.ts`81 行):persistPausedRun / loadPausedRun / consumePausedRun + isResumablePending 偵測器,24h TTL
- [x] 2.2 改 `graph-executor.ts` Component case:偵測 pending → 寫 KV + throw WorkflowPaused
- [x] 2.3 改 `cypher-handlers.ts`catch WorkflowPaused → 回 `{success:true, paused:true, task_id, run_id, paused_node_id, trace, graph}`
- [x] 2.4 callback_url 自動注入:componentId==='claude_api' 時 mergedContext.callback_url = PUBLIC_BASE_URL 或預設 cypher.arcrun.dev/workflows/resume
## Phase 3resume endpoint
- [x] 3.1 寫 `routes/resume.ts`POST /workflows/resumeconsumePausedRun → resumeFromPaused
- [x] 3.2 graph-executor 加 `resumeFromPaused()` 方法:把 callback_result 當 paused_node 輸出 + spread 進 ctx + 從下游節點繼續
- [x] 3.3 idempotent 驗證:第二次 callback 回 `{noop:true, reason:"state 不存在或過期"}`
- [x] 3.4 cypher-executor 部署 v0580980b
- [x] 3.5 mount /workflows/resume 進 index.ts
## Phase 4claude_api 容器透傳 callback_url
- [x] 4.1 改 `claude_api/main.go`Input 加 CallbackURLtimeout 預設改 120s
- [x] 4.2 重 build wasm + redeploy claude-api.arcrun.dev (v f926e3dd)
- [x] 4.3 真實端對端驗證:daemon 收到 callback_url → task done 後 POST cypher-executor/workflows/resume → 200 OK
## Phase 5:端對端整合測試
- [ ] 5.1 用 MCP `u6u_execute_workflow` 跑 wiki 合成 + 5KB+ 草稿
- [ ] 5.2 第一次回應應為 `{paused, task_id, run_id}`
- [ ] 5.3 等 daemon callback 進來(log 看到 /workflows/resume 命中)
- [ ] 5.4 觀察 wiki page 真的寫進 KBDB(即使原 MCP call 已斷線)
- [ ] 5.5 trace 含完整節點紀錄(paused → resumed
---
## 風險追蹤
- 風險 1daemon callback 進來時,cypher.arcrun.dev 還沒醒(CF Worker cold start)→ 第一次 retry 接住(daemon retry policy 涵蓋)
- 風險 2v1 沒 final_callback 給原 client → 用戶要主動查狀態
- 接受:mira 河道 UI 可定期 refetch wiki page,或用既有 KBDB 觸發機制
- v2 加 final_callback 統一處理
## v2 已記錄
- nested pending(一個 run 多個 paused 節點)
- foreach 內 pendingitem-level resume
- final_callback 給原 clienttrigger 時帶 final_callback_url
- poll_task 零件(外部 API 沒 webhook 時用)