Files
Arcrun/.agents/specs/resumable-workflow/tasks.md
T
Leo 497f92a268 feat(arcrun): recipe system + resumable workflow + component registry canon
Three new platform capabilities + one component (kbdb_get) to enable
real AI workflow execution through cypher binding YAML.

## Recipe System (容器 + Recipe 模式)
SDD: .agents/specs/recipe-system/

- prompt_recipe schema (Zod): fragments + inputs + assembly + output
- recipe-expander.ts: expand recipe ref → real prompt by fetching KBDB blocks
  + pulling context fields with transforms (pluck_content / extract_field / etc)
- 7 transform whitelist: json_array / to_string / join / markdown_list /
  extract_field / first / pluck_content
- graph-executor hooks: detect node.data.recipe → expand → inject into ctx
- output JSON parser (with markdown fence stripping for Claude-wrapped JSON)
- Stored in RECIPES KV under prompt_recipe:{name}

## Resumable Workflow (webhook callback resume)
SDD: .agents/specs/resumable-workflow/

- WorkflowPaused class + paused-runs.ts (persist/load/consume in EXEC_CONTEXT KV, 24h TTL)
- graph-executor: detect {pending:true, task_id} → persist state → throw WorkflowPaused
- cypher-handlers: catch → return {success:true, paused:true, task_id, run_id}
- POST /workflows/resume route: consume KV state → resumeFromPaused()
- Auto-inject callback_url for claude_api nodes (PUBLIC_BASE_URL or default cypher.arcrun.dev)
- claude_api/main.go: forward callback_url to Mira daemon, default timeout 25s→120s
- Idempotent (consume = load+delete)

## Component Registry Canon
SDD: .agents/specs/component-registry-canon/

- Add POST /components/index-only endpoint (metadata-only, no wasm/sandbox)
- Backfill script (mjs): scan registry/components/*/contract.yaml → submit to KV
- register-component.sh: SSOT for local + CI hook (deploy.yml change in next commit)
- Drop R2 dead storage from submitComponent + types + wrangler
- Schema relaxed: category enum + auth/ai/platform; cold_start 50→500ms; size 2→8MB

## kbdb_get component
- registry/components/kbdb_get/: TinyGo WASM, two modes (block_id / page_name list)
- .component-builds/kbdb_get/: WASI shim worker (kbdb-get.arcrun.dev)

End-to-end validation: AI uses MCP execute_workflow with recipe ref →
cypher-executor expands prompt from KBDB schema/skill blocks + drafts →
claude_api calls Mira daemon → daemon callback fires resume route →
workflow continues. Verified with real 2KB+ Karpathy LLM Wiki draft.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-07 15:52:19 +08:00

62 lines
3.2 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Tasks — Resumable Workflow
> 對應 SDD[design.md](design.md)
> 上次更新:2026-05-07
**狀態 legend**`[ ]` 待辦 / `[🔄]` 進行中 / `[x]` 完成
---
## Phase 1Mira daemon 端 callback 支援
- [x] 1.1 改 `/opt/mira/mira-daemon.js`Hetzner mira container`/execute` 接受 `params.callback_url`
- [x] 1.2 fireCallback functiontask done/failed 時 POST callback_urlbody = `{task_id, success, data?, error?}`
- [x] 1.3 callback retry4 次(立即 + 1s/5s/30s backoff),全失敗 log
- [x] 1.4 patch script 寫好 `/tmp/patch-mira-daemon.py`docker cp 進 container(注意:rebuild image 會丟失,需重 patch 或正式 commit 進 Dockerfile/git repo
- [x] 1.5 真實端對端驗證:daemon log 顯示 `[Mira callback] task=task_2_... POST https://cypher.arcrun.dev/workflows/resume OK 200`2026-05-07 07:24:04 + task_3 短測試)
## Phase 2cypher-executor resumable runtime
- [x] 2.1 寫 `paused-runs.ts`81 行):persistPausedRun / loadPausedRun / consumePausedRun + isResumablePending 偵測器,24h TTL
- [x] 2.2 改 `graph-executor.ts` Component case:偵測 pending → 寫 KV + throw WorkflowPaused
- [x] 2.3 改 `cypher-handlers.ts`catch WorkflowPaused → 回 `{success:true, paused:true, task_id, run_id, paused_node_id, trace, graph}`
- [x] 2.4 callback_url 自動注入:componentId==='claude_api' 時 mergedContext.callback_url = PUBLIC_BASE_URL 或預設 cypher.arcrun.dev/workflows/resume
## Phase 3resume endpoint
- [x] 3.1 寫 `routes/resume.ts`POST /workflows/resumeconsumePausedRun → resumeFromPaused
- [x] 3.2 graph-executor 加 `resumeFromPaused()` 方法:把 callback_result 當 paused_node 輸出 + spread 進 ctx + 從下游節點繼續
- [x] 3.3 idempotent 驗證:第二次 callback 回 `{noop:true, reason:"state 不存在或過期"}`
- [x] 3.4 cypher-executor 部署 v0580980b
- [x] 3.5 mount /workflows/resume 進 index.ts
## Phase 4claude_api 容器透傳 callback_url
- [x] 4.1 改 `claude_api/main.go`Input 加 CallbackURLtimeout 預設改 120s
- [x] 4.2 重 build wasm + redeploy claude-api.arcrun.dev (v f926e3dd)
- [x] 4.3 真實端對端驗證:daemon 收到 callback_url → task done 後 POST cypher-executor/workflows/resume → 200 OK
## Phase 5:端對端整合測試
- [ ] 5.1 用 MCP `u6u_execute_workflow` 跑 wiki 合成 + 5KB+ 草稿
- [ ] 5.2 第一次回應應為 `{paused, task_id, run_id}`
- [ ] 5.3 等 daemon callback 進來(log 看到 /workflows/resume 命中)
- [ ] 5.4 觀察 wiki page 真的寫進 KBDB(即使原 MCP call 已斷線)
- [ ] 5.5 trace 含完整節點紀錄(paused → resumed
---
## 風險追蹤
- 風險 1daemon callback 進來時,cypher.arcrun.dev 還沒醒(CF Worker cold start)→ 第一次 retry 接住(daemon retry policy 涵蓋)
- 風險 2v1 沒 final_callback 給原 client → 用戶要主動查狀態
- 接受:mira 河道 UI 可定期 refetch wiki page,或用既有 KBDB 觸發機制
- v2 加 final_callback 統一處理
## v2 已記錄
- nested pending(一個 run 多個 paused 節點)
- foreach 內 pendingitem-level resume
- final_callback 給原 clienttrigger 時帶 final_callback_url
- poll_task 零件(外部 API 沒 webhook 時用)