arcrun — AI workflow execution engine (clean history)

Self-hosted 開源：WASM 零件 + recipe + cypher-executor，跑在你自己的 Cloudflare。此為重建的乾淨歷史起點（移除曾誤 commit 的 GCP SA 金鑰，舊歷史保留在 richblack/arcrun 與本地 backup 分支）。含： - acr init --self-hosted installer（建 KV/R2 + codeload 拉預編譯 wasm + wrangler deploy + seed recipe） - recipe push 把關（資料外流提醒 + 打通檢查） - 19 個正當零件預編譯 wasm（claude_api/km_writer/kbdb_upsert_block 排除：違反 DECISIONS §1） - CLI / cypher-executor / registry / 完整 SDD Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-03 15:52:38 +08:00
commit 922a57fe34
485 changed files with 89356 additions and 0 deletions
@@ -0,0 +1,38 @@
+# error-retry
+
+## 解決什麼問題
+外部 API 偶發 500 / timeout 是常態。寫死「打一次就放棄」太脆弱。
+這個 pattern 提供標準 retry chain：失敗 → 等 5 秒 → 重試一次 → 還失敗才通知人。
+
+## 怎麼觸發
+```bash
+curl -X POST https://cypher.arcrun.dev/webhooks/named/error_retry/trigger \
+  -d '{
+    "api_key":"ak_xxx",
+    "target_url":"https://flaky-api.example.com/endpoint",
+    "payload":{"x":1},
+    "workflow_name":"my_workflow"
+  }'
+```
+
+## 改成你自己的
+- `wait_a_bit.seconds` 改延遲（指數 backoff：5, 15, 45 秒）
+- 串更多 retry 節點（generic 寫 3-4 次足夠）
+- `final_fail_notify` 換 email / pagerduty / slack 等
+- 加 `if_control` 判斷 error 類型（4xx 不重試、5xx 重試）
+
+## 為什麼這 pattern 重要
+- arcrun 的 `ON_FAIL` 邊是宣告式 error handling，比寫 try/catch 直觀
+- `wait` 零件不消耗 CPU（cypher-executor 排程 sleep 後恢復），比 setTimeout 健康
+- 失敗最終要通知人，不能默默吞 — 通知本身也是 workflow 的責任
+
+## 變體
+- **Circuit breaker**：3 次連續失敗 → 寫 KBDB `circuit:open` flag → 後續 trigger 直接跳過
+- **Dead letter queue**：失敗的 input 寫 KBDB type=dlq-input，方便事後手動重跑
+- **Idempotency key**：retry 時帶同樣的 request_id，避免下游重複處理
+
+## 學到什麼
+- `ON_FAIL` 邊：節點失敗時走哪條
+- `wait` 零件：宣告式 delay，不阻塞 worker（推到 paused-resume）
+- `{{node_id.error}}` 取得失敗節點的錯誤訊息
+- 把「最終失敗通知」當 workflow 一部分，不靠系統外部 monitoring