docs(registry): seed 10 examples + 5 skills (LI SDD M3.1 + M3.3)

對應 .agents/specs/llm-interface/ Milestone 3.1 + 3.3。 registry/examples/ — 10 個可直接 push 的 workflow 範本： starter: webhook-to-http common: cron-watcher, llm-classify, rag-search-answer, daily-digest external: email-summary (gmail+claude+telegram), pdf-to-blocks, github-issue-bot advanced: parallel-fanout (trigger_workflow fan-out), error-retry (try_catch+wait pattern) 每個含：workflow.yaml（可直接 push）+ description.md（解決什麼問題 / 改成你自己的 / 學到什麼）+ tags.json（搜尋用） registry/skills/ — 5 個 AI playbook（markdown）： build_watcher_workflow — cron + filter + trigger 模式 debug_paused_workflow — claude_api callback paused 怎麼追 migrate_http_to_trigger_workflow — 從 self-fetch 換 trigger_workflow rag_with_arcrun — KBDB + claude_api 組裝 RAG add_new_wasm_component — TinyGo 寫 + 部署全流程兩者差異： examples = 可直接拿來改的 YAML skills = 面對 X 問題該怎麼想 + 該用哪個 example 兩者後續：CI 自動同步進 KBDB（type=workflow-example / type=agent-skill）， MCP arcrun_search_examples / arcrun_list_skills 走 KBDB semantic search。（CI sync 是 M3.4 工作） Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-16 16:33:54 +08:00
parent 989fbeb9ac
commit 388c193ae7
37 changed files with 1324 additions and 0 deletions
@@ -0,0 +1,86 @@
+# Skill: Build Watcher Workflow
+
+## 何時用這個 skill
+
+用戶說：
+- 「每 X 分鐘 / 小時掃 Y → 找到符合條件的處理」
+- 「監聽某資料源，新資料進來自動處理」
+- 「定期巡 X 看有沒有新的」
+
+## 核心 pattern
+
+```
+cron → list (撈候選) → filter (過濾未處理) → 對每個 → trigger 處理 workflow
+```
+
+## 5 步流程
+
+### 1. 確認資料源
+
+問用戶（或從上下文推）：
+- 資料在哪？KBDB / 外部 API / 檔案系統？
+- 用什麼欄位區分「已處理 vs 未處理」？常見：
+  - tag（`tags_json` 有沒有 `"processed"`）
+  - 狀態欄位（`status: pending`）
+  - 缺某 metadata（如沒 `summary`）
+- 不要靠時間判斷 — 因為 cron 漏跑會永久 miss
+
+### 2. 看範例 + 改
+
+`arcrun_search_examples('cron watcher')` → 命中 `cron-watcher` 範例。
+複製 YAML 改三處：
+- `watch_cron.cron_expr` — 改頻率
+- `list_unprocessed` — 改 query
+- `filter_new.condition` — 改你的「未處理」定義
+- `trigger_processor.workflow_name` — 改你的處理 workflow 名
+
+### 3. 處理 workflow 要 idempotent
+
+watcher 可能重跑（cron 漏跑後補跑、手動 trigger 補跑）。處理 workflow 必須：
+- 第一步檢查「我是不是已處理過此 record」
+- 或在末步 mark 已處理（加 tag / 改 status）
+- 失敗時 graceful（記 telemetry，不重 crash）
+
+### 4. 永遠用 `trigger_workflow` 不用 `http_request` 自打
+
+**這是 #1 死坑**。cypher-executor 走 `http_request` 打自己的 `cypher.arcrun.dev` 或
+`arcrun-cypher-executor.*.workers.dev` 都被 CF self-fetch 防護擋（1042 / 522 錯誤）。
+
+用內建 `trigger_workflow` 零件：
+```yaml
+trigger_processor:
+  component: trigger_workflow
+  workflow_name: "your_processor"
+  api_key: "{{api_key}}"
+  input:
+    api_key: "{{api_key}}"
+    block_id: "{{item.id}}"
+```
+
+### 5. 部署 + 驗證
+
+```
+arcrun_validate_yaml(yaml) → arcrun_push_workflow(yaml) → wait 5 min → arcrun_list_recent_executions
+```
+
+第一次 cron tick 跑完後看 executions list 確認有運作；若沒有，看 `arcrun_list_paused_executions` 看有沒有卡住。
+
+## 常見陷阱
+
+| 症狀 | 原因 | 解 |
+|---|---|---|
+| watcher 跑了但每次處理同樣 N 筆 | 沒做 mark 已處理 | 處理 workflow 末步加 tag / status 變更 |
+| watcher 跑了沒處理任何 | filter condition 寫錯 | acr validate 過但邏輯錯，curl 觸發測一次手動觸發看 trace |
+| 處理 workflow 永遠 paused | claude_api callback 沒回 | mira daemon 健康檢查；正常是 30-60 秒回 |
+| 處理量大爆 worker | 一次 trigger 太多 | list_unprocessed 加 limit，分多次 cron 跑 |
+| cron 沒 fire | 首節點不是 cron 零件 | scheduled() 只認首節點 cron — 確認 YAML flow 第一行是 `cron_node >> X` |
+
+## 真實案例
+
+`mira_feed_watcher.yaml` (polaris/mira/arcrun/) 是這 pattern 的生產使用：
+- cron `*/5 * * * *` 掃 leo 河道貼文
+- filter `tags_json eq "[]"` 抓未處理
+- trigger_workflow 觸發 `wiki_synthesis`
+- wiki_synthesis 內部末步 mark `wiki-processed` tag 確保 idempotency
+
+完整 YAML 見 mira repo。