docs(registry): seed 10 examples + 5 skills (LI SDD M3.1 + M3.3)

對應 .agents/specs/llm-interface/ Milestone 3.1 + 3.3。 registry/examples/ — 10 個可直接 push 的 workflow 範本： starter: webhook-to-http common: cron-watcher, llm-classify, rag-search-answer, daily-digest external: email-summary (gmail+claude+telegram), pdf-to-blocks, github-issue-bot advanced: parallel-fanout (trigger_workflow fan-out), error-retry (try_catch+wait pattern) 每個含：workflow.yaml（可直接 push）+ description.md（解決什麼問題 / 改成你自己的 / 學到什麼）+ tags.json（搜尋用） registry/skills/ — 5 個 AI playbook（markdown）： build_watcher_workflow — cron + filter + trigger 模式 debug_paused_workflow — claude_api callback paused 怎麼追 migrate_http_to_trigger_workflow — 從 self-fetch 換 trigger_workflow rag_with_arcrun — KBDB + claude_api 組裝 RAG add_new_wasm_component — TinyGo 寫 + 部署全流程兩者差異： examples = 可直接拿來改的 YAML skills = 面對 X 問題該怎麼想 + 該用哪個 example 兩者後續：CI 自動同步進 KBDB（type=workflow-example / type=agent-skill）， MCP arcrun_search_examples / arcrun_list_skills 走 KBDB semantic search。（CI sync 是 M3.4 工作） Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-16 16:33:54 +08:00
parent 989fbeb9ac
commit 388c193ae7
37 changed files with 1324 additions and 0 deletions
@@ -0,0 +1,115 @@
+# Skill: RAG with Arcrun
+
+## 何時用這個 skill
+
+用戶說：
+- 「我有一堆 X 資料，想問問題它幫我答」
+- 「Claude 不知道我的私人資料，怎麼讓它知道」
+- 「客服 bot 看我們的 docs 回答」
+- 「企業內部知識庫問答」
+
+## 三步 RAG 架構
+
+```
+用戶問 → 搜尋 → 把 context 餵 LLM → 回答
+```
+
+arcrun 對應：
+```yaml
+flow:
+  - "input >> ON_SUCCESS >> search"      # KBDB semantic search
+  - "search >> ON_SUCCESS >> answer"     # claude_api 帶 context
+```
+
+完整範本見 `arcrun_search_examples('rag')` → `rag-search-answer`。
+
+## 5 個關鍵決定
+
+### 1. 資料怎麼進 KBDB？
+
+源頭決定品質：
+- **PDF / 文件** → 用 `pdf-to-blocks` workflow（自動 chunk + embedding）
+- **Logseq / Notion / 手記** → 寫腳本 ingest 或讓 mira 平台處理
+- **Web crawl** → http_request → `kbdb_ingest`
+- **每天 RSS** → cron + kbdb_ingest
+
+關鍵：
+- 用 `source` 欄位區分來源（之後 query 可篩 source）
+- 用 `user_id` 區分 namespace（多租戶或多 domain）
+- chunk 大小：500-1000 字最佳（太小無 context，太大稀釋 relevance）
+
+### 2. search 怎麼設？
+
+```yaml
+search:
+  component: kbdb_search
+  api_key: "{{api_key}}"
+  query: "{{input.question}}"
+  topK: 5                          # 3-10 都合理
+  user_id: "{{input.user_id}}"     # 限定 namespace（多租戶必要）
+```
+
+進階參數：
+- `source` — 限定來源（只查 "pdf:*" 或 "wiki:*"）
+- `tag` — 限定 tag（"verified" / "policy" / 等）
+- semantic search 走 embedding，query 用自然語言即可，不用打對 keyword
+
+### 3. prompt 怎麼餵 context？
+
+關鍵：**明確標 context 邊界 + 給 LLM 拒絕回答的權力**
+
+```
+你是知識庫助手。**只用 context 內的資訊**回答問題。
+
+規則：
+1. context 沒講的，老實說「資料庫裡查不到」
+2. 引用時標 [block_id]，方便用戶追原始
+3. 不要外推、不要編造
+
+Context:
+{{search.results}}
+
+問題：{{input.question}}
+
+回答：
+```
+
+不給 LLM「拒絕的權力」，它會亂猜。
+
+### 4. 引用怎麼顯示？
+
+進階：用 `_recipe_output_format: json` 讓 claude 回結構化：
+```json
+{
+  "answer": "...",
+  "citations": [{"block_id": "abc-123", "snippet": "..."}],
+  "confidence": "high"
+}
+```
+
+前端可 render 成可點擊的 citation 連結。
+
+### 5. 怎麼測準度？
+
+`arcrun_search_examples('rag-eval')` 暫無範例。手動：
+1. 準備 N 個「黃金 QA pair」（問題 + 應有的答案）
+2. 跑 workflow N 次，比對結果
+3. 若準度 < 70%：先檢查 KBDB chunk 品質、再 tune topK、最後 tune prompt
+
+## 常見陷阱
+
+| 症狀 | 原因 | 解 |
+|---|---|---|
+| 答案不準 | chunk 太大 / 太小 | re-ingest 改 chunk size |
+| 答案編造 | prompt 沒給拒絕權 | prompt 加「不知道就說不知道」 |
+| 找不到該找到的 | semantic 不命中 | 換 query rewriting / 增 topK |
+| 答案太長 | prompt 沒限制 | prompt 加「答案 < 100 字」 |
+| 慢 | claude_api timeout | 拉 timeout_ms 或減 context |
+
+## 進階變體
+
+- **多輪 query rewriting**：claude 先改寫 question → search → 答
+- **mix sources**：KBDB + web search + DB query → merge
+- **cache**：相同 question 的答案存 KBDB，下次 lookup hit 直接回（省 LLM call）
+- **conversational**：傳 chat history 進 prompt，支援 follow-up
+- **filter-then-rerank**：semantic search 撈 20 → claude rerank 取前 5 → 餵 final answer