docs(registry): seed 10 examples + 5 skills (LI SDD M3.1 + M3.3)

對應 .agents/specs/llm-interface/ Milestone 3.1 + 3.3。 registry/examples/ — 10 個可直接 push 的 workflow 範本： starter: webhook-to-http common: cron-watcher, llm-classify, rag-search-answer, daily-digest external: email-summary (gmail+claude+telegram), pdf-to-blocks, github-issue-bot advanced: parallel-fanout (trigger_workflow fan-out), error-retry (try_catch+wait pattern) 每個含：workflow.yaml（可直接 push）+ description.md（解決什麼問題 / 改成你自己的 / 學到什麼）+ tags.json（搜尋用） registry/skills/ — 5 個 AI playbook（markdown）： build_watcher_workflow — cron + filter + trigger 模式 debug_paused_workflow — claude_api callback paused 怎麼追 migrate_http_to_trigger_workflow — 從 self-fetch 換 trigger_workflow rag_with_arcrun — KBDB + claude_api 組裝 RAG add_new_wasm_component — TinyGo 寫 + 部署全流程兩者差異： examples = 可直接拿來改的 YAML skills = 面對 X 問題該怎麼想 + 該用哪個 example 兩者後續：CI 自動同步進 KBDB（type=workflow-example / type=agent-skill）， MCP arcrun_search_examples / arcrun_list_skills 走 KBDB semantic search。（CI sync 是 M3.4 工作） Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-16 16:33:54 +08:00
parent 989fbeb9ac
commit 388c193ae7
37 changed files with 1324 additions and 0 deletions
@@ -0,0 +1,40 @@
+# pdf-to-blocks
+
+## 解決什麼問題
+研究 / 學習：丟一份 PDF 進來，自動轉文字 + 切 chunk + 存 KBDB，之後可 RAG search。
+適合做：論文閱讀庫、合約查詢、技術文件 RAG。
+
+## 怎麼觸發
+```bash
+curl -X POST https://cypher.arcrun.dev/webhooks/named/pdf_to_blocks/trigger \
+  -H "X-Arcrun-API-Key: ak_xxx" \
+  -d '{
+    "api_key":"ak_xxx",
+    "pdf_url":"https://arxiv.org/pdf/2411.02959.pdf",
+    "title":"HtmlRAG",
+    "user_id":"inkstone_leo_research"
+  }'
+```
+
+## 怎麼用後續
+搭配 `rag-search-answer` workflow：
+```bash
+curl ... rag_search_answer/trigger \
+  -d '{"question":"HtmlRAG 對 Markdown 的優勢是什麼?", "user_id":"inkstone_leo_research"}'
+```
+→ claude 從你剛 ingest 的 PDF chunks 找 context 回答
+
+## 改成你自己的
+- 替換 convert 來源（cto.finally.click 也有 convert，自家環境可用）
+- `kbdb_ingest` 預設 chunk ~500 字，要改在 KBDB 端設
+- `source: "pdf:{url}"` 是 idempotency key — 同 URL 重複 ingest 會被偵測
+
+## 變體
+- 接 `claude_api` 在 ingest 後跑「自動 tag」流程（每個 chunk 抽 keyword tag）
+- 接 `email-summary` pattern：訂閱 arxiv RSS → 自動 PDF 收進來
+- 把 ingest 結果 trigger `wiki_synthesis`（mira 用此 chain）
+
+## 學到什麼
+- KBDB 有 `/convert` endpoint 直接吃 PDF / DOC，不用自己處理 OCR
+- `kbdb_ingest` 自動 chunking + embedding 一條龍
+- `source: "{type}:{key}"` 是 KBDB idempotency 慣例
@@ -0,0 +1 @@
+["pdf", "ingest", "kbdb", "rag-prep", "chunking", "knowledge-base"]
@@ -0,0 +1,25 @@
+name: pdf_to_blocks
+description: 收 PDF URL → 轉文字 → 切 chunk → 存 KBDB 每塊一個 block
+
+flow:
+  - "input >> ON_SUCCESS >> convert_pdf"
+  - "convert_pdf >> ON_SUCCESS >> ingest_to_kbdb"
+
+config:
+  convert_pdf:
+    component: http_request
+    url: "https://kbdb.finally.click/convert"
+    method: POST
+    body_json:
+      file_url: "{{input.pdf_url}}"
+      format: "text"
+
+  # kbdb_ingest 自動 chunk + 寫 blocks（每塊 ~500 字）
+  # source 用 file_url 當去重 key（同 PDF 重 ingest 不會重複建）
+  ingest_to_kbdb:
+    component: kbdb_ingest
+    api_key: "{{api_key}}"
+    page_name: "pdf-{{input.title}}"
+    text: "{{convert_pdf.data.text}}"
+    source: "pdf:{{input.pdf_url}}"
+    user_id: "{{input.user_id}}"
				`@@ -0,0 +1 @@`
				`["pdf", "ingest", "kbdb", "rag-prep", "chunking", "knowledge-base"]`