docs(registry): seed 10 examples + 5 skills (LI SDD M3.1 + M3.3)
對應 .agents/specs/llm-interface/ Milestone 3.1 + 3.3。
registry/examples/ — 10 個可直接 push 的 workflow 範本:
starter: webhook-to-http
common: cron-watcher, llm-classify, rag-search-answer, daily-digest
external: email-summary (gmail+claude+telegram), pdf-to-blocks,
github-issue-bot
advanced: parallel-fanout (trigger_workflow fan-out),
error-retry (try_catch+wait pattern)
每個含:workflow.yaml(可直接 push)+ description.md(解決什麼問題 /
改成你自己的 / 學到什麼)+ tags.json(搜尋用)
registry/skills/ — 5 個 AI playbook(markdown):
build_watcher_workflow — cron + filter + trigger 模式
debug_paused_workflow — claude_api callback paused 怎麼追
migrate_http_to_trigger_workflow — 從 self-fetch 換 trigger_workflow
rag_with_arcrun — KBDB + claude_api 組裝 RAG
add_new_wasm_component — TinyGo 寫 + 部署全流程
兩者差異:
examples = 可直接拿來改的 YAML
skills = 面對 X 問題該怎麼想 + 該用哪個 example
兩者後續:CI 自動同步進 KBDB(type=workflow-example / type=agent-skill),
MCP arcrun_search_examples / arcrun_list_skills 走 KBDB semantic search。
(CI sync 是 M3.4 工作)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,40 @@
|
||||
# pdf-to-blocks
|
||||
|
||||
## 解決什麼問題
|
||||
研究 / 學習:丟一份 PDF 進來,自動轉文字 + 切 chunk + 存 KBDB,之後可 RAG search。
|
||||
適合做:論文閱讀庫、合約查詢、技術文件 RAG。
|
||||
|
||||
## 怎麼觸發
|
||||
```bash
|
||||
curl -X POST https://cypher.arcrun.dev/webhooks/named/pdf_to_blocks/trigger \
|
||||
-H "X-Arcrun-API-Key: ak_xxx" \
|
||||
-d '{
|
||||
"api_key":"ak_xxx",
|
||||
"pdf_url":"https://arxiv.org/pdf/2411.02959.pdf",
|
||||
"title":"HtmlRAG",
|
||||
"user_id":"inkstone_leo_research"
|
||||
}'
|
||||
```
|
||||
|
||||
## 怎麼用後續
|
||||
搭配 `rag-search-answer` workflow:
|
||||
```bash
|
||||
curl ... rag_search_answer/trigger \
|
||||
-d '{"question":"HtmlRAG 對 Markdown 的優勢是什麼?", "user_id":"inkstone_leo_research"}'
|
||||
```
|
||||
→ claude 從你剛 ingest 的 PDF chunks 找 context 回答
|
||||
|
||||
## 改成你自己的
|
||||
- 替換 convert 來源(cto.finally.click 也有 convert,自家環境可用)
|
||||
- `kbdb_ingest` 預設 chunk ~500 字,要改在 KBDB 端設
|
||||
- `source: "pdf:{url}"` 是 idempotency key — 同 URL 重複 ingest 會被偵測
|
||||
|
||||
## 變體
|
||||
- 接 `claude_api` 在 ingest 後跑「自動 tag」流程(每個 chunk 抽 keyword tag)
|
||||
- 接 `email-summary` pattern:訂閱 arxiv RSS → 自動 PDF 收進來
|
||||
- 把 ingest 結果 trigger `wiki_synthesis`(mira 用此 chain)
|
||||
|
||||
## 學到什麼
|
||||
- KBDB 有 `/convert` endpoint 直接吃 PDF / DOC,不用自己處理 OCR
|
||||
- `kbdb_ingest` 自動 chunking + embedding 一條龍
|
||||
- `source: "{type}:{key}"` 是 KBDB idempotency 慣例
|
||||
@@ -0,0 +1 @@
|
||||
["pdf", "ingest", "kbdb", "rag-prep", "chunking", "knowledge-base"]
|
||||
@@ -0,0 +1,25 @@
|
||||
name: pdf_to_blocks
|
||||
description: 收 PDF URL → 轉文字 → 切 chunk → 存 KBDB 每塊一個 block
|
||||
|
||||
flow:
|
||||
- "input >> ON_SUCCESS >> convert_pdf"
|
||||
- "convert_pdf >> ON_SUCCESS >> ingest_to_kbdb"
|
||||
|
||||
config:
|
||||
convert_pdf:
|
||||
component: http_request
|
||||
url: "https://kbdb.finally.click/convert"
|
||||
method: POST
|
||||
body_json:
|
||||
file_url: "{{input.pdf_url}}"
|
||||
format: "text"
|
||||
|
||||
# kbdb_ingest 自動 chunk + 寫 blocks(每塊 ~500 字)
|
||||
# source 用 file_url 當去重 key(同 PDF 重 ingest 不會重複建)
|
||||
ingest_to_kbdb:
|
||||
component: kbdb_ingest
|
||||
api_key: "{{api_key}}"
|
||||
page_name: "pdf-{{input.title}}"
|
||||
text: "{{convert_pdf.data.text}}"
|
||||
source: "pdf:{{input.pdf_url}}"
|
||||
user_id: "{{input.user_id}}"
|
||||
Reference in New Issue
Block a user