Leo/Arcrun

Files

T

Leo 388c193ae7 docs(registry): seed 10 examples + 5 skills (LI SDD M3.1 + M3.3)

對應 .agents/specs/llm-interface/ Milestone 3.1 + 3.3。

registry/examples/ — 10 個可直接 push 的 workflow 範本：
  starter:    webhook-to-http
  common:     cron-watcher, llm-classify, rag-search-answer, daily-digest
  external:   email-summary (gmail+claude+telegram), pdf-to-blocks,
              github-issue-bot
  advanced:   parallel-fanout (trigger_workflow fan-out),
              error-retry (try_catch+wait pattern)

  每個含：workflow.yaml（可直接 push）+ description.md（解決什麼問題 /
  改成你自己的 / 學到什麼）+ tags.json（搜尋用）

registry/skills/ — 5 個 AI playbook（markdown）：
  build_watcher_workflow            — cron + filter + trigger 模式
  debug_paused_workflow             — claude_api callback paused 怎麼追
  migrate_http_to_trigger_workflow  — 從 self-fetch 換 trigger_workflow
  rag_with_arcrun                   — KBDB + claude_api 組裝 RAG
  add_new_wasm_component            — TinyGo 寫 + 部署全流程

兩者差異：
  examples = 可直接拿來改的 YAML
  skills = 面對 X 問題該怎麼想 + 該用哪個 example

兩者後續：CI 自動同步進 KBDB（type=workflow-example / type=agent-skill），
MCP arcrun_search_examples / arcrun_list_skills 走 KBDB semantic search。
（CI sync 是 M3.4 工作）

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

2026-05-16 16:33:54 +08:00

1.4 KiB

Raw Permalink Blame History

pdf-to-blocks

解決什麼問題

研究 / 學習：丟一份 PDF 進來，自動轉文字 + 切 chunk + 存 KBDB，之後可 RAG search。適合做：論文閱讀庫、合約查詢、技術文件 RAG。

怎麼觸發

curl -X POST https://cypher.arcrun.dev/webhooks/named/pdf_to_blocks/trigger \
  -H "X-Arcrun-API-Key: ak_xxx" \
  -d '{
    "api_key":"ak_xxx",
    "pdf_url":"https://arxiv.org/pdf/2411.02959.pdf",
    "title":"HtmlRAG",
    "user_id":"inkstone_leo_research"
  }'

怎麼用後續

搭配 rag-search-answer workflow：

curl ... rag_search_answer/trigger \
  -d '{"question":"HtmlRAG 對 Markdown 的優勢是什麼?", "user_id":"inkstone_leo_research"}'

→ claude 從你剛 ingest 的 PDF chunks 找 context 回答

改成你自己的

替換 convert 來源（cto.finally.click 也有 convert，自家環境可用）
kbdb_ingest 預設 chunk ~500 字，要改在 KBDB 端設
source: "pdf:{url}" 是 idempotency key — 同 URL 重複 ingest 會被偵測

變體

接 claude_api 在 ingest 後跑「自動 tag」流程（每個 chunk 抽 keyword tag）
接 email-summary pattern：訂閱 arxiv RSS → 自動 PDF 收進來
把 ingest 結果 trigger wiki_synthesis（mira 用此 chain）

學到什麼

KBDB 有 /convert endpoint 直接吃 PDF / DOC，不用自己處理 OCR
kbdb_ingest 自動 chunking + embedding 一條龍
source: "{type}:{key}" 是 KBDB idempotency 慣例

1.4 KiB Raw Permalink Blame History Unescape Escape