Leo/Arcrun

Files

T

uncle6me-web 922a57fe34 arcrun — AI workflow execution engine (clean history)

Self-hosted 開源：WASM 零件 + recipe + cypher-executor，跑在你自己的 Cloudflare。

此為重建的乾淨歷史起點（移除曾誤 commit 的 GCP SA 金鑰，舊歷史保留在
richblack/arcrun 與本地 backup 分支）。含：
- acr init --self-hosted installer（建 KV/R2 + codeload 拉預編譯 wasm + wrangler deploy + seed recipe）
- recipe push 把關（資料外流提醒 + 打通檢查）
- 19 個正當零件預編譯 wasm（claude_api/km_writer/kbdb_upsert_block 排除：違反 DECISIONS §1）
- CLI / cypher-executor / registry / 完整 SDD

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

2026-06-03 15:52:38 +08:00

3.4 KiB

Raw Blame History

Skill: RAG with Arcrun

何時用這個 skill

用戶說：

「我有一堆 X 資料，想問問題它幫我答」
「Claude 不知道我的私人資料，怎麼讓它知道」
「客服 bot 看我們的 docs 回答」
「企業內部知識庫問答」

三步 RAG 架構

用戶問 → 搜尋 → 把 context 餵 LLM → 回答

arcrun 對應：

flow:
  - "input >> ON_SUCCESS >> search"      # KBDB semantic search
  - "search >> ON_SUCCESS >> answer"     # claude_api 帶 context

完整範本見 arcrun_search_examples('rag') → rag-search-answer。

5 個關鍵決定

1. 資料怎麼進 KBDB？

源頭決定品質：

PDF / 文件 → 用 pdf-to-blocks workflow（自動 chunk + embedding）
Logseq / Notion / 手記 → 寫腳本 ingest 或讓 mira 平台處理
Web crawl → http_request → kbdb_ingest
每天 RSS → cron + kbdb_ingest

關鍵：

用 source 欄位區分來源（之後 query 可篩 source）
用 user_id 區分 namespace（多租戶或多 domain）
chunk 大小：500-1000 字最佳（太小無 context，太大稀釋 relevance）

2. search 怎麼設？

search:
  component: kbdb_search
  api_key: "{{api_key}}"
  query: "{{input.question}}"
  topK: 5                          # 3-10 都合理
  user_id: "{{input.user_id}}"     # 限定 namespace（多租戶必要）

進階參數：

source — 限定來源（只查 "pdf:" 或 "wiki:"）
tag — 限定 tag（"verified" / "policy" / 等）
semantic search 走 embedding，query 用自然語言即可，不用打對 keyword

3. prompt 怎麼餵 context？

關鍵：明確標 context 邊界 + 給 LLM 拒絕回答的權力

你是知識庫助手。**只用 context 內的資訊**回答問題。

規則：
1. context 沒講的，老實說「資料庫裡查不到」
2. 引用時標 [block_id]，方便用戶追原始
3. 不要外推、不要編造

Context:
{{search.results}}

問題：{{input.question}}

回答：

不給 LLM「拒絕的權力」，它會亂猜。

4. 引用怎麼顯示？

進階：用 _recipe_output_format: json 讓 claude 回結構化：

{
  "answer": "...",
  "citations": [{"block_id": "abc-123", "snippet": "..."}],
  "confidence": "high"
}

前端可 render 成可點擊的 citation 連結。

5. 怎麼測準度？

arcrun_search_examples('rag-eval') 暫無範例。手動：

準備 N 個「黃金 QA pair」（問題 + 應有的答案）
跑 workflow N 次，比對結果
若準度 < 70%：先檢查 KBDB chunk 品質、再 tune topK、最後 tune prompt

常見陷阱

症狀	原因	解
答案不準	chunk 太大 / 太小	re-ingest 改 chunk size
答案編造	prompt 沒給拒絕權	prompt 加「不知道就說不知道」
找不到該找到的	semantic 不命中	換 query rewriting / 增 topK
答案太長	prompt 沒限制	prompt 加「答案 < 100 字」
慢	claude_api timeout	拉 timeout_ms 或減 context

進階變體

多輪 query rewriting：claude 先改寫 question → search → 答
mix sources：KBDB + web search + DB query → merge
cache：相同 question 的答案存 KBDB，下次 lookup hit 直接回（省 LLM call）
conversational：傳 chat history 進 prompt，支援 follow-up
filter-then-rerank：semantic search 撈 20 → claude rerank 取前 5 → 餵 final answer

3.4 KiB Raw Blame History Unescape Escape