Files
kbdb-ingest-plugin/docs/3-specs/ingest-pipeline/tasks.md
T
Leo 16ad1cb208 feat(ingest): T0.5–T5 純餵食器管線實作(issue #2)
ingest 全管線(採取優先、extract fallback、跨庫織網、POST envelope):
- T0.5 骨架:Hono + zod-openapi,無 D1/Vectorize/AI 綁定(不碰儲存鐵律)
- T1 SourceAdapter:GitHub runtime API 拉 + per-file sha256 content-hash + /refresh 受理端
- T2 採取(路徑 A 優先):harvest template 1.8.0+ 卡(gloss/實體/typed-edge)
- T3 extract(路徑 B fallback):LlmCaller 可選模型 + JSON-fail 升級閘 + 端點對齊硬自檢護欄;第一版不 embed(只打標)
- T4 跨庫織網(主職):匯總多 repo → 偵測跨庫橋/異見,不算 bridge_score(graph 領域)
- T5 輸出:buildEnvelope strict + 顯式禁送欄位自檢;graph-client 純 POST(cherry-pick _kbdb_client.py 改不碰 base);薄 ops CLI(不帶查詢 MCP)

envelope 對齊 full contract(embed/id/aliases/predicate_embed);同步 contract 向量化欄位升格。

gate:vitest 28 passed / tsc clean / wrangler dry-run 乾淨(只 env-var 綁定)。
端到端 ingest→graph:graph receiver 已補對齊 → 待 ingest 部署 + GRAPH_BASE_URL → 待部署驗,未假綠。

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 20:40:53 +08:00

49 lines
3.3 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# ingest pipeline — Tasks
> 唯一進度來源。狀態:[ ] 未開始 [🔄] 進行中 [x] 完成 [⏸] 卡住
> 跨專案藍圖:InkStoneCo `docs/3-specs/mira-dissolve/`。
> 實作分支:`claude/ingest-t1-t5-implementation`vitest 28 passed / tsc clean / dry-run 乾淨)。
## T0 repo 骨架
- [x] 0.1 建 public repo `uncle6me-web/kbdb-ingest-plugin`
- [x] 0.2 CLAUDE.md(上游指針 + ingest 鐵律)+ README + .gitignore
- [x] 0.3 `contracts/ingest-candidate.json`(從頂層 SDD 複製,凍結契約)
- [x] 0.4 SDD 三件式骨架(`docs/3-specs/ingest-pipeline/`
- [x] 0.5 package.json / tsconfig / wrangler.toml / vitest.config(參考 kbdb-graph-pluginHono + zod-openapi,無 D1/Vectorize/AI 綁定)
## T1 SourceAdapterR1)— `src/lib/source-adapter.ts`
- [x] 1.1 GitHub 拉 reporuntime git/trees + contents API,非 Actions);GitHubFetcher 介面(測試走 mock
- [x] 1.2 content-hashper-file sha256source.uri = github:owner/repo@pathmakeSourceUri/parseSourceUri round-trip
- [x] 1.3 被 graph `POST /graph/refresh` 代轉觸發的受理端:`POST /refresh``src/index.ts`,被動代轉、無排程)
## T2 採取(R2,路徑 A 優先)— `src/lib/harvest.ts`
- [x] 2.1 採取本地 CC 已建三元組 + glosstemplate 1.8.0+ 格式:frontmatter gloss、`## 實體``## 關聯` typed-edge;卡對卡 vs 內文端點分流)
- [x] 2.2 cherry-pick `_kbdb_client.py` → 改純餵食器 `src/lib/graph-client.ts`POST envelope**不寫 KBDB/base**
## T3 extractR3,路徑 B fallback)— `src/lib/extract.ts`
- [x] 3.1 cherry-pick `wiki_synthesis.yaml` classify 模式 → extract promptJSON nodes[]+triplets[]
- [x] 3.2 模型用戶可選(意圖非型號,LlmCaller 介面,預設 shallow/Haiku、deep/Claude via CC
- [ ] 3.3 模型測試集(中文 + 人類暗示樣本,轉回歸測試)— **deferred**(先跑預設;護欄 + parse 已有單元測試)
- [x] 3.4 JSON-fail 升級閘(淺萃 fail/過稀 → 升 deep 一次)
- [x] 3.5 第一版不 embed(仍【打標】embed/predicate_embed 供未來 base 讀標;embed 動作等 Arcrun #7
- [x] 3.x 端點對齊硬自檢護欄(`src/lib/endpoint-check.ts`leo 壓測 14→0;自檢 + autoAlign 補齊)
## T4 跨 repo 織網(R4,主職)— `src/lib/weave.ts`
- [x] 4.1 匯總多 repo 三元組 → 偵測跨庫橋(同名 node 跨 ≥2 repo)+ 異見(同 s/o 對、不同謂詞);**不算 bridge_score**graph 領域,禁送)
## T5 輸出 + CLIR5/R6
- [x] 5.1 POST envelope 給 graph `POST /triplets/ingest`(嚴格符合 contractbuildEnvelope strict + 顯式禁送欄位自檢提早攔)。對齊【full contract】(含 embed/id/aliases/predicate_embed,總管裁定 ingest 不退)
- [x] 5.2 薄 ops CLI`scripts/ingest-cli.mjs`refresh 經 Worker / pull dry-run);**不帶查詢 MCP**
## 阻擋項 / 誠實標記
1.**端到端 ingest→graph 走通**graph receiver 已補對齊 full contract → 剩 ingest 部署 + `GRAPH_BASE_URL` 設定 → **待部署驗**,未假綠。
2. ⏸ embed 依賴 base vectorizeArcrun #7)。第一版不 embed(只打標)已動。
3. T3.3 模型測試集 deferredrefresh 端 extractWorkers AI)第一版只走採取,深萃留 CLI/CC。