feat(ingest): T0.5–T5 純餵食器管線實作(issue #2)
ingest 全管線(採取優先、extract fallback、跨庫織網、POST envelope): - T0.5 骨架:Hono + zod-openapi,無 D1/Vectorize/AI 綁定(不碰儲存鐵律) - T1 SourceAdapter:GitHub runtime API 拉 + per-file sha256 content-hash + /refresh 受理端 - T2 採取(路徑 A 優先):harvest template 1.8.0+ 卡(gloss/實體/typed-edge) - T3 extract(路徑 B fallback):LlmCaller 可選模型 + JSON-fail 升級閘 + 端點對齊硬自檢護欄;第一版不 embed(只打標) - T4 跨庫織網(主職):匯總多 repo → 偵測跨庫橋/異見,不算 bridge_score(graph 領域) - T5 輸出:buildEnvelope strict + 顯式禁送欄位自檢;graph-client 純 POST(cherry-pick _kbdb_client.py 改不碰 base);薄 ops CLI(不帶查詢 MCP) envelope 對齊 full contract(embed/id/aliases/predicate_embed);同步 contract 向量化欄位升格。 gate:vitest 28 passed / tsc clean / wrangler dry-run 乾淨(只 env-var 綁定)。 端到端 ingest→graph:graph receiver 已補對齊 → 待 ingest 部署 + GRAPH_BASE_URL → 待部署驗,未假綠。 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -2,44 +2,47 @@
|
||||
|
||||
> 唯一進度來源。狀態:[ ] 未開始 [🔄] 進行中 [x] 完成 [⏸] 卡住
|
||||
> 跨專案藍圖:InkStoneCo `docs/3-specs/mira-dissolve/`。
|
||||
> 實作分支:`claude/ingest-t1-t5-implementation`(vitest 28 passed / tsc clean / dry-run 乾淨)。
|
||||
|
||||
## T0 repo 骨架(本輪)
|
||||
## T0 repo 骨架
|
||||
|
||||
- [x] 0.1 建 public repo `uncle6me-web/kbdb-ingest-plugin`
|
||||
- [x] 0.2 CLAUDE.md(上游指針 + ingest 鐵律)+ README + .gitignore
|
||||
- [x] 0.3 `contracts/ingest-candidate.json`(從頂層 SDD 複製,凍結契約)
|
||||
- [x] 0.4 SDD 三件式骨架
|
||||
- [ ] 0.5 package.json / tsconfig / wrangler.toml(參考 kbdb-graph-plugin)
|
||||
- [x] 0.4 SDD 三件式骨架(`docs/3-specs/ingest-pipeline/`)
|
||||
- [x] 0.5 package.json / tsconfig / wrangler.toml / vitest.config(參考 kbdb-graph-plugin:Hono + zod-openapi,無 D1/Vectorize/AI 綁定)
|
||||
|
||||
## T1 SourceAdapter(R1)
|
||||
## T1 SourceAdapter(R1)— `src/lib/source-adapter.ts`
|
||||
|
||||
- [ ] 1.1 GitHub 拉 repo(runtime API/clone,非 Actions)
|
||||
- [ ] 1.2 content-hash(per-file,source.uri = github:owner/repo@path)
|
||||
- [ ] 1.3 被 KBDB MCP `refresh` 代轉觸發的接口
|
||||
- [x] 1.1 GitHub 拉 repo(runtime git/trees + contents API,非 Actions);GitHubFetcher 介面(測試走 mock)
|
||||
- [x] 1.2 content-hash(per-file sha256;source.uri = github:owner/repo@path,makeSourceUri/parseSourceUri round-trip)
|
||||
- [x] 1.3 被 graph `POST /graph/refresh` 代轉觸發的受理端:`POST /refresh`(`src/index.ts`,被動代轉、無排程)
|
||||
|
||||
## T2 採取(R2,路徑 A 優先)
|
||||
## T2 採取(R2,路徑 A 優先)— `src/lib/harvest.ts`
|
||||
|
||||
- [ ] 2.1 拉本地 CC 已建三元組 + gloss(用了 system-dev-template 的 repo)
|
||||
- [ ] 2.2 cherry-pick `polaris/mira/tools/_kbdb_client.py` → 改純餵食器(POST envelope,不寫 KBDB)
|
||||
- [x] 2.1 採取本地 CC 已建三元組 + gloss(template 1.8.0+ 格式:frontmatter gloss、`## 實體`、`## 關聯` typed-edge;卡對卡 vs 內文端點分流)
|
||||
- [x] 2.2 cherry-pick `_kbdb_client.py` → 改純餵食器 `src/lib/graph-client.ts`(POST envelope,**不寫 KBDB/base**)
|
||||
|
||||
## T3 extract(R3,路徑 B fallback)
|
||||
## T3 extract(R3,路徑 B fallback)— `src/lib/extract.ts`
|
||||
|
||||
- [ ] 3.1 cherry-pick `wiki_synthesis.yaml` classify / 兩 skill block
|
||||
- [ ] 3.2 模型用戶可選 + 品質門檻白名單(預設 Haiku,深萃 Claude via CC)
|
||||
- [ ] 3.3 模型測試集(中文 + 人類暗示樣本,轉回歸測試)— deferred,先跑預設
|
||||
- [ ] 3.4 JSON-fail 升級閘(淺萃失敗升 deep)
|
||||
- [ ] 3.5 第一版不 embed(embed 等 base vectorize,InkStoneCo T2.4)
|
||||
- [x] 3.1 cherry-pick `wiki_synthesis.yaml` classify 模式 → extract prompt(JSON nodes[]+triplets[])
|
||||
- [x] 3.2 模型用戶可選(意圖非型號,LlmCaller 介面,預設 shallow/Haiku、deep/Claude via CC)
|
||||
- [ ] 3.3 模型測試集(中文 + 人類暗示樣本,轉回歸測試)— **deferred**(先跑預設;護欄 + parse 已有單元測試)
|
||||
- [x] 3.4 JSON-fail 升級閘(淺萃 fail/過稀 → 升 deep 一次)
|
||||
- [x] 3.5 第一版不 embed(仍【打標】embed/predicate_embed 供未來 base 讀標;embed 動作等 Arcrun #7)
|
||||
- [x] 3.x 端點對齊硬自檢護欄(`src/lib/endpoint-check.ts`,leo 壓測 14→0;自檢 + autoAlign 補齊)
|
||||
|
||||
## T4 跨 repo 織網(R4,主職)
|
||||
## T4 跨 repo 織網(R4,主職)— `src/lib/weave.ts`
|
||||
|
||||
- [ ] 4.1 匯總多 repo 三元組
|
||||
- [x] 4.1 匯總多 repo 三元組 → 偵測跨庫橋(同名 node 跨 ≥2 repo)+ 異見(同 s/o 對、不同謂詞);**不算 bridge_score**(graph 領域,禁送)
|
||||
|
||||
## T5 輸出 + CLI(R5/R6)
|
||||
|
||||
- [ ] 5.1 POST envelope 給 graph `POST /triplets/ingest`(嚴格符合 contract)⏸ 待 graph 寫入端(InkStoneCo T3.3)
|
||||
- [ ] 5.2 薄 ops CLI(手動重萃);不帶查詢 MCP
|
||||
- [x] 5.1 POST envelope 給 graph `POST /triplets/ingest`(嚴格符合 contract;buildEnvelope strict + 顯式禁送欄位自檢提早攔)。對齊【full contract】(含 embed/id/aliases/predicate_embed,總管裁定 ingest 不退)
|
||||
- [x] 5.2 薄 ops CLI(`scripts/ingest-cli.mjs`:refresh 經 Worker / pull dry-run);**不帶查詢 MCP**
|
||||
|
||||
## 阻擋項
|
||||
## 阻擋項 / 誠實標記
|
||||
|
||||
1. ⏸ T5.1 依賴 graph `POST /triplets/ingest`(InkStoneCo T3,待 graph repo 實作)。
|
||||
2. ⏸ embed 依賴 base vectorize(InkStoneCo T2.4)。第一版不 embed 可先動。
|
||||
1. ⏸ **端到端 ingest→graph 走通**:graph receiver 已補對齊 full contract → 剩 ingest 部署 + `GRAPH_BASE_URL` 設定 → **待部署驗**,未假綠。
|
||||
2. ⏸ embed 依賴 base vectorize(Arcrun #7)。第一版不 embed(只打標)已動。
|
||||
3. T3.3 模型測試集 deferred;refresh 端 extract(Workers AI)第一版只走採取,深萃留 CLI/CC。
|
||||
|
||||
Reference in New Issue
Block a user