Files
kbdb-graph-plugin/contracts/ingest-candidate.json
Leo 98b221b435 docs(sdd): 立 ingest-contract SDD + 搬入 ingest-candidate 契約 (T3.1+T3.8)
對應 issue #1(頂層 mira-dissolve T3)。

- contracts/ingest-candidate.json:ingest→graph 邊界契約(自頂層搬入)
- contracts/README.md:標明候選(輸入)≠已存(triplet)
- docs/3-specs/ingest-contract/design.md + tasks.md:
  - ensureTemplate 改 slot-diff 補丁(取代 early-return,免遷移腳本)
  - 補 KbdbClient.updateRecord(base PATCH /records/:id)
  - ingest 流程:驗證(422)→idempotency(uri+hash)→先 append 後 deprecate
  - triplet template 增 source_uri+content_hash slot 承載 idempotency
  - 跨 repo 協調點(3.6 圖工具併 KBDB MCP)明列需 arcrun 配合

總管已認可四個設計決定(issue #1 comment)。鐵律:零建表/零 SQL/零 migration。

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 18:07:12 +08:00

105 lines
5.1 KiB
JSON

{
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "IngestCandidateEnvelope",
"description": "ingest 插件 → graph 插件 的唯一邊界契約。一個 envelope = 一個來源檔(canonical MD)一次萃取的產物。graph 收下後負責正規化/clusters/bridge_score/embed,並以『翻 triplet template 實例的 status slot』做取代:同 source.uri 出新 content_hash 時,graph 把舊 active 實例 PATCH 成 deprecated、append 新批 active(可查、可 rollback、可清),ingest 對這些一無所知。注意:這是【輸入候選】,不是【已存三元組】(後者見 triplet.json)。",
"type": "object",
"required": ["source", "extractor", "triplets"],
"additionalProperties": false,
"properties": {
"source": {
"type": "object",
"description": "這批候選來自哪個 canonical MD。同時是『指回原文的指標』與『append-only 快照鍵』。",
"required": ["uri", "content_hash"],
"additionalProperties": false,
"properties": {
"uri": {
"type": "string",
"minLength": 1,
"description": "來源的穩定識別 = 快照鍵 + get_source 指標。格式: 'github:<owner>/<repo>@<path>' 例如 'github:uncle6me-web/LLM-Wiki-for-n8n@.claude/wiki/graph-rag.md'。同一 uri 的後續 envelope 會【取代】(latest-wins)前一批,而非疊加。"
},
"content_hash": {
"type": "string",
"minLength": 1,
"description": "來源檔內容的 hash(快照鍵)。graph 比對: 與該 uri 現存快照同 hash → no-op 跳過;不同 → 寫新快照。"
},
"anchor": {
"type": "string",
"description": "檔內定位(heading slug / block id),供 get_source 精準回跳。選填。"
},
"commit": {
"type": "string",
"description": "git commit sha(可追溯)。選填。"
},
"block_id": {
"type": "string",
"description": "向後相容: Logseq Block ID(= 既有 triplet.json 的 source_block_id)。非 git 來源時用。選填。"
}
}
},
"extractor": {
"type": "object",
"description": "萃取出處。供『升級率』觀測與『要不要重萃』判斷;不影響圖結構。",
"required": ["model", "tier"],
"additionalProperties": false,
"properties": {
"model": {
"type": "string",
"minLength": 1,
"description": "產生這批的模型,例如 'workers-ai/@cf/...' 或 'claude-sonnet-4-6'。"
},
"tier": {
"type": "string",
"enum": ["shallow", "deep"],
"description": "shallow = Workers AI 淺萃;deep = Claude API 深萃(淺萃 JSON-fail/過稀時升級)。"
},
"extracted_at": {
"type": "integer",
"description": "萃取的 unix 時間(秒)。快照排序用。選填(graph 收件時可補)。"
}
}
},
"nodes": {
"type": "array",
"description": "節點層附帶資訊(選填)。entity_type 與 gloss 是【節點】屬性,不是【邊】屬性 → 放這裡,不放 triplets。graph 用 gloss 去 embed(每節點一句,不是裸詞)、用 entity_type 去 typing。",
"items": {
"type": "object",
"required": ["name"],
"additionalProperties": false,
"properties": {
"name": {
"type": "string",
"minLength": 1,
"description": "節點名(須對應某 triplet 的 subject 或 object 原字面)。"
},
"gloss": {
"type": "string",
"description": "一句話描述,供 embedding。例如 'Graph RAG — 用關係遍歷檢索、保住異見的 RAG 變體'。選填(建議 deep tier 產出)。"
},
"entity_type": {
"type": "string",
"enum": ["person", "event", "product", "market", "org"],
"description": "節點類型提示。graph 最終決定;ingest 只提示。選填。"
}
}
}
},
"triplets": {
"type": "array",
"minItems": 1,
"description": "邊(關係)。ingest 只產原始 (s,p,o) + confidence。",
"items": {
"type": "object",
"required": ["subject", "predicate", "object"],
"additionalProperties": false,
"properties": {
"subject": { "type": "string", "minLength": 1, "description": "主詞(實體名,須與 nodes[].name 對得上若有提供)" },
"predicate": { "type": "string", "minLength": 1, "description": "謂詞(關係)" },
"object": { "type": "string", "minLength": 1, "description": "受詞(目標實體或值)" },
"confidence":{ "type": "number", "minimum": 0, "maximum": 1, "default": 1.0, "description": "萃取可信度。淺萃可附自評;graph 不據此過濾,只記錄。" }
}
}
}
},
"$comment": "禁止欄位(graph 領域,ingest 絕不可送): id / clusters / bridge_score / created_at / updated_at / 以及 triplet 上的 subject_entity_type|object_entity_type(類型只走 nodes[])。送了即違反 ingest=純餵食器的邊界,graph 應拒收或忽略。"
}