e7a681a989
契約漂移修補:T3 的 strict Zod 鏡射舊 contract,ingest 照新 contract(ingest#1 升格)送向量化打標欄位會被 .strict() 擋成 422。方向 A:顯式加合法新欄位、保留 strict。 - 同步 contracts/ingest-candidate.json 副本到頂層單一真相源(mira-dissolve)。 - NodeSchema 加 id?/aliases?/embed?;EdgeSchema 加 predicate_embed?。strict() 保留 → bridge_score/clusters 等 graph 領域禁送欄位仍 422。 - 落地:predicate_embed 透傳進 triplet slot;node 打標(embed/gloss/aliases)存進 entity slot,供 base/KBDB embed 模組讀標執行(graph 不算向量,鐵律一致)。 - id 作 node 去重鍵:同卡多邊指到只存一筆 entity。 - persistNodes 拆成獨立 action(triplet-ingest.ts 回到 95 行,守樂高 100 行限制)。 - 測試 +4:帶向量化欄位通過、bridge_score/clusters 仍 422、同 id 去重。 vitest 23 passed。零 SQL / 無 D1·Vectorize·AI 綁定 / dry-run 乾淨。 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
126 lines
7.0 KiB
JSON
126 lines
7.0 KiB
JSON
{
|
|
"$schema": "http://json-schema.org/draft-07/schema#",
|
|
"title": "IngestCandidateEnvelope",
|
|
"description": "ingest 插件 → graph 插件 的唯一邊界契約。一個 envelope = 一個來源檔(canonical MD)一次萃取的產物。graph 收下後負責正規化/clusters/bridge_score/embed,並以『翻 triplet template 實例的 status slot』做取代:同 source.uri 出新 content_hash 時,graph 把舊 active 實例 PATCH 成 deprecated、append 新批 active(可查、可 rollback、可清),ingest 對這些一無所知。注意:這是【輸入候選】,不是【已存三元組】(後者見 triplet.json)。",
|
|
"type": "object",
|
|
"required": ["source", "extractor", "triplets"],
|
|
"additionalProperties": false,
|
|
"properties": {
|
|
"source": {
|
|
"type": "object",
|
|
"description": "這批候選來自哪個 canonical MD。同時是『指回原文的指標』與『append-only 快照鍵』。",
|
|
"required": ["uri", "content_hash"],
|
|
"additionalProperties": false,
|
|
"properties": {
|
|
"uri": {
|
|
"type": "string",
|
|
"minLength": 1,
|
|
"description": "來源的穩定識別 = 快照鍵 + get_source 指標。格式: 'github:<owner>/<repo>@<path>' 例如 'github:uncle6me-web/LLM-Wiki-for-n8n@.claude/wiki/graph-rag.md'。同一 uri 的後續 envelope 會【取代】(latest-wins)前一批,而非疊加。"
|
|
},
|
|
"content_hash": {
|
|
"type": "string",
|
|
"minLength": 1,
|
|
"description": "來源檔內容的 hash(快照鍵)。graph 比對: 與該 uri 現存快照同 hash → no-op 跳過;不同 → 寫新快照。"
|
|
},
|
|
"anchor": {
|
|
"type": "string",
|
|
"description": "檔內定位(heading slug / block id),供 get_source 精準回跳。選填。"
|
|
},
|
|
"commit": {
|
|
"type": "string",
|
|
"description": "git commit sha(可追溯)。選填。"
|
|
},
|
|
"block_id": {
|
|
"type": "string",
|
|
"description": "向後相容: Logseq Block ID(= 既有 triplet.json 的 source_block_id)。非 git 來源時用。選填。"
|
|
}
|
|
}
|
|
},
|
|
"extractor": {
|
|
"type": "object",
|
|
"description": "萃取出處。供『升級率』觀測與『要不要重萃』判斷;不影響圖結構。",
|
|
"required": ["model", "tier"],
|
|
"additionalProperties": false,
|
|
"properties": {
|
|
"model": {
|
|
"type": "string",
|
|
"minLength": 1,
|
|
"description": "產生這批的模型,例如 'workers-ai/@cf/...' 或 'claude-sonnet-4-6'。"
|
|
},
|
|
"tier": {
|
|
"type": "string",
|
|
"enum": ["shallow", "deep"],
|
|
"description": "shallow = Workers AI 淺萃;deep = Claude API 深萃(淺萃 JSON-fail/過稀時升級)。"
|
|
},
|
|
"extracted_at": {
|
|
"type": "integer",
|
|
"description": "萃取的 unix 時間(秒)。快照排序用。選填(graph 收件時可補)。"
|
|
}
|
|
}
|
|
},
|
|
"nodes": {
|
|
"type": "array",
|
|
"description": "節點層附帶資訊。【向量化分工(leo 2026-06-26,ingest#1 升格成契約)】ingest 在此【打標】哪些 token 要向量化 + embed 什麼;base/KBDB embed 模組【讀標執行】實際 embedding;ingest 自己不算向量。兩類節點(實體詞條 / wikilink 卡)都進 nodes[],謂詞向量見 triplets[].predicate_vector。",
|
|
"items": {
|
|
"type": "object",
|
|
"required": ["name"],
|
|
"additionalProperties": false,
|
|
"properties": {
|
|
"name": {
|
|
"type": "string",
|
|
"minLength": 1,
|
|
"description": "節點名(須對應某 triplet 的 subject/object 原字面)。實體詞條=正規名;wikilink 卡=卡標題。"
|
|
},
|
|
"id": {
|
|
"type": "string",
|
|
"description": "去重鍵。wikilink 卡用【檔名】→ 一卡一 node,被多條邊指到也只 embed 一次,不以出現次數重複。實體詞條用正規名。選填(無則以 name 去重)。"
|
|
},
|
|
"gloss": {
|
|
"type": "string",
|
|
"description": "一句話描述。base embed 對【名 + gloss 一起】embedding(實體同義詞字面差太遠,靠描述拉近)。選填(建議 deep tier 產)。"
|
|
},
|
|
"aliases": {
|
|
"type": "array",
|
|
"items": { "type": "string" },
|
|
"description": "同義詞(如『黃仁勳』/『Jensen Huang』)。base 歸一(collapse)成同一 node。選填。"
|
|
},
|
|
"embed": {
|
|
"type": "boolean",
|
|
"default": true,
|
|
"description": "【向量化打標】此節點要不要進向量庫。true=base 讀標去 embed(名+gloss);false=base 看到就不理(如結構符號/散文不該進 nodes[],真進了標 false)。預設 true(實體詞條與 wikilink 卡都要)。",
|
|
"$comment": "ingest 打標,base 讀標執行。embed 動作歸 base embed 模組,ingest 不算向量。"
|
|
},
|
|
"entity_type": {
|
|
"type": "string",
|
|
"enum": ["person", "event", "product", "market", "org"],
|
|
"description": "節點類型提示。graph 最終決定;ingest 只提示。選填。"
|
|
}
|
|
}
|
|
}
|
|
},
|
|
"triplets": {
|
|
"type": "array",
|
|
"minItems": 1,
|
|
"description": "邊(關係)。ingest 只產原始 (s,p,o) + confidence + 謂詞向量打標。端點(s/o)以字面 match nodes[].name。",
|
|
"items": {
|
|
"type": "object",
|
|
"required": ["subject", "predicate", "object"],
|
|
"additionalProperties": false,
|
|
"properties": {
|
|
"subject": { "type": "string", "minLength": 1, "description": "主詞(實體名,須與 nodes[].name 對得上若有提供)" },
|
|
"predicate": { "type": "string", "minLength": 1, "description": "謂詞(關係)" },
|
|
"object": { "type": "string", "minLength": 1, "description": "受詞(目標實體或值)" },
|
|
"predicate_embed": {
|
|
"type": "boolean",
|
|
"default": true,
|
|
"description": "【謂詞向量化打標】謂詞要不要 embed。base 讀標 → embed【謂詞裸詞,無描述】(謂詞同義詞字面本就近,如『參考』/『參照』,裸詞 embed 即自動聚類),存 edge 的 predicate_vector。為支援『關係過濾』查詢(查『參考』不漏『參照』)→ 預設 true。embed 動作歸 base,ingest 只打標。",
|
|
"$comment": "ingest 打標,base 讀標執行 embed。"
|
|
},
|
|
"confidence":{ "type": "number", "minimum": 0, "maximum": 1, "default": 1.0, "description": "萃取可信度。淺萃可附自評;graph 不據此過濾,只記錄。" }
|
|
}
|
|
}
|
|
}
|
|
},
|
|
"$comment": "禁止欄位(graph 領域,ingest 絕不可送): id(節點去重鍵的 id 例外,那是 ingest 提供的去重鍵非 record id) / clusters / bridge_score / created_at / updated_at / 以及 triplet 上的 subject_entity_type|object_entity_type(類型只走 nodes[])。【向量化分工】ingest 打標(embed/predicate_embed + 帶 gloss/aliases),base/KBDB embed 模組讀標執行 embedding,ingest 不算向量。結構符號(>>/←)與給人讀的散文(## 摘要)不進 envelope。"
|
|
}
|