feat(cypher-executor): implicit telemetry (LI SDD M1.2)

對應 .agents/specs/llm-interface/ Milestone 1.2。

新增 lib/telemetry.ts:
- hashApiKey(): SHA-256 截 16 hex 字元(不可逆,可聚合)
- recordTelemetry(): fetch fire-and-forget 寫 KBDB type=agent-telemetry block
- 設計:不阻擋主流程,錯誤 console.warn 不 throw
- 用 ctx.waitUntil 確保即使主 request 已回,背景仍會跑完

寫入點 3 處:
1. routes/webhooks-named.ts POST /webhooks/named (deploy) → deploy_success
2. routes/webhooks-named.ts POST /webhooks/named/:name/trigger →
   executeWebhookGraph 帶 ctx + userAgent,內部記 run_success / run_fail
3. routes/validate.ts POST /validate → validation_error (含 schema_failed / edge_node_missing)

executeWebhookGraph 簽名擴張:可選 ctx + userAgent,舊 caller (scheduled /
trigger_workflow / anonymous webhook) 不傳也 OK(telemetry 仍寫但無 ctx 加持)。

paused (workflow 因 claude_api 等等 callback resume) 算 run_success,
不污染 fail metric。

types.ts: 加 PLATFORM_API_KEY env (可選) + re-export ExecutionContext

不違反「業務邏輯走 WASM」鐵律:telemetry 是 orchestrator 觀測自身的能力,
跟 trigger_workflow / scheduled() 同類。

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
2026-05-16 15:33:35 +08:00
parent 56973558e2
commit d04e34ea35
10 changed files with 189 additions and 2 deletions
@@ -1,8 +1,9 @@
import type { Bindings, ExecutionGraph } from '../types';
import type { Bindings, ExecutionGraph, ExecutionContext } from '../types';
import { ExecutionError } from '../types';
import { GraphExecutor } from '../graph-executor';
import { graphSchema } from '../lib/schemas';
import { createComponentLoader } from '../lib/component-loader';
import { recordTelemetry } from '../lib/telemetry';
type WebhookRecord = {
graph: Record<string, unknown>;
@@ -29,6 +30,8 @@ export async function executeWebhookGraph(
triggerContext: Record<string, unknown>,
token: string,
apiKey?: string,
ctx?: ExecutionContext, // 可選 — 用 waitUntil 把 telemetry 推到背景
userAgent?: string, // MCP / SDK client 帶過來
): Promise<{ success: boolean; data?: unknown; error?: string; trace?: unknown; duration_ms: number }> {
const parsed = graphSchema.safeParse(graph);
if (!parsed.success) {
@@ -46,10 +49,30 @@ export async function executeWebhookGraph(
env.EXEC_CONTEXT,
);
const duration_ms = Date.now() - start;
// Implicit telemetry:成功 run(含 paused 也算「成功啟動」由 trigger_workflow 那層分類)
recordTelemetry(env, apiKey, {
event_type: 'run_success',
workflow_name: token,
duration_ms,
agent_user_agent: userAgent,
}, ctx);
return { success: true, data: result.data, duration_ms };
} catch (err) {
const duration_ms = Date.now() - start;
const errMsg = err instanceof Error ? err.message : String(err);
const isPaused = /workflow paused/i.test(errMsg);
// Implicit telemetrypaused 算 run_success;真錯才 run_fail
recordTelemetry(env, apiKey, {
event_type: isPaused ? 'run_success' : 'run_fail',
workflow_name: token,
error_code: isPaused ? 'paused_awaiting_resume' : 'execution_error',
duration_ms,
agent_user_agent: userAgent,
}, ctx);
if (err instanceof ExecutionError) {
const traceFormatted = err.trace.map(s => ({
node: s.nodeId,