@punk/sdk — API reference
The TypeScript client for the Punk gateway. Zero dependencies; works in Bun and Node 18+ (uses global fetch). Source: packages/sdk/. For a guided tour, read ONBOARDING.md.
import { Punk } from "@punk/sdk";
All response types (Run, Pattern, Artifact, SavingsSummary, SomSnapshot, …) are exported from the package. They are local mirrors of the canonical @punk/trace-schema contracts, copied so the published SDK stays dependency-free.
Constructor
new Punk(opts?: PunkOptions)
| Option | Type | Default | Sent as |
|---|---|---|---|
baseUrl | string | "http://localhost:4100" | — (trailing slashes stripped) |
apiKey | string | none | Authorization: Bearer <apiKey> on every request |
app | string | "default-app" | X-Punk-App on chat |
agent | string | none | X-Punk-Agent on chat |
subject | string | none | X-Punk-Subject on chat; subject field on tool-cache calls |
The client is stateless — construct one per (app, agent, subject) identity. apiKey is only needed when the gateway sets PUNK_API_KEY.
Chat
chat(params: ChatParams): Promise<ChatResult>
POST /v1/chat/completions (OpenAI-compatible) with the X-Punk-* identity headers. Forces stream: false — for streaming, use any OpenAI client pointed at the gateway instead.
interface ChatParams {
model: string;
messages: Array<{ role: string; content: string }>;
temperature?: number;
response_format?: unknown;
}
interface ChatResult {
content: string; // choices[0].message.content, "" if absent
runId: string; // x-punk-run-id response header, "" if absent
route: string; // x-punk-route response header, "live" if absent
raw: any; // full OpenAI-shaped response body
}
Errors: throws on any non-2xx (including policy blocks, which return the verdict in the body).
Tool tracing
traceTool<TArgs, TResult>(def: ToolDefinition<TArgs, TResult>): TracedTool<TArgs, TResult>
Wraps a tool function so invocations are traced into a run and read-only results participate in the tool-result cache.
interface ToolDefinition<TArgs, TResult> {
name: string;
sideEffectLevel?: SideEffectLevel; // 0–4; default 3 (conservative)
ttlSeconds?: number; // level <= 1 + ttl > 0 => cacheable
execute: (args: TArgs) => Promise<TResult> | TResult;
}
type TracedTool<TArgs, TResult> =
(args: TArgs, ctx?: { runId?: string }) => Promise<TResult>;
Behavior of the returned function, in order:
- Cache check (only if
sideEffectLevel <= 1andttlSeconds > 0):POST /api/v1/tool-cache/checkwith{ toolName, subject, args }. On a hit, returns the cached result without executing; if arunIdwas given, tracestool.completedwithcached: true. Network failure degrades to a miss. - Trace
tool.calledwith{ name, args, sideEffectLevel }— only whenctx.runIdis provided. - Trace
side_effect.plannedwith{ toolName, level, payload }— only forsideEffectLevel >= 2, before execution, so replay/shadow can suppress it. - Execute
def.execute(args). - Trace
tool.completedwith{ name, result }. - Cache store (cacheable tools only):
POST /api/v1/tool-cache/storewith the result and TTL.
Guarantees: without ctx.runId the tool executes untraced; trace and cache failures are swallowed (telemetry never breaks the tool call); errors thrown by execute propagate to the caller unchanged.
trace(runId: string, type: TraceEventType | string, payload: Record<string, unknown>): Promise<void>
POST /api/v1/trace with { runId, type, payload }. Appends a trace event to a run's ledger. Throws on non-2xx (unlike the internal tracing in traceTool, which is best-effort).
Feedback
feedback(runId: string, rating: 1 | -1, correction?: string): Promise<void>
POST /api/v1/runs/:id/feedback with { type: "rating", rating, correction }. Corrections are the strongest learning signal — they count against pattern stability and artifact confidence. Throws on non-2xx.
Semantic web (SOM)
fetchSom(url: string, opts?: { bypassCache?: boolean }): Promise<WebFetchResult>
POST /api/v1/web/fetch. Fetches a page and compiles it to a Semantic Object Model — regions and elements with stable ids — instead of raw HTML.
interface WebFetchResult {
som: SomSnapshot; // regions/elements, meta with byte counts
source: string; // "plasmate" | "builtin" | "cache"
cached: boolean; // served from the SOM cache
htmlBytes: number;
somBytes: number;
tokensSavedEstimate: number; // raw-HTML tokens you didn't spend
diff?: SomDiff; // semantic diff vs. previous snapshot (on refetch)
context: string; // compact prompt-ready text rendering
}
bypassCache: true forces a refetch; when a prior snapshot exists, diff reports semantically weighted changes (pricing changed is high-significance; footer noise is low) and an aggregate driftScore in [0,1]. Throws on non-2xx.
Web sessions & actions — punk.web.*
The perception→action loop: open a stateful session, act on SOM element ids, observe the result. Actions are protocol-level (follow links, fill/submit forms — no JS engine, by design) and governed server-side.
punk.web.openSession(url): Promise<WebSessionOpenResult> // POST /api/v1/web/sessions
punk.web.act(sessionId, intent): Promise<WebActResult> // POST /api/v1/web/sessions/:id/act
punk.web.closeSession(sessionId): Promise<{ ok: boolean }> // DELETE /api/v1/web/sessions/:id
punk.web.listSessions() // GET /api/v1/web/sessions
interface WebActionIntent {
action: "click" | "type" | "select" | "submit";
target: string; // SOM element id e_… (or region id r_form… for submit)
value?: string; // for type/select
}
interface WebActResult {
result: WebActionResult; // { ok, action, target, resolved?, navigated?, url, error? }
som: SomSnapshot; // fresh SOM after the action
diff?: SomDiff; // semantic diff vs. the pre-action snapshot
context: string; // prompt-ready rendering of the fresh SOM
}
Governance levels (PRD §17): type/select are level 0 (session-local form state), click is level 1 (read:web), and submit is level 3 — a write:web gated by the same policy engine as chat tools. Policy deny/approval_required on a submit returns 403 with the verdict; observe-mode keys can never submit ("observe-mode keys cannot perform web writes", 403) though their reads run normally. Every action is audited and every navigation destination (session open, link hrefs, form actions) is SSRF-guarded. Idle sessions auto-close after 5 minutes; sessions are tenant-private (another tenant's key sees 404).
Read APIs
savings(): Promise<SavingsSummary>
GET /api/v1/savings. Tenant rollup: totalRuns, liveRuns, optimizedRuns, blockedRuns, totalCostUsd, totalSavedUsd, ghostSavedUsd (observe-mode "would have saved" accounting), totalSavedMs, cacheHitRate, artifactHitRate, somTokensSaved.
patterns(): Promise<Pattern[]>
GET /api/v1/patterns, unwraps { patterns } ([] if absent). Each Pattern carries its lifecycle state (observed → candidate → … → promoted, or negative/retired), fingerprints, runCount, cost/latency averages, stabilityScore, and optimizableScore.
artifacts(): Promise<Artifact[]>
GET /api/v1/artifacts, unwraps { artifacts } ([] if absent). Each Artifact carries state, type, the declarative representation (interpreted DSL — never generated code), provenance (source/holdout run ids), Beta-posterior confidence (alpha, beta, confidence), and replay/shadow/live pass-fail counters.
artifactDetail(id: string): Promise<ArtifactDetail>
GET /api/v1/artifacts/:id.
interface ArtifactDetail {
artifact: Artifact;
evaluations: ArtifactEvaluation[]; // replay/shadow/live evidence rows
pattern: Pattern | null; // the source pattern
}
runDetail(id: string): Promise<RunDetail>
GET /api/v1/runs/:id.
interface RunDetail {
run: Run; // includes routeExplanation
events: TraceEvent[]; // the full append-only trace
sideEffects: SideEffectRecord[]; // planned/executed/suppressed/blocked
}
run.routeExplanation is the audit story: route, reason, rejected alternatives, policy verdict, cache/artifact details, estimated savings, fallback.
cacheStats(): Promise<CacheStats>
GET /api/v1/cache/stats → { stats: Array<{ cacheType, entries, hits }> } per tier (exact_response, tool_result, som, negative, …).
Learning lifecycle
learningTick(): Promise<LearningReport>
POST /api/v1/learning/tick. Forces one learning pass (it also runs on a timer inside the gateway). Returns at least:
interface LearningReport {
artifactsSynthesized: number;
promotionsEligible: string[]; // artifact ids that passed the gates
autoPromoted: string[]; // promoted hands-free (PUNK_AUTO_PROMOTE)
synthesisReports?: Array<Record<string, unknown>>;
[key: string]: unknown;
}
promoteArtifact(id: string): Promise<Artifact>
POST /api/v1/artifacts/:id/promote, unwraps { artifact }. The gateway enforces the promotion gate — replay evidence plus shadow agreement; side-effect-bearing artifacts additionally require operator action. Throws on non-2xx, including "gate not satisfied" rejections.
Tool-result cache (low level)
traceTool calls these for you; they're public for manual integration.
toolCacheCheck(toolName: string, args: unknown): Promise<{ hit: boolean; result?: unknown }>
POST /api/v1/tool-cache/check with { toolName, subject, args }. Never throws — any failure returns { hit: false }.
toolCacheStore(toolName: string, args: unknown, result: unknown, ttlSeconds?: number): Promise<void>
POST /api/v1/tool-cache/store with { toolName, subject, args, result, ttlSeconds }. Never throws — caching is an optimization, not a failure mode.
Error behavior summary
| Surface | On failure |
|---|---|
chat, trace, feedback, fetchSom, all read APIs, learningTick, promoteArtifact | throws Error("Punk API <METHOD> <path> failed: <status> <statusText> — <body, first 500 chars>") |
Tracing inside traceTool | swallowed — the tool call succeeds untraced |
toolCacheCheck | degrades to { hit: false } |
toolCacheStore | swallowed |
def.execute inside a traced tool | propagates unchanged |
There are no retries in the SDK; the gateway is local-first and the router fails open server-side.
Properties
punk.baseUrl, punk.app, punk.agent, punk.subject are readable on the instance. The API key is private.