Punk in 30 minutes
Punk sits between your agent and its LLM provider. It observes traffic, caches what's safe to cache, learns deterministic artifacts from repeated work, proves them with replay and shadow evaluation before they ever serve a user, and explains every routing decision. This guide takes you from zero to your first promoted artifact in about 30 minutes — every stage works fully offline, no API keys required.
Hosted reference: cheaperfastersafer.com. Everything below runs locally on http://localhost:4100.
What you need: Bun 1.2+ and a clone of github.com/dbhurley/punk.
0–5 min — Run the gateway and the demo
bun install
bun run dev # gateway + dashboard + learning loop on http://localhost:4100
bun run demo # second terminal: the full loop, narrated
The demo drives a simulated support-triage agent through the entire runtime loop: live traffic, cache hits, SOM web fetches, artifact synthesis from traces, replay + shadow proof, promotion, and finally serving traffic from the artifact at ~$0 and ~2ms. With no OPENAI_API_KEY set, a deterministic mock provider stands in for the model — the runtime mechanics are identical.
Open http://localhost:4100. Run bun run demo again: Punk remembers what it proved, and the optimized share climbs. That compounding is the product.
5–10 min — Repoint your existing app (zero code change beyond the constructor)
If your agent uses any OpenAI-compatible client, integration is one line:
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "http://localhost:4100/v1", // was: api.openai.com (or unset)
defaultHeaders: { "X-Punk-App": "my-app" }, // optional but recommended
});
// every existing client.chat.completions.create(...) call now flows through Punk
Set OPENAI_API_KEY in the gateway's environment and Punk passes your traffic through to the live provider unchanged — same request, same response shape, streaming included. Nothing else in your codebase changes. See examples/ for Vercel AI SDK and LangChain equivalents; they're the same one-line move.
The headers are how Punk attributes work: X-Punk-App (application), X-Punk-Agent (agent identity), X-Punk-Subject (pseudonymous end user — also a cache-key safety dimension, so one user's cached answer never leaks to another).
10–15 min — Read your first route explanation
Send a request twice, then look at it in the dashboard (http://localhost:4100 → Runs).
Every response carries two headers: x-punk-run-id and x-punk-route. The first request routes live — Punk observed it, fingerprinted it, and stored the trace. The identical second request routes exact_cache. Click the run and read its RouteExplanation: the route taken, the reason, the alternatives that were rejected and why, the policy verdict, and the estimated saved cost/latency.
This is the core mental model: Punk observes first, optimizes only what it can justify, and shows its work. Nothing is served from a cache or artifact without an explanation you can audit, and any optimized route that fails falls back to the live provider.
You can also fetch it raw:
curl -s http://localhost:4100/api/v1/runs/<runId> | jq .run.routeExplanation
15–25 min — Declare your tools, send feedback
The gateway sees model traffic; the SDK lets it see tools. Install @punk/sdk (bun add @punk/sdk) and wrap your tools with a declared side-effect level:
import { Punk } from "@punk/sdk";
const punk = new Punk({ app: "my-app", agent: "my-bot", subject: "user-123" });
const lookupAccount = punk.traceTool({
name: "crm.lookupAccount",
sideEffectLevel: 1, // read-only
ttlSeconds: 300, // participate in the tool-result cache
execute: async (args: { accountId: string }) => crm.get(args.accountId),
});
const sendEmail = punk.traceTool({
name: "email.send",
sideEffectLevel: 3, // user-visible write: never cached, suppressed in replay/shadow
execute: async (args: { to: string; body: string }) => mailer.send(args),
});
The side-effect levels (PRD §17) determine what Punk is allowed to do with a tool:
| Level | Meaning | Punk's behavior |
|---|---|---|
| 0 | Pure computation | cacheable, freely replayable |
| 1 | Read-only external | cacheable with TTL, replayable |
| 2 | Reversible/idempotent write | requires idempotency; gated |
| 3 | User-visible write (email/Slack/tickets) | never cached; suppressed in replay/shadow; policy-gated |
| 4 | High-impact (payments, deletion, permissions) | live + approval by default |
Undeclared tools default to level 3. Punk is conservative by design: it will not optimize what it cannot classify.
Then close the loop with feedback — it feeds pattern stability and artifact confidence:
const r = await punk.chat({ model: "gpt-4o", messages: [...] });
await punk.feedback(r.runId, 1); // or
await punk.feedback(r.runId, -1, "correct answer here"); // corrections matter most
25–30 min — Watch the learning report, promote your first artifact
The learning loop runs on a timer; force a pass to see it now:
const report = await punk.learningTick();
console.log(report); // patterns discovered, artifacts synthesized, promotion eligibility
Send the same shaped request (with varying values) ~5+ times so a pattern forms, then check the dashboard's Patterns and Artifacts pages — or punk.patterns() / punk.artifacts(). When the synthesizer induces a candidate artifact from your traces, it must earn promotion:
- Replay — the artifact re-runs against historical traces and must reproduce the recorded outputs.
- Shadow — it runs silently beside live traffic (side effects suppressed) and must agree with the live model.
Only then does promotion open up:
await punk.promoteArtifact(artifactId);
(Or click Promote in the dashboard. PUNK_AUTO_PROMOTE=true allows hands-free promotion, but only for side-effect-free artifacts that pass every gate.) After promotion, matching requests route artifact: deterministic output, ~zero cost, ~2ms, still policy-checked, still explained, still monitored for drift — degradation quarantines the artifact and traffic returns to live.
That's the loop: observe → cache → learn → prove → promote → compound.
Environment variables
| Variable | Effect |
|---|---|
OPENAI_API_KEY / OPENAI_BASE_URL | live OpenAI-compatible pass-through provider (gpt-* and default models) |
ANTHROPIC_API_KEY / ANTHROPIC_BASE_URL | live Anthropic Messages backend (claude-* models route here; gateway also serves the Anthropic-compatible /v1/messages endpoint) |
PUNK_PROVIDER=mock | force the deterministic offline simulator for all models |
PUNK_DATABASE_URL | Postgres (Neon-compatible) storage; default is single-file SQLite |
PUNK_DB_PATH | override the SQLite path (data/punk.db); delete the file for a fresh start |
PUNK_API_KEY | require bearer auth on the gateway and API |
PUNK_AUTO_PROMOTE=true | hands-free promotion of side-effect-free artifacts that pass every gate |
PUNK_RETENTION_DAYS | trace retention sweep window (default 90) |
PLASMATE_URL=ws://127.0.0.1:9222 | use Punk's Plasmate-compatible AWP adapter for SOM compilation (built-in TS compiler otherwise) |
MESHGUARD_URL | delegate policy evaluation to a MeshGuard-compatible control plane (embedded engine otherwise) |
Production notes
- Auth: set
PUNK_API_KEYso the gateway and API requireAuthorization: Bearer. Pass the same key to SDK clients (new Punk({ apiKey })) or your OpenAI client'sapiKey. - Storage: set
PUNK_DATABASE_URLto a Postgres/Neon URL. The schema is identical to the SQLite adapter; the trace ledger is append-only and everything derived (patterns, artifacts, route stats) is rebuildable from it. - Scale-out: shadow evaluation, learning ticks, and retention sweeps are durable jobs. The gateway embeds one worker; run
bun run workerN times against the same Postgres to scale out — job claims are atomic, retries back off exponentially, and exhausted jobs dead-letter for inspection (GET /api/v1/jobs). - Policies:
policies/*.yamluses the MeshGuard AgentPolicy schema. Policy denials returnblockedroutes with the verdict in the route explanation and an audit event. - Canary rollout: enable the tenant setting
canary_enabled(PUT /api/v1/settings {"key":"canary_enabled","value":true}) and promotions stop jumping straight to full traffic. A promoted artifact enters statecanaryserving 1% of its pattern's requests (admission is a deterministic hash of the run id, so a given request never flaps); the other 99% are served live and shadow-evaluate the canary, so evidence accrues from 100% of traffic. The learning loop graduates one rung per tick — 1% → 10% → 25% →stable(full traffic) — once a rung accumulates ≥25 clean passes with zero live failures; any live failure holds the rung, and a >10% window fail rate or collapsed confidence demotes the artifact todegraded(traffic returns to live). Every rung is audited and fires theartifact.promotedwebhook with therungin the payload. The setting defaults to false (promote = full traffic immediately, which keeps the local demo fast) — production tenants should turn it on. - Evidence APIs:
GET /api/v1/runs/:id/replay-bundleexports a self-contained JSON document (run, reconstructed replay bundle, full trace, artifact ids evaluated against it) for offline audit;POST /api/v1/artifacts/:id/replay(admin) re-proves an artifact against its provenance traces on demand and returns the pass rate — also a button on the artifact detail page.GET /api/v1/runs/:id/integrityverifies the run's tamper-evident trace hash chain: every trace event is sha256-chained to its predecessor at write time, so out-of-band edits to the ledger are detectable.
FAQ
Is my data isolated per tenant? Yes. Every entity — runs, traces, caches, patterns, artifacts — carries a tenant id, and cache keys include tenant/app/subject as safety dimensions. There is no cross-tenant learning from raw traces. Artifacts are induced only from your own traffic.
What never gets optimized? Side effects. Tools at level ≥ 2 are never served from cache; level 3–4 actions are suppressed in replay and shadow runs and gated by policy in live ones. Undeclared tools are treated as level 3. Patterns the learner can't verify as stable get negative-cached so Punk stops trying, rather than guessing. And when in doubt, the router fails open to the live provider — the worst case is the cost you were already paying.
How do I roll back? Per artifact: the rollback/quarantine actions in the dashboard or POST /api/v1/artifacts/:id/rollback — traffic returns to live immediately, and the evidence trail stays. Drift detection does this automatically when an artifact starts disagreeing with reality. Whole system: it's a gateway; point your baseURL back at the provider and you're out.
Do I have to use the SDK? No. The base-URL swap gets you observation, caching, policy, learning, and artifacts for model traffic. The SDK adds tool tracing, tool-result caching, side-effect declaration, feedback, and SOM web fetch. Start with the URL swap; add the SDK where the extra trace fidelity pays for itself.
Next
- docs/README.md — docs hub by reader type and task
- docs/CONCEPTS.md — core runtime model, routes, side effects, SOM, artifacts, approvals
- docs/SDK.md — TypeScript client reference
- docs/API.md — HTTP routes, auth, identity headers, and response conventions
- docs/CONFIGURATION.md — env vars, auth modes, database choices, and tenant settings
- docs/GOVERNANCE.md — policies, trust, approvals, audit, and observe mode
- docs/SEMANTIC-WEB.md — SOM fetch, cache, diffs, and prompt context
- docs/OPERATIONS.md — production shape, workers, backups, retention, security
- docs/TROUBLESHOOTING.md — common symptoms and fixes
- examples/ — OpenAI SDK, Vercel AI SDK, LangChain, native SDK, curl
- docs/PUNKBAR.md — PunkBar, the macOS menu bar app for savings and approvals