//DOCS 30 Minutes

Local walkthrough from gateway startup to promoted artifact.
GitHub Docs

Punk in 30 minutes

Punk sits between your agent and its LLM provider. It observes traffic, caches what's safe to cache, learns deterministic artifacts from repeated work, proves them with replay and shadow evaluation before they ever serve a user, and explains every routing decision. This guide takes you from zero to your first promoted artifact in about 30 minutes — every stage works fully offline, no API keys required.

Hosted reference: cheaperfastersafer.com. Everything below runs locally on http://localhost:4100.

What you need: Bun 1.2+ and a clone of github.com/dbhurley/punk.


0–5 min — Run the gateway and the demo

bun install
bun run dev     # gateway + dashboard + learning loop on http://localhost:4100
bun run demo    # second terminal: the full loop, narrated

The demo drives a simulated support-triage agent through the entire runtime loop: live traffic, cache hits, SOM web fetches, artifact synthesis from traces, replay + shadow proof, promotion, and finally serving traffic from the artifact at ~$0 and ~2ms. With no OPENAI_API_KEY set, a deterministic mock provider stands in for the model — the runtime mechanics are identical.

Open http://localhost:4100. Run bun run demo again: Punk remembers what it proved, and the optimized share climbs. That compounding is the product.

5–10 min — Repoint your existing app (zero code change beyond the constructor)

If your agent uses any OpenAI-compatible client, integration is one line:

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "http://localhost:4100/v1",        // was: api.openai.com (or unset)
  defaultHeaders: { "X-Punk-App": "my-app" }, // optional but recommended
});
// every existing client.chat.completions.create(...) call now flows through Punk

Set OPENAI_API_KEY in the gateway's environment and Punk passes your traffic through to the live provider unchanged — same request, same response shape, streaming included. Nothing else in your codebase changes. See examples/ for Vercel AI SDK and LangChain equivalents; they're the same one-line move.

The headers are how Punk attributes work: X-Punk-App (application), X-Punk-Agent (agent identity), X-Punk-Subject (pseudonymous end user — also a cache-key safety dimension, so one user's cached answer never leaks to another).

10–15 min — Read your first route explanation

Send a request twice, then look at it in the dashboard (http://localhost:4100 → Runs).

Every response carries two headers: x-punk-run-id and x-punk-route. The first request routes live — Punk observed it, fingerprinted it, and stored the trace. The identical second request routes exact_cache. Click the run and read its RouteExplanation: the route taken, the reason, the alternatives that were rejected and why, the policy verdict, and the estimated saved cost/latency.

This is the core mental model: Punk observes first, optimizes only what it can justify, and shows its work. Nothing is served from a cache or artifact without an explanation you can audit, and any optimized route that fails falls back to the live provider.

You can also fetch it raw:

curl -s http://localhost:4100/api/v1/runs/<runId> | jq .run.routeExplanation

15–25 min — Declare your tools, send feedback

The gateway sees model traffic; the SDK lets it see tools. Install @punk/sdk (bun add @punk/sdk) and wrap your tools with a declared side-effect level:

import { Punk } from "@punk/sdk";
const punk = new Punk({ app: "my-app", agent: "my-bot", subject: "user-123" });

const lookupAccount = punk.traceTool({
  name: "crm.lookupAccount",
  sideEffectLevel: 1,  // read-only
  ttlSeconds: 300,     // participate in the tool-result cache
  execute: async (args: { accountId: string }) => crm.get(args.accountId),
});

const sendEmail = punk.traceTool({
  name: "email.send",
  sideEffectLevel: 3,  // user-visible write: never cached, suppressed in replay/shadow
  execute: async (args: { to: string; body: string }) => mailer.send(args),
});

The side-effect levels (PRD §17) determine what Punk is allowed to do with a tool:

LevelMeaningPunk's behavior
0Pure computationcacheable, freely replayable
1Read-only externalcacheable with TTL, replayable
2Reversible/idempotent writerequires idempotency; gated
3User-visible write (email/Slack/tickets)never cached; suppressed in replay/shadow; policy-gated
4High-impact (payments, deletion, permissions)live + approval by default

Undeclared tools default to level 3. Punk is conservative by design: it will not optimize what it cannot classify.

Then close the loop with feedback — it feeds pattern stability and artifact confidence:

const r = await punk.chat({ model: "gpt-4o", messages: [...] });
await punk.feedback(r.runId, 1);                        // or
await punk.feedback(r.runId, -1, "correct answer here"); // corrections matter most

25–30 min — Watch the learning report, promote your first artifact

The learning loop runs on a timer; force a pass to see it now:

const report = await punk.learningTick();
console.log(report); // patterns discovered, artifacts synthesized, promotion eligibility

Send the same shaped request (with varying values) ~5+ times so a pattern forms, then check the dashboard's Patterns and Artifacts pages — or punk.patterns() / punk.artifacts(). When the synthesizer induces a candidate artifact from your traces, it must earn promotion:

  1. Replay — the artifact re-runs against historical traces and must reproduce the recorded outputs.
  2. Shadow — it runs silently beside live traffic (side effects suppressed) and must agree with the live model.

Only then does promotion open up:

await punk.promoteArtifact(artifactId);

(Or click Promote in the dashboard. PUNK_AUTO_PROMOTE=true allows hands-free promotion, but only for side-effect-free artifacts that pass every gate.) After promotion, matching requests route artifact: deterministic output, ~zero cost, ~2ms, still policy-checked, still explained, still monitored for drift — degradation quarantines the artifact and traffic returns to live.

That's the loop: observe → cache → learn → prove → promote → compound.


Environment variables

VariableEffect
OPENAI_API_KEY / OPENAI_BASE_URLlive OpenAI-compatible pass-through provider (gpt-* and default models)
ANTHROPIC_API_KEY / ANTHROPIC_BASE_URLlive Anthropic Messages backend (claude-* models route here; gateway also serves the Anthropic-compatible /v1/messages endpoint)
PUNK_PROVIDER=mockforce the deterministic offline simulator for all models
PUNK_DATABASE_URLPostgres (Neon-compatible) storage; default is single-file SQLite
PUNK_DB_PATHoverride the SQLite path (data/punk.db); delete the file for a fresh start
PUNK_API_KEYrequire bearer auth on the gateway and API
PUNK_AUTO_PROMOTE=truehands-free promotion of side-effect-free artifacts that pass every gate
PUNK_RETENTION_DAYStrace retention sweep window (default 90)
PLASMATE_URL=ws://127.0.0.1:9222use Punk's Plasmate-compatible AWP adapter for SOM compilation (built-in TS compiler otherwise)
MESHGUARD_URLdelegate policy evaluation to a MeshGuard-compatible control plane (embedded engine otherwise)

Production notes

  • Auth: set PUNK_API_KEY so the gateway and API require Authorization: Bearer. Pass the same key to SDK clients (new Punk({ apiKey })) or your OpenAI client's apiKey.
  • Storage: set PUNK_DATABASE_URL to a Postgres/Neon URL. The schema is identical to the SQLite adapter; the trace ledger is append-only and everything derived (patterns, artifacts, route stats) is rebuildable from it.
  • Scale-out: shadow evaluation, learning ticks, and retention sweeps are durable jobs. The gateway embeds one worker; run bun run worker N times against the same Postgres to scale out — job claims are atomic, retries back off exponentially, and exhausted jobs dead-letter for inspection (GET /api/v1/jobs).
  • Policies: policies/*.yaml uses the MeshGuard AgentPolicy schema. Policy denials return blocked routes with the verdict in the route explanation and an audit event.
  • Canary rollout: enable the tenant setting canary_enabled (PUT /api/v1/settings {"key":"canary_enabled","value":true}) and promotions stop jumping straight to full traffic. A promoted artifact enters state canary serving 1% of its pattern's requests (admission is a deterministic hash of the run id, so a given request never flaps); the other 99% are served live and shadow-evaluate the canary, so evidence accrues from 100% of traffic. The learning loop graduates one rung per tick — 1% → 10% → 25% → stable (full traffic) — once a rung accumulates ≥25 clean passes with zero live failures; any live failure holds the rung, and a >10% window fail rate or collapsed confidence demotes the artifact to degraded (traffic returns to live). Every rung is audited and fires the artifact.promoted webhook with the rung in the payload. The setting defaults to false (promote = full traffic immediately, which keeps the local demo fast) — production tenants should turn it on.
  • Evidence APIs: GET /api/v1/runs/:id/replay-bundle exports a self-contained JSON document (run, reconstructed replay bundle, full trace, artifact ids evaluated against it) for offline audit; POST /api/v1/artifacts/:id/replay (admin) re-proves an artifact against its provenance traces on demand and returns the pass rate — also a button on the artifact detail page. GET /api/v1/runs/:id/integrity verifies the run's tamper-evident trace hash chain: every trace event is sha256-chained to its predecessor at write time, so out-of-band edits to the ledger are detectable.

FAQ

Is my data isolated per tenant? Yes. Every entity — runs, traces, caches, patterns, artifacts — carries a tenant id, and cache keys include tenant/app/subject as safety dimensions. There is no cross-tenant learning from raw traces. Artifacts are induced only from your own traffic.

What never gets optimized? Side effects. Tools at level ≥ 2 are never served from cache; level 3–4 actions are suppressed in replay and shadow runs and gated by policy in live ones. Undeclared tools are treated as level 3. Patterns the learner can't verify as stable get negative-cached so Punk stops trying, rather than guessing. And when in doubt, the router fails open to the live provider — the worst case is the cost you were already paying.

How do I roll back? Per artifact: the rollback/quarantine actions in the dashboard or POST /api/v1/artifacts/:id/rollback — traffic returns to live immediately, and the evidence trail stays. Drift detection does this automatically when an artifact starts disagreeing with reality. Whole system: it's a gateway; point your baseURL back at the provider and you're out.

Do I have to use the SDK? No. The base-URL swap gets you observation, caching, policy, learning, and artifacts for model traffic. The SDK adds tool tracing, tool-result caching, side-effect declaration, feedback, and SOM web fetch. Start with the URL swap; add the SDK where the extra trace fidelity pays for itself.

Next