PUNKthe adaptive runtime

//DOCS Onboarding Guide

Extended pilot and rollout guide from workflow selection through production operation.

Onboarding Guide

This is the extended guide for taking Punk from evaluation to first production traffic. If you only need a fast local walkthrough, start with Punk in 30 Minutes. Use this guide when you are planning a real pilot, a customer proof of value, or a team rollout.

Punk works best when the work is operational: repeated, evidence-bearing, policy-sensitive, or expensive enough that routing decisions matter. It is not primarily a generic prompt benchmark for one-off creative drafts or trivial questions.

Hosted reference: cheaperfastersafer.com. Local default: http://localhost:4100.

Who This Is For

ReaderWhat this guide helps you finish
EvaluatorPick the right workflow, run a credible local proof, and know what evidence to inspect.
App developerRoute an existing OpenAI- or Anthropic-compatible app through Punk without rewriting the app.
Workflow builderConvert repeated agent work into a workflow with inputs, gates, receipts, and replayable outputs.
OperatorSet up auth, storage, workers, provider keys, health checks, retention, and billing posture.
Security/GRC reviewerUnderstand identity, side effects, approval gates, redaction, audit, and private-network controls.
Team adminInvite users, issue scoped keys, and make the dashboard useful for a pilot team.

Outcomes

By the end of onboarding, you should have:

  • A local Punk gateway and dashboard.
  • One real workflow candidate selected for evaluation.
  • At least one observed agent/app routed through Punk.
  • Traceable runs with app, agent, and subject identity.
  • A governance posture for read-only, reversible, user-visible, and high-impact actions.
  • An optimization-evidence view that shows whether Punk found a stable pattern.
  • A decision on whether the pilot remains observe-only or starts optimized routing.
  • A production readiness checklist for storage, auth, workers, provider keys, and operations.

Phase 1: Choose The Right Work

Start with a workflow, not a random prompt. Punk proves value when a repeated job has structure that can be observed, governed, measured, and approved for optimization.

Good first candidates:

CandidateWhy it is a good fit
Support triageRepeated classification, structured output, low-risk reads, clear evaluation criteria.
Vendor reviewWeb evidence, invoice or profile data, thresholds, approval gates, reusable scorecards.
Pricing monitorWeb reads, structured extraction, snapshots, diffs, repeatable schedule.
Lead enrichmentWeb and CRM reads, field normalization, policy-controlled writes.
Compliance precheckEvidence collection, deterministic gates, receipts, human approval before action.
Internal research briefRepeatable source policy, citation/evidence burden, reusable templates.

Poor first candidates:

CandidateWhy to avoid it for the first proof
One-off creative writingSubjective quality dominates; repeatability and route proof are weak.
Tiny factual questionsThe baseline is already cheap and fast; savings will be uninteresting.
Unbounded brainstormingHard to define correctness, side effects, or replay evidence.
Fully manual workflowsPunk needs agent/app traffic to observe and improve.
High-impact writes on day oneStart in observe mode until policy and approval paths are clear.

Use the public Workflow Diagnostic before a pilot call or scoping session. It compares a standard serial agent loop with Punk workflow mode across repeatability, evidence burden, side-effect risk, governance gates, receipts, review value, cost, and latency. Treat it as a workflow-fit diagnostic, not a generic leaderboard.

Phase 2: Define The Pilot Contract

Before running traffic, write down the pilot contract in plain language.

QuestionExample answer
What job are we evaluating?"Review a new vendor and invoice, check the vendor site, flag spend over $5,000, and prepare a scorecard."
What input shape repeats?Vendor URL, invoice PDF or extracted invoice fields, requester, department, spend amount.
What output shape matters?JSON scorecard plus human-readable rationale and approval recommendation.
What evidence is required?Vendor website snapshot, invoice fields, policy threshold, risk flags.
What actions are risky?Emailing finance, creating a ticket, approving spend, storing vendor records.
What can be cached or promoted?Stable extraction, policy threshold logic, scorecard shell, known vendor profile.
Who approves promotion?Pilot operator or workflow owner.
What success metric matters?Lower cost after proof, fewer hidden side effects, receipts for every action, less manual review.

Use consistent identifiers from the first run:

IdentifierRecommendation
TenantOne tenant per company, team, or pilot customer.
AppProduct surface or integration name, for example finance-review-app.
AgentStable actor name, for example vendor-review-agent.
SubjectThe end user, account, customer, vendor, ticket, or workflow instance being acted on.

Punk uses these identifiers for trust, audit, policy, cost, pattern discovery, and routing. Missing identity makes the pilot harder to interpret.

Phase 3: Run Punk Locally

Install dependencies and start the gateway:

bun install
bun run dev

Open http://localhost:4100.

Default local behavior:

AreaDefault
Port4100
Databasedata/punk.db
Provideroffline mock when no matching live provider key is configured
Authopen dev mode when PUNK_API_KEY is unset
Workerembedded in the API process
Learningbackground tick enabled
Dashboardserved at /
Docsserved at /docs

If the dashboard is blank, use the Getting started panel to seed demo data. For repeatable optimization traffic, keep the gateway running and use another terminal:

bun run demo

Inspect the dashboard after the first demo run:

Dashboard areaWhat to verify
OverviewRecent activity, route mix, spend, savings, health.
RunsEvery model request has route, cost, latency, trace, and explanation.
PatternsRepeated request shapes are grouped.
ArtifactsCandidate optimized routes show evidence, promotion, and rollback state.
LearningEvidence notes explain what is eligible, blocked, or waiting for more samples.
WebCompact page snapshots show structured page state and token savings.
GovernancePolicies, users, keys, credentials, MCP servers, audit, approvals.
WorkflowsTemplates, graph editor, run panel, node timelines.

Local success criteria:

  • You can load /, /docs, and /health.
  • A chat, workflow, or demo run appears in Runs.
  • A repeated request can be recognized as a pattern.
  • The run detail explains route choice and cost.
  • The learning page explains whether a candidate is eligible or why it needs more evidence.

Phase 4: Connect One Real App In Observe Mode

The simplest integration is a base URL swap. Keep your existing OpenAI-compatible request shape, point it at Punk, and send identity headers.

curl http://localhost:4100/v1/chat/completions \
  -H 'content-type: application/json' \
  -H 'x-punk-app: finance-review-app' \
  -H 'x-punk-agent: vendor-review-agent' \
  -H 'x-punk-subject: vendor:acme-123' \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [
      {
        "role": "user",
        "content": "Review this vendor profile and return risk, rationale, and next action."
      }
    ]
  }'

For Anthropic-compatible apps, use the native Messages endpoint:

curl http://localhost:4100/v1/messages \
  -H 'content-type: application/json' \
  -H 'x-punk-app: finance-review-app' \
  -H 'x-punk-agent: vendor-review-agent' \
  -H 'x-punk-subject: vendor:acme-123' \
  -d '{
    "model": "claude-haiku-4-5",
    "max_tokens": 512,
    "messages": [
      {
        "role": "user",
        "content": "Review this vendor profile and return risk, rationale, and next action."
      }
    ]
  }'

If you set PUNK_API_KEY, include bearer auth:

-H 'authorization: Bearer <token>'

Start in observe mode for consequential work. Observe mode records what policy would do without blocking live work. Use optimize mode only after the trace, governance, and promotion paths are understood.

What to inspect after the first real app run:

EvidenceWhere
Model provider and key sourceRun detail trace and route explanation.
Prompt shape and token useRun detail.
App, agent, subjectRun detail and governance records.
Policy verdictRun detail and Governance audit.
Cost and latencyRuns table and Overview.
Repeated patternPatterns and Learning after several similar requests.

Phase 5: Add Provider Keys Deliberately

With no live provider key, Punk uses the deterministic mock provider for local work. For live calls, configure platform keys or tenant BYOK.

Platform env keys:

ProviderEnv vars
OpenAIOPENAI_API_KEY, optional OPENAI_BASE_URL
AnthropicANTHROPIC_API_KEY, optional ANTHROPIC_BASE_URL
OpenRouterOPENROUTER_API_KEY, optional OPENROUTER_BASE_URL
DeepSeekDEEPSEEK_API_KEY, optional DEEPSEEK_BASE_URL
Kimi/MoonshotMOONSHOT_API_KEY or KIMI_API_KEY, optional base URL

Tenant BYOK stores a tenant-owned provider key in the encrypted credentials vault. Set PUNK_ENCRYPTION_KEY before relying on stored credentials outside local dev.

Example tenant key:

curl -X POST http://localhost:4100/api/v1/credentials \
  -H 'content-type: application/json' \
  -H 'authorization: Bearer <token>' \
  -d '{
    "name": "openai",
    "provider": "openai",
    "secret": { "value": "sk-..." }
  }'

Read Configuration before mixing platform keys and tenant keys in production.

Phase 6: Classify Tools And Side Effects

Punk can govern direct model calls, SDK tool traces, workflow tool nodes, web sessions, and webhook effects. The important first step is classifying side effects.

LevelMeaningPilot posture
0Pure computationSafe to observe and optimize early.
1Read-only externalGood first pilot scope.
2Reversible or idempotent writeRequire identity, idempotency, and audit.
3User-visible writeStart observe-only; add approval rules before optimize.
4High-impact writeRequire explicit policy, approval, and rollback plan.

Undeclared SDK tools default to side-effect level 3. That is intentional: unknown tools are treated like user-visible writes.

For app code, use the TypeScript SDK when you need tool tracing, feedback, web fetch, or web sessions. Keep the first integration small:

  1. Route model calls through Punk.
  2. Add identity headers.
  3. Add tool tracing around one or two important tools.
  4. Mark tool side-effect levels.
  5. Verify trace and governance events.
  6. Expand only after the first path is observable.

Read SDK, API, and Governance for the exact client and HTTP surfaces.

Phase 7: Build The Workflow Version

Once the repeated job is visible, decide whether it should remain a chat/agent flow or become a workflow.

SurfaceUse when
GatewayYou need a low-friction base URL swap for an existing agent.
ChatA human is actively testing prompts and route behavior.
AgentOne scheduled or on-demand task can be represented as start -> llm -> output.
WorkflowThe job has multiple steps, branches, tools, web reads, gates, or structured outputs.
ChorusThe job needs governed multi-model answers with evidence receipts.

Workflow design checklist:

  • Inputs are explicit JSON, not hidden in prose.
  • Every web or external read has a named step.
  • Risky actions are separate from reasoning steps.
  • Side effects have declared levels.
  • Gates are stated as policy or workflow conditions.
  • Outputs have a stable schema.
  • Receipts and evidence are preserved.
  • The workflow can be reviewed without firing real side effects.
  • The owner can explain what would be promoted and what must remain live.

Start from the dashboard templates:

TemplateFirst use
support-triageTicket classification and conditional notification.
web-researchWeb fetch plus model summary.
pricing-monitorScheduled web reads and structured extraction.

Run the workflow several times with similar inputs. Then inspect Runs, Patterns, Learning, and Artifacts to see whether Punk found a stable route.

Phase 8: Governance And Security Review

Do this before optimize mode or production exposure.

Access and identity:

  • Set PUNK_API_KEY for protected API and gateway routes.
  • Bootstrap dashboard users with PUNK_ADMIN_EMAIL, PUNK_ADMIN_PASSWORD, and optionally PUNK_REQUIRE_LOGIN=true.
  • Use tenant API keys for apps rather than sharing the bootstrap admin token.
  • Pin keys to app ids when possible.
  • Send X-Punk-App, X-Punk-Agent, and X-Punk-Subject.

Secrets and credentials:

  • Set PUNK_ENCRYPTION_KEY before storing provider keys, workflow credentials, or MCP credentials.
  • Store provider BYOK keys under Governance -> Provider keys or /api/v1/credentials.
  • Do not put secrets in prompts, workflow inputs, or trace-visible metadata.

Network controls:

  • Leave PUNK_ALLOW_PRIVATE_WEB_FETCH=false in authenticated deployments unless private fetches are intended.
  • Leave PUNK_ALLOW_PRIVATE_WEBHOOKS=false unless private webhook targets are intended.
  • Review web session and webhook destinations before enabling writes.

Policy and approvals:

  • Keep policies in PUNK_POLICIES_DIR.
  • Declare allow, deny, and approval-required rules for the pilot app and agent.
  • Require approval for side-effect levels 3 and 4 unless the workflow owner explicitly accepts the risk.
  • Use observe mode first to see what would have been blocked.
  • Review audit events and pending approvals in the dashboard.

Data controls:

  • Decide retention_days.
  • Enable tenant setting redaction=true when tool payloads may include sensitive fields.
  • Review streaming_dlp=true if secrets or regulated identifiers could leave the gateway.
  • Use tripwires for sensitive decoy values that should never appear in prompts or outputs.

Security review done means:

  • A non-admin app key exists.
  • The app key is scoped or pinned where possible.
  • Provider and tool credentials are encrypted.
  • Risky actions are gated.
  • Private-network escape hatches are intentionally set.
  • Retention and redaction are chosen.
  • The operator can find audit records for a run, approval, and policy decision.

Phase 9: Evidence And Promotion

Punk does not promote a cheaper route just because it was cheaper once. The promotion loop is evidence-driven:

  1. Observe repeated request shapes.
  2. Group stable traffic into patterns.
  3. Prepare candidate optimized routes only when the task is stable enough.
  4. Check candidates against relevant history.
  5. Compare candidates against live traffic without firing side effects.
  6. Require policy or human approval when configured.
  7. Route matching future traffic through the cheapest safe proven path.

Force a learning pass during a pilot:

curl -X POST http://localhost:4100/api/v1/learning/tick \
  -H 'authorization: Bearer <token>'

Evidence to look for:

EvidenceWhy it matters
Pattern confidenceShows whether Punk sees stable repeated work.
Evidence notesExplain why a candidate is eligible, blocked, or waiting for more samples.
History checkShows how the candidate performed against relevant prior work.
Live comparisonCompares candidate behavior against live traffic without firing effects.
Artifact receiptRecords what was promoted, by whom, and with what evidence.
Route explanationShows why a future request used live, cache, semantic cache, or artifact.

Promotion readiness checklist:

  • The pattern represents real repeated work, not test noise.
  • Outputs have an objective or reviewable contract.
  • History checks passed against enough relevant work.
  • Live comparison did not create hidden side effects.
  • Governance allows the promoted route.
  • A human owner understands rollback.
  • Canary mode is enabled if the first production rollout should be gradual.

Enable canaries:

curl -X PUT http://localhost:4100/api/v1/settings \
  -H 'content-type: application/json' \
  -H 'authorization: Bearer <token>' \
  -d '{ "key": "canary_enabled", "value": true }'

Use the dashboard Learning view for the full evidence trail.

Phase 10: Production Deployment

Production posture depends on whether you run a long-lived server or a serverless deployment.

Core production environment:

AreaVariables
AuthPUNK_API_KEY, PUNK_ADMIN_EMAIL, PUNK_ADMIN_PASSWORD, PUNK_REQUIRE_LOGIN=true
StoragePUNK_DATABASE_URL or DATABASE_URL
SecretsPUNK_ENCRYPTION_KEY
ProvidersOPENAI_API_KEY, ANTHROPIC_API_KEY, OPENROUTER_API_KEY, or tenant BYOK
App URLsPUNK_APP_BASE_URL, PUNK_APP_HOST, PUNK_MARKETING_HOST, PUNK_MEET_HOST
DocsPUNK_DOCS_DIR if docs are not in the default repo location
WorkersPUNK_WORKER_POLL_MS, PUNK_WORKER_CONCURRENCY
Serverless cronPUNK_CRON_SECRET, CRON_SECRET
RetentionPUNK_RETENTION_DAYS
EmailRESEND_API_KEY, PUNK_EMAIL_FROM
BillingPUNK_BILLING_DISABLED, STRIPE_SECRET_KEY, STRIPE_WEBHOOK_SECRET, STRIPE_PRICE_*

Long-lived server:

bun run dev

For a separate worker process:

bun run worker

Serverless or Vercel-style deployment:

  • Configure Postgres or Neon-compatible storage.
  • Configure PUNK_CRON_SECRET and CRON_SECRET.
  • Schedule /api/v1/internal/tick once per minute.
  • Verify the tick endpoint drains learning, workflow, webhook, and retention jobs.
  • Confirm PUNK_FAILOVER_TO_MOCK is not silently serving simulated content for live customers unless explicitly intended.

Production readiness:

  • /health returns healthy.
  • Dashboard login is required.
  • Docs and health remain public as intended.
  • Gateway routes require the intended auth.
  • Database migrations or schema initialization have run.
  • Worker or cron tick is draining queues.
  • Provider calls use intended platform or tenant keys.
  • Retention sweep is configured.
  • Backups exist for the production database.
  • Billing and quota behavior matches the commercial plan.

Use the Production readiness panel and Configuration for deployment settings.

Phase 11: Team Rollout

Once the first workflow is observable and governed, bring in the pilot team.

Team setup:

  • Create user accounts or enable public signup only if intended.
  • Invite workflow owners, operators, security reviewers, and app developers.
  • Issue tenant API keys per app or integration.
  • Avoid sharing admin tokens.
  • Give each pilot workflow a named owner.

Working agreements:

AgreementWhy it matters
Every app sends app, agent, and subject identityMakes trust, audit, and routing explainable.
New tools declare side-effect levelPrevents silent unsafe writes.
Risky actions start observe-onlyLets policy review happen before blocking or optimizing.
Promotions require evidence reviewKeeps cheaper routes from becoming uncontrolled shortcuts.
Rollbacks are exercisedOperators know how to recover before real incidents.

Dashboard rituals:

CadenceReview
Daily during pilotFailed runs, blocked actions, pending approvals, top spend, learning attempts.
Twice weeklyPatterns, optimized routes, evidence, route mix, canary behavior.
WeeklyPolicy changes, retention/redaction settings, provider key usage, savings report.
Before production expansionSecurity checklist, rollback drill, customer-facing impact review.

Anti-Patterns

Avoid these common onboarding mistakes:

  • Judging Punk with a one-off creative prompt.
  • Using a prompt that is already too cheap to optimize meaningfully.
  • Sending all traffic without app, agent, and subject identity.
  • Enabling optimize mode before observing policy and route behavior.
  • Caching or promoting workflows that perform writes without idempotency or approvals.
  • Treating the workflow diagnostic as a model benchmark.
  • Promoting an optimization without enough evidence.
  • Storing provider keys without PUNK_ENCRYPTION_KEY.
  • Exposing open dev mode publicly.
  • Mixing pilot test noise with production-like traffic and then trusting the pattern.
  • Hiding all workflow structure in one huge prompt instead of naming inputs, steps, gates, and outputs.

Definition Of Done

Local evaluation is done when:

  • Punk runs locally.
  • Demo or chat traffic appears in Runs.
  • Repeated work appears in Patterns or an abstention is clearly explained in Learning.
  • The evaluator can explain the route, cost, latency, and policy verdict for a run.

Pilot integration is done when:

  • One real app routes through /v1/chat/completions or /v1/messages.
  • App, agent, and subject identity are present.
  • At least one workflow candidate is documented.
  • Tool side effects are classified.
  • Observe-mode policy results are reviewed.
  • Learning evidence is visible.

Production onboarding is done when:

  • Auth, login, storage, encryption, provider keys, workers, and retention are configured.
  • Governance policy covers the first workflow.
  • Risky actions require approval or are denied.
  • Promotion has enough evidence.
  • Rollback is understood.
  • The team has an operating cadence.
  • The owner can explain what Punk is allowed to optimize and what must remain live.

Troubleshooting During Onboarding

SymptomFirst checks
The response is simulatedConfirm provider env vars or tenant BYOK; check PUNK_PROVIDER and PUNK_FAILOVER_TO_MOCK.
Gateway returns 401Add Authorization: Bearer <token> or confirm the tenant API key is valid.
Dashboard requires loginUse the bootstrap admin from PUNK_ADMIN_EMAIL and PUNK_ADMIN_PASSWORD, or create a user in open dev mode.
No patterns appearSend several similar requests with stable identity and input shape.
No optimization promotesCheck Learning evidence notes, approval settings, and side-effect level.
Web fetch failsCheck PUNK_ALLOW_PRIVATE_WEB_FETCH, URL safety, and network access.
Webhook or MCP tool is blockedCheck policy, credentials, private-network controls, and side-effect level.
Scheduled agents do not runConfirm worker process or serverless tick is active.
Costs do not dropConfirm the workflow is repeated, stable, and eligible for cache, semantic cache, model substitution, or artifact routing.
Route stays liveReview route explanation; Punk may be correctly avoiding an unproven or unsafe shortcut.
  • Punk in 30 Minutes: the fast local walkthrough.
  • Workflow Diagnostic: public workflow-fit diagnostic.
  • Workflows: workflow templates, scheduling, credentials, MCP tools.
  • Chat & Agents: chat economics, save-as-agent, scheduled task agents.
  • SDK: TypeScript client and tracing helpers.
  • API: HTTP endpoints, auth, identity headers, response conventions.
  • Governance: policies, trust tiers, approvals, audit, observe mode.
  • Configuration: env vars, provider modes, auth, databases, tenant settings.
  • Billing & Usage: plans, quotas, usage metering, Stripe.
  • Troubleshooting: common symptoms and fixes.