Billing & Usage
Punk meters first and charges later. Every run is metered per tenant per day; plans carry quotas; the gateway enforces a monthly run quota and the creation endpoints enforce resource limits. Stripe is optional; the platform is fully functional on the free plan without it.
Metering
The runtime router finalizes every run with a route, a cost, and a savings number. That finalize is the single metering source: it upserts one row per (tenantId, day) (UTC) into usage_daily, incrementing runs, input_tokens, output_tokens, cost_usd, saved_usd, and blocked_runs. One cheap upsert per run, atomic per adapter (same shape as the route-arm counters).
Child gateway runs are the metered unit. A chat reply, a workflow llm node, and an agent run all loop back through the router and finalize as their own runs, so they are metered exactly once. The parent workflow_runs row aggregates cost and savings from its children and is not separately metered, so there is no double count.
A blocked run (policy or quota) still records a runs: 1 row with blocked_runs: 1, so blocks are visible in usage and auditable as route: "blocked" runs.
Plans
Plans are code-defined (not a table), in apps/api/src/billing.ts. The numbers below are sensible starting points; tune them as real usage arrives. A null limit means unlimited.
| Plan | Price | Runs / month | Workflows | Agents | Seats | MCP servers | Semantic cache serve |
|---|---|---|---|---|---|---|---|
| Free | $0 | 10,000 | 5 | 3 | 2 | 2 | no (shadow only) |
| Pro | $99/mo | 500,000 | 100 | 50 | 10 | 20 | yes |
| Enterprise | contact us | unlimited | unlimited | ∞ | ∞ | ∞ | yes |
A tenant's plan lives on its organizations row: plan (default free), plan_status (default active), and the Stripe linkage stripe_customer_id / stripe_subscription_id. Store accessors are getOrgPlan / setOrgPlan.
Enforcement
Gateway run quota. Before routing, serveChat checks the current month's run count against the plan's monthlyRuns. Over quota → the request is not run, a blocked run is recorded, and the caller gets a 402-style error in its wire format:
- OpenAI: `{ "error": { "type": "insufficient_quota", "code": "quota_exceeded",
- Anthropic: an
errorobject whose message carriesinsufficient_quota.
"message": "Monthly run quota reached for the free plan…" } }`
The check is cheap: the month's run count is cached in-memory per tenant (30s TTL) and incremented on each allowed run, so usage_daily is summed at most once per 30 seconds per tenant.
Creation limits. POST /api/v1/workflows, /api/v1/agents, /api/v1/mcp/servers, and /api/v1/orgs/active/invites count existing resources against the plan and return 403 { "error": "plan_limit", "message", "limit", "current", "plan" } when over. Seats count members plus pending invites.
Semantic-cache gate. A plan without semanticCache caps the tenant's semantic_cache setting at shadow, so evidence still accrues, but the cache never serves until the tenant upgrades.
Bypass (self-host / open dev)
Quotas are disabled (everything unlimited) when:
- the gateway runs in open dev mode: no
PUNK_API_KEY, no users, and PUNK_BILLING_DISABLED=trueis set explicitly.
PUNK_REQUIRE_LOGIN unset; or
The demo and tests run open-dev, so they never hit a quota. The moment login mode turns on (a user exists or PUNK_REQUIRE_LOGIN=true), quotas enforce, unless PUNK_BILLING_DISABLED keeps them off.
Usage API
GET /api/v1/usage[?days=30]→{ summary, plan, planDetail, quota, billingEnabled, stripe }.GET /api/v1/plans→{ plans: [...] }, the catalog.POST /api/v1/orgs/active/plan { plan }(owner/admin) → sets the org's plan
summary is the rolling getUsageSummary (totalRuns, totalCostUsd, totalSavedUsd, totalTokens, byDay[]); quota carries per-resource { used, limit, percent, exceeded } for runs, seats, workflows, agents, and MCP servers.
directly when Stripe is off (or for free/enterprise); with Stripe on for a paid plan it returns { checkoutUrl } to redirect to.
The dashboard shows a Billing & usage panel on the Overview (plan badge, this-month runs vs quota, spend, and savings) and a fuller Billing view (under the nav) with quota bars, a 30-day runs sparkline, and the plans comparison with upgrade buttons. Exceeded quotas render red with an upgrade CTA.
Stripe (optional)
Stripe activates only when STRIPE_SECRET_KEY is set. Punk calls the Stripe REST API directly with fetch (no stripe npm dependency), form-encoded.
When Stripe is on, these endpoints register:
POST /api/v1/billing/checkout { plan }(owner/admin) →{ url }, a CheckoutPOST /api/v1/billing/portal(owner/admin) →{ url }, the Stripe billingPOST /api/v1/billing/webhook(no bearer auth; verified by the
session (mode: subscription) for the active org. The org's Stripe customer is created on first use.
portal.
Stripe-Signature HMAC) → on checkout.session.completed and customer.subscription.updated it sets the org's plan + Stripe ids + plan_status; on customer.subscription.deleted it drops the org back to free.
Webhook signatures are verified per Stripe's scheme: the header is t=<timestamp>,v1=<sig> where v1 is the HMAC-SHA256 of "<timestamp>.<payload>" with STRIPE_WEBHOOK_SECRET.
When Stripe is off, POST /api/v1/orgs/active/plan simply flips the org row so the dashboard's upgrade still does something sane (enterprise shows "contact us").
Environment
| Variable | Purpose |
|---|---|
PUNK_BILLING_DISABLED | true → all quotas unlimited (self-host). Open dev bypasses too |
STRIPE_SECRET_KEY | enables the Stripe Checkout / portal / webhook endpoints |
STRIPE_WEBHOOK_SECRET | webhook signature signing secret |
STRIPE_PRICE_PRO | Stripe price id used as the Pro checkout line item |
STRIPE_PRICE_ENTERPRISE | Stripe price id for Enterprise (if sold self-serve) |