//DOCS Billing & Usage

Plans, quotas, usage metering, the usage API, and Stripe setup.
Open App

Billing & Usage

Punk meters first and charges later. Every run is metered per tenant per day; plans carry quotas; the gateway enforces a monthly run quota and the creation endpoints enforce resource limits. Stripe is optional; the platform is fully functional on the free plan without it.

Metering

The runtime router finalizes every run with a route, a cost, and a savings number. That finalize is the single metering source: it upserts one row per (tenantId, day) (UTC) into usage_daily, incrementing runs, input_tokens, output_tokens, cost_usd, saved_usd, and blocked_runs. One cheap upsert per run, atomic per adapter (same shape as the route-arm counters).

Child gateway runs are the metered unit. A chat reply, a workflow llm node, and an agent run all loop back through the router and finalize as their own runs, so they are metered exactly once. The parent workflow_runs row aggregates cost and savings from its children and is not separately metered, so there is no double count.

A blocked run (policy or quota) still records a runs: 1 row with blocked_runs: 1, so blocks are visible in usage and auditable as route: "blocked" runs.

Plans

Plans are code-defined (not a table), in apps/api/src/billing.ts. The numbers below are sensible starting points; tune them as real usage arrives. A null limit means unlimited.

PlanPriceRuns / monthWorkflowsAgentsSeatsMCP serversSemantic cache serve
Free$010,0005322no (shadow only)
Pro$99/mo500,000100501020yes
Enterprisecontact usunlimitedunlimitedyes

A tenant's plan lives on its organizations row: plan (default free), plan_status (default active), and the Stripe linkage stripe_customer_id / stripe_subscription_id. Store accessors are getOrgPlan / setOrgPlan.

Enforcement

Gateway run quota. Before routing, serveChat checks the current month's run count against the plan's monthlyRuns. Over quota → the request is not run, a blocked run is recorded, and the caller gets a 402-style error in its wire format:

  • OpenAI: `{ "error": { "type": "insufficient_quota", "code": "quota_exceeded",
  • "message": "Monthly run quota reached for the free plan…" } }`

  • Anthropic: an error object whose message carries insufficient_quota.

The check is cheap: the month's run count is cached in-memory per tenant (30s TTL) and incremented on each allowed run, so usage_daily is summed at most once per 30 seconds per tenant.

Creation limits. POST /api/v1/workflows, /api/v1/agents, /api/v1/mcp/servers, and /api/v1/orgs/active/invites count existing resources against the plan and return 403 { "error": "plan_limit", "message", "limit", "current", "plan" } when over. Seats count members plus pending invites.

Semantic-cache gate. A plan without semanticCache caps the tenant's semantic_cache setting at shadow, so evidence still accrues, but the cache never serves until the tenant upgrades.

Bypass (self-host / open dev)

Quotas are disabled (everything unlimited) when:

  • the gateway runs in open dev mode: no PUNK_API_KEY, no users, and
  • PUNK_REQUIRE_LOGIN unset; or

  • PUNK_BILLING_DISABLED=true is set explicitly.

The demo and tests run open-dev, so they never hit a quota. The moment login mode turns on (a user exists or PUNK_REQUIRE_LOGIN=true), quotas enforce, unless PUNK_BILLING_DISABLED keeps them off.

Usage API

  • GET /api/v1/usage[?days=30]{ summary, plan, planDetail, quota, billingEnabled, stripe }.
  • summary is the rolling getUsageSummary (totalRuns, totalCostUsd, totalSavedUsd, totalTokens, byDay[]); quota carries per-resource { used, limit, percent, exceeded } for runs, seats, workflows, agents, and MCP servers.

  • GET /api/v1/plans{ plans: [...] }, the catalog.
  • POST /api/v1/orgs/active/plan { plan } (owner/admin) → sets the org's plan
  • directly when Stripe is off (or for free/enterprise); with Stripe on for a paid plan it returns { checkoutUrl } to redirect to.

The dashboard shows a Billing & usage panel on the Overview (plan badge, this-month runs vs quota, spend, and savings) and a fuller Billing view (under the nav) with quota bars, a 30-day runs sparkline, and the plans comparison with upgrade buttons. Exceeded quotas render red with an upgrade CTA.

Stripe (optional)

Stripe activates only when STRIPE_SECRET_KEY is set. Punk calls the Stripe REST API directly with fetch (no stripe npm dependency), form-encoded.

When Stripe is on, these endpoints register:

  • POST /api/v1/billing/checkout { plan } (owner/admin) → { url }, a Checkout
  • session (mode: subscription) for the active org. The org's Stripe customer is created on first use.

  • POST /api/v1/billing/portal (owner/admin) → { url }, the Stripe billing
  • portal.

  • POST /api/v1/billing/webhook (no bearer auth; verified by the
  • Stripe-Signature HMAC) → on checkout.session.completed and customer.subscription.updated it sets the org's plan + Stripe ids + plan_status; on customer.subscription.deleted it drops the org back to free.

Webhook signatures are verified per Stripe's scheme: the header is t=<timestamp>,v1=<sig> where v1 is the HMAC-SHA256 of "<timestamp>.<payload>" with STRIPE_WEBHOOK_SECRET.

When Stripe is off, POST /api/v1/orgs/active/plan simply flips the org row so the dashboard's upgrade still does something sane (enterprise shows "contact us").

Environment

VariablePurpose
PUNK_BILLING_DISABLEDtrue → all quotas unlimited (self-host). Open dev bypasses too
STRIPE_SECRET_KEYenables the Stripe Checkout / portal / webhook endpoints
STRIPE_WEBHOOK_SECRETwebhook signature signing secret
STRIPE_PRICE_PROStripe price id used as the Pro checkout line item
STRIPE_PRICE_ENTERPRISEStripe price id for Enterprise (if sold self-serve)