Limits | Respan Docs

Limits are guardrails on your gateway usage. They attach at three scopes you control: per API key, your whole organization, and per customer. Per-API-key limits (lifetime, recurring, and expiry) are configured on each key’s detail page. Organization spend caps and per-customer budgets share one form in Settings > Limits.

Limit scopes

A gateway request is checked against every applicable limit before the call is made. Crossing a warning threshold lets the request continue but fires a notification; hitting a block threshold rejects the request in real time. Notifications fire in both cases.

Respan gateway limits flow: a request is checked against per-API-key, organization, and customer limits, then continues to the LLM call (pass / warn) or is rejected (block), with notifications firing on warn and block.

This page leads with the most granular scope: per API key. We then cover the organization and customer scopes.

Per-API-key limits

Limits scoped to a single API key let you cap a specific integration, contractor, or environment key without touching organization-wide spend. Each key can carry its own budget, usage caps, and expiry.

Per-API-key limits are configured on that key’s detail page. Each limit has an on/off toggle, and the page has its own Save button (and a Revoke key action).

Section	Fields	Notes
Lifetime limit	Cost	Cumulative cap over the key’s entire lifetime. Never resets.
Recurring limits	Cost, Requests, Tokens	Reset each period.
Expiry controls	Expiry	`Never` or a specific date.
Notifications	Limit alerts	Notify configured channels when spend reaches an alert or block threshold.

Lifetime vs. recurring. A lifetime limit is cumulative and never resets. For example, a $50 lifetime cost cap on a demo key. A recurring limit resets each period. For example, 10,000 requests per day on a production key.

Open the API key

Navigate to Settings > API Keys and click the key you want to limit.

Set a lifetime limit (optional)

Under Lifetime limit, toggle on Cost to cap total spend over the key’s entire lifetime.

Set recurring limits (optional)

Under Recurring limits, toggle on any of Cost, Requests, or Tokens and set a value. These reset each period.

Set expiry (optional)

Under Expiry controls, set Expiry to a date, or leave it as Never.

Enable alerts (optional)

Toggle Limit alerts to notify your configured channels when spend reaches an alert or block threshold.

Save

Click Save.

Separate keys per environment. Create separate API keys for test and production instead of using one key for both. You can then give each its own limits and expiry — for example, a tight recurring cap on a test key and a higher budget on production.

Warn vs. block

Every limit acts in one of two ways:

Warn: usage crossing the threshold sends a notification but requests keep flowing. The organization spend cap’s Warning threshold works this way, firing the spend_cap_warning_threshold_reached webhook. This is the pass / warn path. Note that the request still reaches the LLM.

Block: usage hitting the threshold rejects further requests in real time. The organization Spend cap and the per-customer limits work this way. This is the block path, in this case, the request is rejected.

Organization spend cap

A hard limit on total LLM spend across your organization. Proxy requests are blocked in real time once the cap is reached, and a webhook fires when the warning threshold is crossed.

Go to Limits

Navigate to Settings > Limits.

Set the billing period

Under Organization spend cap, choose the Billing period, this is how often the spend cap will reset (for example, Monthly).

Set the spend cap

Set Spend cap (USD) to your hard limit. Proxy requests are blocked in real time once the cap is reached. Leave it as Unlimited for no cap.

Set a warning threshold (optional)

Set Warning threshold (USD) to a value below the cap. When crossed, the spend_cap_warning_threshold_reached webhook fires. Leave as Not set to skip warnings.

Save

Click Save. Changes take effect immediately.

Customer limits

Monthly spending limits applied to LLM requests associated with each customer_identifier. These enforce the block path per customer.

Customer monthly budget: the monthly spend allowed per customer. Can be overridden per customer via the API.
Customer rate limit (requests/min): requests are blocked when a customer exceeds this rate.

Set both under Customer limits on the same page, then click Save.

Notifications

The organization spend cap’s Warning threshold fires the spend_cap_warning_threshold_reached webhook when crossed. Per-API-key Limit alerts notify your configured channels when spend reaches an alert or block threshold.

To learn how alert channels and webhooks are set up, see Monitors & notifications.

Looking for API rate limits? See API rate limits.