Rate limits

The platform enforces three different limits. Each one fires a 429 with a different code.

Per-caller RPM

Across all your /v1/* endpoints combined.

Setting	Default	Env var
Requests per minute	60	`RATE_LIMIT_CALLER_RPM`
Burst (consecutive requests over the smooth rate)	10	`RATE_LIMIT_CALLER_BURST`

Implemented as a token bucket per caller key.

Hitting it returns:

{ "code": "RATE_LIMITED", "error": "Too many requests" }

Headers:

X-RateLimit-Limit: 60
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1700000000
Retry-After: 12

Per-profile concurrency

Across simultaneous running workspaces for a given profile.

Setting	Default	Env var
Concurrency	5	`RATE_LIMIT_PROFILE_CONCURRENCY`

Hitting it returns 429 with code: "CONCURRENCY_LIMIT". The task is rejected — submit it later or scale the limit.

Per-task budget / turn caps

Not server-throttled; enforced inside the agent runtime.

max_turns — max iterations the agent does in a single task. Default 200 (config: MAX_TURNS).
max_budget_usd — total spend across model + tool costs. Set per-task or per-profile.

Hitting either ends the task with status=completed and a termination_reason (agent_finished_in_limit for turns, budget_cap for budget).

What about Anthropic’s rate limits?

Vonzio routes traffic through your own Anthropic API key (or your local Ollama instance). When Anthropic returns a 429, Vonzio surfaces it as MODEL_ERROR in your task result. The retry handler does not auto-retry rate limit errors — you’d just compound the problem.

If you’re hitting Anthropic’s limits frequently, either upgrade your tier or stagger your playbook schedules.

Backoff strategy

For polling-style clients:

On 429 with Retry-After, sleep that many seconds and retry.
Otherwise exponential backoff starting at 1s, max 60s, jittered.
After 5 consecutive 429s, switch to the WebSocket — long-running task watches are essentially free over WS.

For real-time clients (the dashboard), the WebSocket already handles this — no caller-side backoff needed.