Rate limits
The platform enforces three different limits. Each one fires a 429 with a different code.
Per-caller RPM
Section titled “Per-caller RPM”Across all your /v1/* endpoints combined.
| Setting | Default | Env var |
|---|---|---|
| Requests per minute | 60 | RATE_LIMIT_CALLER_RPM |
| Burst (consecutive requests over the smooth rate) | 10 | RATE_LIMIT_CALLER_BURST |
Implemented as a token bucket per caller key.
Hitting it returns:
{ "code": "RATE_LIMITED", "error": "Too many requests" }Headers:
X-RateLimit-Limit: 60X-RateLimit-Remaining: 0X-RateLimit-Reset: 1700000000Retry-After: 12Per-profile concurrency
Section titled “Per-profile concurrency”Across simultaneous running workspaces for a given profile.
| Setting | Default | Env var |
|---|---|---|
| Concurrency | 5 | RATE_LIMIT_PROFILE_CONCURRENCY |
Hitting it returns 429 with code: "CONCURRENCY_LIMIT". The task is rejected — submit it later or scale the limit.
Per-task budget / turn caps
Section titled “Per-task budget / turn caps”Not server-throttled; enforced inside the agent runtime.
max_turns— max iterations the agent does in a single task. Default 200 (config:MAX_TURNS).max_budget_usd— total spend across model + tool costs. Set per-task or per-profile.
Hitting either ends the task with status=completed and a termination_reason (agent_finished_in_limit for turns, budget_cap for budget).
What about Anthropic’s rate limits?
Section titled “What about Anthropic’s rate limits?”Vonzio routes traffic through your own Anthropic API key (or your local Ollama instance). When Anthropic returns a 429, Vonzio surfaces it as MODEL_ERROR in your task result. The retry handler does not auto-retry rate limit errors — you’d just compound the problem.
If you’re hitting Anthropic’s limits frequently, either upgrade your tier or stagger your playbook schedules.
Backoff strategy
Section titled “Backoff strategy”For polling-style clients:
- On 429 with
Retry-After, sleep that many seconds and retry. - Otherwise exponential backoff starting at 1s, max 60s, jittered.
- After 5 consecutive 429s, switch to the WebSocket — long-running task watches are essentially free over WS.
For real-time clients (the dashboard), the WebSocket already handles this — no caller-side backoff needed.