Rate Limits
Aqta enforces rate limits per API key to ensure fair usage and system stability. Limits vary by tier.
Limits by tier
| Tier | Requests / month | Rate limit | Burst |
|---|---|---|---|
| Free | 500 | 5 / min | 10 |
| Starter | 10,000 | 100 / min | 200 |
| Pro | 100,000 | 1,000 / min | 2,000 |
| Enterprise | Unlimited | Custom | Custom |
Burst limit
Burst lets you briefly exceed the per-minute rate — useful for spiky workloads. For example, Free tier allows 10 requests in a short burst even though the sustained rate is 5 / min.
Model availability by tier
Free — cost-effective models only:
- GPT-4o mini, GPT-3.5 Turbo
- Claude 3 Haiku
- Gemini 1.5 Flash
Starter, Pro, Enterprise — all models:
- GPT-4o, GPT-4 Turbo
- Claude 3.5 Sonnet, Claude 3 Opus
- Gemini 1.5 Pro, Gemini 2.0 Flash
- Perplexity Sonar Pro
How rate limiting works
Per-minute window (sliding)
Requests are counted in a rolling 60-second window:
Time (s): 0 10 20 30 40 50 60
Requests: 3 2 0 0 0 0 0
In window: 3 5 5 5 5 5 5 ← all still within last 60s
Once you hit the limit, requests return 429 until the window clears.
Monthly limit
Resets on the 1st of each month at 00:00 UTC.
If you exhaust your monthly quota mid-month, all requests return 429 until the next cycle. Upgrading your tier restores access immediately.
Rate limit headers
Every response includes:
X-RateLimit-Limit: 100 X-RateLimit-Remaining: 94 X-RateLimit-Reset: 1743465600
| Header | Description |
|---|---|
X-RateLimit-Limit | Max requests per minute for your tier |
X-RateLimit-Remaining | Requests left in the current window |
X-RateLimit-Reset | Unix timestamp when the window resets |
Handling 429 errors
Error response
{ "error": { "message": "Rate limit exceeded. Retry after 30 seconds.", "type": "rate_limit_error", "code": "rate_limit_exceeded", "retry_after": 30 } }
Retry with backoff (Python)
import time from openai import OpenAI, RateLimitError client = OpenAI( base_url="https://api.aqta.ai/v1", api_key="sk-aqta-your-key-here", ) def chat_with_retry(messages, max_retries=3): for attempt in range(max_retries): try: return client.chat.completions.create( model="gpt-4o", messages=messages, ) except RateLimitError as e: if attempt == max_retries - 1: raise wait = 2 ** attempt # 1s, 2s, 4s print(f"Rate limited, retrying in {wait}s...") time.sleep(wait)
Retry with backoff (JavaScript)
import OpenAI, { RateLimitError } from 'openai'; const client = new OpenAI({ baseURL: 'https://api.aqta.ai/v1', apiKey: 'sk-aqta-your-key-here', }); async function chatWithRetry(messages: OpenAI.ChatCompletionMessageParam[], maxRetries = 3) { for (let attempt = 0; attempt < maxRetries; attempt++) { try { return await client.chat.completions.create({ model: 'gpt-4o', messages }); } catch (err) { if (err instanceof RateLimitError && attempt < maxRetries - 1) { const wait = Math.pow(2, attempt) * 1000; await new Promise(r => setTimeout(r, wait)); } else { throw err; } } } }
Best practices
Watch the headers — check X-RateLimit-Remaining before it hits zero:
# After each request, inspect remaining quota response = client.chat.completions.create(...) # Access via response.headers if using raw HTTP, or monitor in the dashboard
Exponential backoff — don't hammer the API after a 429. Start at 1s, double each retry.
Batch when possible — one request with multiple items is better than many individual requests.
Cache repeated prompts — identical prompts return the same result; cache at the application layer to avoid redundant calls.
Limits and edge cases
- Streaming requests count as one request, regardless of stream duration.
- Failed requests (4xx, 5xx) still count toward your rate limit.
- Rate limits are per API key, not per account. Create multiple keys to distribute load across services.
Upgrading
Visit app.aqta.ai/pricing to upgrade. New limits apply immediately — no downtime.
Need a temporary limit increase for a launch or batch job? Email hello@aqta.ai.
Next steps
- Authentication — get your API key
- API Endpoints — available endpoints
- Pricing — compare tiers
Questions? hello@aqta.ai