Rate Limits

Aqta enforces rate limits per API key to ensure fair usage and system stability. Limits vary by tier.

Limits by tier

Tier	Requests / month	Rate limit	Burst
Free	500	5 / min	10
Starter	10,000	100 / min	200
Pro	100,000	1,000 / min	2,000
Enterprise	Unlimited	Custom	Custom

Burst limit

Burst lets you briefly exceed the per-minute rate, useful for spiky workloads. For example, Free tier allows 10 requests in a short burst even though the sustained rate is 5 / min.

Model availability by tier

Free: cost-effective models only:

GPT-4o mini, GPT-3.5 Turbo
Claude 3 Haiku
Gemini 1.5 Flash

Starter, Pro, Enterprise: all models:

GPT-4o, GPT-4 Turbo
Claude 3.5 Sonnet, Claude 3 Opus
Gemini 1.5 Pro, Gemini 2.0 Flash
Perplexity Sonar Pro

How rate limiting works

Per-minute window (sliding)

Requests are counted in a rolling 60-second window:

Time (s):    0    10   20   30   40   50   60
Requests:    3     2    0    0    0    0    0
In window:   3     5    5    5    5    5    5  ← all still within last 60s

Once you hit the limit, requests return 429 until the window clears.

Monthly limit

Resets on the 1st of each month at 00:00 UTC.

If you exhaust your monthly quota mid-month, all requests return 429 until the next cycle. Upgrading your tier restores access immediately.

Rate limit headers

Every response includes:

http
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 94
X-RateLimit-Reset: 1743465600

Header	Description
`X-RateLimit-Limit`	Max requests per minute for your tier
`X-RateLimit-Remaining`	Requests left in the current window
`X-RateLimit-Reset`	Unix timestamp when the window resets

Handling 429 errors

Error response

json
{
  "error": {
    "message": "Rate limit exceeded. Retry after 30 seconds.",
    "type": "rate_limit_error",
    "code": "rate_limit_exceeded",
    "retry_after": 30
  }
}

Retry with backoff (Python)

python
import time
from openai import OpenAI, RateLimitError

client = OpenAI(
    base_url="https://api.aqta.ai/v1",
    api_key="sk-aqta-your-key-here",
)

def chat_with_retry(messages, max_retries=3):
    for attempt in range(max_retries):
        try:
            return client.chat.completions.create(
                model="gpt-4o",
                messages=messages,
            )
        except RateLimitError as e:
            if attempt == max_retries - 1:
                raise
            wait = 2 ** attempt  # 1s, 2s, 4s
            print(f"Rate limited, retrying in {wait}s...")
            time.sleep(wait)

Retry with backoff (JavaScript)

typescript
import OpenAI, { RateLimitError } from 'openai';

const client = new OpenAI({
  baseURL: 'https://api.aqta.ai/v1',
  apiKey: 'sk-aqta-your-key-here',
});

async function chatWithRetry(messages: OpenAI.ChatCompletionMessageParam[], maxRetries = 3) {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    try {
      return await client.chat.completions.create({ model: 'gpt-4o', messages });
    } catch (err) {
      if (err instanceof RateLimitError && attempt < maxRetries - 1) {
        const wait = Math.pow(2, attempt) * 1000;
        await new Promise(r => setTimeout(r, wait));
      } else {
        throw err;
      }
    }
  }
}

Best practices

Watch the headers: check X-RateLimit-Remaining before it hits zero:

python
# After each request, inspect remaining quota
response = client.chat.completions.create(...)
# Access via response.headers if using raw HTTP, or monitor in the dashboard

Exponential backoff: don't hammer the API after a 429. Start at 1s, double each retry.

Batch when possible: one request with multiple items is better than many individual requests.

Cache repeated prompts: identical prompts return the same result; cache at the application layer to avoid redundant calls.

Limits and edge cases

Streaming requests count as one request, regardless of stream duration.
Failed requests (4xx, 5xx) still count toward your rate limit.
Rate limits are per API key, not per account. Create multiple keys to distribute load across services.

Upgrading

Visit app.aqta.ai/pricing to upgrade. New limits apply immediately, no downtime.

Need a temporary limit increase for a launch or batch job? Email hello@aqta.ai.

Next steps

Authentication, get your API key
API Endpoints, available endpoints
Pricing, compare tiers

Questions? hello@aqta.ai