Step-by-Step Fix
1. Read the 429 error response carefully before taking action
When the Sonar API returns a 429 error, the response body contains specific information about which limit you hit and how long to wait.
- Check the
Retry-Afterheader in the HTTP response. This tells you exactly how many seconds to wait before retrying. - Read the
messagefield in the JSON response body. It will typically indicate whether you hit the per-minute RPM limit or the daily quota. - If the message references "rate limit" or "too many requests" with a short window, you hit the RPM cap. If it references "daily quota" or "quota exceeded," you hit the 24-hour cap.
- Do not retry immediately. A retry within the rate limit window will return another 429 and may count against your quota depending on implementation.
2. Implement exponential backoff with jitter in your code
Retrying immediately after a 429 is the single most common mistake. Implement proper backoff logic so your application recovers automatically.
Python example with exponential backoff:
import time
import random
import requests
def call_sonar_api(payload, max_retries=5):
base_delay = 1.0
for attempt in range(max_retries):
response = requests.post(
"https://api.perplexity.ai/chat/completions",
headers={"Authorization": f"Bearer {API_KEY}"},
json=payload
)
if response.status_code == 200:
return response.json()
elif response.status_code == 429:
retry_after = int(response.headers.get("Retry-After", base_delay * (2 ** attempt)))
jitter = random.uniform(0, 0.5)
wait = retry_after + jitter
print(f"Rate limited. Waiting {wait:.1f}s before retry {attempt + 1}")
time.sleep(wait)
else:
response.raise_for_status()
raise Exception("Max retries exceeded")
Key principles:
- Start with at least 1 second after the first failure.
- Double the wait time after each subsequent failure (1s, 2s, 4s, 8s, 16s...).
- Add random jitter (0 to 500ms) to prevent thundering herd problems when multiple clients retry simultaneously.
- Respect the
Retry-Afterheader value when present — it overrides your backoff calculation. - Set a maximum retry limit (5 to 7 attempts is typical) to avoid infinite loops.
3. Throttle your request rate proactively to stay under 50 RPM
Instead of letting 429 errors occur and then backing off, design your application to never exceed the rate limit in the first place.
- Calculate your safe rate: 50 RPM = 1 request per 1.2 seconds. Add a small safety margin and target 1 request per 1.5 seconds (40 RPM effective rate).
- Use a rate-limiting library: In Python, use
ratelimitortenacity. In Node.js, usebottleneckorp-throttle. These handle the timing automatically. - Queue requests sequentially for batch jobs: If you are processing many items offline, use a simple queue that dispatches one request at a time with a fixed delay rather than sending all requests concurrently.
- Monitor your request timing: Log timestamps of every API call. Review logs to identify burst patterns that are triggering 429s.
Node.js example with Bottleneck:
const Bottleneck = require('bottleneck');
const limiter = new Bottleneck({
minTime: 1500, // 1.5 seconds between requests = ~40 RPM
maxConcurrent: 1
});
const safeApiCall = limiter.wrap(async (payload) => {
const response = await fetch('https://api.perplexity.ai/chat/completions', {
method: 'POST',
headers: { 'Authorization': `Bearer ${API_KEY}`, 'Content-Type': 'application/json' },
body: JSON.stringify(payload)
});
return response.json();
});
4. Spread batch workloads across the full day to avoid daily quota exhaustion
If your application processes large batches, concentrate requests at a steady rate throughout the day rather than running everything in one burst.
- Divide your daily request budget by 24 hours to find a safe hourly rate. For a 1,000 requests/day limit: 1,000 ÷ 24 = ~41 requests per hour, or roughly 1 request per 88 seconds.
- Schedule batch jobs with a cron that staggers processing: run smaller batches every few hours rather than one large batch at midnight.
- If your use case allows, cache Sonar API responses for identical or near-identical queries. Serving cached results does not consume API quota.
- For non-time-sensitive batch processing, consider running overnight across multiple UTC days rather than exhausting the full daily cap in one session.
5. Check your current usage in the Perplexity developer dashboard
Before assuming the problem is your code, verify your actual usage metrics.
- Log into your Perplexity account and navigate to the API settings or developer dashboard at perplexity.ai/api.
- Look for a usage dashboard that shows requests made today, remaining daily quota, and current RPM tier.
- If you have multiple applications sharing the same API key, the combined traffic from all of them counts toward a single account's limits. Consider splitting large workloads across multiple API keys if your organization has multiple accounts.
- Note your current tier's RPM and daily limits so you have a concrete target to design around.
6. Request a rate limit increase for production workloads
If your legitimate production use case consistently requires more than 50 RPM or more than 1,000 requests per day, apply for a higher tier.
- In the developer dashboard, look for a "Request limit increase" or "Contact sales" option.
- Prepare a brief description of your use case, expected peak RPM, daily request volume, and the application you are building.
- Include your current API usage metrics as evidence.
- For enterprise or high-volume use cases, email Perplexity at support.perplexity.ai directly to discuss custom tier arrangements.
- Allow 2 to 5 business days for limit increase requests to be reviewed and approved.
Why This Happens
Perplexity Sonar API rate limits exist because each API call consumes real compute resources — especially Sonar-Pro, Sonar-Reasoning, and Sonar-Deep-Research, which invoke much larger models and more extensive web retrieval pipelines. Without per-account limits, a single high-volume API user could monopolize infrastructure and degrade service quality for all other users.
The 50 RPM default is calibrated for typical developer and small-production workloads. Most interactive applications rarely exceed 5 to 10 RPM in real usage; the limit is primarily relevant for batch processing pipelines, automated content generation systems, or applications that fire multiple simultaneous requests per user action. The daily cap (which resets at midnight UTC) provides an additional safety net against runaway processes that might otherwise exhaust infrastructure capacity in a single burst.
Common Mistakes to Avoid
- Retrying immediately after a 429. A zero-delay retry will hit the same limit and return another 429. Always wait at least the
Retry-Afterduration specified in the response header. - Sending all batch requests at once. Even if your batch is small enough to fit within the daily quota, sending it all in a few seconds will exceed the 50 RPM limit. Always throttle to below 40 RPM with a rate-limiting library.
- Sharing one API key across multiple production apps without tracking combined usage. If two applications each think they have 50 RPM available, they will collectively exceed the 50 RPM account limit and trigger 429 errors seemingly at random.
- Not logging API responses during development. Without logs, you cannot tell whether failures are 429s (rate limit) or 5xx errors (service issue). Log status codes and response bodies for every call.
- Ignoring the daily quota until it is exhausted. Monitor your daily usage proactively via the developer dashboard. Build an alerting mechanism when daily usage exceeds 80% of quota so you can slow down before hitting zero.
- Assuming a 429 means the API is down. A 429 is a client-side throttle, not a server error. The Perplexity Sonar API service is functioning normally; your account has simply exceeded its rate limit. Do not report rate limit errors as outages.