What are the default Perplexity Sonar API rate limits?

The Perplexity Sonar API defaults to 50 requests per minute (RPM) on the base tier. There is also a daily request cap that varies by account tier — base accounts typically see 1,000 requests per day. These limits apply across all Sonar models (sonar, sonar-pro, sonar-reasoning, sonar-deep-research). Your tier and exact limits are visible in the Perplexity developer dashboard at perplexity.ai/api. Higher tiers with increased RPM and daily caps are available by request.

What does a Perplexity Sonar 429 error mean?

A 429 Too Many Requests response means you have exceeded one of two limits: your per-minute request rate (RPM) or your daily request quota. The response body typically includes a message field explaining which limit was hit and a Retry-After header indicating the number of seconds before you can retry. If the Retry-After header is missing, wait at least 60 seconds before retrying when you hit an RPM limit, or until the next UTC day when you hit the daily cap.

How do I implement exponential backoff for Perplexity Sonar 429 errors?

Exponential backoff means waiting progressively longer between retries after each failed request. A standard implementation starts with a 1-second wait after the first 429, doubles to 2 seconds after the second failure, 4 seconds after the third, and so on up to a maximum wait (typically 60-120 seconds). Add random jitter (a small random value like 0-500ms) to each wait time to prevent multiple clients from retrying in sync. Most HTTP client libraries have built-in retry logic you can configure, or you can implement it manually with a retry loop.

How do I request a higher Perplexity Sonar API rate limit?

Go to the Perplexity developer dashboard at perplexity.ai/api, navigate to the rate limits or quota section, and look for a 'Request limit increase' option. You will typically need to describe your use case, expected request volume, and the tier you need. Perplexity evaluates increases case by case. If you need higher limits urgently, contact Perplexity developer support at support.perplexity.ai with your account email, current usage metrics, and target RPM requirements.

Does using a more expensive Sonar model increase my rate limits?

Not automatically. Rate limits on the Sonar API are tied to your account tier, not to the specific model you call. Sonar-pro, sonar-reasoning, and sonar-deep-research each have higher per-call costs in tokens and dollars compared to the base sonar model, but they share the same RPM and daily request limits unless you have negotiated a higher tier. If you are consistently hitting limits, consider whether some use cases can be served by the lighter sonar model to reduce per-call latency and cost while staying within quota.

What is the difference between RPM limits and daily request limits on the Sonar API?

RPM (requests per minute) limits control how fast you can send requests in a short burst — exceeding 50 RPM triggers a 429 immediately. The daily request limit is a total quota for a 24-hour period (reset at midnight UTC); once you exhaust it, all further requests fail with 429 until the next day. To avoid both limits, design your application to spread requests evenly across the day (rather than bursting) and implement RPM throttling with a rate-limiting library like `ratelimit` in Python or `bottleneck` in Node.js.

Perplexity Sonar API Rate Limit: Fix 429 Errors and Scale Your Usage

Step-by-Step Fix

1. Read the 429 error response carefully before taking action

When the Sonar API returns a 429 error, the response body contains specific information about which limit you hit and how long to wait.

Check the Retry-After header in the HTTP response. This tells you exactly how many seconds to wait before retrying.
Read the message field in the JSON response body. It will typically indicate whether you hit the per-minute RPM limit or the daily quota.
If the message references "rate limit" or "too many requests" with a short window, you hit the RPM cap. If it references "daily quota" or "quota exceeded," you hit the 24-hour cap.
Do not retry immediately. A retry within the rate limit window will return another 429 and may count against your quota depending on implementation.

2. Implement exponential backoff with jitter in your code

Retrying immediately after a 429 is the single most common mistake. Implement proper backoff logic so your application recovers automatically.

Python example with exponential backoff:

import time
import random
import requests

def call_sonar_api(payload, max_retries=5):
    base_delay = 1.0
    for attempt in range(max_retries):
        response = requests.post(
            "https://api.perplexity.ai/chat/completions",
            headers={"Authorization": f"Bearer {API_KEY}"},
            json=payload
        )
        if response.status_code == 200:
            return response.json()
        elif response.status_code == 429:
            retry_after = int(response.headers.get("Retry-After", base_delay * (2 ** attempt)))
            jitter = random.uniform(0, 0.5)
            wait = retry_after + jitter
            print(f"Rate limited. Waiting {wait:.1f}s before retry {attempt + 1}")
            time.sleep(wait)
        else:
            response.raise_for_status()
    raise Exception("Max retries exceeded")

Key principles:

Start with at least 1 second after the first failure.
Double the wait time after each subsequent failure (1s, 2s, 4s, 8s, 16s...).
Add random jitter (0 to 500ms) to prevent thundering herd problems when multiple clients retry simultaneously.
Respect the Retry-After header value when present — it overrides your backoff calculation.
Set a maximum retry limit (5 to 7 attempts is typical) to avoid infinite loops.

3. Throttle your request rate proactively to stay under 50 RPM

Instead of letting 429 errors occur and then backing off, design your application to never exceed the rate limit in the first place.

Calculate your safe rate: 50 RPM = 1 request per 1.2 seconds. Add a small safety margin and target 1 request per 1.5 seconds (40 RPM effective rate).
Use a rate-limiting library: In Python, use ratelimit or tenacity. In Node.js, use bottleneck or p-throttle. These handle the timing automatically.
Queue requests sequentially for batch jobs: If you are processing many items offline, use a simple queue that dispatches one request at a time with a fixed delay rather than sending all requests concurrently.
Monitor your request timing: Log timestamps of every API call. Review logs to identify burst patterns that are triggering 429s.

Node.js example with Bottleneck:

const Bottleneck = require('bottleneck');

const limiter = new Bottleneck({
  minTime: 1500, // 1.5 seconds between requests = ~40 RPM
  maxConcurrent: 1
});

const safeApiCall = limiter.wrap(async (payload) => {
  const response = await fetch('https://api.perplexity.ai/chat/completions', {
    method: 'POST',
    headers: { 'Authorization': `Bearer ${API_KEY}`, 'Content-Type': 'application/json' },
    body: JSON.stringify(payload)
  });
  return response.json();
});

4. Spread batch workloads across the full day to avoid daily quota exhaustion

If your application processes large batches, concentrate requests at a steady rate throughout the day rather than running everything in one burst.

Divide your daily request budget by 24 hours to find a safe hourly rate. For a 1,000 requests/day limit: 1,000 ÷ 24 = ~41 requests per hour, or roughly 1 request per 88 seconds.
Schedule batch jobs with a cron that staggers processing: run smaller batches every few hours rather than one large batch at midnight.
If your use case allows, cache Sonar API responses for identical or near-identical queries. Serving cached results does not consume API quota.
For non-time-sensitive batch processing, consider running overnight across multiple UTC days rather than exhausting the full daily cap in one session.

5. Check your current usage in the Perplexity developer dashboard

Before assuming the problem is your code, verify your actual usage metrics.

Log into your Perplexity account and navigate to the API settings or developer dashboard at perplexity.ai/api.
Look for a usage dashboard that shows requests made today, remaining daily quota, and current RPM tier.
If you have multiple applications sharing the same API key, the combined traffic from all of them counts toward a single account's limits. Consider splitting large workloads across multiple API keys if your organization has multiple accounts.
Note your current tier's RPM and daily limits so you have a concrete target to design around.

6. Request a rate limit increase for production workloads

If your legitimate production use case consistently requires more than 50 RPM or more than 1,000 requests per day, apply for a higher tier.

In the developer dashboard, look for a "Request limit increase" or "Contact sales" option.
Prepare a brief description of your use case, expected peak RPM, daily request volume, and the application you are building.
Include your current API usage metrics as evidence.
For enterprise or high-volume use cases, email Perplexity at support.perplexity.ai directly to discuss custom tier arrangements.
Allow 2 to 5 business days for limit increase requests to be reviewed and approved.

Why This Happens

Perplexity Sonar API rate limits exist because each API call consumes real compute resources — especially Sonar-Pro, Sonar-Reasoning, and Sonar-Deep-Research, which invoke much larger models and more extensive web retrieval pipelines. Without per-account limits, a single high-volume API user could monopolize infrastructure and degrade service quality for all other users.

The 50 RPM default is calibrated for typical developer and small-production workloads. Most interactive applications rarely exceed 5 to 10 RPM in real usage; the limit is primarily relevant for batch processing pipelines, automated content generation systems, or applications that fire multiple simultaneous requests per user action. The daily cap (which resets at midnight UTC) provides an additional safety net against runaway processes that might otherwise exhaust infrastructure capacity in a single burst.

Common Mistakes to Avoid

Retrying immediately after a 429. A zero-delay retry will hit the same limit and return another 429. Always wait at least the Retry-After duration specified in the response header.
Sending all batch requests at once. Even if your batch is small enough to fit within the daily quota, sending it all in a few seconds will exceed the 50 RPM limit. Always throttle to below 40 RPM with a rate-limiting library.
Sharing one API key across multiple production apps without tracking combined usage. If two applications each think they have 50 RPM available, they will collectively exceed the 50 RPM account limit and trigger 429 errors seemingly at random.
Not logging API responses during development. Without logs, you cannot tell whether failures are 429s (rate limit) or 5xx errors (service issue). Log status codes and response bodies for every call.
Ignoring the daily quota until it is exhausted. Monitor your daily usage proactively via the developer dashboard. Build an alerting mechanism when daily usage exceeds 80% of quota so you can slow down before hitting zero.
Assuming a 429 means the API is down. A 429 is a client-side throttle, not a server error. The Perplexity Sonar API service is functioning normally; your account has simply exceeded its rate limit. Do not report rate limit errors as outages.

View all Perplexity guides

Perplexity Sonar API Rate Limit: Fix 429 Errors and Scale Your Usage

Step-by-Step Fix

1. Read the 429 error response carefully before taking action

2. Implement exponential backoff with jitter in your code

3. Throttle your request rate proactively to stay under 50 RPM

4. Spread batch workloads across the full day to avoid daily quota exhaustion

5. Check your current usage in the Perplexity developer dashboard

6. Request a rate limit increase for production workloads

Why This Happens

Common Mistakes to Avoid

More Perplexity usage limits & restrictions guides

Frequently Asked Questions

Related Guides

Perplexity Pro Usage Limits Explained: What's Included and What to Do When You Hit a Cap

Perplexity AI Rate Limit: What It Means and How to Fix It

Perplexity file upload limits — supported formats, size limits, and weekly caps

Perplexity Labs Rate Limit: What It Is and How to Work Around It

Perplexity Limit Exceeded: 3 Causes and How to Fix Each

How to avoid Perplexity temporary restrictions and suspicious activity flags