Perplexity Sonar API Rate Limit: Fix 429 Errors and Scale Your Usage

Quick Answer

Perplexity Sonar API defaults to 50 requests per minute (RPM) and 1,000 requests per day on the base tier. When you exceed either limit, the API returns a 429 Too Many Requests error. Fix it by implementing exponential backoff with a minimum 1-second delay between retries, reducing request concurrency, and applying for a higher-tier limit increase through the Perplexity developer portal.

Step-by-Step Fix

1. Read the 429 error response carefully before taking action

When the Sonar API returns a 429 error, the response body contains specific information about which limit you hit and how long to wait.

  • Check the Retry-After header in the HTTP response. This tells you exactly how many seconds to wait before retrying.
  • Read the message field in the JSON response body. It will typically indicate whether you hit the per-minute RPM limit or the daily quota.
  • If the message references "rate limit" or "too many requests" with a short window, you hit the RPM cap. If it references "daily quota" or "quota exceeded," you hit the 24-hour cap.
  • Do not retry immediately. A retry within the rate limit window will return another 429 and may count against your quota depending on implementation.

2. Implement exponential backoff with jitter in your code

Retrying immediately after a 429 is the single most common mistake. Implement proper backoff logic so your application recovers automatically.

Python example with exponential backoff:

import time
import random
import requests

def call_sonar_api(payload, max_retries=5):
    base_delay = 1.0
    for attempt in range(max_retries):
        response = requests.post(
            "https://api.perplexity.ai/chat/completions",
            headers={"Authorization": f"Bearer {API_KEY}"},
            json=payload
        )
        if response.status_code == 200:
            return response.json()
        elif response.status_code == 429:
            retry_after = int(response.headers.get("Retry-After", base_delay * (2 ** attempt)))
            jitter = random.uniform(0, 0.5)
            wait = retry_after + jitter
            print(f"Rate limited. Waiting {wait:.1f}s before retry {attempt + 1}")
            time.sleep(wait)
        else:
            response.raise_for_status()
    raise Exception("Max retries exceeded")

Key principles:

  • Start with at least 1 second after the first failure.
  • Double the wait time after each subsequent failure (1s, 2s, 4s, 8s, 16s...).
  • Add random jitter (0 to 500ms) to prevent thundering herd problems when multiple clients retry simultaneously.
  • Respect the Retry-After header value when present — it overrides your backoff calculation.
  • Set a maximum retry limit (5 to 7 attempts is typical) to avoid infinite loops.

3. Throttle your request rate proactively to stay under 50 RPM

Instead of letting 429 errors occur and then backing off, design your application to never exceed the rate limit in the first place.

  • Calculate your safe rate: 50 RPM = 1 request per 1.2 seconds. Add a small safety margin and target 1 request per 1.5 seconds (40 RPM effective rate).
  • Use a rate-limiting library: In Python, use ratelimit or tenacity. In Node.js, use bottleneck or p-throttle. These handle the timing automatically.
  • Queue requests sequentially for batch jobs: If you are processing many items offline, use a simple queue that dispatches one request at a time with a fixed delay rather than sending all requests concurrently.
  • Monitor your request timing: Log timestamps of every API call. Review logs to identify burst patterns that are triggering 429s.

Node.js example with Bottleneck:

const Bottleneck = require('bottleneck');

const limiter = new Bottleneck({
  minTime: 1500, // 1.5 seconds between requests = ~40 RPM
  maxConcurrent: 1
});

const safeApiCall = limiter.wrap(async (payload) => {
  const response = await fetch('https://api.perplexity.ai/chat/completions', {
    method: 'POST',
    headers: { 'Authorization': `Bearer ${API_KEY}`, 'Content-Type': 'application/json' },
    body: JSON.stringify(payload)
  });
  return response.json();
});

4. Spread batch workloads across the full day to avoid daily quota exhaustion

If your application processes large batches, concentrate requests at a steady rate throughout the day rather than running everything in one burst.

  • Divide your daily request budget by 24 hours to find a safe hourly rate. For a 1,000 requests/day limit: 1,000 ÷ 24 = ~41 requests per hour, or roughly 1 request per 88 seconds.
  • Schedule batch jobs with a cron that staggers processing: run smaller batches every few hours rather than one large batch at midnight.
  • If your use case allows, cache Sonar API responses for identical or near-identical queries. Serving cached results does not consume API quota.
  • For non-time-sensitive batch processing, consider running overnight across multiple UTC days rather than exhausting the full daily cap in one session.

5. Check your current usage in the Perplexity developer dashboard

Before assuming the problem is your code, verify your actual usage metrics.

  • Log into your Perplexity account and navigate to the API settings or developer dashboard at perplexity.ai/api.
  • Look for a usage dashboard that shows requests made today, remaining daily quota, and current RPM tier.
  • If you have multiple applications sharing the same API key, the combined traffic from all of them counts toward a single account's limits. Consider splitting large workloads across multiple API keys if your organization has multiple accounts.
  • Note your current tier's RPM and daily limits so you have a concrete target to design around.

6. Request a rate limit increase for production workloads

If your legitimate production use case consistently requires more than 50 RPM or more than 1,000 requests per day, apply for a higher tier.

  • In the developer dashboard, look for a "Request limit increase" or "Contact sales" option.
  • Prepare a brief description of your use case, expected peak RPM, daily request volume, and the application you are building.
  • Include your current API usage metrics as evidence.
  • For enterprise or high-volume use cases, email Perplexity at support.perplexity.ai directly to discuss custom tier arrangements.
  • Allow 2 to 5 business days for limit increase requests to be reviewed and approved.

Why This Happens

Perplexity Sonar API rate limits exist because each API call consumes real compute resources — especially Sonar-Pro, Sonar-Reasoning, and Sonar-Deep-Research, which invoke much larger models and more extensive web retrieval pipelines. Without per-account limits, a single high-volume API user could monopolize infrastructure and degrade service quality for all other users.

The 50 RPM default is calibrated for typical developer and small-production workloads. Most interactive applications rarely exceed 5 to 10 RPM in real usage; the limit is primarily relevant for batch processing pipelines, automated content generation systems, or applications that fire multiple simultaneous requests per user action. The daily cap (which resets at midnight UTC) provides an additional safety net against runaway processes that might otherwise exhaust infrastructure capacity in a single burst.


Common Mistakes to Avoid

  • Retrying immediately after a 429. A zero-delay retry will hit the same limit and return another 429. Always wait at least the Retry-After duration specified in the response header.
  • Sending all batch requests at once. Even if your batch is small enough to fit within the daily quota, sending it all in a few seconds will exceed the 50 RPM limit. Always throttle to below 40 RPM with a rate-limiting library.
  • Sharing one API key across multiple production apps without tracking combined usage. If two applications each think they have 50 RPM available, they will collectively exceed the 50 RPM account limit and trigger 429 errors seemingly at random.
  • Not logging API responses during development. Without logs, you cannot tell whether failures are 429s (rate limit) or 5xx errors (service issue). Log status codes and response bodies for every call.
  • Ignoring the daily quota until it is exhausted. Monitor your daily usage proactively via the developer dashboard. Build an alerting mechanism when daily usage exceeds 80% of quota so you can slow down before hitting zero.
  • Assuming a 429 means the API is down. A 429 is a client-side throttle, not a server error. The Perplexity Sonar API service is functioning normally; your account has simply exceeded its rate limit. Do not report rate limit errors as outages.

View all Perplexity guides

Perplexity · Usage Limits & Restrictions

More Perplexity usage limits & restrictions guides

Browse all guides in this category to troubleshoot related issues faster.

Browse all guides →

Frequently Asked Questions

The Perplexity Sonar API defaults to 50 requests per minute (RPM) on the base tier. There is also a daily request cap that varies by account tier — base accounts typically see 1,000 requests per day. These limits apply across all Sonar models (sonar, sonar-pro, sonar-reasoning, sonar-deep-research). Your tier and exact limits are visible in the Perplexity developer dashboard at perplexity.ai/api. Higher tiers with increased RPM and daily caps are available by request.

Related Guides

Continue with nearby guides in the same topic to rule out adjacent causes faster.

Perplexity file upload limits — supported formats, size limits, and weekly caps

Perplexity Pro supports file uploads up to 25 MB per file. Supported formats include PDF (text-based, not scanned), plain text (.txt), and Word documents (.doc and .docx). File upload is a Pro-only feature — free plan users do not have access. If your file exceeds the size limit or is in an unsupported format, compress it or convert it to PDF before uploading. Alternatively, paste the text content directly into the search bar, which works for most analysis tasks.

Perplexity Limit Exceeded: 3 Causes and How to Fix Each

Perplexity's 'limit exceeded' message has three distinct causes: (1) Pro search weekly quota of 200 searches exhausted — wait until Monday 00:00 UTC; (2) Deep Research monthly quota of 20 sessions used up — wait until the 1st of next month; (3) Free plan daily quota of ~5 Pro searches reached — wait until 00:00 UTC tonight. Switch to Standard search to continue immediately in all three cases.

How to avoid Perplexity temporary restrictions and suspicious activity flags

Perplexity temporary restrictions are triggered by 3 main behaviors: submitting more than 20 to 30 queries in a short period, repeatedly switching between VPN server locations during a session, or using browser automation scripts that mimic bot traffic. If you are flagged, stop all activity and wait 1 to 4 hours for the restriction to lift automatically. Do not attempt to bypass the block by creating a new account — this risks a permanent ban. For persistent restrictions, email support@perplexity.ai.