What does 'Too Many Requests' mean on Perplexity?

On the Perplexity Sonar API, 'Too Many Requests' is an HTTP 429 error that means your application sent more than 50 requests within a 60-second window, which is the default rate limit for new API accounts. On the Perplexity web interface, a similar message can appear when your weekly Pro search quota of 200 searches is exhausted. The two situations are distinct: API burst limiting clears within minutes, while Pro quota exhaustion lasts until the following Monday at 00:00 UTC.

How long does the Perplexity 429 Too Many Requests error last?

For the Sonar API, a burst rate limit (sending more than 50 requests per minute) typically clears within 60 seconds because the limit is calculated over a rolling 1-minute window. Wait 60 to 90 seconds and retry. If the 429 persists after waiting, you may have a sustained overuse pattern requiring a longer backoff or a tier upgrade. For web users with an exhausted Pro quota, the 429-equivalent block lasts until Monday 00:00 UTC — potentially up to 7 days.

How do I fix 429 errors in the Perplexity API?

Check the Retry-After header on the 429 response — it specifies the exact number of seconds to wait before retrying. If no header is present, wait 60 seconds before the first retry. Implement exponential backoff for subsequent retries: 1 second, then 2 seconds, then 4 seconds. Keep your sustained throughput below 50 requests per minute (one request every 1.2 seconds). Monitor the X-RateLimit-Remaining header on each response to detect when you are approaching the limit before hitting it.

Does Too Many Requests on Perplexity affect my Standard searches?

No. Standard searches on the Perplexity web interface use the base Sonar model and are unlimited for all users — free, Pro, and even users with exhausted Pro quotas. When you exhaust your weekly 200 Pro searches, the 'Too Many Requests' block applies only to Pro-tier model usage. Switch the model selector to 'Default' or 'Standard' and your searches will continue immediately without restriction. Standard mode is a fully supported alternative, not a workaround — Perplexity designed it to be available at all times.

What is the Perplexity Sonar API rate limit?

The default rate limit for new Perplexity Sonar API accounts is 50 requests per minute (RPM). This translates to a maximum sustained rate of one request every 1.2 seconds. Exceeding this threshold in any 60-second window triggers a 429 Too Many Requests response. The limit is independent from the web interface Pro search quota. Higher-tier API accounts with increased RPM limits are available for production applications by contacting Perplexity through the API dashboard.

How do I implement rate limiting in my Perplexity API integration?

Add a minimum 1.2-second delay between sequential API requests (60 seconds divided by 50 RPM = 1.2 seconds per request). For parallel requests, limit concurrency to no more than 10 simultaneous calls. On every 429 response, check the Retry-After header and wait exactly that many seconds before retrying. Implement exponential backoff for cases where Retry-After is absent. Read the X-RateLimit-Remaining header on each response to track your remaining quota in real time and slow down proactively when it drops below 10.

When should I request a higher API rate limit from Perplexity?

Request a higher API tier if your production application consistently needs more than 50 requests per minute as a sustained throughput — not just occasional bursts. Before requesting, ensure you have already implemented proper exponential backoff and request queuing, and that your 429 errors are genuine capacity limitations rather than a lack of rate limiting in your code. To request an upgrade, go to the Perplexity API dashboard and look for a tier upgrade or contact option, or reach out to Perplexity support with your average daily request volume and application description.

Perplexity Too Many Requests (429): How to Fix It

Step-by-Step Fix

1. Distinguish Between Temporary Burst Limiting and Quota Exhaustion

The "Too Many Requests" error appears in two fundamentally different contexts, and the fix depends entirely on which one you are experiencing:

Temporary Burst Limiting (API and possibly web)

Cause: Sending requests faster than the allowed rate (50 RPM on API)
Duration: Usually clears within 60–90 seconds
Signal: Appears suddenly after a period of normal operation, often when running a script or loop
Fix: Wait 60 seconds, implement backoff, slow your request rate

Quota Exhaustion (web Pro users)

Cause: All 200 weekly Pro searches consumed before Monday's reset
Duration: Lasts until Monday 00:00 UTC — up to 7 days
Signal: Appears consistently whenever you try to use Pro mode, Standard still works fine
Fix: Switch to Standard search immediately, or wait for the Monday reset

If you are a web user and Standard search still works, you have quota exhaustion. If you are a developer and the error appeared mid-script, you likely have burst limiting.

2. Check the API Response Headers (API Users)

If you are a developer hitting 429 errors in the Sonar API, the response headers contain exactly the information you need:

HTTP/1.1 429 Too Many Requests
Retry-After: 45
X-RateLimit-Limit: 50
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1716000060

Retry-After: Seconds to wait before retrying — always honor this first
X-RateLimit-Remaining: How many requests you have left in the current window
X-RateLimit-Reset: Unix timestamp when the rate limit window resets
X-RateLimit-Limit: Your account's RPM limit (50 for default tier)

Always parse Retry-After in your application code. Waiting the exact required time prevents extended throttling from premature retries.

3. Web Users — Switch to Standard Search Immediately

When Pro search is blocked due to quota exhaustion:

Go to perplexity.ai
Click the model selector — it will show your current model name ("GPT-4o", "Pro", "Claude 3.5 Sonnet")
Select Default or Standard from the dropdown
Continue searching — Standard uses the base Sonar model and has no quota

Standard search is unlimited for all Perplexity users regardless of plan or quota status. It returns results in 2–5 seconds and handles the vast majority of research tasks competently. The only queries where Standard noticeably underperforms Pro are those requiring complex multi-step reasoning or a specific advanced model's output characteristics.

4. Web Users — Calculate Time Until Reset

If you must use Pro mode and cannot use Standard:

Pro quota resets every Monday at 00:00 UTC
Daily free quota resets every day at 00:00 UTC
Check current UTC time at time.is/UTC

Time zone conversions for the Monday reset:

| Time Zone | Reset Time (Local) | |-----------|-------------------| | US Eastern (EDT, UTC-4) | Sunday 8:00 PM | | US Central (CDT, UTC-5) | Sunday 7:00 PM | | US Pacific (PDT, UTC-7) | Sunday 5:00 PM | | UK (BST, UTC+1) | Monday 1:00 AM | | Central Europe (CEST, UTC+2) | Monday 2:00 AM |

If the reset is within a few hours, waiting is often more practical than restructuring your workflow.

5. API Users — Wait and Retry with Exponential Backoff

For burst-rate-limited API calls, the standard fix is to wait and retry with increasing delays:

import time
import requests

API_KEY = "your-perplexity-api-key"

def call_perplexity_api(payload, max_retries=5):
    base_delay = 1  # Start with 1 second

    for attempt in range(max_retries):
        response = requests.post(
            "https://api.perplexity.ai/chat/completions",
            headers={
                "Authorization": f"Bearer {API_KEY}",
                "Content-Type": "application/json"
            },
            json=payload,
            timeout=30
        )

        if response.status_code == 200:
            return response.json()

        elif response.status_code == 429:
            # Honor Retry-After if present
            retry_after = response.headers.get("Retry-After")
            if retry_after:
                delay = int(retry_after)
                print(f"Server says wait {delay}s (attempt {attempt + 1})")
            else:
                delay = base_delay * (2 ** attempt)  # 1, 2, 4, 8, 16 seconds
                print(f"Backoff: waiting {delay}s (attempt {attempt + 1})")

            time.sleep(delay)

        else:
            # Non-rate-limit error — raise immediately
            response.raise_for_status()

    raise RuntimeError(f"API failed after {max_retries} retries")

This pattern respects the rate limit window and avoids triggering extended throttling from repeated immediate retries.

6. API Users — Throttle Your Request Rate Proactively

Rather than reacting to 429 errors, build rate limiting into your application from the start:

import time
import threading

class RateLimiter:
    def __init__(self, max_rpm=45):  # 45 RPM = 10% buffer below 50 RPM limit
        self.min_interval = 60.0 / max_rpm  # seconds between requests
        self.last_request_time = 0
        self.lock = threading.Lock()

    def wait(self):
        with self.lock:
            now = time.time()
            elapsed = now - self.last_request_time
            if elapsed < self.min_interval:
                time.sleep(self.min_interval - elapsed)
            self.last_request_time = time.time()

rate_limiter = RateLimiter(max_rpm=45)

def query_perplexity(payload):
    rate_limiter.wait()  # Enforce rate limiting before each request
    response = requests.post(...)
    return response

Setting your target to 45 RPM instead of 50 provides a 10% safety buffer against burst timing issues.

7. API Users — Request a Higher Rate Limit Tier

If your application genuinely requires more than 50 RPM for sustained throughput:

Log into the Perplexity API dashboard
Navigate to account or billing settings
Look for a "Request tier upgrade" option or equivalent
Submit your request with: average daily request volume, peak RPM requirements, and a description of your application
If no self-service option is available, email Perplexity support with the same details

Higher API tiers exist and are granted for production applications with legitimate high-volume needs. Do not try to work around the limit using multiple API keys — this violates the terms of service and risks account suspension.

8. Verify It Is Not a Platform-Wide Outage

Before spending time debugging rate limits, confirm the issue is specific to your account:

Visit perplexity.ai/status
Check for active incidents
Try Standard search on the web — if Standard is also failing, it is likely a platform issue

A genuine rate limit or quota exhaustion affects only Pro-tier requests while Standard continues to work. If everything is broken including Standard search, you are likely looking at a service disruption unrelated to your quota status.

Why This Happens

Perplexity enforces rate limits at two levels for different reasons.

At the API level, the 50 RPM default exists to prevent any single client from consuming a disproportionate share of shared compute resources. The Sonar API serves many applications simultaneously, and without per-client throttling, a poorly written script could degrade response quality for all other users.

At the web level, Pro search quota exhaustion reflects the underlying cost structure of premium AI models. Each GPT-4o or Claude 3.5 Sonnet query via Perplexity's real-time web search pipeline costs more than the per-query revenue from the $20/month subscription at high volumes. The 200 weekly Pro search limit (reduced from 600 in May 2026) balances access against the economics of running advanced inference at scale.

The daily cap resets at 00:00 UTC, not your local midnight. The weekly Pro reset happens every Monday at 00:00 UTC — which may fall on a Sunday evening for users in US time zones.

Common Mistakes to Avoid

Retrying the API immediately after a 429 response. Each immediate retry consumes quota in the next time window and can trigger progressively longer throttle periods. Always wait — at minimum the value in the Retry-After header, or 60 seconds if the header is absent.
Confusing temporary burst limiting with quota exhaustion. Burst limiting clears in under 2 minutes. Quota exhaustion lasts until the weekly reset. These require different responses, and treating them the same wastes time.
Not implementing proactive rate limiting in API code. Adding a time.sleep(1.2) between requests costs almost nothing in a script context but prevents 100% of burst-rate-limit 429 errors. Reacting to errors is always slower than preventing them.
Assuming Standard search has the same limit. Standard (Default) mode on Perplexity is unlimited for all users. It is not a degraded fallback — it is a fully supported search mode that works at any time. Many users hit the Pro limit and believe Perplexity is down, not realizing Standard is available.
Using multiple API keys to bypass the rate limit. This violates Perplexity's terms of service. If you need higher throughput, request a tier upgrade through official channels.
Ignoring X-RateLimit-Remaining. Reading this header on each API response tells you how much quota you have left in the current window. Acting on low remaining values — by slowing down — prevents 429 errors entirely rather than requiring recovery from them.

View all Perplexity guides

Perplexity Too Many Requests (429): How to Fix It

Step-by-Step Fix

1. Distinguish Between Temporary Burst Limiting and Quota Exhaustion

2. Check the API Response Headers (API Users)

3. Web Users — Switch to Standard Search Immediately

4. Web Users — Calculate Time Until Reset

5. API Users — Wait and Retry with Exponential Backoff

6. API Users — Throttle Your Request Rate Proactively

7. API Users — Request a Higher Rate Limit Tier

8. Verify It Is Not a Platform-Wide Outage

Why This Happens

Common Mistakes to Avoid

More Perplexity errors & bugs guides

Frequently Asked Questions

Related Guides

Perplexity Citations Not Loading or Sources Missing – How to Fix

Perplexity Error 401 – How to Fix Unauthorized Error

Perplexity Error 403 – How to Fix Forbidden Error

Perplexity Error 500 – How to Fix Internal Server Error

Perplexity Error in Processing Query: Causes and Fixes

Perplexity Error Messages – What They Mean and How to Fix