Perplexity Too Many Requests (429): How to Fix It

PerplexityErrors & BugsUpdated May 17, 2026
Quick Answer

A Perplexity 'Too Many Requests' or 429 error means either you sent requests too fast (API: wait 60 seconds, then retry with backoff) or your weekly Pro search quota of 200 searches is exhausted (wait until Monday 00:00 UTC or switch to Standard search). Temporary burst limiting clears in under 2 minutes; quota exhaustion lasts until the weekly reset.

Step-by-Step Fix

1. Distinguish Between Temporary Burst Limiting and Quota Exhaustion

The "Too Many Requests" error appears in two fundamentally different contexts, and the fix depends entirely on which one you are experiencing:

Temporary Burst Limiting (API and possibly web)

  • Cause: Sending requests faster than the allowed rate (50 RPM on API)
  • Duration: Usually clears within 60–90 seconds
  • Signal: Appears suddenly after a period of normal operation, often when running a script or loop
  • Fix: Wait 60 seconds, implement backoff, slow your request rate

Quota Exhaustion (web Pro users)

  • Cause: All 200 weekly Pro searches consumed before Monday's reset
  • Duration: Lasts until Monday 00:00 UTC — up to 7 days
  • Signal: Appears consistently whenever you try to use Pro mode, Standard still works fine
  • Fix: Switch to Standard search immediately, or wait for the Monday reset

If you are a web user and Standard search still works, you have quota exhaustion. If you are a developer and the error appeared mid-script, you likely have burst limiting.

2. Check the API Response Headers (API Users)

If you are a developer hitting 429 errors in the Sonar API, the response headers contain exactly the information you need:

HTTP/1.1 429 Too Many Requests
Retry-After: 45
X-RateLimit-Limit: 50
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1716000060
  • Retry-After: Seconds to wait before retrying — always honor this first
  • X-RateLimit-Remaining: How many requests you have left in the current window
  • X-RateLimit-Reset: Unix timestamp when the rate limit window resets
  • X-RateLimit-Limit: Your account's RPM limit (50 for default tier)

Always parse Retry-After in your application code. Waiting the exact required time prevents extended throttling from premature retries.

3. Web Users — Switch to Standard Search Immediately

When Pro search is blocked due to quota exhaustion:

  1. Go to perplexity.ai
  2. Click the model selector — it will show your current model name ("GPT-4o", "Pro", "Claude 3.5 Sonnet")
  3. Select Default or Standard from the dropdown
  4. Continue searching — Standard uses the base Sonar model and has no quota

Standard search is unlimited for all Perplexity users regardless of plan or quota status. It returns results in 2–5 seconds and handles the vast majority of research tasks competently. The only queries where Standard noticeably underperforms Pro are those requiring complex multi-step reasoning or a specific advanced model's output characteristics.

4. Web Users — Calculate Time Until Reset

If you must use Pro mode and cannot use Standard:

  • Pro quota resets every Monday at 00:00 UTC
  • Daily free quota resets every day at 00:00 UTC
  • Check current UTC time at time.is/UTC

Time zone conversions for the Monday reset:

| Time Zone | Reset Time (Local) | |-----------|-------------------| | US Eastern (EDT, UTC-4) | Sunday 8:00 PM | | US Central (CDT, UTC-5) | Sunday 7:00 PM | | US Pacific (PDT, UTC-7) | Sunday 5:00 PM | | UK (BST, UTC+1) | Monday 1:00 AM | | Central Europe (CEST, UTC+2) | Monday 2:00 AM |

If the reset is within a few hours, waiting is often more practical than restructuring your workflow.

5. API Users — Wait and Retry with Exponential Backoff

For burst-rate-limited API calls, the standard fix is to wait and retry with increasing delays:

import time
import requests

API_KEY = "your-perplexity-api-key"

def call_perplexity_api(payload, max_retries=5):
    base_delay = 1  # Start with 1 second

    for attempt in range(max_retries):
        response = requests.post(
            "https://api.perplexity.ai/chat/completions",
            headers={
                "Authorization": f"Bearer {API_KEY}",
                "Content-Type": "application/json"
            },
            json=payload,
            timeout=30
        )

        if response.status_code == 200:
            return response.json()

        elif response.status_code == 429:
            # Honor Retry-After if present
            retry_after = response.headers.get("Retry-After")
            if retry_after:
                delay = int(retry_after)
                print(f"Server says wait {delay}s (attempt {attempt + 1})")
            else:
                delay = base_delay * (2 ** attempt)  # 1, 2, 4, 8, 16 seconds
                print(f"Backoff: waiting {delay}s (attempt {attempt + 1})")

            time.sleep(delay)

        else:
            # Non-rate-limit error — raise immediately
            response.raise_for_status()

    raise RuntimeError(f"API failed after {max_retries} retries")

This pattern respects the rate limit window and avoids triggering extended throttling from repeated immediate retries.

6. API Users — Throttle Your Request Rate Proactively

Rather than reacting to 429 errors, build rate limiting into your application from the start:

import time
import threading

class RateLimiter:
    def __init__(self, max_rpm=45):  # 45 RPM = 10% buffer below 50 RPM limit
        self.min_interval = 60.0 / max_rpm  # seconds between requests
        self.last_request_time = 0
        self.lock = threading.Lock()

    def wait(self):
        with self.lock:
            now = time.time()
            elapsed = now - self.last_request_time
            if elapsed < self.min_interval:
                time.sleep(self.min_interval - elapsed)
            self.last_request_time = time.time()

rate_limiter = RateLimiter(max_rpm=45)

def query_perplexity(payload):
    rate_limiter.wait()  # Enforce rate limiting before each request
    response = requests.post(...)
    return response

Setting your target to 45 RPM instead of 50 provides a 10% safety buffer against burst timing issues.

7. API Users — Request a Higher Rate Limit Tier

If your application genuinely requires more than 50 RPM for sustained throughput:

  1. Log into the Perplexity API dashboard
  2. Navigate to account or billing settings
  3. Look for a "Request tier upgrade" option or equivalent
  4. Submit your request with: average daily request volume, peak RPM requirements, and a description of your application
  5. If no self-service option is available, email Perplexity support with the same details

Higher API tiers exist and are granted for production applications with legitimate high-volume needs. Do not try to work around the limit using multiple API keys — this violates the terms of service and risks account suspension.

8. Verify It Is Not a Platform-Wide Outage

Before spending time debugging rate limits, confirm the issue is specific to your account:

  1. Visit perplexity.ai/status
  2. Check for active incidents
  3. Try Standard search on the web — if Standard is also failing, it is likely a platform issue

A genuine rate limit or quota exhaustion affects only Pro-tier requests while Standard continues to work. If everything is broken including Standard search, you are likely looking at a service disruption unrelated to your quota status.


Why This Happens

Perplexity enforces rate limits at two levels for different reasons.

At the API level, the 50 RPM default exists to prevent any single client from consuming a disproportionate share of shared compute resources. The Sonar API serves many applications simultaneously, and without per-client throttling, a poorly written script could degrade response quality for all other users.

At the web level, Pro search quota exhaustion reflects the underlying cost structure of premium AI models. Each GPT-4o or Claude 3.5 Sonnet query via Perplexity's real-time web search pipeline costs more than the per-query revenue from the $20/month subscription at high volumes. The 200 weekly Pro search limit (reduced from 600 in May 2026) balances access against the economics of running advanced inference at scale.

The daily cap resets at 00:00 UTC, not your local midnight. The weekly Pro reset happens every Monday at 00:00 UTC — which may fall on a Sunday evening for users in US time zones.


Common Mistakes to Avoid

  • Retrying the API immediately after a 429 response. Each immediate retry consumes quota in the next time window and can trigger progressively longer throttle periods. Always wait — at minimum the value in the Retry-After header, or 60 seconds if the header is absent.
  • Confusing temporary burst limiting with quota exhaustion. Burst limiting clears in under 2 minutes. Quota exhaustion lasts until the weekly reset. These require different responses, and treating them the same wastes time.
  • Not implementing proactive rate limiting in API code. Adding a time.sleep(1.2) between requests costs almost nothing in a script context but prevents 100% of burst-rate-limit 429 errors. Reacting to errors is always slower than preventing them.
  • Assuming Standard search has the same limit. Standard (Default) mode on Perplexity is unlimited for all users. It is not a degraded fallback — it is a fully supported search mode that works at any time. Many users hit the Pro limit and believe Perplexity is down, not realizing Standard is available.
  • Using multiple API keys to bypass the rate limit. This violates Perplexity's terms of service. If you need higher throughput, request a tier upgrade through official channels.
  • Ignoring X-RateLimit-Remaining. Reading this header on each API response tells you how much quota you have left in the current window. Acting on low remaining values — by slowing down — prevents 429 errors entirely rather than requiring recovery from them.

View all Perplexity guides

Perplexity · Errors & Bugs

More Perplexity errors & bugs guides

Browse all guides in this category to troubleshoot related issues faster.

Browse all guides →

Frequently Asked Questions

On the Perplexity Sonar API, 'Too Many Requests' is an HTTP 429 error that means your application sent more than 50 requests within a 60-second window, which is the default rate limit for new API accounts. On the Perplexity web interface, a similar message can appear when your weekly Pro search quota of 200 searches is exhausted. The two situations are distinct: API burst limiting clears within minutes, while Pro quota exhaustion lasts until the following Monday at 00:00 UTC.

Related Guides

Continue with nearby guides in the same topic to rule out adjacent causes faster.

Perplexity Citations Not Loading or Sources Missing – How to Fix

Perplexity citations fail to load in over 70% of cases because an ad blocker or privacy extension is blocking the source-fetching requests that run alongside the AI response. Disable your ad blocker (uBlock Origin, AdGuard, Ghostery) for perplexity.ai, reload the page, and run your query again — citations should appear as numbered blue links below the answer. If the problem continues, switch from Pro Search to Standard Search as a quick test, then clear your browser cache.

Perplexity Error 401 – How to Fix Unauthorized Error

Perplexity error 401 means Unauthorized — your session has expired or your API key is invalid. For web users, log out at perplexity.ai/settings/account, clear cookies, and log back in. For API users, check that your key is active at perplexity.ai/settings/api and that you are sending it as a Bearer token in the Authorization header. A 401 error is never caused by Perplexity's servers — it is always an authentication issue on your end.

Perplexity Error 403 – How to Fix Forbidden Error

Perplexity error 403 means Forbidden — your request was understood but blocked. The three most common causes are: regional restrictions (Perplexity is unavailable in some countries), account-level restrictions (your account was flagged or your subscription lapsed), and content policy blocks (the specific query was rejected). Start by disconnecting any VPN, then checking your account status at perplexity.ai/settings/account, then clearing cookies and logging back in.

Perplexity Error 500 – How to Fix Internal Server Error

Perplexity error 500 is an Internal Server Error — the problem is entirely on Perplexity's servers, not your device or network. There is no local fix. Check perplexity.ai/status to see if an incident is active, then wait 5 to 15 minutes before retrying. Most Perplexity 500 errors resolve automatically within 15 minutes as the server recovers. If the error persists beyond 30 minutes, report it at perplexity.ai/contact.

Perplexity Error in Processing Query: Causes and Fixes

The 'Error in Processing Query' message on Perplexity appears when your query is too long or complex for the model to process, contains phrasing that triggers content filters, or when a backend service times out mid-response. The fastest fix is to shorten your query to under 500 characters, remove ambiguous or sensitive phrasing, and resubmit. If the error persists across multiple queries, check perplexity.ai/status for an ongoing service incident.

Perplexity Error Messages – What They Mean and How to Fix

Perplexity error messages fall into three groups: rate limit errors (too many requests — wait 60 seconds or upgrade your plan), service errors (something went wrong, network error — refresh the page or check perplexity.ai/status), and query errors (error processing query, content policy — rephrase or shorten your search). The fix depends on which error you see. This guide covers the 10 most common Perplexity error messages with specific solutions for each.