API Rate Limiting & Throttling
Copy for LLM
Copy page as Markdown for LLMs
View as Markdown
Open this page as Markdown
Open in ChatGPT
Get insights from ChatGPT
Open in Claude
Get insights from Claude
Connect to Cursor
Install MCP server on Cursor
Connect to VS Code
Install MCP server on VS Code

MediaMath APIs implement rate limiting and throttling to ensure platform stability and fair resource allocation across all users. This page explains the limits, how they work, and how to handle rate-limited responses in your integrations.

Overview

The MediaMath API uses a multi-layer protection system to maintain service reliability. When limits are exceeded, the API returns appropriate HTTP status codes with details to help you adjust your request patterns.

Rate Limits

Limit Type	Recommended Safe Limit	Theoretical Maximum	Description
Per-User Rate Limit	40 req/s	50 req/s	Sustained request rate per authenticated user
Per-User Burst Capacity	200 requests	250 requests	Maximum instant burst after idle period
Concurrent Write Limit	150 requests	150 requests	Maximum concurrent POST/PUT/PATCH/DELETE requests
Per-IP Rate Limit	80 req/s	100 req/s	Sustained request rate per IP (unauthenticated user)

Understanding Limits: The Recommended Safe Limit accounts for load balancing distribution variance (80% of theoretical max) and is the safe target for your implementation. Theoretical Maximum assumes perfect round-robin distribution across 5 production pods. Error responses show per-pod enforcement values (Per-User: 10 req/s, Burst: 50, Concurrent: 30, Per-IP: 20 req/s).

How Burst Capacity Works:

Uses token bucket algorithm with continuous refill (not fixed time windows)
Each request consumes 1 token and tokens refill continuously across the cluster
After idle period ≥5 seconds, burst capacity is available (200 tokens cluster-wide)
Example: After being idle, you can send up to 200 requests instantly, then continue at sustained rate of 40 req/s
Sustained rate at or below the limit never depletes burst capacity
The API runs on multiple pods with load balancing, distributing your requests across the cluster

Error Responses

HTTP 429 Too Many Requests

Returned when rate limit or concurrent write limit is exceeded.

Rate Limit Exceeded:

{
  "meta": {
    "status": "error",
    "uuid": "request-uuid"
  },
  "errors": [{
    "code": "rate-limit-exceeded",
    "message": "Rate limit exceeded, please slow down",
    "details": {
      "limit": 10,
      "burst": 50,
      "window": "1s"
    }
  }]
}

Note: Error responses show per-pod enforcement values (10 req/s sustained, 50 burst capacity). Due to load balancing across 5 production pods, your effective capacity is higher. See the Rate Limits table above for Recommended Safe Limit and Theoretical Maximum.

Concurrent Write Limit Exceeded:

{
  "meta": {
    "status": "error",
    "uuid": "request-uuid"
  },
  "errors": [{
    "code": "too-many-concurrent-writes",
    "message": "Too many concurrent write operations, please retry",
    "details": {
      "limit": 30
    }
  }]
}

Note: Error responses show the per-pod enforcement limit (30 concurrent writes). The effective cluster-wide capacity is higher due to load balancing. See the Rate Limits table above for Recommended Safe Limit and Theoretical Maximum.

Response Headers:

The API returns informational headers to help you manage request patterns:

Rate Limit Headers (on all responses):

X-RateLimit-Limit: 10 - Per-pod rate limit in requests per second
X-RateLimit-Remaining: 3 - Current number of tokens available in your bucket on this pod
X-RateLimit-Reset: 1638360100 - Unix timestamp when the next token becomes available
Retry-After: 1 - Suggested wait time in seconds (on 429 errors only, includes jitter to prevent thundering herd)

Concurrency Limit Headers (on write operations: POST/PUT/PATCH/DELETE):

X-Concurrency-Limit: 30 - Per-pod concurrent write operation limit
X-Concurrency-Remaining: 20 - Current number of available write slots on this pod
Retry-After: 1 - Suggested wait time in seconds (on 429 errors only, includes jitter to prevent thundering herd)

Note: Headers show per-pod values. The effective cluster-wide capacity is higher due to load balancing. See the Rate Limits table above for Recommended Safe Limit and Theoretical Maximum.

HTTP 503 Service Unavailable

Returned when the API is under heavy load or experiencing issues.

{
  "meta": {
    "status": "error",
    "uuid": "request-uuid"
  },
  "errors": [{
    "code": "service-overloaded",
    "message": "Service temporarily overloaded, please retry later"
  }]
}

Headers returned:

Retry-After: 1 - Suggested wait time in seconds (includes jitter to prevent thundering herd)

Recommended Client Behavior

Implement Exponential Backoff:

On 429 or 503, wait for Retry-After header value
If retrying fails, double the wait time (max 60 seconds)
Example: 1s → 2s → 4s → 8s → 16s → 32s → 60s

Respect Rate Limits:

Space out requests to stay under 40 req/s per user
Batch operations when possible instead of many small requests

Handle Concurrent Write Limits:

For bulk updates, submit requests sequentially or in small batches
Avoid sending 150+ simultaneous write requests

Monitor Response Headers:

X-RateLimit-Limit: Per-pod rate limit in requests per second
X-RateLimit-Remaining: Current tokens available in your bucket
X-RateLimit-Reset: Unix timestamp when next token becomes available
X-Concurrency-Limit: Per-pod concurrent write operation limit
X-Concurrency-Remaining: Available write slots on this pod
Retry-After: Seconds to wait before retrying (on errors, includes jitter to prevent thundering herd)

Code Examples

JavaScript (with fetch)

async function apiRequestWithRetry(url, options = {}, maxRetries = 5) {
  let lastError;

  for (let attempt = 0; attempt < maxRetries; attempt++) {
    try {
      const response = await fetch(url, {
        ...options,
        headers: {
          'Authorization': `Bearer ${getToken()}`,
          'Content-Type': 'application/json',
          ...options.headers,
        },
      });

      // Success - return response
      if (response.ok) {
        return response;
      }

      // Rate limited or service unavailable - retry with backoff
      if (response.status === 429 || response.status === 503) {
        const retryAfter = parseInt(response.headers.get('Retry-After') || '1', 10);
        const waitTime = Math.min(retryAfter * 1000 * Math.pow(2, attempt), 60000);

        console.log(`Rate limited. Waiting ${waitTime}ms before retry ${attempt + 1}/${maxRetries}`);
        await new Promise(resolve => setTimeout(resolve, waitTime));
        continue;
      }

      // Other error - don't retry
      return response;

    } catch (error) {
      lastError = error;
      // Network error - retry with backoff
      const waitTime = Math.min(1000 * Math.pow(2, attempt), 60000);
      await new Promise(resolve => setTimeout(resolve, waitTime));
    }
  }

  throw lastError || new Error('Max retries exceeded');
}

// Usage
const response = await apiRequestWithRetry('https://api.mediamath.com/api/v3.0/campaigns');
const data = await response.json();

Python (with requests)

import time
import requests
from typing import Optional, Dict, Any

def api_request_with_retry(
    url: str,
    method: str = 'GET',
    headers: Optional[Dict[str, str]] = None,
    data: Optional[Dict[str, Any]] = None,
    max_retries: int = 5,
    token: str = None
) -> requests.Response:
    """Make API request with automatic retry on rate limit errors."""

    request_headers = {
        'Authorization': f'Bearer {token}',
        'Content-Type': 'application/json',
    }
    if headers:
        request_headers.update(headers)

    last_error = None

    for attempt in range(max_retries):
        try:
            response = requests.request(
                method=method,
                url=url,
                headers=request_headers,
                json=data,
                timeout=30
            )

            # Success
            if response.ok:
                return response

            # Rate limited or service unavailable - retry with backoff
            if response.status_code in (429, 503):
                retry_after = int(response.headers.get('Retry-After', 1))
                wait_time = min(retry_after * (2 ** attempt), 60)

                print(f"Rate limited. Waiting {wait_time}s before retry {attempt + 1}/{max_retries}")
                time.sleep(wait_time)
                continue

            # Other error - don't retry
            return response

        except requests.exceptions.RequestException as e:
            last_error = e
            wait_time = min(1 * (2 ** attempt), 60)
            time.sleep(wait_time)

    if last_error:
        raise last_error
    raise Exception('Max retries exceeded')


# Usage
response = api_request_with_retry(
    url='https://api.mediamath.com/api/v3.0/campaigns',
    method='GET',
    token='your-api-token'
)
data = response.json()

FAQ

Q: Why do the documented limits differ from the error response values?

The documented limits show recommended cluster-wide targets accounting for load balancing across 5 production pods. Error responses show per-pod enforcement values because each individual pod enforces limits independently. Your effective capacity is higher than per-pod values due to request distribution.

Example:

Error shows: "limit": 10 (per-pod)
Effective capacity: 40 req/s recommended, 50 req/s theoretical max (5 pods × 10)
Recommended target uses 80% safety margin to account for distribution variance

Q: Why am I getting rate limited at low request volumes?

Rate limits are per-user, not per-application. If you have multiple applications or scripts using the same API credentials, they share the same rate limit bucket.

Q: How do I request a higher rate limit?

Contact MediaMath Support to discuss your use case. Higher limits may be available for specific approved integrations.

Q: Are rate limits applied to read and write operations equally?

Rate limits (429 responses from rate limiting) apply to all requests equally. However, concurrent write limits (429 responses from concurrency limiting) only apply to POST, PUT, PATCH, and DELETE requests.

Q: What timezone are rate limit windows based on?

Rate limit windows are rolling and continuous (token bucket algorithm), not calendar-based windows. Each request is evaluated against your current token balance, which refills continuously. The cluster-wide effective rate is approximately 40 tokens per second (distributed across multiple pods).

Q: What should I do if I consistently hit rate limits?

If you're frequently receiving 429 responses:

Implement request batching where possible to reduce total request count
Add delays between requests (target 20-30 req/s sustained rate)
Use exponential backoff retry logic as shown in the code examples above
Review your integration to eliminate unnecessary API calls
Contact MediaMath Support if you have legitimate high-volume needs that cannot be optimized

Q: Is there a way to check my current rate limit status?

Yes. Monitor the X-RateLimit-Remaining header in API responses to see your current token balance. This shows how many requests you can make before hitting the rate limit. The X-RateLimit-Reset header provides a Unix timestamp indicating when your next token becomes available.

Q: Why am I getting 503 errors instead of 429?

HTTP 503 Service Unavailable indicates system-level overload (not user-specific rate limiting). This is a temporary condition caused by high overall system load. When you receive 503 errors, implement exponential backoff and retry - the system will recover automatically. The Retry-After header indicates how long to wait.

API Rate Limiting & ThrottlingCopyCopy for LLMCopy page as Markdown for LLMsView as MarkdownOpen this page as MarkdownOpen in ChatGPTGet insights from ChatGPTOpen in ClaudeGet insights from ClaudeConnect to CursorInstall MCP server on CursorConnect to VS CodeInstall MCP server on VS Code