MediaMath APIs implement rate limiting and throttling to ensure platform stability and fair resource allocation across all users. This page explains the limits, how they work, and how to handle rate-limited responses in your integrations.
The MediaMath API uses a multi-layer protection system to maintain service reliability. When limits are exceeded, the API returns appropriate HTTP status codes with details to help you adjust your request patterns.
| Limit Type | Recommended Safe Limit | Theoretical Maximum | Description |
|---|---|---|---|
| Per-User Rate Limit | 40 req/s | 50 req/s | Sustained request rate per authenticated user |
| Per-User Burst Capacity | 200 requests | 250 requests | Maximum instant burst after idle period |
| Concurrent Write Limit | 150 requests | 150 requests | Maximum concurrent POST/PUT/PATCH/DELETE requests |
| Per-IP Rate Limit | 80 req/s | 100 req/s | Sustained request rate per IP (unauthenticated user) |
Understanding Limits: The Recommended Safe Limit accounts for load balancing distribution variance (80% of theoretical max) and is the safe target for your implementation. Theoretical Maximum assumes perfect round-robin distribution across 5 production pods. Error responses show per-pod enforcement values (Per-User: 10 req/s, Burst: 50, Concurrent: 30, Per-IP: 20 req/s).
How Burst Capacity Works:
- Uses token bucket algorithm with continuous refill (not fixed time windows)
- Each request consumes 1 token and tokens refill continuously across the cluster
- After idle period ≥5 seconds, burst capacity is available (200 tokens cluster-wide)
- Example: After being idle, you can send up to 200 requests instantly, then continue at sustained rate of 40 req/s
- Sustained rate at or below the limit never depletes burst capacity
- The API runs on multiple pods with load balancing, distributing your requests across the cluster
Returned when rate limit or concurrent write limit is exceeded.
Rate Limit Exceeded:
{
"meta": {
"status": "error",
"uuid": "request-uuid"
},
"errors": [{
"code": "rate-limit-exceeded",
"message": "Rate limit exceeded, please slow down",
"details": {
"limit": 10,
"burst": 50,
"window": "1s"
}
}]
}Note: Error responses show per-pod enforcement values (10 req/s sustained, 50 burst capacity). Due to load balancing across 5 production pods, your effective capacity is higher. See the Rate Limits table above for Recommended Safe Limit and Theoretical Maximum.
Concurrent Write Limit Exceeded:
{
"meta": {
"status": "error",
"uuid": "request-uuid"
},
"errors": [{
"code": "too-many-concurrent-writes",
"message": "Too many concurrent write operations, please retry",
"details": {
"limit": 30
}
}]
}Note: Error responses show the per-pod enforcement limit (30 concurrent writes). The effective cluster-wide capacity is higher due to load balancing. See the Rate Limits table above for Recommended Safe Limit and Theoretical Maximum.
Response Headers:
The API returns informational headers to help you manage request patterns:
Rate Limit Headers (on all responses):
X-RateLimit-Limit: 10- Per-pod rate limit in requests per secondX-RateLimit-Remaining: 3- Current number of tokens available in your bucket on this podX-RateLimit-Reset: 1638360100- Unix timestamp when the next token becomes availableRetry-After: 1- Suggested wait time in seconds (on 429 errors only, includes jitter to prevent thundering herd)
Concurrency Limit Headers (on write operations: POST/PUT/PATCH/DELETE):
X-Concurrency-Limit: 30- Per-pod concurrent write operation limitX-Concurrency-Remaining: 20- Current number of available write slots on this podRetry-After: 1- Suggested wait time in seconds (on 429 errors only, includes jitter to prevent thundering herd)
Note: Headers show per-pod values. The effective cluster-wide capacity is higher due to load balancing. See the Rate Limits table above for Recommended Safe Limit and Theoretical Maximum.
Returned when the API is under heavy load or experiencing issues.
{
"meta": {
"status": "error",
"uuid": "request-uuid"
},
"errors": [{
"code": "service-overloaded",
"message": "Service temporarily overloaded, please retry later"
}]
}Headers returned:
Retry-After: 1- Suggested wait time in seconds (includes jitter to prevent thundering herd)
- Implement Exponential Backoff:
- On 429 or 503, wait for
Retry-Afterheader value - If retrying fails, double the wait time (max 60 seconds)
- Example: 1s → 2s → 4s → 8s → 16s → 32s → 60s
- Respect Rate Limits:
- Space out requests to stay under 40 req/s per user
- Batch operations when possible instead of many small requests
- Handle Concurrent Write Limits:
- For bulk updates, submit requests sequentially or in small batches
- Avoid sending 150+ simultaneous write requests
- Monitor Response Headers:
X-RateLimit-Limit: Per-pod rate limit in requests per secondX-RateLimit-Remaining: Current tokens available in your bucketX-RateLimit-Reset: Unix timestamp when next token becomes availableX-Concurrency-Limit: Per-pod concurrent write operation limitX-Concurrency-Remaining: Available write slots on this podRetry-After: Seconds to wait before retrying (on errors, includes jitter to prevent thundering herd)
async function apiRequestWithRetry(url, options = {}, maxRetries = 5) {
let lastError;
for (let attempt = 0; attempt < maxRetries; attempt++) {
try {
const response = await fetch(url, {
...options,
headers: {
'Authorization': `Bearer ${getToken()}`,
'Content-Type': 'application/json',
...options.headers,
},
});
// Success - return response
if (response.ok) {
return response;
}
// Rate limited or service unavailable - retry with backoff
if (response.status === 429 || response.status === 503) {
const retryAfter = parseInt(response.headers.get('Retry-After') || '1', 10);
const waitTime = Math.min(retryAfter * 1000 * Math.pow(2, attempt), 60000);
console.log(`Rate limited. Waiting ${waitTime}ms before retry ${attempt + 1}/${maxRetries}`);
await new Promise(resolve => setTimeout(resolve, waitTime));
continue;
}
// Other error - don't retry
return response;
} catch (error) {
lastError = error;
// Network error - retry with backoff
const waitTime = Math.min(1000 * Math.pow(2, attempt), 60000);
await new Promise(resolve => setTimeout(resolve, waitTime));
}
}
throw lastError || new Error('Max retries exceeded');
}
// Usage
const response = await apiRequestWithRetry('https://api.mediamath.com/api/v3.0/campaigns');
const data = await response.json();import time
import requests
from typing import Optional, Dict, Any
def api_request_with_retry(
url: str,
method: str = 'GET',
headers: Optional[Dict[str, str]] = None,
data: Optional[Dict[str, Any]] = None,
max_retries: int = 5,
token: str = None
) -> requests.Response:
"""Make API request with automatic retry on rate limit errors."""
request_headers = {
'Authorization': f'Bearer {token}',
'Content-Type': 'application/json',
}
if headers:
request_headers.update(headers)
last_error = None
for attempt in range(max_retries):
try:
response = requests.request(
method=method,
url=url,
headers=request_headers,
json=data,
timeout=30
)
# Success
if response.ok:
return response
# Rate limited or service unavailable - retry with backoff
if response.status_code in (429, 503):
retry_after = int(response.headers.get('Retry-After', 1))
wait_time = min(retry_after * (2 ** attempt), 60)
print(f"Rate limited. Waiting {wait_time}s before retry {attempt + 1}/{max_retries}")
time.sleep(wait_time)
continue
# Other error - don't retry
return response
except requests.exceptions.RequestException as e:
last_error = e
wait_time = min(1 * (2 ** attempt), 60)
time.sleep(wait_time)
if last_error:
raise last_error
raise Exception('Max retries exceeded')
# Usage
response = api_request_with_retry(
url='https://api.mediamath.com/api/v3.0/campaigns',
method='GET',
token='your-api-token'
)
data = response.json()The documented limits show recommended cluster-wide targets accounting for load balancing across 5 production pods. Error responses show per-pod enforcement values because each individual pod enforces limits independently. Your effective capacity is higher than per-pod values due to request distribution.
Example:
- Error shows:
"limit": 10(per-pod) - Effective capacity: 40 req/s recommended, 50 req/s theoretical max (5 pods × 10)
- Recommended target uses 80% safety margin to account for distribution variance
Rate limits are per-user, not per-application. If you have multiple applications or scripts using the same API credentials, they share the same rate limit bucket.
Contact MediaMath Support to discuss your use case. Higher limits may be available for specific approved integrations.
Rate limits (429 responses from rate limiting) apply to all requests equally. However, concurrent write limits (429 responses from concurrency limiting) only apply to POST, PUT, PATCH, and DELETE requests.
Rate limit windows are rolling and continuous (token bucket algorithm), not calendar-based windows. Each request is evaluated against your current token balance, which refills continuously. The cluster-wide effective rate is approximately 40 tokens per second (distributed across multiple pods).
If you're frequently receiving 429 responses:
- Implement request batching where possible to reduce total request count
- Add delays between requests (target 20-30 req/s sustained rate)
- Use exponential backoff retry logic as shown in the code examples above
- Review your integration to eliminate unnecessary API calls
- Contact MediaMath Support if you have legitimate high-volume needs that cannot be optimized
Yes. Monitor the X-RateLimit-Remaining header in API responses to see your current token balance. This shows how many requests you can make before hitting the rate limit. The X-RateLimit-Reset header provides a Unix timestamp indicating when your next token becomes available.
HTTP 503 Service Unavailable indicates system-level overload (not user-specific rate limiting). This is a temporary condition caused by high overall system load. When you receive 503 errors, implement exponential backoff and retry - the system will recover automatically. The Retry-After header indicates how long to wait.