MediaMath APIs implement rate limiting and throttling to ensure platform stability and fair resource allocation across all users. This page explains the limits, how they work, and how to handle rate-limited responses in your integrations.
The MediaMath API uses a multi-layer protection system to maintain service reliability. When limits are exceeded, the API returns appropriate HTTP status codes with details to help you adjust your request patterns.
| Limit Type | Guaranteed Safe Limit | Recommended Target | Theoretical Maximum | Notes |
|---|---|---|---|---|
| Per-User Rate Limit* | 10 req/s | 30 req/s | 50 req/s | Sustained request rate per authenticated user (token bucket refill rate) |
| Burst Capacity | 50 requests | 150 requests | 250 requests | Maximum requests that can be sent instantly after idle period (token bucket capacity) |
| Unauthenticated Requests | 50 req/s | 150 req/s | 250 req/s | Global limit for requests without authentication |
- Per user rate limiting is currently disabled by default.
How Burst Capacity Works:
- Uses token bucket algorithm with continuous refill (not fixed time windows)
- Each request consumes 1 token; tokens refill at the rate limit (e.g., 10 tokens/second = 1 token per 100ms)
- After idle period ≥5 seconds, bucket fills to capacity
- Example: User can send 50 requests instantly, then must wait for refill at 10 req/s
- Sustained rate at or below limit never depletes burst capacity
Safe Operating Guidelines:
- At or below recommended targets (30 req/s sustained, 150 request burst capacity): Low risk of throttling with normal distribution variance
- Between recommended and maximum: Possible intermittent throttling during uneven distribution
- Above theoretical maximum: Will be throttled
Standard tier: Up to 50 concurrent write requests
Priority tier: Up to 30 concurrent write requests
During periods of high usage, available concurrency is distributed fairly among active clients to ensure equitable access.
Applies to: POST, PUT, PATCH, DELETE methods only. GET requests are not limited.
Returned when rate limit or concurrent write limit is exceeded.
{
"meta": {
"status": "error",
"uuid": "request-uuid"
},
"errors": [{
"code": "rate-limit-exceeded",
"message": "Rate limit exceeded, please slow down",
"details": {
"limit": 10,
"window": "1s"
}
}]
}Or for concurrent write limit:
{
"meta": {
"status": "error",
"uuid": "request-uuid"
},
"errors": [{
"code": "too-many-concurrent-writes",
"message": "Too many concurrent write operations, please retry",
"details": {
"limit": 10
}
}]
}Headers returned:
Retry-After: 1- Suggested wait time in seconds before retrying
Returned when the API is under heavy load or experiencing issues.
{
"meta": {
"status": "error",
"uuid": "request-uuid"
},
"errors": [{
"code": "service-overloaded",
"message": "Service temporarily overloaded, please retry later"
}]
}Headers returned:
Retry-After: 5orRetry-After: 10- Suggested wait time in seconds
Implement Exponential Backoff:
- On 429 or 503, wait for
Retry-Afterheader value - If retrying fails, double the wait time (max 60 seconds)
- Example: 1s → 2s → 4s → 8s → 16s → 32s → 60s
- On 429 or 503, wait for
Respect Rate Limits:
- Space out requests to stay under 10 req/s per user
- Batch operations when possible instead of many small requests
Handle Concurrent Write Limits:
- For bulk updates, submit requests sequentially or in small batches
- Don't fire 20+ simultaneous write requests
Monitor Response Headers:
X-RateLimit-Limit: Our rate limit (when enabled)X-RateLimit-Remaining: Remaining requests in current windowRetry-After: Seconds to wait before retrying
async function apiRequestWithRetry(url, options, maxRetries = 3) {
for (let attempt = 0; attempt < maxRetries; attempt++) {
const response = await fetch(url, options);
if (response.status === 429 || response.status === 503) {
const retryAfter = parseInt(response.headers.get('Retry-After') || '1');
const waitTime = retryAfter * 1000 * Math.pow(2, attempt);
await new Promise(resolve => setTimeout(resolve, Math.min(waitTime, 60000)));
continue;
}
return response;
}
throw new Error('Max retries exceeded');
}The following endpoint is excluded from rate limiting:
- /healthcheck - Service health verification
Why am I getting rate limited at low request volumes?
Rate limits are per-user, not per-application. If you have multiple applications or scripts using the same API credentials, they share the same rate limit bucket.
How do I request a higher rate limit?
Contact MediaMath Support to discuss your use case. Higher limits may be available for specific approved integrations.
Are rate limits applied to read and write operations equally?
Rate limits (429 responses) apply to all requests equally. However, write concurrency limits only apply to POST, PUT, PATCH, and DELETE operations.
What timezone are rate limit windows based on?
Rate limit windows are rolling and based on UTC time from the moment of each request, not calendar-based windows.