# API Rate Limiting & Throttling

MediaMath APIs implement rate limiting and throttling to ensure platform stability and fair resource allocation across all users. This page explains the limits, how they work, and how to handle rate-limited responses in your integrations.

## Overview

The MediaMath API uses a multi-layer protection system to maintain service reliability. When limits are exceeded, the API returns appropriate HTTP status codes with details to help you adjust your request patterns.

## Rate Limits

| Limit Type | Limit | Description |
|  --- | --- | --- |
| **Per-User Rate Limit** | 40 req/s | Maximum sustained request rate per authenticated user |
| **Burst Capacity** | 200 requests | Maximum requests that can be sent instantly after idle period |
| **Concurrent Write Operations** | 50 | Maximum concurrent POST/PUT/PATCH/DELETE requests |
| **Unauthenticated Requests** | 200 req/s | Requests without authentication (shared across all unauthenticated traffic) |


> **Note:** Per-user rate limiting is currently **disabled**. When enabled, these limits will be enforced.


**How Burst Capacity Works:**

- Uses token bucket algorithm with continuous refill (not fixed time windows)
- Each request consumes 1 token; tokens refill at the rate limit (40 tokens/second)
- After idle period ≥5 seconds, bucket fills to capacity (200 tokens)
- Example: After being idle, you can send 200 requests instantly, then continue at 40 req/s
- Sustained rate at or below the limit never depletes burst capacity


## Concurrent Write Operations

**Limit:** 50 concurrent write operations cluster-wide

**Applies to:** POST, PUT, PATCH, DELETE methods only. GET requests are not limited.

During periods of high usage, the system distributes available concurrency fairly among active clients to ensure equitable access.

## Error Responses

### HTTP 429 Too Many Requests

Returned when rate limit or concurrent write limit is exceeded.

**Rate Limit Exceeded:**


```json
{
  "meta": {
    "status": "error",
    "uuid": "request-uuid"
  },
  "errors": [{
    "code": "rate-limit-exceeded",
    "message": "Rate limit exceeded, please slow down",
    "details": {
      "limit": 40,
      "window": "1s"
    }
  }]
}
```

**Concurrent Write Limit Exceeded:**


```json
{
  "meta": {
    "status": "error",
    "uuid": "request-uuid"
  },
  "errors": [{
    "code": "too-many-concurrent-writes",
    "message": "Too many concurrent write operations, please retry",
    "details": {
      "limit": 50
    }
  }]
}
```

**Headers returned:**

- `Retry-After: 1` - Suggested wait time in seconds before retrying
- `X-RateLimit-Limit: 40` - Rate limit value (when rate limiting enabled)
- `X-RateLimit-Remaining: 0` - Remaining requests in current window (when rate limiting enabled)


### HTTP 503 Service Unavailable

Returned when the API is under heavy load or experiencing issues.


```json
{
  "meta": {
    "status": "error",
    "uuid": "request-uuid"
  },
  "errors": [{
    "code": "service-overloaded",
    "message": "Service temporarily overloaded, please retry later"
  }]
}
```

**Headers returned:**

- `Retry-After: 5` or `Retry-After: 10` - Suggested wait time in seconds


## Recommended Client Behavior

1. **Implement Exponential Backoff:**
  - On 429 or 503, wait for `Retry-After` header value
  - If retrying fails, double the wait time (max 60 seconds)
  - Example: 1s → 2s → 4s → 8s → 16s → 32s → 60s
2. **Respect Rate Limits:**
  - Space out requests to stay under 40 req/s per user
  - Batch operations when possible instead of many small requests
3. **Handle Concurrent Write Limits:**
  - For bulk updates, submit requests sequentially or in small batches
  - Avoid sending 50+ simultaneous write requests
4. **Monitor Response Headers:**
  - `X-RateLimit-Limit`: Rate limit threshold (when enabled)
  - `X-RateLimit-Remaining`: Remaining requests in current window (when enabled)
  - `Retry-After`: Seconds to wait before retrying


## Example: Retry Logic (JavaScript)


```javascript
async function apiRequestWithRetry(url, options, maxRetries = 3) {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    const response = await fetch(url, options);

    if (response.status === 429 || response.status === 503) {
      const retryAfter = parseInt(response.headers.get('Retry-After') || '1');
      const waitTime = retryAfter * 1000 * Math.pow(2, attempt);
      await new Promise(resolve => setTimeout(resolve, Math.min(waitTime, 60000)));
      continue;
    }

    return response;
  }
  throw new Error('Max retries exceeded');
}
```

## Endpoint Exceptions

The following endpoints are excluded from rate limiting:

- `/healthcheck` - Service health verification
- `/metrics` - Prometheus metrics endpoint


## FAQ

**Why am I getting rate limited at low request volumes?**

Rate limits are per-user, not per-application. If you have multiple applications or scripts using the same API credentials, they share the same rate limit bucket.

**How do I request a higher rate limit?**

Contact MediaMath Support to discuss your use case. Higher limits may be available for specific approved integrations.

**Are rate limits applied to read and write operations equally?**

Rate limits (429 responses from rate limiting) apply to all requests equally. However, concurrent write limits (429 responses from concurrency limiting) only apply to POST, PUT, PATCH, and DELETE operations.

**What timezone are rate limit windows based on?**

Rate limit windows are rolling and continuous (token bucket algorithm), not calendar-based windows. Each request is evaluated against your current token balance, which refills continuously at 40 tokens per second.

**When will per-user rate limiting be enabled?**

Per-user rate limiting is currently disabled and will be enabled in a future release. You will be notified in advance before it is activated.