What is MediaMath’s Data Platform?

Data Platform is a hosted data service that gives clients and partners an easy way to access raw, log-level data such as impressions or conversions with a minimum of latency. Unlike traditional data feeds, Data Platform is a “pull” based model that leverages the cloud for scalable, reliable and efficient data transfer in a tool agnostic way.

What do I need to access Data Platform?

You will need a valid AWS account and an IAM user with sts:AssumeRole permissions. You will also need MediaMath to activate your account and assign a Role ARN. See Accessing Data Platform Using AWS for more details.

What tools can I use to access my data with via Data Platform

Data Platform is designed to give you access to your raw, log-level data in a tool agnostic way. Any tool which can read data from Amazon’s S3 and process tab-delimited files will work. Additional information for the following, recommended tools is available in the Data Platform guide.

  • Hive
  • Qubole
  • Redshift
  • MySQL

How do I signup?

Data Platform is currently in closed beta. During the closed beta period, signup is by invitation only.

When is data available for processing?

See Data Update Cycle

How much does Data Platform cost?

Please discuss with your account representative.

Who should I contact if something goes wrong?

If you encounter any issues or have any questions, please submit an Advertising Operations & Support request.

I am working with an attribution vendor who wants access to my data, can they use Data Platform?

Yes, Data Platform has a flexbile permissions model and your data can be made availble to partners on of your choosing. Please submit an Advertising Operations & Support request with vendor contact information to get started.

I can’t see the Data Platform S3 buckets when I login to my AWS console

The AWS console only displays S3 buckets that were created from within your account, not all the buckets that you have access to. You can use the AWS CLI tools to verify access to your buckets. For example:

$ aws s3 ls 2014-02-27 12:24:44 mm-prod-platform-attributed-events \
2013-11-26 11:39:52 mm-prod-platform-events \
2013-11-26 11:39:44 mm-prod-platform-impressions

See the Data Platform guide for more information.

I get an error when trying to assume the IAM role from my account

When accessing Data Platform, you will have to assume an IAM role. If you encounter an error such as the below

<ErrorResponse xmlns="https://sts.amazonaws.com/doc/2011-06-15/">   
           <Message>User: arn:aws:iam::924339635232:user/xxxxxxxxx is not authorized to perform: sts:AssumeRole on resource: arn:aws:iam::794878508631:role/DPA-YYYYYYYYY</Message>  

you must ensure that your IAM user has the privilege to assume a role in your account… You can apply the following policy to your user (or group)

  "Statement": [
      "Action": [
      "Effect": "Allow",
      "Resource": [
      "Sid": "Stmt1389115902000"
  "Version": "2012-10-17"

I am seeing Unicode errors when loading into Redshift

If Redshift is giving you errors such as String contains invalid or unsupported UTF8 codepoints. Bad UTF8 hex sequence: 94 (error 3) you may have entered an invalid unicode character into TerminalOne when naming a campaign or strategy.

This is an easy fix though, just ensure that your COPY command replaces invalid characters with another character of your choosing, such as _or ?.

    COPY your_table_name FROM 's3://mm-prod-data-platform/data/organization_id=<ORG_ID>'
    DELIMITER '\t'
    NULL AS '\\\\N'

How can I calculate the timestamp of an impression relative to the user?

Data Platform logs provide a GMT timestamp for each record in each of the three datasets provided (impressions, events and attributed events). For impressions, a timestamp adjusted from GMT to the time zone of the campaign which served the impression is also provided, as this is the timestamp used for reporting purposes in TerminalOne. For attributed events, the timestamp of the event (pixel fire) is provided both in GMT and in the time zone of the campaign to which the event is attributed. Additionally the GMT timestamp of the impression to which the event is attributed in TerminalOne is provided. For event (pixel) logs, only the GMT timestamp of the pixel fire is provided.

On occasion a Data Platform user might be interested in the timestamp of an impression relative to the user who viewed the impression. While this is not provided explicitly, this can be deduced using the other fields in the Data Platform logs. The impression logs contain a field called WEEK_HOUR_PART, an integer between 0 and 671 representing a 15-minute interval in the week. For example:

Sunday 00:00:00 00:14:59 0
Sunday 00:15:00 00:29:59 1
Sunday 00:30:00 00:44:59 2
Sunday 00:45:00 00:59:59 3
Sunday 01:00:00 01:14:59 4
Sunday 02:00:00 02:14:59 8
Sunday 23:45:00 23:59:59 95
Monday 00:00:00 00:14:59 96
Tuesday 00:00:00 00:14:59 192
Thursday 15:30:00 15:44:59 446
Friday 07:45:00 07:59:59 511
Saturday 23:45:00 23:59:59 671

The WEEK_HOUR_PART refers to the 15-minute interval relative to the timezone of the user. Thus, in general, the GMT timestamp of the impression will not match the WEEK_HOUR_PART value provided (the WEEK_HOUR_PART value is used by Brain for optimization purposes and therefore uses the user’s timezone in determining what times of day/week users are particularly responsive or unresponsive to display media). But by calculating the 0-671 value of the given GMT timestamp and comparing to the WEEK_HOUR_PART value of the record, the user’s GMT offset can be determined, and thus the timestamp of the impression relative to the user’s time zone can be determined.

Suppose an impression is shown at five minutes after noon GMT on Tuesday, July 1, 2014 by a TerminalOne campaign with a configured time zone of America/New_York (UTC -04:00). The record for this impression will show a TIMESTAMP_GMT of ‘2014-07-01 12:05:00’ and a REPORT_TIMESTAMP of ‘2014-07-01 08:05:00’. Suppose further that this impression was shown to a user in Berlin, Germany (UTC +02:00) and thus the WEEK_HOUR_PART in the log would be 248. Transforming the GMT timestamp of the impression would yield a 0-671 “15-minute interval” value of 240. Therefore the user who viwed the impression was 8 “intervals” ahead of GMT, and since each interval is 15 minutes, the user was in a time zone two hours ahead of GMT (i.e., UTC +02:00). Adding these two hours to the GMT timestamp of the impression yields a USER_TIMESTAMP of ‘2014-07-01 02:05:00’.