Last updated

BYOA Model Features

Custom Brain allows the client to use the BYOA API to upload a set of logistic coefficients corresponding to any of the variables currently in use by the MediaMath Brain.

Available Features

We accommodate the features listed below. Please refer to "Data Platform Schemas" list on MediaMath Support for the variable type and a description of what these features represent. Only some of the Categorical Types below will be in that list.

Intercept

Feature NameTypeCorresponding field(s) in LLDS ImpressionsNotes
__const--represents the Intercept weight

Mapped numerical Types

Feature NameTypeCorresponding field(s) in LLDS ImpressionsNotes
bidder_pixel_frequencyMapped numericaloverlapped_brain_pixel_selectionsSee 'Audience Data' section below.
bidder_pixel_recencyMapped numericaloverlapped_brain_pixel_selectionsSee 'Audience Data' section below.

Numerical Types

Feature NameTypeCorresponding field(s) in LLDS ImpressionsNotes
exchange_ctrNumericalprebid_historical_ctrClick Through Rate. If it is -1, we populate it with 0.
exchange_vcrNumericalprebid_video_completionVideo Completion Rate. If it is -1, we populate it with 0.
exchange_viewability_rateNumericalprebid_viewabilityViewability Rate. If it is -1, we populate it with 0.
id_vintageNumericalid_vintage0 if the primary MM user id from bid request is between [0 and 1 week old], 1 if it’s between [1 and 18 weeks old], 2 if it’s between [18 and 52 weeks old], 3 if it’s more than 52 weeks old, 999 or higher If there is no MM UUID or incoming request has id_vintage >= 999, the calculation of the response rate will use id_vintage = 0 but LLDS Impressions will still log id_vintage >= 999.

Hardcoded Interaction Types

Feature NameTypeCorresponding field(s) in LLDS ImpressionsNotes
country_id_cs_region_idHardcoded interactionregion_id and country_idcountry_id joins region_id by dash
exchange_id_cs_category_idHardcoded interactionN/ARead exchange_id and category_id from impression table. Combination of exchange_id and category_id
exchange_id_cs_vcrHardcoded interactionN/ARead exchange_id and prebid_video_completion from impression table. Combination of exchange_id and prebid_video_completion
exchange_id_cs_ctrHardcoded interactionN/ARead exchange_id and prebid_historical_ctr from impression table. We round prebid_historical_ctr using the same rounding convention as prebid_video_completion above. If there is no record for this section, it will record -1. See the rest of 'Hardcoded Interactions' for more information.
exchange_id_cs_vrateHardcoded interactionN/ARead exchange_id and prebid_viewability from impression table. Combination of exchange_id and prebid_viewability. We round prebid_viewability down to the nearest multiple of 10. For example, 120, 121, and 129 all become 120. If there is no record for this section, it will record -1. See the rest of 'Hardcoded Interactions' for more information.
exchange_id_cs_site_idHardcoded interactionN/ACombination of exchange_id and site_id

Categorical Types

Feature NameTypeCorresponding field(s) in LLDS ImpressionsNotes
base_domainSimple categoricalsite_urlExtract effective top-level domain from t1.site_url (see Base Domain notes below here)
browserSimple categoricalcontextual_dataSee WURFL features extraction
browser_idSimple categoricalbrowser_idWe recommend using the 'browser' and 'browser version' features instead of this one due to better granularity and accuracy.
browser_language_idSimple categoricalbrowser_language_idBrowserLanguageID: If browserLanguage is set to 0 in LLDS Impressions, browserLanguage id was not send to BYOA.
browser_versionSimple categoricalcontextual_dataSee WURFL features extraction
category_idSimple categoricalcategory_id-
channel_typeSimple categoricalchannel_type1 = Display, 2 = Video, 3 = Social, 4 = mobile display (web), 5 = mobile video (web), 6 = search, 7 = email, 8 = mobile display (in-app), 9 = mobile video (in-app), 10 = Newsfeed (FBX)
conn_speedSimple categoricalconn_speed_idread LLDS Impressions.conn_speed_id and store as conn_speed in the model
cookielessSimple categoricalcross_device_flagcookieless can be derived from the cross_device_flag field in LLDS Impressions using the following logic: If cross_device_flag = (2 or 3) then cross_device=TRUE else FALSE
creative_idSimple categoricalcreative_id-
cross_deviceSimple categoricalcross_device_flagcross_device can be derived from the cross_device_flag field in LLDS Impressions using the following logic: If cross_device_flag = (1 or 3) then cross_device=TRUE else FALSE
country_idSimple categoricalcountry_id-
device_idSimple categoricaldevice_idWe recommend using ‘device_manufacturer’, ‘device_model’ and ‘device_type’ features instead of this one due to better granularity and accuracy.
device_manufacturerSimple categoricalcontextual_dataSee WURFL features extraction
device_modelSimple categoricalcontextual_dataSee WURFL features extraction
device_typeSimple categoricalcontextual_dataSee WURFL features extraction
day_of_weekSimple categoricaltimestamp_gmt0 = Sunday ... 6 = Saturday
day_partSimple categoricaltimestamp_gmtThe day part is based on when the impression was served and the user’s timezone. 0 = 12AM to 5:59AM; 1 = 6AM to 11:59AM; 2 = 12PM to 5:59PM; 3 = 6PM to 11:59PM
deal_idSimple categoricaldeal_id-
dma_idSimple categoricaldma_id-
exchange_idSimple categoricalexchange_id-
fold_positionSimple categoricalfold_position1 = Above the fold, 2 = Below the fold, 0 = Unknown
hashed_app_idSimple categoricalread LLDS Impressions. app_id -> calculate hashed_app_idSee App ID notes.
isp_idSimple categoricalisp_id-
interstitialSimple categoricalinterstitial-
num_device_idsSimple categoricalt1.num_device_ids is null or 0, it should be 0, if 1, then it should be 1, else if > 1, it should be set as 2
osSimple categoricalcontextual_dataSee 'Device Information' below
os_idSimple categoricalos_idWe recommend using the 'os' and 'os_version' fields instead of this one due to better granularity and accuracy.
os_versionSimple categoricalcontextual_dataSee WURFL features extraction
pixelSimple categoricaloverlapped_brain_pixel_selectionsSee 'Audience Data' section below.
region_idSimple categoricalregion_id-
sizeSimple categoricalsize32-bit encoding of creative size, calculated by making the high 16 bits the width and the low 16 bits the height, see 'Size Encoding/Decoding' section below
site_idSimple categoricalsite_id-
user_frequencySimple categoricaluser_frequencyRefers to 'Session Frequency' on T1 Knowledge Base page
video_placement_typeSimple categoricalvideo_placement_typeVideo placement type. 1: In-stream; 2: In-banner; 3: In-article; 4: In-feed; 5: Interstitial/slider/floating
video_skippabilitySimple categoricalvideo_skippabilityFlag to identify video skippability ; 0: non-skippable; 1: skippable; null: unknown
week_partSimple categoricaltimestamp_gmtThe week part is based on when the impression was served and the user’s timezone. 0 = weekday; 1 = weekend

Audience Data

There are three audience-based features in our model that are derived from the overlapped_brain_pixel_selections field in LLDS Impressions are separated by | char. format is: pixel, bidder_pixel_frequency, and bidder_pixel_recency

For context, overlapped_brain_pixel_selections is a pipe-delimited list of tuples that contain segment membership information. Each tuple is of the format mm:px1:r1:f1; the components of the tuple are separated by colons and can be interpreted as follows:

"mm" - the namespace of the pixel

"px1" - the pixel_id of the audience segment. In the logistic model, this is converted into a binary field indicating whether the user is in this segment.

"r1" - the recency, or amount of time that has elapsed, since the user was added to px1. In the logistic model, this is converted into a mapped numerical field whose value is equal to 1440.0/r1. By way of background, 1440.0/recency is simply converting the recency value, which is denominated in minutes, to its inverse, measured in days—there are 1,440 minutes in a day.

Input: recency_minutes

we calculate recencyDays := math.Max(recencyMinutes/1440.0, 1.0)

and limit recencyDays as follows:
recencyDaysFn = 1.0 / math.Min(200, recencyDays) // math.Min(200, recencyDays) will limit recencyDaysFn in the range 0.005...1

If the recency is zero (perhaps because recency data is not available for that audience segment), the corresponding map-entry for recency would not exist. I.e. we do not allow division by zero.

"f1" - the frequency, or amount of time that has elapsed, since the user was added to px1. In the logistic model, this is converted into a mapped numerical field whose value is simply equal to f1.

If frequency > 200 frequency will be set to 200.

Device Data

Model features related to a device, including browser, browser_version, os, os_version, device_manufacturer, device_model, device_type are derived from the contextual_data field in LLDS Impressions. For example

{
  "24": {
    "1": {
      "targeted": [],
      "untargeted": [
        "br_Chrome:ve_60.0.3112"
      ]
    }
  },
  "25": {
    "1": {
      "targeted": [],
      "untargeted": [
        "os_Windows:ve_10.0.0"
      ]
    }
  },
  "26": {
    "1": {
      "targeted": [],
      "untargeted": [
        "fo_Desktop"
      ]
    }
  },
  "27": {
    "1": {
      "targeted": [],
      "untargeted": [
        "ma_Desktop Make:mo_Desktop Model"
      ]
    }
  },
  "28": {
    "1": {},
    "2": {},
    "3": {}
  }
}

Would be read as

browser = "br_Chrome"
browser_version = "br_Chrome:ve_60.0.3112"
os = "os_Windows"
os_version = "os_Windows:ve_10.0.0"
device_model = "ma_Desktop Make:mo_Desktop Model"
device_manufacturer = "ma_Desktop Make"
device_type = "fo_Desktop"

We include browser name in browser version (i.e. we prepend "br" in "br_Chrome:vs60.0.3112") because two different browsers could have the same version. The same logic holds for os_version and device_model.

Hardcoded Interactions

For exchange_id_cs_site_id: Formed by appending exchange id with other half of feature value, e.g. ExchangeID = 4 and site_id = 100. We will lookup exchange_id_cs_site_id^4-100 in features -> weights.

For exchange_id_cs_vcr

package main

import (
    "fmt"
    "strconv"
)

//Round is a bit slower but easier to read
func Round(input string, numberDec int) (string, error) {
    flval, err := strconv.ParseFloat(input, 64)
    if err != nil {
        return "", err
    }
    if flval < 0 {
        return "", nil
    }
    if numberDec == 3 && flval > 0 {
        return TrimRight(fmt.Sprintf("%.3f", flval), '0'), nil
    }
    return fmt.Sprintf("%.1f", flval), nil
}

// RoundAndTrim Rounds and Trims
func RoundAndTrim(input string, numberDec int) (string, error) {
    res, err := Round(input, numberDec)
    if err != nil {
        return "", err
    }
    return TrimRight(res, '0'), nil
}

// TrimRight removes zero padding. E.g.,
// 10.000 -> 10.0
// 10.100 -> 10.1
// 10.120 -> 10.12
func TrimRight(input string, cut byte) string {
    count := 0
    for i := len(input) - 1; i > 0; i-- {
        if input[i] == cut && input[i-1] != '.' {
            count++
        } else {
            break
        }
    }

    return input[0 : len(input)-count]
}

func exchange_id_cs_vcr(exchangeID, videoCompletion string) {
    vcRounded, _ := RoundAndTrim(videoCompletion, 1)
    fmt.Println("exchange_id_cs_vcr^" + exchangeID + "-" + vcRounded)
}

func main() {
    exchangeID := "10"
    prebidVideoCompletion := "20"
    exchange_id_cs_vcr(exchangeID, prebidVideoCompletion)
}

For exchange_id_cs_ctr

package main

import (
    "fmt"
    "strconv"
)

//Round is a bit slower but easier to read
func Round(input string, numberDec int) (string, error) {
    flval, err := strconv.ParseFloat(input, 64)
    if err != nil {
        return "", err
    }
    if flval < 0 {
        return "", nil
    }
    if numberDec == 3 && flval > 0 {
        return TrimRight(fmt.Sprintf("%.3f", flval), '0'), nil
    }
    return fmt.Sprintf("%.1f", flval), nil
}

// RoundAndTrim Rounds and Trims
func RoundAndTrim(input string, numberDec int) (string, error) {
    res, err := Round(input, numberDec)
    if err != nil {
        return "", err
    }
    return TrimRight(res, '0'), nil
}

// TrimRight removes zero padding. E.g.,
// 10.000 -> 10.0
// 10.100 -> 10.1
// 10.120 -> 10.12
func TrimRight(input string, cut byte) string {
    count := 0
    for i := len(input) - 1; i > 0; i-- {
        if input[i] == cut && input[i-1] != '.' {
            count++
        } else {
            break
        }
    }
    return input[0 : len(input)-count]
}

func exchange_id_cs_ctr(exchangeID, prebidHistoricalCtr string) {
    ctRounded, _ := RoundAndTrim(prebidHistoricalCtr, 3)
    fmt.Println("exchange_id_cs_ctr^" + exchangeID + "-" + ctRounded)
}

func main() {
    exchangeID := "10"
    prebidHistoricalCtr := "0.002"
    exchange_id_cs_ctr(exchangeID, prebidHistoricalCtr)
}

For exchange_id_cs_vrate

We round prebid_viewability down to the nearest multiple of 10. For example, 120, 121, and 129 all become 120.

package main

import (
    "fmt"
    "math"
    "strconv"
)

func exchange_id_cs_vrate(exchangeID, prebidViewability string) {
    vr, _ := strconv.ParseFloat(prebidViewability, 64)
    finalVal := int64(math.Floor(vr/10)) * 10
    fmt.Println("exchange_id_cs_vrate^" + exchangeID + "-" + strconv.FormatInt(finalVal, 10))
}

func main() {
    exchangeID := "10"
    prebidViewability := "19"
    exchange_id_cs_vrate(exchangeID, prebidViewability)
}

AppID

The raw bid request sends the hashed_app_id but we log app_id in the impression_log. If the impression.app_id is equal to "N/A" then the hashed_app_id was equal to "0" in the raw bid request. If the impression.app_id is different from "N/A" then the hashed_app_id needs to be calculated manually as per the following pseudo-code.

Use Boost Library 1.58

uint32_t m_HashedAppId = 0;

void setHashedAppId(const char* appid)
{
    if (appid) {
        m_HashedAppId = atoi(appid);
        if (m_HashedAppId == 0) {
            m_HashedAppId = MM::Utils::pstr_ihash()(appid) % INT_MAX;
        }
    }
}

struct pstr_ihash
    : std::unary_function<const char*, std::size_t>
{
    std::size_t operator()(const char* x) const
    {
        std::size_t seed = 0;

        while (*x) {
            boost::hash_combine(seed, ::toupper(*x++));
        }
        return seed;
    }
};

Please use the following code to test the calcHashedAppId. It takes app_id and calculates the hashed_app_id to be used in the model and these examples will make sure the implementation is correct

#include <iostream>
#include <string>
#include <boost/functional/hash.hpp>
#include <climits>
#include <cassert>
#include <cstring>

struct pstr_ihash
    : std::unary_function<const char*, std::size_t>
{
    std::size_t operator()(const char* x) const
    {
        std::size_t seed = 0;

        while (*x) {
            boost::hash_combine(seed, ::toupper(*x++));
        }
        return seed;
    }
};

// pass appId from impressions
// TODO: add handling for special case:
// If App ID is absent, LLDS Impressions logs N/A for app_id
// if app_id = N/A -> hashed_app_id = 0
unsigned int calcHashedAppId(const char* appid)
{
    unsigned int m_HashedAppId = 0;
    if (appid) {
        if (std::strcmp(appid, "N/A") == 0) {
            return 0;
        }
        m_HashedAppId = atoi(appid);
        if (m_HashedAppId == 0) {
            m_HashedAppId = pstr_ihash()(appid) % INT_MAX;
        }
    }
    return m_HashedAppId;
}

struct testcase {
    const char* input;
    unsigned int expected_output;
};
  
int main() {
   // your code goes here

   std::vector<testcase> tests {
      {"com.fivemobile.thescore", 1453566594},
      {"605581486", 605581486},
      {"tunein.player", 1173358324},
      {"com.aws.android", 1903276095},
      {"com.document.pdf.scanner.docscan", 1812585910},
      {"com.apalon.weatherlive.free", 591449217},
      {"com.pandora.android", 1387399900},
      {"com.weather.weather", 447752198},
      {"de.wetteronline.wetterapp", 1107246225},
      {"439873467", 439873467},
      {"N/A", 0},
   };

   for (unsigned int i = 0; i < tests.size(); i++) {
      assert(calcHashedAppId(tests[i].input) == tests[i].expected_output);
   }

   return 0;
}

Base Domain

We recommend using a library from https://publicsuffix.org/learn/ to derive the base_domain from site_url. Bellow is an example of what we expect.

site_urlbase_domaincomment
www.yahoo.comyahoo.com
finance.yahoo.comyahoo.com
https://www.sports.yahoo.comyahoo.com
w.main.welcomescreen.aol.comaol.com
bap.navigator.web.deweb.de
www.ebay.co.ukebay.co.uk
www.u.ggu.gg
https://www.u.ggu.gg
https://u.ggu.gg
http://sqlserverbuilds.blogspot.comsqlserverbuilds.blogspot.com
prebidsetup.an.r.appspot.comprebidsetup.an.r.appspot.com*.r.appspot.com is a valid suffix
r.appspot.comr.appspot.comappspot.com is a valid suffix
this.is.a.test.readthedocs.iotest.readthedocs.ioreadthedocs.io is a valid suffix
pythonguidecn.readthedocs.iopythonguidecn.readthedocs.ioreadthedocs.io is a valid suffix
check-ozmall.global.ssl.fastly.netcheck-ozmall.global.ssl.fastly.netglobal.ssl.fastly.net is a valid suffix
test.fastly.netfastly.netfastly.net is a valid suffix
http://mp2f-m-env.ap-northeast-1.elasticbeanstalk.commp2f-m-env.ap-northeast-1.elasticbeanstalk.comap-northeast-1.elasticbeanstalk.com is a valid suffix
thisisatesterror: publicsuffix: cannot derive eTLD+1 for domain "thisisatest""

Size Encoding/Decoding

32-bit encoding of creative size, calculated by making the high 16 bits the width and the low 16 bits the height.

//encore_decode_size.go
package main

import (
	"errors"
	"fmt"
	"strconv"
)

func encode(width uint32, height uint32) uint32 {
	var size uint32
	size = (0xFFFF & height)
	size |= (width << 16)
	return size
}

func decode(val string) (int, int, error) {
	size, err := strconv.ParseUint(val, 10, 32)
	if err != nil {
		return 0, 0, errors.New("strconv.ParseUint() failed")
	}
	width := int((size >> 16) & 0xffff)
	height := int(size & 0xffff)

	return width, height, nil
}

func main() {
	fmt.Printf("%d\n", encode(320, 50)) // Will output 20971570

	width, height, err := decode("20971570")
	if err != nil {
		fmt.Println(err)
		return
	}
	fmt.Printf("%dx%d\n", width, height) // Will output 320x50
}