The Sigma Mobile Advertising Blog

Experiences in mobile advertising technology

The Curse of Dimensionality

leave a comment »

One of the nice things about the star-schema (and its close relative, the snowflake schema) is that it forces you to consider all dimensions of your reportable data. Suppose you’re delivering ads over multiple mobile channels and want to report the number of deliveries over time. A first attempt might be to create a record (in star-schema parlance, a fact table) that has the following values:

  • delivery attempts
  • verified deliveries

These summarize groups of individual deliveries according to some selection criteria. These criteria are essentially dimensions. A first attempt at selecting useful dimensions might be the following:

  • date
  • hour
  • delivery channel
  • campaign

Simple enough. This reduces the individual deliveries into hourly records organized by delivery channel and campaign. If we’re dealing with large volumes, this makes reporting easier. Actually, it makes reporting feasible. In some cases, these dimensions might suffice. I seriously doubt it, though. In the real world, both the mobile operator and the ad sales organization will invariably find this scheme simplistic.

Consider hourly reporting intervals. This may be sufficient for some channels but not for others, especially live or looped video.  So, it’s likely that day and hour will need to be replaced with something like:

  • day and time
  • interval duration

where interval duration may be 30 minutes, 15 minutes, or even smaller.

Delivery channel is another candidate for decomposition. Here, the decomposition is likely to be different kinds of services. SMS ads can be pushed by themselves to users or appended to existing messages (e.g., peer-to-peer SMS or operator-generated messages), so the simple “delivery channel” turns into:

  • delivery channel
  • operator service

If the delivery channel is content-based (as opposed to a messaging channel), we need to know the context of the delivery, for example:

  • content type
  • ad location with respect to content

Merely summarizing deliveries by campaign is insufficient, too; they need to be organized by all components of a campaign structure, such as:

  • campaign / flight / creative

Finally, what about the location, demographics, and behavior of the subscriber who received the ad? We need to add the following:

  • location
  • demographic attributes (age, sex, etc.)
  • behavioral attributes

So, our initial attempt to summarize deliveries according to four simple dimensions has now gotten extremely ugly. Does that mean we shouldn’t have started with those four dimensions but instead jumped directly into every dimension we could think of? Not necessarily. It does mean, though, that we need to plan ahead for extra dimensions and not be surprised when they become requirements.

By the way — apologies to anyone who found this page by searching for the phrase “curse of dimensionality”. This term is used in machine learning and statistics to convey the exponential growth of a problem space as dimensions are added. Suppose we were looking for patterns in our ad delivery data. If we represent our deliveries as points in n-space, where n is the number of dimensions we’re using to organize it, then most of the space will be empty and we’ll have a hell of time looking for those patterns.  But that’s a topic for another post.

Advertisement

Written by Mark Westling

November 27, 2009 at 17:15

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.