The Sigma Mobile Advertising Blog

Experiences in mobile advertising technology

Predicting a Start-Up’s Success

with one comment

I recently ran across RightSide Capital Management (RSCM),an investment fund that targets seed-stage start-ups. What makes them unusual is that they go for high volume and quick turn-around on investment decisions: they intend to fund about 100-200 start-ups a year, which is a huge number even for the biggest firms. Their web site describes a standardized process for evaluation and investment.

A brief history of consumer credit scoring

This reminds me in many ways of the adoption of consumer credit scoring by lenders. In the “good old days”, loans were made by loan officers who applied experience and judgment to determine who is likely to pay back and who is likely to default.  Loan applications included sometimes lengthy interviews where the loan officer would ask things presumed to be important: living arrangements, income, education, “I knew your father in college”, etc. These answers would then be evaluated with respect to the considerable knowledge that had been accrued over the years (“consider character, capacity, and collateral”) and a decision would be made.

No one questioned this process because until the 1950s, no one had the computational means to evaluate it. (You could also argue that once people were allowed into the private club of loan officers, it was in their interest to protect the sanctity of that club.) But in 1956, Ray Fair and Earl Isaac founded Fair Isaac and Company, aka FICO, with the belief that credit decisions could be evaluated and optimized, and created the concept of credit scoring.

A credit score is a number that represents a consumer’s creditworthiness, that is, the likelihood that a person will successfully pay back a loan of a certain type. A credit score is simply the output of a predictive model, called a credit scorecard, that assigns weights to various factors. You could create such a score by hand: add income, subtract 10 times the number of late payments in the last two years, add 5 times the number of years o education, etc. A better model, obviously, would consider only factors that are statistically relevant to the likelihood of paying back a loan and would monotonically increase with such likelihood.  It would also fit a certain scale that’s easy to use, be robust over time, and have a number of other “engineering” properties.

The only way to create a good scorecard is by studying a large number of historical loans: what factors were known at application time and what was the outcome (i.e., paid back or default). These factors are what credit bureaus supply. The beauty of credit scoring is that it removes folklore and prejudice from the process. You can also argue that it removes human kindness, if you’re willing to say that it’s nice  to lend money to people who are at risk of not repaying debts.

Seed-funding scorecards

Now, back to RSCM. My impression is that they intend to apply a similar systematic decision-making process to seed funding. How successful can this be?  It would (I hope) remove age bias from funding decisions. But think about this: age is used as a proxy for energy and innovation, which seem quite reasonable as predictors for success. Now, how do you quantify energy and innovation? There are no credit bureaus that record the number of hours an entrepreneur works each week. Do you ask the applicant?  (That’s the equivalent of a credit card application asking, “Do you always make your payments on time?”)  You either adopt some kind of proxy — like (yech) age — or you interview the entrpreneur directly. Just like conventional funds do.

We often hear about entrepreneurial characteristics that predict success, but what about those that predict failure? You can’t create a credit scorecard using only applications that resulted in paid-back loans — you also need need a healthy supply of negative loans so you can see what factors separate the goods from the bads.  I’m curious what negative factors are considered in RSCM’s decision model.

This suggests another difference that separates credit scoring from seed-funding: how exactly do you define success? Failure is easy: a default is equivalent to a company that’s folded. But what’s success? Preferred owners getting their money back? Everyone getting rich?  I have no idea.

I’d love to hear comments about this and more insights into RSCM’s approach.

Written by Mark Westling

February 24, 2010 at 00:45

Posted in Uncategorized

We’ve Been Busy

leave a comment »

In the past few weeks, we’ve been busy experimenting with new technologies, in particular NoSQL databases. If you’re not familiar with them now, you  will be during the next few years.  In the meantime, take a look at Cassandra, CouchDB, MongoDB, and Redis.

These databases tend to loosen the  traditional RDBMS  ACID properties in exchange for speed and scalability. You might not want to use this for billable transactions (or maybe you do, if you’re careful about it) but it could be ideal for managing other forms of data. Instead of tables, many NoSQL databases manage “documents” that can be thought of as key-value pairs or hash tables.  Instead of defining a schema and forcing all records to fit that schema, you can add keys and values as you go. If you’re willing to forgo ACID transactions, replication becomes much, much easier.

If you think about what kinds of data must be managed by an ad platform, you’ll see that this approach fits quite naturally.  Expect to hear more about it from us in the future.

Written by Mark Westling

January 19, 2010 at 11:13

Posted in Uncategorized

The Operator Advantage

with one comment

Smart phones make it easy for ad networks to bypass the mobile operator. A major bonus for advertisers is that this eliminates the cost of sharing revenue with mobile operators. A major bonus for technology providers is that it eliminates the effort to integrate with a mobile operator’s infrastructure.

As anyone who has worked on mobile telecommunications networks can attest, carrier grade services can be a huge effort to design, test, and deploy. When a banner ad fails to appear quickly on a web page, a consumer seldom blames his ISP, but when an ad slows the presentation of a WAP or mobile web page, the operator is the first to be blamed. This is why operators enforce “carrier grade” requirements on all services associated with their name.

Given all that, why would anyone want to integrate an advertising platform tightly within a mobile operator network?  Here are some good reasons:

New ad delivery channels. A mobile operator can open up channels that are otherwise inaccessible. Text ads can be appended to network-generated SMSs, such as missed call notification (MCN), billing top-ups, and welcome messages. Audio ads can be inserted in IVR portals or personalized ring-back tone (PRBT) services, such as Turkcell’s Tone & Win.

On-deck applications and portals. Let’s face it: only a small percentage of the world’s mobile subscribers use iPhones and Androids.  The rest rely on traditional handsets, often supplied by the operator, with operator-selected applications and portals. Tomi Ahonen makes a convincing argument that the importance of iPhones and Androids — with respect to the world outside the U.S. —  is often overestimated.

Cross-channel campaigns. Working with the operator makes it much easier to deploy campaigns that span multiple channels. Our system, the Sigma Mobile Advertising Platform, is currently used by a large Asian operator to deliver initial text ads with embedded short codes that link to audio response IVR that then sends follow-up information via push SMS. Many other combinations are possible.

Ad-sponsored services. By integrating with the billing system, an operator can offer ad-sponsored services — for example, a ring-back tone service that is subsidized by occasional ads, or a rebate for calling minutes if the user opts to listen to an audio ad before the call.

Dropping charges for ad delivery. A serious problem with mobile advertising is that a subscriber is often charged for the delivery of the ad and responses to the ad. Any interactive advertising based on SMS ad delivery and interactive SMS responses runs into this problem. The only solution is reversing or dropping the charge and this can only be done with the cooperation of the operator’s billing system.

Demographic data. Operators are the trusted keeper’s of their subscribers’ demographic data and if privacy protection is in place, this information can be accessed by an ad platform. Privacy protection is the key.

Behavioral data. One of the most interesting and most valuable sources of targeting data is subscriber behavior, and calling behavior is accessible only through the operator. I’ve written previously about the value of calling behavior.

CRM. The best customer of a mobile ad platform can be the mobile operator itself. Some operators have realized that their ad platform can also be a CRM platform. If they can perform a data mining analysis to determine the customers most likely to churn, they can target those customers not with ads but with offers and promotions enticing them to stay.

What’s holding up mobile operators from jumping into ad delivery?  One problem is sensitivity to brand perception: a mobile operator that associated with unwanted ads loses subscribers quickly, and getting and retaining subscribers is what the mobile industry is all about. The message here is that mobile ad delivery must be designed carefully and thoughtfully.

Another problem is simply mobile operator culture. It’s long been a joke that every mobile operator wants to be the second operator in their market to launch a new service. The success of ad networks might change this mindset.

Image: Michal Marcol /

Written by Mark Westling

January 5, 2010 at 00:21

Posted in Advertising, Marketing

Slightly Off Topic: Why Automated Terrorist Detection Is Hard

with one comment

Sorry for the off-topic discussion, but I think would be useful to shed a little light on why it’s not as easy to automate the detection of potential terrorists as many people think. Think of this as a combination of demographic, behavioral, and contextual targeting. If your favorite ad platform doesn’t give you a 100% click-through rate, don’t be surprised if the government’s counter-terrorism platform doesn’t either.

I worked on similar problems in the first half of my career so I think I can add a little to the discussion. Comments to this post are welcome but please focus on the computational problem, not the political problem.

First, a little background.  The intelligence community has been trying to automate this sort of thing for decades. Some really, really smart people are involved. If you think it would help to “just throw some Stanford, MIT, and Carnegie Mellon Ph.D.s at the problem”, let me assure you, I’ve been to conferences on this subject that were filled with Stanford, MIT, and Carnegie Mellon Ph.D.s.

Automated intelligence can be organized at two levels: information fusion and indications & warning.

Information Fusion

Data fusion and information fusion are processes for collecting and merging data from multiple sources into coherent “pictures” of a situation. Think of de-duping mailing lists; i.e., recognizing that John Smith, J. Q. Smith, John Q. Smith, Johnny Smith, John Smiht, etc., are the same person. Ever looked at your credit report and seen items you don’t recognize? It’s probably from someone with a similar name and there wasn’t enough information for the credit bureau to determine exactly who it belonged to.

Now remember that the data received by credit bureaus is orders of magnitude more accurate than what can be gotten from overseas and particularly third-world sources. In many places it’s easy to change the name on your idenity card or passport — I see people leave Hong Kong and reenter under slightly different names all the time.

Indications & warning

Indications & warning (commonly termed I&W) is the high level process of making sense of fused data. You can think of it as finding a high-level model that fits low-level data, or asking the question, “What’s the meaning of what we’re seeing?” In general, it’s a recognition problem with many analogs such as disease diagnosis, manufacturing fault diagnosis, image understanding, or even natural language understanding (“what’s the meaning behind this collection of characters?”).

How would you go about building an I & W system for counter-terrorism? The obvious approach is to build a model of how to recognize a terrorist and fit data to the model. So, how do you build a model of a potential terrorist? It’s not easy. From what I’ve read, there aren’t any single defining characteristics. How about combining multiple characteristics, e.g., “received warning from father” + “traveled to Yemen”? Now you need to combine and weigh evidence. The simple way to do this in expert systems is by assigning weights, summing them, and applying a threshold. A better way is Bayesian probability, which considers conditional probabilities. But, how do you get those probabilities, and how do you know what’s significant and what’s not? Finally, how do you set a threshold? Make it too high and screen out terrorists while making innocent people unhappy; make it too low and people end up dead.

A better approach might be anomaly detection: instead of trying to model what terrorist are like, model what normal traveler are like and then look for anomalies. This might catch the terrorist who travels internationally on a one-way ticket without luggage. So, terrorists with half a brain (or more likely the people managing them) will do a better job in disguising themselves.


It’s common for smart people to underestimate the difficulty of problems especially in hindsight. This cognitive error is known as hindsight bias. According to Wikipedia:

Hindsight bias is the inclination to see events that have occurred as more predictable than they in fact were before they took place. Hindsight bias has been demonstrated experimentally in a variety of settings, including politics, games and medicine. In psychological experiments of hindsight bias, subjects also tend to remember their predictions of future events as having been stronger than they actually were, in those cases where those predictions turn out correct.

Sound familiar?

Here’s a possible solution that focuses on immediate behavior. Performing behavioral analysis at the airport makes sense but it requires TSA staff with a much higher level of skill than what I’ve seen and it also doesn’t help when dealing with flights coming from other countries.

Written by Sigma Limited

January 3, 2010 at 02:46

Posted in Uncategorized

A Great Example of Targeting Mobile Behavior

with one comment

For a long time I struggled to find good examples of how behavioral targeting could be used effectively in mobile advertising. It’s possible to measure mobile usage in many ways, but how can these measures be turned into something actionable, or at least more interesting to an advertiser than traditional ASL (Age, Sex, Location)? All examples were things like “if a mobile subscriber frequently requests sports content, then send him ads related to sports”. That’s fine, but wouldn’t you be better off simply delivering such ads alongside sports content? That’s contextual targeting, not behavioral targeting.

Then I had a chat with Tomi Ahonen, a mobile strategy author and consultant who writes on a variety of mobile-related topics at Communities Dominate Brands. Tomi reminded me of the concept of connectors, which I first read about in Malcolm Gladwell’s The Tipping Point. Connectors are the people with large social networks who “seem to know everybody”. They are also influencers.

Marketers love connectors because they set trends and influence others, and mobile networks provide a simple means to identify and reach them. How do identify connectors? Look for those subscribers who are making the largest number of calls to different destinations. Minutes of usage doesn’t help because it doesn’t differentiate someone who speaks to the same person every day for 30 minutes from the person who talks to ten different people for five minutes each day.

This also demonstrates the value that the mobile operator can bring to the table. It’s not easy working with operators and many mobile marketing/advertising technology companies avoid it. Demographic data can be gleaned from surveys or questionnaires and contextual data can be extracted from ad requests. But only the operator has general traffic data, and traffic is an incredibly rich source of information.

As a sidenote, you can also find connectors through social networks. Here’s a paper on that subject:
GuruMine: a Pattern Mining System for Discovering Leaders and Tribes by Goyal, On, Bonchi, and Lakshmanan.

Image: Danilo Rizzuti /

Written by Mark Westling

December 11, 2009 at 17:19

Posted in Marketing

The Curse of Dimensionality

leave a comment »

One of the nice things about the star-schema (and its close relative, the snowflake schema) is that it forces you to consider all dimensions of your reportable data. Suppose you’re delivering ads over multiple mobile channels and want to report the number of deliveries over time. A first attempt might be to create a record (in star-schema parlance, a fact table) that has the following values:

  • delivery attempts
  • verified deliveries

These summarize groups of individual deliveries according to some selection criteria. These criteria are essentially dimensions. A first attempt at selecting useful dimensions might be the following:

  • date
  • hour
  • delivery channel
  • campaign

Simple enough. This reduces the individual deliveries into hourly records organized by delivery channel and campaign. If we’re dealing with large volumes, this makes reporting easier. Actually, it makes reporting feasible. In some cases, these dimensions might suffice. I seriously doubt it, though. In the real world, both the mobile operator and the ad sales organization will invariably find this scheme simplistic.

Consider hourly reporting intervals. This may be sufficient for some channels but not for others, especially live or looped video.  So, it’s likely that day and hour will need to be replaced with something like:

  • day and time
  • interval duration

where interval duration may be 30 minutes, 15 minutes, or even smaller.

Delivery channel is another candidate for decomposition. Here, the decomposition is likely to be different kinds of services. SMS ads can be pushed by themselves to users or appended to existing messages (e.g., peer-to-peer SMS or operator-generated messages), so the simple “delivery channel” turns into:

  • delivery channel
  • operator service

If the delivery channel is content-based (as opposed to a messaging channel), we need to know the context of the delivery, for example:

  • content type
  • ad location with respect to content

Merely summarizing deliveries by campaign is insufficient, too; they need to be organized by all components of a campaign structure, such as:

  • campaign / flight / creative

Finally, what about the location, demographics, and behavior of the subscriber who received the ad? We need to add the following:

  • location
  • demographic attributes (age, sex, etc.)
  • behavioral attributes

So, our initial attempt to summarize deliveries according to four simple dimensions has now gotten extremely ugly. Does that mean we shouldn’t have started with those four dimensions but instead jumped directly into every dimension we could think of? Not necessarily. It does mean, though, that we need to plan ahead for extra dimensions and not be surprised when they become requirements.

By the way — apologies to anyone who found this page by searching for the phrase “curse of dimensionality”. This term is used in machine learning and statistics to convey the exponential growth of a problem space as dimensions are added. Suppose we were looking for patterns in our ad delivery data. If we represent our deliveries as points in n-space, where n is the number of dimensions we’re using to organize it, then most of the space will be empty and we’ll have a hell of time looking for those patterns.  But that’s a topic for another post.

Written by Mark Westling

November 27, 2009 at 17:15

Subscriber Privacy Ain’t Easy

with one comment

One issue in mobile advertising that’s often treated far too lightly is subscriber privacy, and by this, I mean the exposure–accidental or otherwise–of subscriber phone numbers and personal information.

Mobile ad platforms can integrate at different levels with operator networks:

  • They can share all information. They can, for example, keep local copies of the entire operator subscriber base, with periodic updates provided by the billing system. Ad requests pass visible MSISDNs.
  • They can share information but hide the keys. The platform can maintain local copies of the subscriber base but use encrypted or hashed MSISDNs.
  • They can share no information. The platform receives no subscriber data from the operator. All ad requests contain an encrypted or hashed MSISDN.

The first option is the easy one.  Auditing, validation, and operations in general are easy because the data in the ad platform matches the data kept by the operator. (Ok, synchronization can be an issue, but it’s a relatively minor one. The only problem occurs when a disgruntled employee walks off with a database dump, or when a developer decides to work with a copy of the database or log files on his laptop — and then loses it.

The second option is harder to implement but less risky. Ad requests and profile updates from the operator don’t use a plaintext MSISDN but rather pass some other kind of key, which might be an encrypted MSISDN, a hashed MSISDN, or something else that doesn’t directly identify the subscriber. This makes operations harder but poses much less risk of data loss. If a developer loses a copy of the database, whoever gets it might find subscriber profiles but won’t be able to match them to individual subscribers. He will, however, be able to tell the number of subscribers for this particular operator who are under 21, have premium services, live in a particular postal code, and so on.

The third option is the safest. The operator provides no information about the subscriber base. Instead, the ad platform builds its own profiles. It can do this through questionnaires, or by observing behavior, or by other means.  We can also assume that the MSISDNs are disguised inside ad requests. Now the operator data are completely protected, and the owners of the ad platform can make the point that they own the profiles that they gather. The biggest drawback is that operator data are usually quite good — home postal codes are verified through credit checks, for example– whereas it’s not easy to ensure that a subscriber doesn’t lie about everything in a questionnaire that he knows is tied to advertising.

So, what’s the right answer? It depends on technical constraints, legal constraints, and the operator’s level of comfort. At one extreme is the ad platform that’s bought outright by the operator and hosted at the operator’s data center. With such an arrangement, an operator is likely to feel comfortable with plaintext MSISDNs and unencrypted profiles. At the other extreme is a mobile ad service that’s located outside the operator’s premises. I’ve never found an operator that will knowingly agree to put their subscriber’s confidential information in the hands of a distant service provider. (This also has major legal implications if the service provider is in another country.)  If anyone knows of such an operator, please let me know — so I avoid it!

Update: I mentioned MSISDN as the key piece of identifying information kept with a profile but obviously name and home address count, too. I didn’t mention them because I can’t imagine why an ad platform would ever want to store that data. Maybe for customizing ad templates? I don’t think I’d feel more inclined to buy a product or service through an ad that starts “Hi Mark” and in fact, if I didn’t like the product, I’d be seriously turned off.

<a href=”″>Image: Danilo Rizzuti /</a>

Written by Mark Westling

November 20, 2009 at 06:04