Thursday 3 March 2011

The Goldilocks Formula and the Web of Possibilities

First published on the Hyro blog May 21 2009. The rambling epic braindump I referred to in my last post

Search and Market Research will converge. They will become indistinguishable from one another. We will arrive at a point where many products, services and content are created on a just-in-time basis. Exactly at the moment you realise you need it, the perfect music compilation, job offer, electronic device, documentary video, or dinner date will appear right in front of your face.

Maybe even in 3D. Make sure you have the special glasses ready, just in case.
Let me explain.

Market research is all about finding out what people want, and getting into their heads, so you can become better at building products they want - or at persuading them to want the products you build (AKA advertising). Search is also all about finding out what people want, and getting into their heads, so you can find and deliver what they want.  Wait a second, you say – is that last statement really true?
In order to justify my argument, it looks like I’ll need to digress into the recent past and near future of search.

Here goes…

The Next Big Thing in Search

In the mid 1990’s, the internet comes along. It’s like the biggest library ever assembled, the store of all human knowledge, profanity, trivia and vanity. 1 million monkeys with a million typewriters, Shakespeare, the lot. But it is too large for a single human brain or single human lifetime to comprehend or conquer. It is functionally infinite. It is labyrinthine. It is unruly chaos – with no order, classification or government. It is nearly impossible to locate the Shakespeare within the nonsense of the million monkeys. The only way to handle it is to find a little corner and build a pleasant walled garden in which to while away your days… or… Google.

Google comes along. Well, Google and others, but let’s simplify history. Google comes along and solves the problem. Hurrah. The garden walls are trampled, needles are found in haystacks everywhere, the internet thrives; Google lists on the stock exchange and reaches a market cap of around $200 billion.

Fast forward to today – internet speeds are much, much faster, and there’s a lot more and different types of content. It’s not just words and pictures and animations. The internet has punched the music industry inside out and is just getting started on the film and television industry. People are accessing the internet on their phones and TVs as well as their computers.

Today - the analogy for the internet is no longer ‘the world’s largest library’. Today - we’re faced with the world’s largest record shop, the department store where the aisles stretch off into infinity, the cable TV network with 99 million channels (and still nothing on).

So we’re back to square one. And Google, as we know it, can’t help us.

The simple reason is that the songs, videos and products we want to find aren’t composed of text or consistently described by the text they contain (even considering metadata). We can’t rely on the object of desire (be it a song, a movie, or a bargain) to effectively describe itself in words and the user can’t effectively articulate his or her desire in words “I wanna watch a movie that’s really, umm… good”.

So how do we solve this problem?

There are a number of promising approaches, and I’ll go through a few of them in a moment. But at the crux of them all are these two principles:
1.       content is best described and classified by the way humans use it and behave around it; and
2.       the desires of humans can be predicted by their past behaviour and current state 

By the way, in case you haven’t worked it out yet. ‘Search’, ‘Recommendations’ – they’re the same thing. I make no distinction between the two here.

Anyhow, let’s look at a few interesting approaches to the problem.

Collaborative Filtering

Collaborative Filtering is the science behind many of the most successful recommendations engines. The best known users include Amazon, eBay, iTunes and TiVo.

From Wikipedia:
Collaborative filtering systems usually take two steps:
1.       Look for users who share the same rating patterns with the active user (the user whom the prediction is for).
2.       Use the ratings from those like-minded users found in step 1 to calculate a prediction for the active user.


Alternatively, item-based collaborative filtering popularized by Amazon.com (users who bought x also bought y) and first proposed in the context of rating-based collaborative filtering by Vucetic and Obradovic in 2000[citation needed], proceeds in an item-centric manner:
1.       Build an item-item matrix determining relationships between pairs of items
2.       Using the matrix, and the data on the current user, infer his taste


Amazon and eBay use the second method, which relies on what people bought.

iTunes and TiVo use the first method, which relies on what people liked.

This points to something really interesting which scientists have observed about consumer preference.  We used to talk about ‘confirmation bias’. People took a bit of time to decide whether to buy something, and once they got it home and used it, they were more likely than not to convince themselves they’d made the right decision.

Confirmation bias simply does not apply when we’re talking about music and video. The mere fact that someone bought a piece of music or rented a movie, or invested time listening/watching makes no difference to whether they’ll like it or not. That’s why user ratings are so important to music and video sites.

I predict we’re heading into a world where confirmation bias is less and less prevalent in all sorts of purchases and markets. Therefore, it will be even more crucial to deliver what people want rather than convince people they want what you deliver. I’m going out on a limb here, there’s no real data to back this up yet. But if I’m right, it’s another nail in the coffin of advertising-as-we-know-it. Yes, I’ll happily add my voice to that chorus of doomsayers. Google does not advertise. But Google has a brand and people trust the Google brand. Google got there by giving people really useful stuff.
Anyway, let’s crack on…

Choice Modelling

In terms of practical applications Choice Modelling is still very much under the radar.
The inventor of Choice Modelling, Dan McFadden, won the Nobel Prize for economics in 2000.

Choice Modelling is highly accurate, much more so than current Collaborative Filtering methods, but operates within certain limitations. It is excellent for predicting individual and segment preference for things you can describe as a series of attributes. For example you can describe and compare cars by make, colour, number of doors, fuel economy, and price. You could similarly describe and compare mobile phones, holidays, job offers, shares in a company, web pages.

You couldn’t easily break down music or video into comparable attributes. Nor does Choice Modelling recognise ratings – it’s about a binary decision to purchase/consume or not. Choice Modelling is used to mine existing sets of data and identify optimum product configurations. It is also used to conduct experiments which accurately predict demand for hypothetical new products and configurations. Choice Modelling the science is behind the Accenture XoS software, which automatically varies and optimises webpage content and configuration in real time.

I can imagine an application whereby consumers invest 10 minutes of their time completing a choice experiment (survey) on certain type of product – say mobile phones. They learn about features, and accept or reject a number of hypothetical alternatives. The application then builds a choice model for the individual, and trawls the internet to find the best deal for them, based on their unique preference structure. (Anyone has who has a bit of money to play with – contact me, let’s build this thing, I know how.)

Alternatively we could coerce Warren Buffett into completing a choice experiment on stock selection, and build a Warren Buffett ‘bot to buy shares for us and make us rich. Then again, maybe not.

I reckon Choice Modelling will come strongly into play once we move from a universe in which consumers are faced with a functionally infinite range of existing things to choose; to a universe where consumers are faced with a literally infinite range of ‘possibilities’ - options that do not exist yet but could be quickly assembled. More on that later.

Collaborative Indexing


Collaborative Indexing is a term I made up to describe how humans actually operate to solve the problem of functionally infinite choice.

It’s what ‘the kids’ do when they talk about and classify music and subcultures. I have no idea what ‘New School Speed Garage’ means; nor ‘Electroclash’. I’m not exactly sure what an ‘Emo’ is or does.  But everyone who needs to know seems to know. A new musical genre or youth subculture is invented, tagged and propagated in the flicker of an eyelid. No-one is in charge, there’s no formal process, and no authoritative lexicon exists. Yet it works – flawlessly, and instantly. Every teenager across the globe gets the memo. And they all conspire to keep it from their parents (who rely on the tabloid press for alarming misinformation).

The person who can figure out what’s happening here and put it in algorithm will be fabulously rich. He or she won’t even need to put it into an algorithm. The kids are doing a perfectly good job as is. Developing a means to facilitate and accelerate this process online would be powerful enough. 

When you’re online, you can easily shift persona without changing your wardrobe or getting a new haircut. You could use multiple different ‘personalities’ to tap into the Collaborative Indexing power of whatever subculture or interest group would be most useful for a certain search or type of content. 

Paying Smart People a Million Dollars (to work it out for you)


This is how Netflix is going about solving the problem. It’s an approach I particularly like, and would recommend to anyone sitting on a million dollars and a set of data potentially worth a million times that.

Netflix have offered a cash prize of one million US dollars to whoever can come up with a recommendation algorithm that represents a significant improvement in prediction success compared to their current one. Groups and individuals that wish to participate are given a huge set of real data to play around with. This data is more of a motivation than the prize for some participants - it’s the kind of real data that would cost much more to collect than any academic research budget would allow. Participants are encouraged to collaborate and share details of their approach. The winner of the grand prize, as well as winners of progress prizes, are required by Netflix to publish a description of their algorithm, which becomes public domain science, accessible to all.

In effect, Netflix will pay a million dollars for a piece of intellectual property that they will not own, and that anyone, including a competitor, is free to use.  Google’s algorithm is a more closely guarded secret than the Da Vinci Code. Netflix will let anyone who wants to use theirs.

But this is not an act of philanthropy, or of idiocy. It highlights an essential feature of the next big thing in search: the data is more important than the algorithm (or at least as important).

In the world of Google-as-we-know-it, the dataset is the zillions of words on the internet that anyone can look at, and the algorithm is the super-clever, highly-valuable thing that makes sense of the data. In the world of the next-big-thing, the dataset consists of highly valuable bits of information on what humans buy, like and do, and where they are. The bigger the data set the better, and these data sets are precious, closely guarded property. Without access to the data, the new algorithms, no matter how good, are useless.

So, am I saying that Google is dead? (OMG!) No, I am not saying that. Larry Page and Sergey Brin had probably thought this far ahead by the time they were 12. Google has more data, and more relevant data, than anyone else. Do not sell your Google shares. Buy more.

And, by the way, the datasets that Netflix gives away to prize participants are real, but masked. They’re great for testing theories, but you can’t do anything commercially useful with them.

As an aside – if you are in a business that looks anything like Netflix (i.e. if you deliver music or video, or books, or anything similar) I strongly advise you to collect data in the same format as Netflix – a rating scale of 1 to 5. For the blindingly obvious reason that you, too, will be able to use the winning algorithm.

Back to the Point


Well, that was a considerable digression, but I think I’ve explained what I mean when I say that search is (or will be) all about working out what people want, and getting into their heads, so you can find and deliver what they want. In that regard, search is (or will be) similar to market research, which is all about working out what people want, and getting into their heads, so you can become better at building products they want - or at persuading them to want the products you build (AKA advertising).

We’ve already killed off advertising. In the new world, we get this powerful information about what people want instantaneously, and can act on it instantaneously. We’re not sitting on a warehouse full of red widgets. In fact, all known things are just a click away. So, why would we use this information to try to convince someone what they really want is a red widget? If what they want is out there, we get it for them.

And if what they want is not out there, even given the hundreds, or thousands, or millions of options?

Well, if we could make it for them, we would, right? Given this powerful insight into a consumer’s desires, surely we’d take advantage of it…
…as long as we could make some money out of it. So we’d need a matching algorithm to work out if we could make a margin, and how to optimise that margin.

The products and services we offer would then exist as myriad permutations of configuration and price. Some permutations would be profitable for us, some would not. Where the desires of a consumer or a segment of consumers matches a profitable permutation, we make that permutation, and deliver it. Depending on what business we’re in, this loop could be completed almost instantaneously.

In any case we wouldn’t make something unless we were sure there was demand for it.
So -  the Next Big Thing in web search is algorithms fed with rich behavioural data. And these algorithms then enable the Big Thing After That – which is the Web Of Possibilities
But this is all a bit abstract. Let me give you a few examples of how it might pan out.

The Big Thing After The Next Big Thing


Your flight has levelled out and the seatbelt light is off. You pull out your laptop. YouTV knows what you like to watch, knows that you’re travelling for pleasure, and that the flight time means you’ll have 25 minutes to watch video. During the taxi ride to the airport, YouTV has downloaded three alternative viewing packages based on: your preferences; this morning’s zeitgeist; analysis of YouTube sessions; et cetera. All of the packages are automatically created mini-documentary ‘riffs’ – related videos strung together. Your options are: heavyweight title-fight mismatches; risqué European TV commercials; and great jazz performances. You buy the jazz. YouTV rates this as a normally low probability choice, but it knows that you are a single male, and that another of its customers, a female around your age, will be sitting in the adjacent seat. YouTV algorithms accommodate situational shifts in persona. You make a good impression.

The last time you bought a phone on the internet was only a year ago. From your purchasing history, and preferences inferred from the way you drilled down to find more information about certain options presented, and ignored others, YouPhone knows you are most interested in stylish form, low weight, and music features. Their R&D department has just come up with some new case designs, and the means to produce more gigabytes of memory at a lower size, weight and cost. You are pushed an offer to upgrade to a stylish, feature-packed new handset, which has never been made. You buy it. The factory in China starts making it. You have it within a week.

You’re cycling in the park, listening to music on your new phone. YouTunes knows who you are, what you like, what your friends are listening to, where you are, and infers by your velocity that you’re on a bike. YouTunes programs and delivers a playlist perfect for you at that moment. Some of the songs you’ll have purchased licences for already, some will be new songs that YouTunes knows you’ll probably buy, some will be songs you’ll be predicted to like but not buy, one might be a song that your new girlfriend (the one you met on the plane) is listening to at the same time, and has decided to share with you.

You’re looking for a new job, and have decided to invest 15 minutes doing a simple survey on a job site, so that it can create a model of your preferences. YouCorp is looking for someone with your skills, and has posted a vacancy to the same site. The job vacancy is in the form of a series of trade-offs that YouCorp is willing to make - in wages, fringe benefits and working hours. The job site crunches the data makes an automated offer on behalf of YouCorp. The job site knows that you attach a value equivalent to $10,000 salary to a company dental plan; you have a similar strength of preference for a location close to the city, and couldn’t care less about a company car. You’re offered a job in YouCorp’s city office, with a dental plan. The money is seems good (the savings YouCorp will make by not providing a car have topped up the salary). You take the job.

This is the Web of Possibilities. Enabled by detailed insight into the behaviour and preferences of groups and individuals, the internet connects humans with a humungous matrix of complex and abstract things, and an even more humungous super-matrix of complex and abstract possibilities - things that might exist, and will exist as soon as they are desired.

The Web of Possibilities can’t encompass everything, or every market. Certain things, like human beings to meet, or holiday locations to visit, will always be limited to a finite set of existing options.

Or will they? Better hang on to those 3D glasses after all.

As an epilogue, my prediction of the death of advertising is probably exaggerated. Advertising in some form will always be around. Not only because we’ll always need information about what’s available, and what’s new, but because we actually need to be sold the implicit benefits. Reassurance that the blandishments of consumer society are in fact deeply fulfilling is the opiate of the post-industrial masses. Every now and then the buzz wears off and we crave new and better ways to revive it.

No comments:

Post a Comment