Tuesday I went to a cool talk by Sanjeev Arora on the multiplicative weight method and, because I should be working on a different calculation, I thought I’d write up a description of a basic version of this algorithm (Mmm…I love the smell of procrastination in the morning.)
Suppose that there is a stock exchange where each day you can bet on whether a stock will go up or down that day. Of course you want to bet correctly on this stock every day so as to earn the maximum amount of moolah. But you don’t know much about stocks and are a bit lazy. But you do have access to a group of [tex]$n$[/tex] experts. Everyday you get to learn these experts predictions for that days movement of the stock price. Now of course, since these are experts, you’d like to do as well as the best of these experts. So what should you do? It seems that knowing what the experts predict shouldn’t help you perform as well as the best of these experts, right? I mean how would you know who the best of these experts is?
Well, it turns out that there is a way to guarantee performing almost as well as the best expert in this game. This is the heart of the multiplicative weight algorithm. Begin by assigning a weight to every expert. Since we don’t know anything at the time of our first bet we might as well assign a weight of one to every expert. Now intuitively what we are going to do is to increase the weight of experts when they correctly predict the outcome of a days movement. Thus we will update our weights as follows. If an expert got the prediction correctly, then we do nothing to the weight. If, on the other hand, the expert was wrong then we multiply the experts corresponding weight by [tex]$1-epsilon$[/tex] where [tex]$0< epsilon< frac{1}{2}$[/tex] is a small number which we will discuss later. Okay, so the weights will now begin to favor experts who are right more. Now how should we exploit this in our own playing of the game? Well each expert will have a prediction for the movement of the stock. What we do is we take a weighted majority of these predictions. That is if the weight of the experts predicting the stock will go up is more than half the total weight of all experts, then we will bet on up. If the weight of the experts predicting the stock will go down is more than half the total weight of all experts, then we will bet on down. A pretty simple algorithm, no? As a we play the game, we change the weights of the experts and bet depending on what the weighted majority of them predict. That's among the simplest strategies we could imagine…so how well does it work?
Now here is the cool thing. Suppose that we play the game [tex]$t$[/tex] times. Let [tex]$m(t)$[/tex] denote the number of times we, using the above algorithm, guess incorrectly (our number of mistakes). Further, let [tex]$m_i(t)$[/tex] denote the number of times expert [tex]$i$[/tex] guesses incorrectly. Then we claim that, for all experts the above algorithm satisfies [tex]$m(t) leq {2 log n over epsilon} +2(1+epsilon) m_i(t)$[/tex]. We'll prove this in a second but note that since this holds for all experts (for all [tex]$i$[/tex]), this means that it holds, in particular for the best expert. In other words, the number of mistakes we make is (ignoring for the moment the first term) at most roughly twice that of the best expert! Pretty cool, no?
How do you prove this? Well it's fairly straighforward. Note that the weight of the [tex]$i$[/tex]th expert after playing [tex]$t$[/tex] times is [tex]$w_i(t)=(1-epsilon)^{m_i(t)}$[/tex]. Define the "potential" function [tex]$phi(t)=sum_i w_i(t)$[/tex] (so that [tex]$phi(1)=n$[/tex].) Now what happens to the potential function as a function of time? Well the potential will always decrease as long as there is one expert who is wrong. What we are interested in is how it will decrease when we make a mistake in our guess, i.e. when by following the weighted majority we are wrong. In this case, at least half of the total weight is decreased by [tex]$1-epsilon$[/tex]. Thus in this case the potential will decrease by at least [tex]$frac{1}{2}+frac{1}{2}(1-epsilon)=1-frac{epsilon}{2}$[/tex]. Thus if we have made [tex]$m(t)$[/tex] mistakes at time [tex]$t$[/tex], this means that the potential at that time must satisfy [tex]$phi(t) leq nleft(1-frac{epsilon}{2} right)^{m(t)}$[/tex]. Now since the weights remain postive the potential function is always as great as one of its constituent weights: [tex]$phi(t) geq w_i(t)$[/tex] for all [tex]$i$[/tex] so that [tex]$w_i(t) leq nleft(1-frac{epsilon}{2} right)^{m(t)}$[/tex]. Putting this together with [tex]$w_i(t)=(1-epsilon)^{m_i(t)}$[/tex] we thus find that [tex]$(1-epsilon)^{m_i(t)} leq nleft(1-frac{epsilon}{2} right)^{m(t)}$[/tex]. Taking the log of this expression we obtain [tex]$m_i(t) log(1-epsilon) leq log n + m(t) log left(1-frac{epsilon}{2} right)$[/tex] and using [tex]$- log(1-x) < x+x^2$[/tex] for [tex]$0<x<frac{1}{2}$[/tex] plus throwing away some [tex]$epsilon$[/tex] corrections, we obtain our desired expression [tex]$m(t) leq {2 log n over epsilon} +2(1+epsilon) m_i(t)$[/tex].
So what use is this multiplicative weight algorithm? Well that was the subject of Sanjeev's talk, describing all sorts of cool applications of this algorithm, including boosting in machine learning theory, approximately solving semidefinite programs, approximately solving fraction packing and covering problems, and online convex optimization. These later tasks were the subjecto Sanjeev's talk and the interested reader is directed to this survey paper.
Bayesian => cool.
quantum?
I might have another potential application for this: fantasy football predictions! Seriously, no joke. If you’re simply betting on wins and losses in a standard “pick ’em” format, it’s no different than “up/down.” In this case the experts are the usual prognosticators at ESPN, CNNSI, etc. Now the question is how you could adapt it to a fantasy football league in which you pick players and experts project the rough number of points a given player will get in a given week. Since it’s not a binary-like prediction it doesn’t seem (at least at first glance) that you could use the algorithm in its present form. I would be curious to see if it could be modified to some form useful for this. Sorry, it’s Saturday which means I’ve got football on the brain…
Let me clarify something (now that my head is a little clearer – though football once again looms on the horizon since it is Sunday): this only works if you don’t know who the best expert is? And it seems odd that, in the algorithm, since you’re essentially tracking the “record” of the experts, that you wouldn’t eventually know who the best expert is. And, if you do, just pick what he/she picks and you’ll do better than the algorithm guarantees. Which all means that I must be missing something. What am I missing?
Well, presumably if you are taking the weighted average of a number of experts you can do better than any one expert.
If 3 experts have a probability of each being right 75% of the time, and these numbers are independant, then each should be weighted the same. If you followed the advice of the bes expert you would be wrong 25% of the time, but taking the average would meaan two experts must be wrong for you to make a wrong decision, which happens with probability 15.6%. So you can do better than any one expert.
Sorry, meant to say probability 0.156. So it happens 15.6% of the time.
“this only works if you don’t know who the best expert is?” Yes.
“And it seems odd that, in the algorithm, since you’re essentially tracking the “record†of the experts, that you wouldn’t eventually know who the best expert is”
Well you are “learning” who the best expert is (the one with the higher weight). Of course once you identify the best expert, you could just go with that expert. But the beauty of this algorithm is that you don’t need to do this. This comes in handy when, for example, the best expert starts to go wrong.
That being said, the success guarantee isn’t that great for this algorithm. I’m not sure how it would do for fantasy football, but it would be fun to try. Someone’s got to have tried this out for porfolio management, right?