Yesterday I attend a talk by local CS graduate student Atri Rudra describing work he did with Don Coppersmith (IDA, Princeton) and Lisa Fleischer (IBM, NY) where he described their work on rankings for weighted tournaments (see here and here.) The result is pretty neat, so I thought I’d describe it here.
Suppose you are have a tournament where every player plays every other player exactly one time (the results are more general than this, but let’s keep it simple) and there are no ties. From this tournament, you’d like to develop a ranking of the players which most respects who beat who. In other words, you’d like to rank the players from first place to last place in such a way that you minimize the number of times in your ranking that a lower placed person beats a higher placed person. Now, it turns out that this problem, finding the optimal such ranking, is NP-hard (this was recently shown by Ailon, Charikar, and Newman) I.e. as our tournaments get bigger and bigger we have an ice cube’s chance in hell of finding this optimal ranking before we die 😉
So what do you do in the real world? Well a simple strategy you might take is to give a ranking according simply to the number of times each player has won. Thus the first ranked person is the person with the most wins, the second player has the second most wins, etc, and we break tries in some arbitrary manner. A natural question to ask, then, is how close to the optimum ranking does a ranking by this simple strategy get? Let’s measure our rankings by counting the number of times that our ranking fails (i.e. the number of times in our ranking that the lower ranked person beat a higher ranked person.) Then, there is a value for this quantity for the optimal such ranking and for a ranking obtained by the simple strategy of ranking by number of wins. What is shown in the paper by Coppersmith, Fleischer, and Rudra is that the simple raning by the number of wins has at most five times as many failings as the optimal ranking. This is rather amazing, in my simple mind: we get within a factor of five (at worst) by simply using a ranking which orders by number of wins (with ties broken arbitrarily.) In fact, what is also shown by these authors is that it doesn’t matter how you break ties: it will always be the case that the worse case is within a factor of five.
Kind of neat, no? A natural question, of course, is if there are strategies which can be efficiently computed but which beat this factor of five bound. Indeed! It turns out that there are algorithms which give only factors of three. These algorithms, however, are more complicated. What is so nice is that the simplest ranking you could come up with, has such a nice approximation bound!
First link to the work is broken.
What I find interesting is the comment that being off by a factor of five by simply counting wins is considered good. To me, that seems really bad, especially considering that all major pro US sports, at least, decide on who gets to the postseason just by counting wins, even though they would have plenty of time and money to do lots of complex processing on the complete season results. I know pro sports aren’t quite a simple tournament where everyone plays everyone else once, but I can’t imagine that it’s completely different from the idealized case.
I suppose this isn’t _really_ a problem anymore, since in NHL and NBA, about half the league makes it into the postseason, and the NFL is getting up there (so the best teams are certain to be in the playoffs). But it would be interesting to compute the best teams using a complex near-optimal algorithm on the season results and compare it to who actually made the playoffs in each league. I’d like to see what teams, if any, won a championship but would not have even been in the post-season given a better algorithm.
I was wondering if it is it possible that some of these rankings are non-transitive? (I got the idea from your recent post.) I know that rankings can be non-transitive in voting theory, for example.
It wouldn’t be smart for major pro sports to use this sort of ranking all the time–how would you know when your team was close to the wildcard? On the edge of making it to the playoffs? A couple games away? 5x is a small price to pay for the involvement it allows for the fans.
College football gets enough heat about the BCS being so complicated–sometimes I think things would have been better if they’d never admitted it was a computer, and claimed it was decided by committee. Still, the bowls are separate enough that it’s worth computing something.
My CS friends tell me that good greedy algorithms are usually close to optimal in real-world situations. Is this an example of that?
Adam – you could compute the rankings nightly using a supercomputer if you wanted to. You could also do all sorts of projections with different scenarios for your team during the late season. If the rankings programs were open source, die hard fans could do all sorts of analysis using PC clusters. The NFL tie-breakers are pretty complicated these days — some of the analysis during week 15 and 16 must leave a lot of Cowboys fans pretty slackjawed, as the case is already.