A Mathematical Definition of News?

Lately I’ve been thinking about the news. Mostly this involves me shouting obscenities at the radio or the internet for wasting my time with news items the depth of which couldn’t drown an ant and whose factual status makes fairy tales look like rigorous mathematical texts (you know the kind labeled “Introductory X”.) But also (and less violently) I’ve been pondering my favorite type of question, the quantification question: how would one “measure” the news?
Part of motivation for even suggesting that there is a measure of “news” is that if someone asked me if there was a measure of “information” back when I was a wee lad, I would have said they were crazy. How could one “measure” something so abstract and multifaceted as “information?” However there is a nice answer to how to measure information and this answer is given by the Shannon entropy. Of course this answer doesn’t satisfy everyone, but the nice thing about it is that it is the answer to a well defined operational question about resources.
Another thought that strikes me is that, of course Google knows the answer. Or at least there is an algorithm for Google News. Similarly Twitter has an algorithm for spotting trending topics. And of course there are less well known examples like Thoora which seeks to deliver news that is trending in social media. And probably there is academic literature out there about these algorithms, the best I could find with some small google-fu is TwitterMonitor: trend detection over the twitter stream. But all of this is very algorithm centered. The question I want to ask is what quantity are these services attempting to maximize (is it even the same quantity?)
The first observation is that clearly news has a very strong temporal component. If I took all of the newspapers, communications, books, letters, etc. that mankind has produced and regarded it without respect to time you wouldn’t convince many that there is news in this body of raw data (except that there are some monkeys who can type rather well.) Certainly also it seems that news has a time-frame. That is one could easily imagine a quantity that discusses the news of the day, the news of the week, etc.
A second observation is that we can probably define some limits. Suppose that we are examining tweets and that we are looking for news items on a day time scale. We could take the words in the different day’s tweets and make a frequency table for all of these words. A situation in which there is a maximum amount of news on the second day is then a situation where on the first day the frequency distribution over words is peeked one one word, while the second day is all concentrated on another word. One could probably also argue that, on the day time scale, if both frequency distributions were peaked on the same word, then this would not be (day scale) news (it might be week scale news, however.)
This all suggests that our friend, the news, is nothing more than the total variation distance. For two probability distributions $latex p(x)$ and $latex q(x) $, the variation distance between these distribution is $latex d(p,q)=frac{1}{2} sum_{x} |p(x)-q(x)|$ . This is also equal to $latex sup_{E subset X} |P(E)-Q(E)|$ where $latex P(E)=sum_{x in E} p(x)$ and similarly for $latex Q(E)$. Ah, so perhaps this is not as exciting as I’d hoped 🙂 But at least it gives me a new way to talk about the variational distance between two probability distributions: this is a measure of the news that we could associate with changing from one probability distribution to another.
Of course this is just one approach to thinking about how to quantify “news.” What are the drawbacks for my method and what should a real measure have that this one lacks? I mean whats the worst that could happen in thinking about this problem. Okay, so maybe you would learn how many holes it takes
to fill the Albert Hall.

Quantum Article Parse Failure of the Pontiffical Kind

Two observations from yesterdays New York Times article about quantum computing (Moving Toward Quantum Computers.)
First, the drawing accompanying the article (here) is interesting to me.  I wonder where they got the idea for it and whether this idea involved Q*bert, color codes, or topological codes?  Or was it just the same old: we have no idea how to draw a quantum computer, so lets just make a cool looking graphic?
Second, I find this sentence fascinating: “D-Wave has built a system with more than 50 quantum bits, but it has been greeted skeptically by many researchers who believe that it has not proved true entanglement.”  Emphasis mine.  Okay I find it fascinating not because of the debate about the quantum nature of D-wave’s machine, but for its language.  If there is “true” entanglement, what is “false” entanglement?    Further for some reason I can’t quite pen down the sentence strikes me as awkward.  In particular it feels like it needs to be something more like “that is has not proved that its system possess real entanglement.” (Yes I understand the sentence, yes I’m not good at reading comprehension, and yes I’m beyond pedantic.)  Am I the only one having a hard time parsing this sentence

More Fun with the arXiv

Did you know that there is author known as 26 pages?  Or, meet author A.  Or D and B.  Do you think LaTeX is a first name or a last name?  Author Development Center, Japan, and 210 all appear in one paper.
Among abstracts, there are some fun ones, including verses from the Bible (arXiv:0912.1053):

“And should I not take pity on Nineveh, that great city, with more than a hundred and twenty thousand inhabitants who do not know their right hand from their left, and many beasts besides?” [Jonah 4:11]

Here is an abstract which contains optional and mandatory headings.
Titles?  Well Holey Sheet!  I did know that 6+4=10, but am not sure what to do with the fraction 27/32, even though I know one of the authors.  Among my favorite titles is “Is topological Skyrme Model consistent with the Satandard Model?”
I also enjoy the comments where they admit the paper wasn’t so good: “withdrawn. It was a rediculously stupid notion”.
Some papers also tread in directions I never would have considered.  For example, The Socceral Force is about a strange dream and describes a little known markup language, Footballer and Football Simulation Markup Language or FerSML.

Steve Ballmer Talk at UW March 4, 2010

Today Microsoft CEO Steve Ballmer spoke at the University of Washington in the Microsoft Atrium of the Computer Science & Engineering department’s Paul Allen Center. As you can tell from that first sentence UW and Microsoft have long had very tight connections. Indeed, perhaps the smartest thing the UW has ever done was, when they caught two kids using their computers they didn’t call the police, but instead ended up giving them access to those computers. I like to think that all the benefit$ that UW has gotten from Microsoft are a great big karmic kickback for the enlightened sense of justice dished out by the UW.
Todd Bishop from Tech Flash provides good notes on what was in Ballmer’s talk. Ballmer was as I’ve heard: entertaining and loud. Our atrium is six stories high with walkways overlooking it which were all packed: “a hanging room only” crowd as it was called by Ballmer. The subject of his talk was “cloud computing” which makes about 25 percent of people roll their eyes, 25 percent get excited, and the remaining 50 percent look up in the sky and wonder where the computer is. His view was *ahem* the view of cloud computing from a high altitude: what it can be, could be, and should be. Microsoft, Ballmer claimed, has 70 percent of its 40K+ workforce somehow involved in the cloud and that number will reach 90 percent soon. This seems crazy high to me, but reading between the lines what it really said to me is that Microsoft has *ahem* inhaled the cloud and is pushing hard on the model of cloud computing.
But what I found most interesting was the contrast between Ballmer and Larry Ellison. If you haven’t seen Ellison’s rant on cloud computing here it is

Ellison belittles cloud computing, and rightly points out that in some sense cloud computing has been around for a long time. Ballmer, in his talk, says nearly the same thing. Paraphrasing he said something like “you could call the original internet back in 1969 the cloud.” He also said something to the effect that the word “cloud” may only have a short lifespan as a word describing this new technology. But what I found interesting was that Ballmer, while acknowledging the limits of the idea of cloud computing, also argued for a much more expansive view of this model. Indeed as opposed to Ellison, for which server farms equal cloud computing, Ballmer essentially argues for a version of “cloud computing” which is far broader than any definition you’ll find on wikipedia. What I love about this is that it is, in some ways, a great trick to create a brand out of cloud computing. Sure tech wags everywhere have their view of what is and is not new in the recent round of excitement about cloud computing. But the public doesn’t have any idea what this means. Love them or hate them, Microsoft clearly is pushing to move the “cloud” into an idea that consumers, while not understand one iota of how it works, want. Because everything Ballmer described, every technology they demoed, was “from the cloud”, Microsoft is pushing, essentially, a branding of the cloud. (Start snark. The scientist in you will, of course, revolt at such an idea, but fear not fellow scientist: you’re lack of ability to live with imprecision and incompleteness is what keeps your little area of expertise safe and sound and completely fire walled from being exploited to the useful outside world. End snark.)
So, while Ellison berates, Ballmer brands. Personally I suspect Ballmer’s got a better approach…even if Larry’s got the bigger yacht. But it will fun to watch the race, no matter what.

Today I Can't Think of a Decent Blog Post Title

I’m in D.C, attending the sorters meeting for the APS March meeting. Traveling in early December is always nice as the planes seem to be empty (*stretch*) and sheesh, it’s downright balmy here in D.C. Now I’ve absconded to a second rate hotel in the middle of what I can only guess is somewhere near the mythical land of suburbia, since the place is surrounded by office complexes, watching the civil war (no, not that civil war, that one.)
Things I’ve been thinking about when I’m not obsession about my latest research:

  • Has anyone ever tried sending a prop to a conference?
  • Because I hate advice columns about graduate school I am happy to point you to Luis von Ahn’s advice on graduate school applications.
  • Next thing you know, xkcd will be drawing Spherical Cows
  • Fafblog contemplates the Pauli paradox.
  • On twitter I was asked “do you think entangled angular momentum states provide any advantage for QKD?” to which I had only FAIL in response. Opinions?
  • A very cool volcano picture.
  • Oh, and happy birthday to Ellen Swallow Richards, even if you did go to the lesser Tech school