API is an abbreviation that stands for “Application Program Interface.” Roughly speaking an API is a specification of a software component in terms of the operations one can perform with that component. For example, a common kind of an API is the set of methods supported by a encapsulated bit of code a.k.a. a library (for example, a library could have the purpose of “drawing pretty stuff on the screen”, the API is then the set of commands like “draw a rectangle”, and specify how you pass parameters to this method, how rectangles overlay on each other, etc.) Importantly the API is supposed to specify how the library functions, but does this in a way that is independent of the inner workings of the library (though this wall is often broken in practice). Another common API is found when a service exposes remote calls that can be made to manipulate and perform operations on that service. For example, Twitter supports an API for reading and writing twitter data. This later example, of a service exposing a set of calls that can manipulate the data stored on a remote server, is particularly powerful, because it allows one to gain access to data through simple access to a communication network. (As an interesting aside, see this rant for why APIs are likely key to some of Amazon’s success.)
As you might guess, (see for example my latest flop Should Papers Have Unit Tests?), I like smooshing together disparate concepts and seeing what comes out the other side. When thinking about APIs then led me to consider the question “What if Papers had APIs”?
In normal settings academic papers are considered to be relatively static objects. Sure papers on the arXiv, for example, have versions (some more than others!) And there are efforts like Living Reviews in Relativity, where review articles are updated by the authors. But in general papers exist, as fixed “complete” works. In programming terms we would say that are “immutable”. So if we consider the question of exposing an API for papers, one might think that this might just be a read only API. And indeed this form of API exists for many journals, and also for the arXiv. These forms of “paper APIs” allow one to read information, mostly metadata, about a paper.
But what about a paper API that allows mutation? At first glance this heresy is rather disturbing: allowing calls from outside of a paper to change the content of the paper seems dangerous. It also isn’t clear what benefit could come from this. With, I think, one exception. Citations are the currency of academia (last I checked they were still, however not fungible with bitcoins). But citations really only go in one direction (with exceptions for simultaneous works): you cite a paper whose work you build upon (or whose work you demonstrate is wrong, etc). What if a paper exposed a reverse citation index. That is, if I put my paper on the arXiv, and then, when you write your paper showing how my paper is horribly wrong, you can make a call to my paper’s api that mutates my paper and adds to it links to your paper. Of course, this seems crazy: what is to stop rampant back spamming of citations, especially by *ahem* cranks? Here it seems that one could implement a simple approval system for the receiving paper. If this were done on some common system, then you could expose the mutated paper either A) with approved mutations or B) with unapproved mutations (or one could go ‘social’ on this problem and allow voting on the changes).
What benefit would such a system confer? In some ways it would make more accessible something that we all use: the “cited by” index of services like Google Scholar. One difference is that it could be possible to be more precise in the reverse citation: for example while Scholar provides a list of relevant papers, if the API could expose the ability to add links to specific locations in a paper, one could arguably get better reverse citations (because, frankly, the weakness of the cited by indices is their lack of specificity).
What else might a paper API expose? I’m not convinced this isn’t an interesting question to ponder. Thanks for reading another wacko mashup episode of the Quantum Pontiff!
Quantum computers can work in principle
Gil Kalai has just posted on his blog a series of videos of his lectures entitled “why quantum computers cannot work.” For those of us that have followed Gil’s position on this issue over the years, the content of the videos is not surprising. The surprising part is the superior production value relative to your typical videotaped lecture (at least for the first overview video).
I think the high gloss on these videos has the potential to sway low-information bystanders into thinking that there really is a debate about whether quantum computing is possible in principle. So let me be clear.
There is no debate! The expert consensus on the evidence is that large-scale quantum computation is possible in principle.
Quoting “expert consensus” like this is an appeal to authority, and my esteemed colleagues will rebuke me for not presenting the evidence. Aram has done an admirable job of presenting the evidence, but the unfortunate debate format distorts perception of the issue by creating the classic “two sides to a story” illusion. I think it’s best to be unequivocal to avoid misunderstanding.
The program that Gil lays forth is a speculative research agenda, devoid of any concrete microscopic physical predictions, and no physicist has investigated any of it because it is currently neither clear enough nor convincing enough. At the same time, it would be extremely interesting if it one day leads to a concrete conjectured model of physics in which quantum computers do not work. To make the ideas more credible, it would help to have a few-qubit model that is at least internally consistent, and even better, one that doesn’t contradict the dozens of on-going experiments. I genuinely hope that Gil or someone else can realize this thrilling possibility someday.
For now, though, the reality is that quantum computation continues to make exciting progress every year, both on theoretical and experimental levels, and we have every reason to believe that this steady progress will continue. Quantum theory firmly predicts (via the fault-tolerance threshold theorem) that large-scale quantum computation should be achievable if noise rates and correlations are low enough, and we are fast approaching the era where the experimentally achievable noise rates begin to touch the most optimistic threshold estimates. In parallel, the field continues to make contributions to other lines of research in high-energy physics, condensed matter, complexity theory, cryptography, signal processing, and many others. It’s an exciting time to be doing quantum physics.
And most importantly, we are open to being wrong. We all know what happens if you try to update your prior by conditioning on an outcome that had zero support. Gil and other quantum computing skeptics like Alicki play a vital role in helping us sharpen our arguments and remove any blind spots in our reasoning. But for now, the arguments against large-scale quantum computation are simply not convincing enough to draw more than an infinitesimal sliver of expert attention, and it’s likely to remain this way unless experimental progress starts to systematically falter or a concrete and consistent competing model of quantum noise is developed.
Two Cultures in One of the Cultures
A long time ago in a mental universe far far away I gave a talk to a theory seminar about quantum algorithms. An excerpt from the abstract:
Quantum computers can outperform their classical brethren at a variety of algorithmic tasks….[yadda yadda yadaa deleted]… This talk will assume no prior knowledge of quantum theory…
The other day I was looking at recent or forthcoming interesting quantum talks and I stumbled upon one by a living pontiff:
In this talk, I’ll describe connections between the unique games conjecture (or more precisely, the closely relatedly problem of small-set expansion) and the quantum separability problem… [amazing stuff deleted]…The talk will not assume any knowledge of quantum mechanics, or for that matter, of the unique games conjecture or the Lasserre hierarchy….
And another for a talk to kick off a program at the Simons institute on Hamiltonian complexity (looks totally fantastic, wish I could be a fly on the wall at that one!):
The title of this talk is the name of a program being hosted this semester at the Simons Institute for the Theory of Computing….[description of field of Hamiltonian complexity deleted…] No prior knowledge of quantum mechanics or quantum computation will be assumed.
Talks are tricky. Tailoring your talk to your audience is probably one of the trickier sub-trickinesses of giving a talk. But remind me again, why are we apologizing to theoretical computer scientists / mathematicians (which are likely the audiences for the three talks I linked to) for their ignorance of quantum theory? Imagine theoretical computer science talks coming along with a disclaimer, “no prior knowledge of the PCP theorem is assumed”, “no prior knowledge of polynomial-time approximation schemes is assumed”, etc. Why is it still considered necessary, decades after Shor’s algorithm and error correction showed that quantum computing is indeed a fascinating and important idea in computer science, to apologize to an audience for a large gap in their basic knowledge of the universe?
As a counter argument, I’d love to hear from a non-quantum computing person who was swayed to attend a talk because it said that no prior knowledge of quantum theory is assumed. Has that ever worked? (Or similar claims of a cross cultural prereq swaying you to bravely go where none of your kind has gone before.)
Error correcting aliens
Happy New Year! To celebrate let’s talk about error correcting codes and….aliens.
The universe, as many have noted, is kind of like a computer. Or at least our best description of the universe is given by equations that prescribe how various quantities change in time, a lot like a computer program describes how data in a computer changes in time. Of course, this ignores one of the fundamental differences between our universe and our computers: our computers tend to be able to persist information over long periods of time. In contrast, the quantities describing our universe tend to have a hard time being recoverable after even a short amount of time: the location (wave function) of an atom, unless carefully controlled, is impacted by an environment that quickly makes it impossible to recover the initial bits (qubits) of the location of the atom.
Computers, then, are special objects in our universe, ones that persist and allow manipulation of long lived bits of information. A lot like life. The bits describing me, the structural stuff of my bones, skin, and muscles, the more concretely information theoretic bits of my grumbly personality and memories, the DNA describing how to build a new version of me, are all pieces of information that persist over what we call a lifetime, despite the constant gnaw of second law armed foes that would transform me into unliving goo. Maintaining my bits in the face of phase transitions, viruses, bowel obstructions, cardiac arrest, car accidents, and bears is what human life is all about, and we increasingly do it well, with life expectancy now approaching 80 years in many parts of the world.
But 80 years is not that long. Our universe is 13.8ish billion years old, or about 170ish million current lucky human’s life expectancies. Most of us would, all things equal, like to live even longer. We’re not big fans of death. So what obstacles are there toward life extension? Yadda yadda biology squishy stuff, yes. Not qualified to go there so I won’t. But since life is like a computer in regards to maintaining information, we can look toward our understanding of what allows computers to preserve information…and extrapolate!
Enter error correction. If bits are subject to processes that flip the values of the bits, then you’ll lose information. If, however, we give up storing information in each individual bit and instead store single bits across multiple individual noisy bits, we can make this spread out bit live much longer. Instead of saying 0, and watching it decay to unknown value, say 000…00, 0 many times, and ask if the majority of these values remain 0. Viola you’ve got an error correcting code. Your smeared out information will be preserved longer, but, and here is the important point, at the cost of using more bits.
Formalizing this a bit, there are a class of beautiful theorems, due originally to von Neumann, classically, and a host of others, quantumly, called the threshold theorems for fault-tolerant computing which tell you, given an model for how errors occur, how fast they occur, and how fast you can compute, whether you can reliably compute. Roughly these theorems all say something like: if your error rate is below some threshold, then if you want to compute while failing at a particular better rate, then you can do this using a complicated larger construction that is larger proportional to a polynomial in the log of inverse of the error rate you wish to achieve. What I’d like to pull out of these theorems for talking about life are two things: 1) there is an overhead to reliably compute, this overhead is both larger, in size, and takes longer, in time, to compute and 2) the scaling of this overhead depends crucially on the error model assumed.
Which leads back to life. If it is a crucial part of life to preserve information, to keep your bits moving down the timeline, then it seems that the threshold theorems will have something, ultimately, to say about extending your lifespan. What are the error models and what are the fundamental error rates of current human life? Is there a threshold theorem for life? I’m not sure we understand biology well enough to pin this down yet, but I do believe we can use the above discussion to extrapolate about our future evolution.
Or, because witnessing evolution of humans out of their present state seemingly requires waiting a really long time, or technology we currently don’t have, let’s apply this to…aliens. 13.8 billion years is a long time. It now looks like there are lots of planets. If intelligent life arose on those planets billions of years ago, then it is likely that it has also had billions of years to evolve past the level of intelligence that infects our current human era. Which is to say it seems like any such hypothetical aliens would have had time to push the boundaries of the threshold theorem for life. They would have manipulated and technologically engineered themselves into beings that live for a long period of time. They would have applied the constructions of the threshold theorem for life to themselves, lengthening their life by apply principles of fault-tolerant computing.
As we’ve witnessed over the last century, intelligent life seems to hit a point in its life where rapid technological change occurs. Supposing that the period of time in which life spends going from reproducing, not intelligent stuff, to megalords of technological magic in which the life can modify itself and apply the constructions of the threshold theorem for life, is fast, then it seems that most life will be found at the two ends of the spectrum, unthinking goo, or creatures who have climbed the threshold theorem for life to extend their lifespans to extremely long lifetimes. Which lets us think about what alien intelligent life would look like: it will be pushing the boundaries of using the threshold theorem for life.
Which lets us make predictions about what this advanced alien life would look life. First, and probably most importantly, it would be slow. We know that our own biology operates at an error rate that ends up being about 80 years. If we want to extend this further, then taking our guidance from the threshold theorems of computation, we will have to use larger constructions and slower constructions in order to extend this lifetime. And, another important point, we have to do this for each new error model which comes to dominate our death rate. That is, today, cardiac arrest kills the highest percentage of people. This is one error model, so to speak. Once you’ve conquered it, you can go down the line, thinking about error models like earthquakes, falls off cliffs, etc. So, likely, if you want to live a long time, you won’t just be slightly slow compared to our current biological life, but instead extremely slow. And large.
And now you can see my resolution to the Fermi paradox. The Fermi paradox is a fancy way of saying “where are the (intelligent) aliens?” Perhaps we have not found intelligent life because the natural fixed point of intelligent evolution is to produce entities for which our 80 year lifespans is not even a fraction of one of their basic clock cycles. Perhaps we don’t see aliens because, unless you catch life in the short transition from unthinking goo to masters of the universe, the aliens are just operating on too slow a timescale. To discover aliens, we must correlate observations over a long timespan, something we have not yet had the tools and time to do. Even more interesting the threshold theorems also have you spread your information out across a large number of individually erring sub-systems. So not only do you have to look at longer time scales, you also need to make correlated observations over larger and larger systems. Individual bits in error correcting codes look as noisy as before, it is only in the aggregate that information is preserved over longer timespans. So not only do we have too look slower, we need to do so over larger chunks of space. We don’t see aliens, dear Fermi, because we are young and impatient.
And about those error models. Our medical technology is valiantly tackling a long list of human concerns. But those advanced aliens, what kind of error models are they most concerned with? Might I suggest that among those error models, on the long list of things that might not have been fixed by their current setup, the things that end up limiting their error rate, might not we be on that list? So quick, up the ladder of threshold theorems for life, before we end up an error model in some more advanced creatures slow intelligent mind!
Why I Left Academia
TLDR: scroll here for the pretty interactive picture.
Over two years ago I abandoned my post at the University of Washington as a assistant research professor studying quantum computing and started a new career as a software developer for Google. Back when I was a denizen of the ivory tower I used to daydream that when I left academia I would write a long “Jerry Maguire”-esque piece about the sordid state of the academic world, of my lot in that world, and how unfair and f**ked up it all is. But maybe with less Tom Cruise. You know the text, the standard rebellious view of all young rebels stuck in the machine (without any mirror.) The song “Mad World” has a lyric that I always thought summed up what I thought it would feel like to leave and write such a blog post: “The dreams in which I’m dying are the best I’ve ever had.”
But I never wrote that post. Partially this was because every time I thought about it, the content of that post seemed so run-of-the-mill boring that I feared my friends who read it would never ever come visit me again after they read it. The story of why I left really is not that exciting. Partially because writing a post about why “you left” is about as “you”-centric as you can get, and yes I realize I have a problem with ego-centric ramblings. Partially because I have been busy learning a new career and writing a lot (omg a lot) of code. Partially also because the notion of “why” is one I—as a card carrying ex-Physicist—cherish and I knew that I could not possibly do justice to giving a decent “why” explanation.
Indeed: what would a “why” explanation for a life decision such as the one I faced look like? For many years when I would think about this I would simply think “well it’s complicated and how can I ever?” There are, of course, the many different components that you think about when considering such decisions. But then what do you do with them? Does it make sense to think about them as probabilities? “I chose to go to Caltech, 50 percent because I liked physics, and 50 percent because it produced a lot Nobel prize winners.” That does not seem very satisfying.
Maybe the way to do it is to phrase the decisions in terms of probabilities that I was assigning before making the decision. “The probability that I’ll be able to contribute something to physics will be 20 percent if I go to Caltech versus 10 percent if I go to MIT.” But despite what some economists would like to believe there ain’t no way I ever made most decisions via explicit calculation of my subjective odds. Thinking about decisions in terms of what an actor feels each decision would do to increase his/her chances of success feels better than just blindly associating probabilities to components in a decision, but it also seems like a lie, attributing math where something else is at play.
So what would a good description of the model be? After pondering this for a while I realized I was an idiot (for about the eighth time that day. It was a good day.) The best way to describe how my brain was working is, of course, nothing short than my brain itself. So here, for your amusement, is my brain (sorry, only tested using Chrome). Yes, it is interactive.
Correcting Untruths
We here at the Quantum Pontiff value truth in all its forms: theorems, lemmas, statistical inference, and hard experimental data, to name just a few. So I feel compelled to highlight the following.
In his column on the New York Times website, Author S. Brisbane states,
I’m looking for reader input on whether and when New York Times news reporters should challenge “facts” that are asserted by newsmakers they write about. …
[An] example: on the campaign trail, Mitt Romney often says President Obama has made speeches “apologizing for America,” a phrase to which Paul Krugman objected…
As an Op-Ed columnist, Mr. Krugman clearly has the freedom to call out what he thinks is a lie. My question for readers is: should news reporters do the same?
What are we teaching journalism students that would lead them to ask this question in ernest? After double checking my calendar to make sure it wasn’t April 1st, I continued reading:
If [reporters should call out lies], then perhaps the next time Mr. Romney says the president has a habit of apologizing for his country, the reporter should insert a paragraph saying, more or less:
“The president has never used the word ‘apologize’ in a speech about U.S. policy or history. Any assertion that he has apologized for U.S. actions rests on a misleading interpretation of the president’s words.”
I’m not sure which is worse… that Mr. Brisbane feels he, a professional journalist, needs to ask his readers for their opinion on how to be a journalist, or that he doesn’t know the answer to this question which looks (to any scientist at least) to be completely obvious.
“Everyone is entitled to his own opinion, but not his own facts,” as a fellow once said. If you don’t know the answer to your question, Mr. Brisbane, then you are a stenographer, not a journalist, and you need to ask yourself why you would bother giving column inches in your newspaper to misinformation and distortions without bothering to correct them. Some things are true; you are not “biasing” anything by printing true statements.
What are the odds?
Let’s multiply together a bunch of numbers which are less than one and see how small they get!
If that sounds like fun, then you’ll love this sleek new infographic (of which the above is just the teaser) posted this morning at BoingBoing. The graphic is based on this blog post by Dr. Ali Binazir, who apparently has an AB (same as a BA) from Harvard, an MD from the UC San Diego School of Medicine, and an M.Phil. from Cambridge.
I’ll save you the effort of clicking through: the good doctor estimates the probability of “your existing as you, today”. His estimate consists of (what else?) multiplying a bunch of raw probability estimates together without conditioning! And I’ll give you a hint as to the conclusion: the odds that you exist are basically zero! Astounding.
I should add that it seems he was forced to add a disclaimer that “It’s all an exercise to get you thinking…” and (obliquely) admit that the calculation is bogus at the end of the post, however.
Is there any branch of mathematics which is abused so extravagantly as probability? I think these sorts of abuses are beyond even the most egregious statistical claims, no?
Look Ma, I'm a Financial Journalist!
In this Saturday’s New York Times, in an article titled The Chasm Between Consumers and the Fed, I found the most amazing chart:
Of course I am not a financial journalist, so I have absolutely no understanding of the gigantic amoeba-like-shaded-area in this chart. But it looks very cool and very much like it represents something about which the article has much to say. Sadly, however, the New York Times does not provide the methodology it used in obtaining the amazing fact that six of the points can be grouped together while those other two points are excluded from the party. What astounding mathematical finance model did the Grey Lady use to come up with this plot (I’ll be it involves Ito calculus)?
Frustrated by the lack of transparency, I decided that it would be best if I tried to come up with my own methods and models for obtaining this graph. My first attempt, after scouring the economics literature and using some advance methods (related to integrating over Banach spaces) was the following
As you can see this model seems to pick out the overall rate of return as the defining characteristic. After much great reflection, and reacquainting myself with some obscure results from the theory of hyperbolic partial differential equations and new deep learning techniques from machine learning, I was able to tweak my model a bit and obtained the following
Now this is a beautiful plot, but it clearly does not reproduce the graph from the New York Times. What exactly, was I missing in order to obtain the giant amoeba of correlation?
But then I remembered…I’m not a financial journalist. I’m a physicist. And so, I took a look at the stats notes I took as a physics major at Caltech, quickly plugged in some numbers, and obtained a new, reality based, version of the plot
Well it’s not the New York Time plot. But I like it a lot.
A Mathematical Definition of News?
Lately I’ve been thinking about the news. Mostly this involves me shouting obscenities at the radio or the internet for wasting my time with news items the depth of which couldn’t drown an ant and whose factual status makes fairy tales look like rigorous mathematical texts (you know the kind labeled “Introductory X”.) But also (and less violently) I’ve been pondering my favorite type of question, the quantification question: how would one “measure” the news?
Part of motivation for even suggesting that there is a measure of “news” is that if someone asked me if there was a measure of “information” back when I was a wee lad, I would have said they were crazy. How could one “measure” something so abstract and multifaceted as “information?” However there is a nice answer to how to measure information and this answer is given by the Shannon entropy. Of course this answer doesn’t satisfy everyone, but the nice thing about it is that it is the answer to a well defined operational question about resources.
Another thought that strikes me is that, of course Google knows the answer. Or at least there is an algorithm for Google News. Similarly Twitter has an algorithm for spotting trending topics. And of course there are less well known examples like Thoora which seeks to deliver news that is trending in social media. And probably there is academic literature out there about these algorithms, the best I could find with some small google-fu is TwitterMonitor: trend detection over the twitter stream. But all of this is very algorithm centered. The question I want to ask is what quantity are these services attempting to maximize (is it even the same quantity?)
The first observation is that clearly news has a very strong temporal component. If I took all of the newspapers, communications, books, letters, etc. that mankind has produced and regarded it without respect to time you wouldn’t convince many that there is news in this body of raw data (except that there are some monkeys who can type rather well.) Certainly also it seems that news has a time-frame. That is one could easily imagine a quantity that discusses the news of the day, the news of the week, etc.
A second observation is that we can probably define some limits. Suppose that we are examining tweets and that we are looking for news items on a day time scale. We could take the words in the different day’s tweets and make a frequency table for all of these words. A situation in which there is a maximum amount of news on the second day is then a situation where on the first day the frequency distribution over words is peeked one one word, while the second day is all concentrated on another word. One could probably also argue that, on the day time scale, if both frequency distributions were peaked on the same word, then this would not be (day scale) news (it might be week scale news, however.)
This all suggests that our friend, the news, is nothing more than the total variation distance. For two probability distributions and , the variation distance between these distribution is . This is also equal to where and similarly for . Ah, so perhaps this is not as exciting as I’d hoped 🙂 But at least it gives me a new way to talk about the variational distance between two probability distributions: this is a measure of the news that we could associate with changing from one probability distribution to another.
Of course this is just one approach to thinking about how to quantify “news.” What are the drawbacks for my method and what should a real measure have that this one lacks? I mean whats the worst that could happen in thinking about this problem. Okay, so maybe you would learn how many holes it takes
to fill the Albert Hall.
Does Not Compute
Okay, last one. But damn I could make these movies all night and it would make me laugh and all of you cry: