Information Causality

Recently I finally got a chance to read the new preprint arXiv:0905.2292 “A new physical principle: Information Causality” by M. Pawlowski, T. Paterek, D. Kaszlikowski, V. Scarani, A. Winter, and M. Zukowski. It’s been a long time since I spent more than a few spare hours thinking about foundational issues in quantum theory. Personally I am very fond of approaches to foundational questions which have a information theoretic or computational bent (on my desktop I have a pdf of William Wootter’s thesis “The Acquisition of Information From Quantum Measurements” which I consider a classic in this line of interrogation.) This preprint is very much along these lines and presents a very intriguing result which clearly merits some deeper thinking.
(Update: see also Joe for details of the proof.)

The basic back story is as follows (readers who know what entanglement and Bell’s theorem are can skip the next three paragraphs.)
Quantum theory is the theory we use to predict probabilities of different outcomes in different experimental settings. But, while it predicts probabilities, it has some peculiar properties which separate it from how we normally think about systems which have a probabilistic description. Two of the defining characteristics are contextuality and quantum non-locality. The former tells us that we cannot think about a measurement on a quantum system as revealing a property which is independent of the measurement we chose to make. The later, and in my view more troubling, describes how measurements made by two separated observers can show strange correlations in quantum theory.
The setup to demonstrate quantum non-locality is as follows. Alice and Bob get together for lunch and bring with them their own quantum systems. Over hamburger’s they let their two quantum systems interact. They then put their quantum system’s into their backpacks, taking care not to measure them, and go back to their separate work places (Alice works at a restaurant and Bob works on a fishing boat.) Later, during their breaks, which happen to occur at exactly the same time (in the reference frame of their home), they both open backpacks and look at their respective quantum systems. Well they don’t quite just “look”: they choose to measure their system in a particular way. Quantum theory allows us to predict what the probabilities of these measurements are, given which measurement each party chooses to perform. For some manners in which Alice and Bob let their quantum systems interact, where they create so-called entangled quantum states, and for some measurements in this case, Alice and Bob will see correlations in the outcomes of their measurements. For example, when Alice sees a measurement result of “0”, it may be that Bob always sees a measurement result of “0” and when Alice sees a measurement result of “1”, it may be that Bob always sees a measurement result of “1”, and that these two cases occur with equal (fifty percent) probability.
Now one could ask, well this isn’t so strange! Maybe when they let their systems interact, the properties of their individual quantum systems where set, and all we are doing by measuring them later is identifying the value of this property. For example, above I described how Alice and Bob’s outcomes could be perfectly correlated. But we could easily explain this: if I flip a coin and secretly put either a “0” or a “1” in both of the backpacks, then Alice and Bob will see exactly the above correlation. This, however, is not the mystery of quantum non-locality. The mystery is that while we could explain the correlations between Alice and Bob’s experiments via properties of their individual systems in a purely classical way in this case, it turns out that this is not always the case. This result is the famous “Bell theorem.” This theorem says that there are certain setups like Alice and Bob performed above on quantum systems which cannot be explained by believing that the properties being revealed were just stored in the backpacks before hand. Quantum correlations behave in ways that we cannot explain by what is termed a local hidden variable theory. The “hidden variables” are the properties in the backpack and “local” refers to the fact that these bags are separated from each other and so not allowed to influence each other. Bell’s theorem tells us that there is a set of inequalities (Bell inequalities) which must be satisfied for a local hidden variable theory. This inequalities are something like EXPERIMENTAL QUALITIES FOR LOCAL HIDDEN VARIABLE MODELS 2, and hence there is a contradiction between the assumption of local hidden variables and quantum theory.

No Signal

So quantum theory produces these strange correlations that do not have a local hidden variable description. Thus if one wants to “explain” quantum theory using our normal intuitions about probabilities and classical information one would need to believe that there is some way in which information about one parties measurement is somehow transferred to the other party during a measurement on an entangled state. (Or one could just accept quantum theory 🙂 ) But here is what is cool about quantum theory: even though measurements on entangled quantum states exhibit these correlations that could not be explained in classical terms without some sort of communication between the parties, there is no way to exploit this and allow Alice and Bob to communicate between each other. Quantum theory does allow correlations which can only be explained by communication without allowing communication itself.
This sounds kind of strange but really it’s not too odd. Just because a protocol uses communication does not mean that this communication reveals itself in the output of the protocol. In fact we witness this all the time: when we are using the internet, for example, not all of the communication that goes through our internet cable is revealed on your computer monitor. There are all sorts of behind the scenes information being transfered which does not reveal itself by changing any pixels on your monitor. Of course the cool thing about quantum theory is that it never allows for signaling. This is good because if it were not so then it would be hard to get quantum theory to mesh nicely with special relativity (because we could signal faster than light which causes issues with special relativity.)
Given that quantum theory doesn’t allow signaling, one can ask more generally about other “possible theories” of the universe that don’t allow signaling. Quantum theory doesn’t allow signaling, but are there other such theories? And is there some way in which quantum theory is special among no-signaling theories? This later question was asked by Popescu and Rohrlich in a nice form: is quantum theory the most general no-signaling theory which has the maximum violation of Bell inequalities? The answer, it turns out, is no. Popsecu and Rohrlich showed a way to have no-signaling correlations which violate Bell inequalities even more than quantum theory (in fact they achieve the maximum possible violation for all theories.) Thus thinking that quantum theory was somehow special in the space of no-signaling theories took a hit, indeed there has now been many studies showing how much of what we consider intrinsically quantum appears in many no-signaling theories.
Which is where the preprint arXiv:0905.2292 comes in. Here the authors notice something kind of cool: that there is a different way to talk about no-signaling. This is what they term “information causality.” Here is their definition:

Formulated as a principle, Information Causality states that the transmission of m classical bits can cause an information gain of at most m bits.

At first sight you may say: oh this doesn’t say anything interesting…of course if you transmit m classical bits then the other side can only gain m classical bits of information. That’s just standard information theory. But recall again the situation above where you make measurements on a quantum system, or more generally where you get correlations from some other no-signaling theory. In this case you have this extra resource and, while it doesn’t allow you to signal, it might be useful. The m=0 case is then the no-signaling condition: where the measurements and outcomes by themselves (with no classical side channel) can not send more than 0 bits of classical information. But the m=1 case, for example, is different. It says that if you use the correlations plus a single classical bit of information transmitted, then you can only gain a single bit of information on the other side.
This all seems, at first thought, to be just a small modification. It helps, however, to recall a place where something like this fails. If I try to send classical bits using quantum bits then I can only send a single classical bit per quantum bit (this is Holevo’s theorem.) But if you allow an entangled quantum system shared between the two parties, then by using only one qubit you can send two classical bits (this is superdense coding.) Thus if we were to replace the information in the definition of information causality by quantum information, it would not be true.
Great. So it’s a new condition. What can you do with it? Well here is where the coolness comes in. What the authors of the preprint show is that (a) both classical (local hidden variable theories) and quantum theory respect the principle of information causality and (b) that quantum theory achieves the maximal value for a certain class of Bell inequalities and (c) that any no-signaling theory can violate the Bell inequality by more than quantum theory. (a) is neat, but not to surprising. (b) and (c) are where all the coolness lies. It says that there is a possibility of thinking about quantum theory as the theory that maximally violates Bell inequalities among a no-signaling theories…if you replace no-signaling by the more general principle of information causality. A small modification of the no-signaling condition which at first glance doesn’t feel all that different gives very different results and indeed gives us…quantum theory.
So is it possible to derive quantum theory as the the theory which most maximally violates Bell inequalities but still respects information causality? This is not known because the above result only holds for one type of Bell inequality (or at least this is my understanding. They show that the principle of information causality yields Tsirelson’s bound which does not describe the most general experimental setup for Bell inequality experiments.) In other words it is only known to be the maximally violating theory in a particular experimental setup. Another issue is that while it is shown that quantum theory achieves the maximum violation, it’s not shown that there aren’t other ways to produce correlations which is not quantum theory. Another important issue is that it’s not clear that the “size of the violation” of a Bell inequality is the right quantity to be considering here. A better way to quantify the strength of a violation of a Bell inequality is probably through a procedure like that described in The statistical strength of nonlocality proofs by van Dam, Gill and GrĂĽnwald. Any way you slice it, however, this preprint cries out for follow up work: maybe there is another Bell inequality for which the principle of information causality is not enough. Or maybe there is a way to generalize the argument in the paper to a large set of Bell inequalities.
This preprint opens up the door to a possible method for “deriving” quantum theory from some basic information theoretic principles. This would be an awesome achievement, but of course it might not shut the door on the mysteries of quantum theory: such a derivation would beg the question “why is quantum theory the maximally violating theory consistent with the principle of information causality?” What I find particularly nice about this result is that the principle of information causality is an very simple modification of what we mean by no-signaling, something required to mesh quantum theory with special relativity, but one which, apparently had not been previously seriously considered. Small tweaks which lead to big results are by far my favorite and this is a classic along those lines.

15 Replies to “Information Causality”

  1. > Basically, you take the local state spaces to be
    > Hyperspheres of any dimension (so it only coincides with QM
    > for a 3D sphere) and combine them via the maximal
    > no-signalling tensor product.
    I would imagine that this could be further generalized in some category-theoretic manner. Perhaps Howard has already thought of this since he’s done some work in that area. I’d be curious to know what sorts of insights such an approach might show.

  2. Thanks Matt for the long comment.
    I don’t find 1 so troubling, but maybe that’s just me. Do you know of any places where the fact that it is the complex numbers really matters. From an abstract “without the physics” point of view, I would think I could always use the Bernstein-Vazirani trick (or at least that’s the place I learned about it) to simulate the complex QM with real QM.
    I like 2….it should be easy to figure out whether it satisfies information causality.

  3. I guess what is interesting is whether there is something that doesn’t involve breaking a system into subsystems which makes real quantum theory different from complex quantum theory. It seems to me that one could argue that it is the physical theory which establishes locality, not quantum theory itself.

  4. @Matt:
    Regarding 2, I don’t think I’ll be betting against you. Based on a quick scan of the proof that QM satisfies information causality, I see that it involves several properties of (quantum) mutual information and entropy, but starts off by considering only Alice’s classical string a, the message x, and Bob’s part of the shared state rho. rho has the same form in both QM and QM with NPT states, so it seems like the proof should/might go through as before.

  5. So is it possible to derive quantum theory as the the theory which most maximally violates Bell inequalities but still respects information causality?

    I haven’t read the paper and it does sound like a very nice result, but I have to say that the answer to this is almost certainly no. With the caveat that I don’t know the precise mathematical formulation of information causality, here are some theories that I would bet are counterexamples:
    1. Quantum Mechanics with real Hilbert spaces: Saturates Tsirelson just as much as complex QM because you don’t need complex amplitudes in the state or measurement vectors to achieve the bound. Should satisfy information causality for the same reasons as complex QM does. Of course, you may object that this is not a real counter example because it is a type of QM, even though it is not realized in nature. However, I usually take the project of “deriving QM” to include deriving the correct number field of the Hilbert space.
    2. QM with NPT states: For local systems take the usual quantum mechanical state space. However, when combining two systems, allow states that are only locally positive rather than globally positive in the usual sense. This means that if M_A and M_B are positive operators on systems A and B respectively then we require Tr(M_A \otimes M_B \rho_{AB}) >= 0, but not Tr(M_{AB} \rho_{AB}) >= 0 for bipartite positive operators M_{AB}. The state space is larger than quantum theory because things like the partial transpose of a Bell state are valid states in this theory. It can be made consistent by demanding that all measurements in the theory are local, i.e. POVM elements are of the form M_A \otimes M_B, so there are no worries about getting negative probabilities from entangled measurements.
    This theory satisfies the Tsirelson bound and saturates it and I would be willing to bet that it satisfies information causality.
    3. OK, for the final example you have to know a bit about the convex-sets framework that I have done some recent work in with Howard Barnum, Jon Barrett and Alex Wilce (see http://arxiv.org/abs/quant-ph/0611295 for background). Basically, you take the local state spaces to be Hyperspheres of any dimension (so it only coincides with QM for a 3D sphere) and combine them via the maximal no-signalling tensor product. I can prove that this theory satisfies and saturates Tsirelson. I would be very surprised if it violates information causality, although my intuition on this is less strong than for the other two examples.

  6. Well, one thing about real QM is that it doesn’t satisfy what Jon Barret calls the “global state assumption” and what d’Ariano calls “local observability” whereas complex QM does. Basically, this means that you can’t do tomography locally in real QM, i.e. if you have many copies of a bipartite state then there are parameters that can’t be estimated by local measurements and comparison of the resulting correlations. This is a pretty serious breakdown of reductionism in my opinion and may be reason enough for rejecting real QM. For example, it is the reason that the de Finetti theorem fails for real QM.

  7. One has to be careful about using the usual entropic quantities for bipartite states in NPT QM because they don’t have the same operational interpretation as in regular QM and they may not be well defined.

    Or might not have the properties needed in the proof! But my immediate point is that since the proof only needs to consider states of a very specific form (ccq states you might call them, since a and x are classical and rho is quantum), the relevant entropic properties may/should/obviously hold for these states even if they don’t for general states in the new theory. I suspect the required properties of the particular states do hold, so the proof goes through.

    But you raise a larger point—the proof is that a certain information quantity is bounded, but how do we know this means what we think it means (i.e. the operational meaning)? This doesn’t seem to be directly addressed in the paper (my cursory reading of it, anyway). Perhaps they think it is obvious. Perhaps it is obvious. But it might be good to formulate everything strictly operationally from the start. What I have in mind is something along the lines of: If Alice sends fewer than N bits to Bob, the probability of error when he tries to guess the value of the bth bit is large. Or maybe if Alice sends fewer than m bits, the error probability is at least roughly (N-m)/N.
    btw, I am already wishing we were using google wave, and I only watched ~20 minutes of that video. Hand editing your entry to put in my response, bah!

  8. It seems to me that one could argue that it is the physical theory which establishes locality, not quantum theory itself.

    What does this mean? I thought quantum theory was the physical theory.

    Based on a quick scan of the proof that QM satisfies information causality, I see that it involves several properties of (quantum) mutual information and entropy…

    One has to be careful about using the usual entropic quantities for bipartite states in NPT QM because they don’t have the same operational interpretation as in regular QM and they may not be well defined. Perhaps it is possible to avoid entropic arguments altogether.

  9. Sorry, I wasn’t clear. It’s not obvious to me that one needs to have a “subsystem” axiom for quantum theory. Instead one could argue that the subsystems arise from the Hamiltonian in the theory.

  10. > But it might be good to formulate everything strictly
    > operationally from the start.
    This is reminiscent of Pauli’s ‘operationalist’ critique of Weyl’s unified theory, unless I misunderstand your interpretation of ‘operationally.’ In Pauli’s view, a physical theory should be expressed exclusively in terms of observable quantities. Incidentally, there is some historical evidence that this idea was a precursor to the uncertainty relations, though Heisenberg did not participate in the ‘debate.’

  11. I like the paper a lot, but then, I’m a Theorist. My wife liked it too, and she’s an Experimentalist. Our friend Dr. George Hockney (FermiLab to UCLA to JPL Quantum) hasn’t read it closely enough, so we discussed it at length over breakfast.
    What my wife liked was the effort at specifying WHICH theories of QM work, out of that table on Xikipedia of 13 QM theories. I see that as a partial answer to a question I’ve asked before (either here on on n-Catgory Cafe): how do we describe the hyperplane which separates Physical Theories from Nonphysical Theories in the Space of All Possible Theories of Mathematical/Computational Physics?
    Dr. Hockney, whom I’ve never been able to get into the blogosphere or Social Network software, felt that the paper did not adequately describe what it means to transmit a Classical Bit. I feel that his objections were answered by Claude Shannon (in several length conversations I’d had with Shannon).
    Dr. Hockney made a plausible metaphysical distinction between 3 worlds, whose boundaries are fuzzy in the paper.
    (1) The Classical Physics world of Mechanics, EM, SR;
    (2) which, when in QM formalism, gives reversible Hilbert space operators and so forth;
    (3) The Classical Physics world of Thermodynamic entropy and irreversible processes, out of which physical computers are built and fail.
    I pointed out John A. Wheeler’s objection that NO theory advanced so far describes both invariance to Observers, communication between Obsewrvers, and the observers themselves (who can have Consciousness).

  12. @Matt: oops, I was too focused on the quantity I(a:x,rho), whose operational interpretation might be problematic. But information causality deals with the mutual information between Alice’s bit and Bob’s guess, and so we can relate it to the probability of error by Fano’s inequality. Doing so, one finds that, surprise!, the probability of error must be larger than (N-m)/N. In other words, if Bob’s guess should be reliable, then Alice is going to have to send him N bits. I wrote up the details in a post over on my blog.
    Returning to the point about alternative theory number 2, the entropic properties need to be proven for the states we’re considering, under the restriction that global operations or global operations of the wrong kind are not allowed. What are the allowed operations exactly? Just anything that would not eventually lead to negative probabilities? Is there a simple formal characterization of them?

  13. Hi Joe—
    You asked, of the theories that allow only product effects, and all states positive on them:
    What are the allowed operations exactly? Just anything that would not eventually lead to negative probabilities? Is there a simple formal characterization of them?
    That’s a darned good question. I wonder if there is a simple answer that holds for all “maximal tensor products” of (for simplicity) irreducible state spaces. (State spaces that don’t decompose as direct sums, i.e. into “superselection sectors”). One might think it’s maybe just the convex hull of products of local operations (plus permutations where allowed by symmetry ? Where, locally, all positive maps are allowed. But I don’t think that’s all; see below. The recent result by David Gross, Markus Mueller, Roger Colbeck, and Oscar Dahlste”All reversible dynamics in maximally non-local theories are trivial”
    http://arxiv.org/PS_cache/arxiv/pdf/0910/0910.1840v1.pdf
    might have some useful techniques, since it characterizes some of the most interesting operations (the reversible ones), for a particular case, the maximal tensor product of two states of identical “boxlets”, where a boxlet, the basic system of a “boxworld”, has M alternative measurements, each with K possible outcomes, and no restrictions on the allowed states of a box beyond probabilities adding up to one on each measurement. In
    fact they deal with tensor products of an arbitrary finite number of identical boxes. The result is that the reversible operations are just permutations of the systems followed by local reversible operations.
    Of course, there are probably some allowed operations that are not convex combinations of local positive maps (and permutations where allowed). For example, you can do an operation that takes all states to a fixed state, say a P/R box in the case of two M=2, K=2 systems (variously called P/R boxlets, boxli, or squits). But I don’t think the operation that does this, when the state it prepares is an entangled state, is a convex combination of local operations. (Such a convex combination could never prepare an entangled state from a separable one.)
    Might be fun to try to work this out sometime. It somehow seems like it shouldn’t be that hard….but who knows.

Leave a Reply

Your email address will not be published. Required fields are marked *