Why boycott Elsevier?

Everyone has their own reasons for doing this. There is an interesting debate at Gower’s blog, including a response from an Elsevier employee. Some people dislike Elsevier’s high prices, their bundling practices, their fake medical journals, their parent company’s (now-former) involvement in the global arms trade, their lobbying for SOPA/PIPA/RWA, or other aspects of their business practice. Indeed, for those who want to reform Elsevier, this is one limitation of the boycott, in that it doesn’t clearly target a particular practice of the company that we want changed. On the other hand, others think Elsevier isn’t evil, but just has a communications problem.
In this post, I want to defend a more radical position, which is that we should try not to reform Elsevier or other publishers of academic journals, but to eliminate them. Until the debate over SOPA, I thought this position was too extreme. I thought we could tolerate a status quo in which journals are used for credentialing, and although it is a little unjust and absurd, the only real cost is bleeding the library budgets a little bit.
But the status quo isn’t stable. Open access and self-archiving are expanding. Soon, someone will successfully mirror JSTOR. Libraries are increasingly complaining about subscription costs.
In the long run, the future looks more like arxiv.org. Their front page boasts (as of this writing):

Open access to 731,335 e-prints in Physics, Mathematics, Computer Science, Quantitative Biology, Quantitative Finance and Statistics.

Just like the walled gardens of Compuserve and AOL would never grow into the Internet, no commercial publisher will ever be able to match the scope and ease of access of arxiv.org. Nor can they match the price. In 2010, there were about 70,000 new papers added to arxiv.org and there were 30 million articles downloaded, while their annual budget was $420,000. This comes to $6 per article uploaded (or 1.4 cents per download). Publishers talk about how much their business costs and how even “open access” isn’t free, but thanks to arxiv.org, we know how low the costs can go.
By contrast, if you want your article published open access with Springer, it costs $3000. This seems like something we might be able to protest, and convince them to change. We can’t. Elsevier’s outgoing CEO left with a golden parachute worth two million pounds. They’re not going to make that kind of money while running with the efficiency of arxiv.org. So while scientists and the public see the internet as a way of sharing knowledge and driving down costs, publishers like Elsevier see it as a threat. For them, $6/article is a nightmare scenario that has to be stopped.
Some of you might think I’m overreacting. After all, publishers have tolerated self-archiving, citeseer, arxiv.org, etc. so far. This is partly to avoid backlash, and partly because for historical reasons editors of journals like Science and Nature have personally supported the advance of science even over the profits of the companies they work for. But in the long run, we can’t both have everything available for free, and journals continuing to charge extortionate prices. I suspect that a conflict is inevitable, and when it happens, we’ll regret the fact that journals hold all of the copyrights. SOPA was the first sign that publishers are not on the side of advancing knowledge, and if a journal ever goes bankrupt and sells its portfolio of intellectual property, we’ll find out what they’re capable of when they no longer are run by people who place any value on science.
So what can we do about it? A boycott of Elsevier is a good first step. But really we need to change the system so that publishers no longer hold copyright. Their role (and rate of profits) would be like that of the local Kinko’s when they prepare course packs. This would also improve the academic societies, like ACM and APS, by removing the terrible incentive that their publishing gives them to support organizations like the AAP that in turn support SOPA. Instead, they could simply represent communities of scientists, like they were originally designed to do.
I’m not idealistic enough to imagine that arxiv.org is enough. The issue is not so much that it lacks refereeing (which could be remedied easily enough), but that it lacks scarcity. To see what I mean, imagine starting a free online-only virtual journal that simply selects papers from the arxiv. The entire journal archives could be a single html file of less than a megabyte. But without space constraints, it would need to credibly signal that papers accepted into it were high quality. This is nontrivial, and involves convincing authors, readers, referees and hiring committees, all more or less simultaneously. As a community, we need to figure out a way to do this, so that the internet can finally do what it was designed for, and disrupt scientific publishing.
Update: Via John Baez, I came across a proposal for replacing academic journals with overlay boards that seems promising.

referee Hall of Fame/Shame

At the Pontiff, we are big fans of Science 2.0 in all its forms. But even within the traditional journal system, there are ways to improve the peer review system. One tragedy of peer review is how little credit the referees get for doing good work, and how there’s nothing to trouble them if they don’t but their own guilty consciences and a bunch of emails from an editor.
One approach I recently came across is from an economics journal. They publish a list of their associate editors, together with average turnaround time for manuscripts.

Editorial Board Member Manuscripts
Name Affiliation Reviewed Avg Days
1 Bessembinder, Hank

University of Utah

8

14

2

DeAngelo, Harry

University of Southern California

4

18

3

Dittmar, Amy

University of Michigan

4

25

4

Duffie, Darrell

Stanford University

2

12

5

Fama, Eugene

University of Chicago

6

2

etc.

This is a nice start, but surely we could do more. When I teach, I get student evaluations. What if the authors of the papers I refereed also gave me 1-5 stars as a reviewer? Before you mention the obvious problem, the way to average the ratings would be by outcome: all the “reject outright” ratings are averaged together, all the “revise and resubmit” ratings are averaged together, etc., before these are all combined into a final score. That way, you couldn’t get high ratings just by always accepting everything.
Clearly this proposal still needs more work. Any ideas?

strike!

In a move that will undoubtedly bring the US Senate to its knees, the Quantum Pontiff is going dark from 8am to 8pm EST on Jan 18 to protest SOPA, PIPA, the Research Works Act and other proposed acts of censorship.
We suggest you use this time to contact your representatives, read a book (or 1201.3387), or go outside.

Why the laity hope Einstein was wrong.

Although reputable news sources pointed out that most scientists think some more mundane explanation will be found for the too-early arrival of CERN-generated neutrinos in Gran Sasso, recently confirmed by a second round of experiments with much briefer pulse durations to exclude the most likely sources of systematic error, the take-home message for most non-scientists seems to have been “Einstein was wrong.  Things can go faster than light.”  Scientists trying to explain their skepticism often end up sounding closed-minded and arrogant.  People say, “Why don’t you take evidence of faster-than-light travel at face value, rather than saying it must be wrong because it disagrees with Einstein.”  The macho desire not to be bound by an arbitrary speed limit doubtless also helps explain why warp drives are such a staple of  science fiction.  At a recent dinner party, as my wife silently reminded me that a lecture on time dilation and Fitzgerald contraction would be inappropriate, the best I could come up with was an analogy to another branch of physics where where lay peoples’ intuition accords better with that of specialists:  I told them, without giving them any reason to believe me, that Einstein showed that faster-than-light travel would be about as far-reaching and disruptive in its consequences as an engine that required no fuel.
That was too crude an analogy. Certainly a fuelless engine, if it could be built, would be more disruptive in its practical consequences, whereas faster-than-light neutrinos could be accommodated, without creating any paradoxes of time travel, if there were a preferred reference frame within which neutrinos traveling through rock could go faster than light, while other particles, including neutrinos traveling though empty space, would behave in the usual Lorentz-invariant fashion supported by innumerable experiments and astronomical observations.
But it is wrong to blame mere populist distrust of authority for this disconnect between lay and expert opinion. Rather the fault lies with a failure of science education, leaving the public with a good intuition for Galilean relativity, but little understanding of how it has been superseded by special relativity.  So maybe, after dinner is over and my audience is no longer captive, I should retell the old story of cosmic ray-generated muons, who see the onrushing earth as having an atmosphere only a few feet thick, while terrestrial observers see the muons’ lifetime as having been extended manyfold by time dilation.
It is this difference in appreciation of special relativity that accounts for the fact that for most  people, faster-than-light travel seems far more plausible than time travel, whereas for experts, time travel, via closed timelike curves of general relativistic origin, is more plausible than faster-than-light travel in flat spacetime.

QIP travel funding for American students and postdocs

Want to go to the big quantum conference of the year, but are short of funds? Then
apply here by tomorrow (Nov 1, Seattle time) to have up to $675 of travel costs covered. (Hint: I’m mentioning this because not many people have applied so far.)

Required: Since the money comes from the NSF, you need to be a student or postdoc at a US institution. Your citizenship makes no difference.
Criteria: While lots of people are eligible, preference will be given for students, people presenting papers, and according to other criteria listed on the website. Also, if you apply, please have your advisor say something concrete about your funding situation other than “funds are tight.”

Codes, Geometry and Random structures: Day 3

Codes, Geometry and Random Structures

Pranab Sen,
Johnson-Lindenstrauss dimension reduction using unitary k-designs

Since the abstract isn’t on the website, I’ve reproduced it here:
Abstract:

The celebrated Johnson-Lindenstrauss lemma tells us that a random projection onto k-dimensional Euclidean space almost preserves the ell_2-length of any set of 2^{Theta(k)} vectors with high probability. The lemma has no dependence on the dimension of the ambient space in which the vectors originally lie. One way to implement a random projection is to take a Haar-random unitary operator on the ambient space, apply it to a vector and then project the resulting vector onto its first k coordinates. We show that a unitary poly(k)-design can be used for this purpose instead of a Haar-random unitary. This allows us to perform the dimension reduction quantumly in time polylogarithmic in the dimension of the original space and the number of vectors whose length is to be preserved.
We give two proofs of this result. The first proof digs into the Chernoff-style tail bound of a chi-square distribution, and the second proof is by a finer analysis of the so-called k-moment method originated by Bellare and Rompel and applied previously to unitary designs by Low.
Finally, we present an application of our result to private information retrieval where the sets stored have small size.

Here’s the protocol: Start with a state |psirangleinmathbb{C}^{d_1}. In some cases, it is helpful to imagine that it is drawn from a known set of K states, but this is not necessary for all the applications.
Apply a random unitary from a t-design, divide into d_1/d_2 blocks, each of size d_2, and measure which block the resulting state is in. This is meant to mimic a classical J-L transform which would send d_1 dimensions to d_2 dimensions. It turns out that t= O(d_2) is necessary to get good enough concentration. (And to figure out what the right moments of the underlying Haar distribution are, it turns out that using Levy’s Lemma as Richard Low does is not enough: Pranab needs some stronger dimension-independent concentration to avoid paying another d_1 cost.)
Patrick asked whether this would give a good dimension-reduction scheme if simulated classically. But using e.g. random circuits would require poly(t) gates for a total (classical) cost of d_1 {rm poly}(d_2), whereas even a trivial Gaussian rectangular matrix only takes time d_1d_2 to apply (and more sophisticated methods require time more like d_1log(d_1)).
One application (that Pranab called “toy” because of the parameters achieved) is to private information retrieval. The scenario is as follows:

  • Alice stores a subset S subseteq [m], |S|<n, n ll frac{log m}{loglog m}.
  • Bob has x in [m], wants to know if x in S.
    One solution is for Alice to send her entire database at cost O(n log m). Can achieve O(n (log n + log log m)) using a set system due to Buhrman et al. But there is an Omega(n) lower bound if Bob doesn’t speak.

The quantum solution is to use J-L. Let |Srangle = frac{1}{sqrt{|S|}} sum_{xin S} |xrangle. If we have a compressed version of |Srangle, and want to determine whether langle x | Srangle is zero or geq 1/sqrt{n}. This level of accuracy requires O(n log m) dimensions.
Here is the quantum protocol.

  1. Bob makes n^2 projections of x.
  2. He sends Theta(n^2) block names.
  3. Alice projects |Srangle onto these blocks and sends the resulting states.
  4. The total communication is $latex O(n^2(log n + log log m))

A recurrent difficulty is the problem that the dimension-reduction procedure requires a “which block” measurement, which gives a random answer. This means that two parties cannot cheaply compare a state by applying this J-L procedure and a swap test, unless they get very lucky and happen to land in the same block. Indeed, this phenomenon in which it is easy to sample from a distribution, but presumed hard to reproduce any individual state, even given its label, is related to the conjectured security of some quantum money schemes. This difficulty was also explored in the related paper I wrote with Ashley Montanaro and Tony Short: 1012.2262. There we found mostly negative results about the difficulty of faithfully compressing quantum state J-L-style; Pranab circumvents some of them by including a large classical register specifying the “which block” information, but for many applications this need to condition on the classical register is deadly.

Beth Ruskai, Some old and new results on quantum marginals and reduced density matrices

The quantum marginal problem concerns the m-partite reduced density matrices of an N-body quantum system. Given set of reduced density matrices, the N-representability problem asks whether there exists a N-party density matrix consistent with them. If we could decide this efficiently, then we could solve the QMA-complete local Hamiltonian problem efficiently. Despite this, a constant factor approximation, since the reduction only is known to work for 1/poly(N) accuracy.
Often we restrict to the case of N fermions. In this case, anti-symmetry plays a key role, for example, for m=1, rho_1 is N-representable iff all eigenvalues are leq 1/N. (This is essentially the Pauli exclusion principle.) I said “for example,” but really for m=2, it’s already not fully understood, and it is known that the full criteria do not depend only on eigenvalues.
There are a number of nice technical results, which I did not fully follow in part because I was typing Pranab’s talk still. (Yes, I am aware of the irony.)
For example, if R_m denotes the set of m-partite marginal states, then we know that {rho_N : rho_N mapsto R_m {rm extreme}} increases with m.
Many important results were proven in a 1972 paper of Erdahl, although one of the key proofs (that every extreme point is exposed) turns out to have a bug in it.
One useful theorem is that for 2m geq N, the preimage of an extreme point is unique. This implies that
The intuition behind the theorem is that if a point has a nonunique preimage, then by a suitable averaging over a phase, we can show that this point is a mixture of two different valid m-partite density matrices.
Intuition behind thm:
m-body density matrix
rho_J = sum_k mu_k^2 |chi_jranglelangle chi_j|
Purify
|psirangle = sum_j e^{iomega_j} mu_j |chi_jrangle |phi_jrangle
If the $omega_j$ are not unique, then we can write
|psirangle= x_1 |psi_1rangle + e^{iomega} x_2 |psi_2rangle,
where any $omega$ gives the same marginal $rho_J$.
In this case, we can average over $omega$ and get the mixture
x_1^2 |psi_1ranglelangle psi_1|+ x_2^2 |psi_2ranglelangle psi_2|, implying that $rho_J$ is not extreme.
In general can we say that the pre-image of an exposed point is unique? This open question dates back at least to Erdahl’s paper, and the answer is no: QECCs serve as a counter-example. The proof is in 1010.2717. I think this is a nice example of how quantum information tools with operational goals (e.g. fighting decoherence) turn out to have interesting foundational interpretations.

Omar Fawzi, Almost Euclidean sections of L1 and quantum uncertainty relations

Metric uncertainty relation:
Here is the main idea: given a state |psirangle apply U_k where k is drawn randomly from {1,…,t}. (Here t is much smaller than the dimension of the state.) Then we trace out a small subsystem B, leaving a larger quantum state A, say with n qubits. If K is the “key” state, then we hope that the state on AK is approximately maximally mixed.
It turns out that what we need here is a low distortion embedding of ell_2 into ell_1(ell_2) embedding. That is, if our state is mapped to sum_{k,a,b} alpha_{k,a,b} |k,a,brangle then we would like the following inequality to hold:
sqrt{dim AK} (1-epsilon) leq sum_{k,a} sqrt{sum_b |alpha_{k,a,b}|^2}leq sqrt{dim AK}
The ell_2 system is the one we discard, and we would like it to be small.
What can we achieve?
For random unitaries, we get t = O(log(1/epsilon)/epsilon^2) and $dim B = O(1/epsilon^2)$.
But what we can achieve explicitly (i.e. efficiently on a quantum computer)? When t=2, MUBs guarantee that the state of AK has entropy of at least n/2 (due to entropic uncertainty relations), but not high enough to guarantee high fidelity with the maximally mixed state. Nor does it work to take more MUBs.
Unitary 2-designs work, but are very costly.
The construction is a variant of one proposed by Indyk. The idea is to first apply a MUB (using O(1) random bits), which guarantees creating entropy at least n/2, and then a permutation extractor, which uses O(log n) bits to extract a constant fraction of this entropy. We put these extracted bits to the side and continue, each time multiplying the number of remaining qubits by a constant strictly less than one (something like 7/8). Eventually we are left with log(n) qubits that we just refer to as the ell_2 part; i.e. the B system.
This has a number of nice consequences. The relations between these different consequences are also nice, and in my opinion, underappreciated. Unfortunately, I won’t do them justice here, but here is a brief sketch:

  • An almost-Euclidean subspace of ell_1(ell_2).
  • Metric uncertainty relations based on explicit unitaries
  • The first construction of efficient locking
  • Efficient quantum identification codes using n classical bits and O(log^2(n/epsilon)) qubits.

Unfortunately, I had to catch an early flight, and missed the rest of the workshop. It’s a pity, since the talks seemed like they’d be pretty exciting.

Codes, Geometry and Random Structures: Day 2

Codes, Geometry and Random Structures

Graeme Smith, Detecting incapacity, based on 1108.1807

A central question in quantum information theory is determining when a channel has nonzero quantum capacity. Pretty much all we know here is that there are a few criteria for proving a channel has zero quantum capacity: PPT channels can’t transmit entanglement (since LOCC can’t change PPT states to non-PPT states) nor can anti-degradable channels (because of no-cloning). These two arguments appear to both be pretty specific. Can we put them on the same footing, and hopefully also derive some new criteria?
That’s what this paper does. The talk was good, but the paper also describes the results clearly.
Here is a sample of the results.
Assume that R is unphysical on set S; e.g. S is the set of quantum states and R is the transpose map. Suppose that for any physical map D, there exists a physical map D^* with Rcirc D = D^* circ R. If R is the partial transpose then D^* is simply the operation you get from taking the complex conjugate of all the Kraus operators.
Their theorem then states that if Rcirc N is physical, N cannot transmit the states in S. The proof is a simple exercise in composition.
In this case we say that N “physicalizes” R or, equivalently, R “incapacitates” N.
This is not quite enough to get the no-cloning criterion, but a mild generalization will do the job. Graeme gave a nice explanation in terms of how teleporting information involves going backwards in time through the EPR pairs, and if PPT states are used, then the time-traveling information gets confused and doesn’t know whether to get forwards or backwards. However, if this principle is implicit in the proof, then it’s very implicit.

Jean-Pierre Tillich, Quantum turbo codes with unbounded minimum distance and excellent error-reducing performance

LDPC codes are successful classically, but quantumly they suffer many drawbacks: their distance isn’t so good (but see 0903.0566), their Tanner graphs have short cycles (specifically 4-cycles), and iterative decoding doesn’t work. One particular no-go theorem is that there are no convolutional encoder that is both recursive and non-catastrophic (0712.2888).
In this talk, Tillich discusses a catastrophic and recursive encoder. It achieves rate 1/8 and somewhat surprisingly, it achieves minimum distance of Omega(log(n) / loglog(n)) with high probability. He conjectures that this should be Omega(log n) for the right choice of interleaver.
The resulting code can be thought of not so much as “error-correcting” but “error-reducing.” Error rate p=0.14 becomes 10-3, and p=0.105 becomes 10-4. This compares favorably with the toric code threshold of p=0.15. He suspects that the limitation here comes from the iterative decoder.

Jurg Wullschleger, The decoupling theorem

The decoupling theorem is arguably the Book proof of most quantum coding theorems. The encoder applies a random unitary (in some problem-dependent way) and transmits part of the output to the receiver. Treat this part as being traced out, and if she keeps part, then consider it to be controlled by Eve. If the resulting state has the reference system “decoupled” from Eve, then since the remaining parts of the state (controlled by Bob) purify everything, then a local unitary on Bob’s side can give him pure entangled states with both the reference, and separately with Eve. This allows the complicated task of transmitting quantum information reliably (which is hard enough that proving that the coherent information was the quantum capacity originally took a lot of difficult technical work) can be reduced to the simpler goal of destroying correlations.
Decoupling theorems were originally developed for the state-merging problem, by Horodecki-Oppenheim-Winter ’05 (where it was “Lemma 5” or something similarly marginal). Then it was further developed by quant-ph/0606225, where it was called a Theorem. Then in , it moved to the title. So it took some time for the community to fully appreciate how useful this tool is.
Some of these tools use smoothed min- and max-entropies, which can be thought of as one-shot variants of von Neumann entropy that are either pessimistic or optimistic, depending on application. Amusingly, the smoothed max-entropy is not defined by taking a smoothed rank, but is defined in order to satisfy a relation that we’d like (which also holds for pure states). This is reminiscent of the speed of light, which is an integer number of meters/second by definition.
For any pure state rho^{XYE}, define the smoothed max entropy to be
H_{max}^epsilon(X|Y))_rho = - H_{min}^epsilon H(X|E)_rho.
Other definitions are also used, but are pretty close.
In this talk, Jurg described a converse theorem to the decoupling theorem, and explained many scenarios in which it applies. See his paper for the details.

Frédéric Dupuis , Classical coding via decoupling

Decoupling is great for sending quantum messages, and gives simpler proofs than even the original HSW proof of the classical capacity (or any other known proofs). Thus it’s interesting to find a decoupling-based proof of the HSW theorem not only for the sake of the unification of knowledge, but also so we can get a simpler proof. This is essentially what is achieved here, although only when the inputs are uniformly distributed.

Mark Wilde, Polar codes for classical, private, and quantum communication

We are really good at dealing with correlated information, with tools like Slepian-Wolf, side-information-assisted hypothesis testing and multiple-access channel codes. So we can treat inputs to channels in this way. We can perform simple mathbb{F}_2-linear maps on the inputs so that we can manipulate n binary channels each with capacity C become roughly nC channels with capacity roughly 1, and n(1-C) channels roughly with capacity 0.
Quantumly, much of this goes through, but we need the ability to simultaneously apply typical projections. This key lemma is Lemma 3 in Pranab Sen’s paper arXiv:1109.0802.

1 - {rm tr} Pi_N cdots Pi_1 rho Pi_1 cdots Pi_N leq 2 sqrt{sum_{i=1}^N {rm tr} (I-Pi_i)rho} .

Mark calls this a “noncommutative union bound.” Note that using a Winter-style “gentle measurement lemma” puts the square root inside the sum, which for this application cuts the achievable rate in half.
For private communication, we can polarize into channels of four types: either good for Bob and good for Eve, bad for both, or good for one and bad for the other. We send random bits into the channels that are good for both, and arbitrary bits into the ones that are bad for both. Information bits go into the ones that are good for Bob and bad for Eve, and shared secret key goes into the ones that are bad for Bob (so he effectively gets the message anyway), and good for Eve.
This generalizes to quantum communication using Devetak’s technique from quant-ph/0304127. Channel inputs are

Bob

Eve

input

Good

Good

|+rangle

Good

Bad

information

Bad

Good

shared entanglement

Bad

Bad

arbitrary

A big open question is to make the decoding efficient, and to figure out which channels are good.

Joseph M. Renes, Quantum information processing as classical processing of complementary information

The main theme is a CSS-like one: we can often treat quantum information as being classical information about two complementary observables.
For example, if you coherently measure in both the X and Z basis, then you’ve effectively done a swap with the qubit you’re measuring.
This principle is related to uncertainty principles
H(X^A|B) + H(Z^A|C) geq log 1/c
H(X^A|B) + H(Z^A|B) geq log 1/c + H(A|B)
Here c is the largest overlap between eigenvector of X and Z operators.
Rearranging the second inequality, we see
H(A|B) leq  H(X^A|B) + H(Z^A|B)  - log d.
Thus, entanglement between A and B corresponds to the ability to predict complementary observables.
Many quantum information protocols can be understood in this way; e.g. entanglement distillation can be thought of as Alice and Bob having some correlated amplitude and phase information, and trying to do information reconciliation on these quantities.
Joe also talked about quantum polar codes, which create “synthetic” channels with suitable uses of CNOTs on the inputs. The idea is that CNOTs act on Z information in one direction and X information in the opposite direction. And a decoder need only separately figure out amplitude and phase information. There are subtleties: this information can be correlated, and it can exist on the same qubits. When amplitude and phase information is found on the same qubit, we use an entanglement assistance.
This gives efficient decoding for Pauli channels and erasure channels. And in numerical experiments, the entanglement assistance appears to be often unnecessary.

Codes, Geometry and Random Structures: Day 1

I now appreciate the difficulty of taking notes in real time! Here is my “liveblogging” of the first day.

Codes, Geometry and Random Structures

The first talk is by Fernando Brandao, who’s talking about our joint paper (also with Michal Horodecki) titled Random quantum circuits are approximate polynomial-designs. “Random quantum circuits” means choosing poly(n) random two-qudit gates between nearest neighbors of a line of n qudits. (The hardest case is qubits, since increasing the local dimension increases the randomizing power of the local gates.) An (approximate) t-design is a distribution over U(dn) that has (approximately) the same first t moments as the Haar measure on U(dn). (Technically by tth moment, we mean polynomials of degree t in entries of U and degree t in U*.)
Exact t-designs are finicky combinatorial objects, and we only know how to construct them efficiently when t is 1 or 2 (Paulis are 1-designs and Cliffords are 2-designs). But for a long time, the only approximate t-designs we could construct were also only for t=1 or 2, and the only progress was to reduce the polynomial cost of these designs, or to connect them with plausible natural models of random circuits. In the last few years, the three of us (together with Richard Low), found a construction of efficient t-designs on n qubits for tleq O(n/log n), and found that polynomial-size random circuits give approximate 3-designs.
So how do we get t up to poly(n) in our current paper? There are four technical ingredients.

  1. As with classical random walks, it’s useful to think about quantum random circuits in terms of the spectral gaps of certain Hermitian matrices. The matrices we consider have dimension d2nt, and we hope to show that their spectral gap is at least 1/poly(n,t). For more on this, see my earlier work (with Matt Hastings) on tensor product expanders, or the Hamiltonian-based formalism of Znidaric and Brown-Viola
  2. Using a version of path coupling for the unitary group due to Oliveira, we can show that random circuits of exponential length (i.e. poly(dn) gates) are t-designs for all t. In other words, the resulting distribution over the unitary group is approximately uniform in whatever natural distance measure you like (for us, we use Wasserstein (earthmover) distance). This is what we intuitively expect, since constructing an arbitrary unitary requires poly(dn)gates, so one might guess that applying a similar number of random gates would give something approximately uniform.
  3. This means that random circuits on O(log n) qudits are rapidly mixing, which translates into a statement about the gaps of some corresponding Hamiltonians. We would like to extend this to a statement about the gaps for n qudits. This can be achieved by a theorem of Nachtergaele.
  4. For this theorem to apply, we need the certain projectors to approximately commute. This involves a technical calculation of which the key idea is that the t! permutations of t D-dimensional systems are approximately orthogonal (according to the Hilbert-Schmidt inner product) when t ll sqrt{D}. Here t comes from the number of moments we are trying to control (i.e. we want a t-design) and D is the dimension of the smaller block that we know we have good convergence on. In this case, the block has O(log n) qudits, so D = poly(n). If we choose the constant in the O(log n) right, then D will dominate t and the overall circuit will be a t-design.

Whenever I talk about t designs, I get a lot of skeptical questions about applications. One that I think is natural is that quantum circuits of size nk given access to a black box unitary U can’t tell whether U was drawn from the Haar measure on the full unitary group, or from a
nO(k) design. (This is described in our paper.) The proof of this is based on another general application, which is that t designs give concentration of measure, similar to what you get from uniformly random unitaries, but with the probability of a quantity being far from its expectation decreasing only as D^{-t}, where D is the dimension of the system.
Next, Ashley Montanaro spoke about


A quantum generalisation of Fourier analysis on the boolean cube

based mostly on his nice paper on this topic with Tobias Osborne.
In theoretical computer science, Fourier analysis of boolean functions is a powerful tool. One good place to learn about this are these lecture notes of Ryan O’Donnell. There are also good surveys by Punya Biswal and Ronald De Wolf. The main idea is that a function f from {-1,1}^n to mathbb{R} can be expanded in the Fourier basis for mathbb{Z}_2^n. This is equivalent to expressing f as a multilinear form, or in quantum language, we might think of f as a 2n-dimensional vector and apply H^{otimes n}. If f is a multilinear function, then it has a degree, given by the size of the largest monomial with a nonzero coefficient. Alternatively, we can ask for the maximum Hamming weight of any state that has nonzero amplitude after we apply H^{otimes n}.
Here is a small sample of the nice things we know classically.

  • KKL Theorem: Any boolean function f has some j for which I_j(f) = Omega({rm Var}(f)log(n)/n).
    (Spoiler alert: no quantum analogue is known, but proving one is a great open problem.)
  • Hypercontractive bounds: Define a noise operator D_rho that flips each bit with probability frac{1-rho}{2}. Define the p-norm
    |f|_p = left( frac{1}{2^n} sum_{xin{-1,1}^n} |f(x)|^pright)^{1/p}
    Then the hypercontractive inequality states that
    |D_rho(f)|_q leq |f|_p
    if 1 leq p leq q and rho leq sqrt{frac{p-1}{q-1}}
  • One application is that degree-d functions satisfy
    |f|_q leq (q-1)^{d/2} |f|_2.

What about quantum versions?
Boolean functions are replaced by Hermitian matrices of dimension 2^n. The Fourier expansion is replaced by a Pauli expansion. The noise operator is replaced by a depolarizing channel acting on every qubit.
With these replacements, a hypercontractive inequality can still be proven, albeit with the restriction that pleq 2leq q. The classical argument does not entirely go through, since at one point it assumes that prightarrow q norms are multiplicative, which is true for matrices but not superoperators . Instead they use results by King that appear to be specialized to qubits.
As a result, there are some nice applications to k-local Hamiltonians. For example, if |H|_2=1 and tgeq (2e)^{k/2}, then the fraction of eigenvalues of H greater than t is less than e^{-kt^{2/k}/2e}. And if H is nonzero then it has rank geq 2^n e^{-2k}.
The FKN theorem also carries over, implying that if the weight of Fourier coefficients on (2+)-local terms is no more than epsilon then the operator must be O(epsilon)-close to a single-qubit operator in 2-norm distance.
There are a number of great open questions raised in this work. Ashley didn’t mention this, but one of those open questions led to our joint work arXiv:1001.0017, which has been a lot of fun to work on. A quantum KKL theorem is another one. Here is another. Suppose H^2=I, and that H acts nontrivially on each qubit. Then does it hold that the degree of H is always at least Omega(log n)?
<Interlude>
Live-blogging is hard work! Even semi-live blogging is.
You’ll notice the level of details of my reports diminishes over time.
The speakers were all excellent; it’s only your reporter that started to fade.
</interlude>

Marius Junge, Exponential rates via Banach space tools

The key quantum information theory question addressed today is:
Why C_E = 2Q_E? And no, the answer is not “super-dense coding and teleportation.” (At least not in this talk.) Note that the classic quantum papers on entanglement-assisted channel coding are quant-ph/0106052 and quant-ph/0106075.
First, Marius gave a nice review of classical information theory. In the spirit of Shannon, I will not repeat it here.
Then he reinterpreted it all in operator algebra language! For example, classical capacity can be interpreted as a commutative diagram!
begin{array}{ccc} L_1(A^{otimes n}) & overset{{{cal N}^{otimes n}}}{longrightarrow} & L_1(B^{otimes n}) \ uparrow {cal E} & &downarrow {cal D} \ ell_1^N & overset {approx {rm Id}}{longrightarrow} & ell_1^N end{array}
For the quantum capacity, we simply replace ell_1^N with L_1(M_d).
To mix quantum and classical, we can define C = M_{n_1} oplus cdots oplus M_{n_k} in the spirit of quant-ph/0203105.
The resulting commuting diagram is:
begin{array}{ccc}L_1(A^{otimes n}) &overset{{{cal N}^{otimes n}}}{longrightarrow}& L_1(B^{otimes n}) \ uparrow ({rm id} otimes {cal E}) & &downarrow {cal D} \ L_1(C) & overset {approx {rm Id}}{longrightarrow} & L_1(C) end{array}
There is also an entanglement-assisted version that I won’t write down, but hopefully you can imagine.
Next, he introduced p-summing operators. The underlying principle is the following.

In a finite-dimensional Banach space, every unconditionally convergent sequence (i.e. converges even if arbitrarily permuted) is absolutely summing. But in general, this is not the case.

More formally,
T:X rightarrow Y is p-summing if
left ( sum_k |T x_k|_Y^p right)^{frac{1}{p}} leq pi_p(T)sup_{|(alpha_k)|_{p'}leq 1} left| sum_k alpha_k x_k right|
where 1/p + 1/p' = 1.
pi_p(T) is the optimal constant possible in this expression.
e.g. if p=1,
sum_k |T x_k| leq pi_1(T) sup_{epsilon_k = pm 1} |sum_k epsilon_k x_k|.
Why is this nice?
One use is the following factorization theorem due to Grothendieck and Pietch.

If T: ell_infty^m rightarrow X is absolutely p-summing, then there exists a probability distribution lambda such that
|T(x)|_X leq pi_p(T) left(sum_{i=1}^m lambda_i |x_i|_p right)^{1/p}

Here is the connection to (classical) information theory. A noisy channel can be written as
Phi: ell_1^m rightarrow ell_1^m. The dual map is Phi^*: ell_infty^m rightarrow ell_infty^m. And the capacity is given by this amazing formula!

C(Phi) = lim_{prightarrowinfty} p ln pi_p(Phi^*)

The following innocuous observation will have powerful consequences: pi_p(T^{otimes n}) = pi_p(T)^n.
One consequence is a strong converse: Let alpha > C(Phi), N geq e^{alpha n}.
Then P_{rm succ} leq e^{-epsilon n}.
To prove this we also need the fact that pi_1(S: ell_infty^N rightarrow X) leq N^{1-1/p} pi_p(S), which implies that the success probability is
leq pi_p(T)^n / N^{1/p} leq e^{alpha n /p} / N^{1/p}.
Next, Marius talked about how their formalism applies to the
entanglement-assisted capacity of quantum channels. Again there is a simple formula

C_E({cal N}) = lim_{prightarrow infty} pi_p^o ({cal N}^*)

What is pi_p^o?
Let {cal N}^* map from M_m to M_k. Then pi_p^o(T) = inf |a|_p |||S|||, where the inf is over all a, S satisfying T(x) = S(a^* x a).
There is another expression also for limited entanglement assistance, which was considered operationally in quant-ph/0402129.
Ultimately, there is an answer for the question at the start of the talk. The classical capacity is twice as big because dim M_d = d^2 and dim ell_1^d = d. Obviously! 🙂
There is also the promise of an intriguing new additivity violation in the limited-entanglement setting, although I admit that the exact details eluded me.

Yi-Kai Liu,
Universal low-rank matrix recovery from Pauli measurements, based on 1103.2816

Previous work on compressed tomography established that
for any rank-r density matrix rho, O(rd log^2 d) random Paulis suffice to reconstruct rho with high probability.
This work establishes that O(r d log^6(d)) random Paulis work simultaneously for all rho. This also gives better error bounds for noisy behavior.
As a result, one can obtain bounds on the sample complexity, instead of just the number of measurement settings.
The technical statement that we need is that random Paulis satisfy the following restricted isometry property (RIP):

For all X with rank leq r,
(1-delta) |X|_{S_2} leq |R(X)|_{ell_2} leq (1+delta) |X|_2

Gaussian matrices are known to have this property [Recht, Fazel, Parillo 2007].
More concretely, let Omega be a random set of m = O(r d log^6 d) random Pauli matrices.
Define R(rho) = ({rm tr} Prho)_{Pin Omega}.
Measurements produce bapprox R(X)
The reconstruction algorithm is to solve:

{rm argmin}_{Xgeq 0} |R(X)-b|_2^2 + mu {rm tr} |X|

Why does this work?
The set of X with {rm tr} X leq 1 is a ball, and low-rank states are on exposed.
So when we intersect with some generic hyperplane R(X)=b, we’re likely to have a unique solution.
More formally, let rho be the true state and S = X-rho. Note that R(S)=0. We want to show S=0. Decompose S = S_0 + S_c, where S_0 has rank leq 2r and S_c has no overlap with row or column spaces of rho.
If X has minimum trace, then |S_c|_1 leq |S_0|_1.
Then we can use RIP to show that S_0 and S_c are both small, using a clever telescoping technique due originally to Candes and Recht.
Ok, so how do we prove the RIP? The idea is that R should be well conditioned, and be “incoherent” so that it’s operator norm is much less than its 2-norm.
Recht et al ’07 used a union bound over (a net for) rank-r matrices. This works because Gaussians have great concentration. But Paulis are pricklier.
This work: Use generic chaining (a la Rudelson and Vershynin). This requires proving bounds on covering numbers, which will be done using entropy duality (c.f. Guedon et al 2008).
Here’s a little more detail. If T is a self-adjoint linear map from M_d to M_d, then
define |T|_{(r)} = sup {|tr X^dag T(X)| : X in U}, where
U = {X in M_d : |X|_2leq 1, {rm rank}(X) leq r}
The goal is to show |R*R-I|_{(r)} leq 2 delta - delta^2, where delta comes from the RIP condition.
The main tool is Dudley’s inequality:
mathbb{E}[sup G(X) : X in U] leq {rm const} int {rm d}epsilon sqrt{log(N(U,d_G, epsilon))}
Here G is a Gaussian process with d_G(X,Y) = sqrt{mathbb{E}((G(X)-G(Y))^2)} and N(U,d_G,epsilon) is the # of radius-epsilon balls in metric d_G needed to cover U.
We can upper bound N using the trace norm. Let B_1 denote the trace-norm ball.
Define |M|_X = max_{P in Omega} |tr P^dag M|.
There are two estimates of N. The easy one is that
N(B_1, |cdot|_X, epsilon) leq poly(1/epsilon) exp(d^2)
The harder one is that
N(B_1, |cdot|_X, epsilon) leq exp(log^2(d) / epsilon^2).
This is obtained using entropy duality, with arguments that are somewhat specific to the spaces in question, using techniques of Maurey. See paper (and references 🙂 ) for details.

Matthias Christandl
Reliable quantum state tomography, based on 1108.5329

The central question:

How do you reconstruct a density matrix, with error bars, from measurements?

One framework is “measure and predict.”
We have n+k systems, and measure n (like a training set) and then predict the results of measurement outcomes from the next k (like a testing set).
The method:
First compute the likelihood function
mu_{B^n}(sigma) = frac{1}{c_{B^n}} {rm tr}[sigma^{otimes n}B^n]
From this we can compute a confidence region, point estimates, error bars, and what have you.
How to compute a 1-epsilon confidence region?
Let epsilon' = epsilon/poly(n), and let Gamma have measure 1-epsilon'. Then add an extra distance delta = sqrt{(2/n) (ln 2/epsilon + O(ln n))} to obtain Gamma^delta.
Then the main result is the state is in Gamma^delta with probability geq 1-epsilon. Crucially the probability is taken over measurement outcomes, and it is important to realize that the outputted confidence interval is itself a random variable. So one cannot say that conditioned on the measurement outcome, we have a high probability of the state being in the confidence region. Without a prior distribution over states, such statements are impossible.
A key lemma in the proof (due to 0806.1091 and applied to QKD in 0809.3019) is that for any sigma, sigma^{otimes n}leq n^{O(d^2)} int {rm d}rho, rho^{otimes n}, where {rm d}rho is the Hilbert-Schmidt measure.

John Preskill, Protected gates for superconducting qubits

This talk was an award lecture for the award that goes by the charmingly Frenglish name of “chaire Aisenstadt chair”. John was introduced by a poem by Patrick!
I enjoyed John’s two opening questions that he imagined the audience would have after seeing the title: “What does this have to do with this conference?” and “Will I understand anything?”
His response was that we shouldn’t worry, since this is a theorist’s conception of superconducting qubits.
Unfortunately, my note-taking quality suffered during this talk, since there was a high density of equations, figures and ideas. So my summary will be breezier. This may become the norm for the remaining days of the conference as well. However, here is an older version of this talk.
As we all know, it’d be great to have a quantum computer. Instead of concatenated FTQC, which has lousy constants, what about physically motivated QECCs? One example is the braiding of nonabelian anyons. Here’s another less studied one. Encoding qubits in harmonic oscillators [“Continuous variable quantum codes”, Gottesman-Kitaev-Preskill 2000].
The rest of the talk was about a particular variant of this idea, called the 0-pi qubit (due to
I have more detailed notes, but they are really rough, and I am saving my energy for the upcoming sessions. In other words, the margin is big enough for the proof, but my glucose level is not.