Codes, Geometry and Random Structures: Day 1

I now appreciate the difficulty of taking notes in real time! Here is my “liveblogging” of the first day.

Codes, Geometry and Random Structures

The first talk is by Fernando Brandao, who’s talking about our joint paper (also with Michal Horodecki) titled Random quantum circuits are approximate polynomial-designs. “Random quantum circuits” means choosing poly(n) random two-qudit gates between nearest neighbors of a line of n qudits. (The hardest case is qubits, since increasing the local dimension increases the randomizing power of the local gates.) An (approximate) t-design is a distribution over U(dn) that has (approximately) the same first t moments as the Haar measure on U(dn). (Technically by tth moment, we mean polynomials of degree t in entries of U and degree t in U*.)
Exact t-designs are finicky combinatorial objects, and we only know how to construct them efficiently when t is 1 or 2 (Paulis are 1-designs and Cliffords are 2-designs). But for a long time, the only approximate t-designs we could construct were also only for t=1 or 2, and the only progress was to reduce the polynomial cost of these designs, or to connect them with plausible natural models of random circuits. In the last few years, the three of us (together with Richard Low), found a construction of efficient t-designs on n qubits for tleq O(n/log n), and found that polynomial-size random circuits give approximate 3-designs.
So how do we get t up to poly(n) in our current paper? There are four technical ingredients.

  1. As with classical random walks, it’s useful to think about quantum random circuits in terms of the spectral gaps of certain Hermitian matrices. The matrices we consider have dimension d2nt, and we hope to show that their spectral gap is at least 1/poly(n,t). For more on this, see my earlier work (with Matt Hastings) on tensor product expanders, or the Hamiltonian-based formalism of Znidaric and Brown-Viola
  2. Using a version of path coupling for the unitary group due to Oliveira, we can show that random circuits of exponential length (i.e. poly(dn) gates) are t-designs for all t. In other words, the resulting distribution over the unitary group is approximately uniform in whatever natural distance measure you like (for us, we use Wasserstein (earthmover) distance). This is what we intuitively expect, since constructing an arbitrary unitary requires poly(dn)gates, so one might guess that applying a similar number of random gates would give something approximately uniform.
  3. This means that random circuits on O(log n) qudits are rapidly mixing, which translates into a statement about the gaps of some corresponding Hamiltonians. We would like to extend this to a statement about the gaps for n qudits. This can be achieved by a theorem of Nachtergaele.
  4. For this theorem to apply, we need the certain projectors to approximately commute. This involves a technical calculation of which the key idea is that the t! permutations of t D-dimensional systems are approximately orthogonal (according to the Hilbert-Schmidt inner product) when t ll sqrt{D}. Here t comes from the number of moments we are trying to control (i.e. we want a t-design) and D is the dimension of the smaller block that we know we have good convergence on. In this case, the block has O(log n) qudits, so D = poly(n). If we choose the constant in the O(log n) right, then D will dominate t and the overall circuit will be a t-design.

Whenever I talk about t designs, I get a lot of skeptical questions about applications. One that I think is natural is that quantum circuits of size nk given access to a black box unitary U can’t tell whether U was drawn from the Haar measure on the full unitary group, or from a
nO(k) design. (This is described in our paper.) The proof of this is based on another general application, which is that t designs give concentration of measure, similar to what you get from uniformly random unitaries, but with the probability of a quantity being far from its expectation decreasing only as D^{-t}, where D is the dimension of the system.
Next, Ashley Montanaro spoke about


A quantum generalisation of Fourier analysis on the boolean cube

based mostly on his nice paper on this topic with Tobias Osborne.
In theoretical computer science, Fourier analysis of boolean functions is a powerful tool. One good place to learn about this are these lecture notes of Ryan O’Donnell. There are also good surveys by Punya Biswal and Ronald De Wolf. The main idea is that a function f from {-1,1}^n to mathbb{R} can be expanded in the Fourier basis for mathbb{Z}_2^n. This is equivalent to expressing f as a multilinear form, or in quantum language, we might think of f as a 2n-dimensional vector and apply H^{otimes n}. If f is a multilinear function, then it has a degree, given by the size of the largest monomial with a nonzero coefficient. Alternatively, we can ask for the maximum Hamming weight of any state that has nonzero amplitude after we apply H^{otimes n}.
Here is a small sample of the nice things we know classically.

  • KKL Theorem: Any boolean function f has some j for which I_j(f) = Omega({rm Var}(f)log(n)/n).
    (Spoiler alert: no quantum analogue is known, but proving one is a great open problem.)
  • Hypercontractive bounds: Define a noise operator D_rho that flips each bit with probability frac{1-rho}{2}. Define the p-norm
    |f|_p = left( frac{1}{2^n} sum_{xin{-1,1}^n} |f(x)|^pright)^{1/p}
    Then the hypercontractive inequality states that
    |D_rho(f)|_q leq |f|_p
    if 1 leq p leq q and rho leq sqrt{frac{p-1}{q-1}}
  • One application is that degree-d functions satisfy
    |f|_q leq (q-1)^{d/2} |f|_2.

What about quantum versions?
Boolean functions are replaced by Hermitian matrices of dimension 2^n. The Fourier expansion is replaced by a Pauli expansion. The noise operator is replaced by a depolarizing channel acting on every qubit.
With these replacements, a hypercontractive inequality can still be proven, albeit with the restriction that pleq 2leq q. The classical argument does not entirely go through, since at one point it assumes that prightarrow q norms are multiplicative, which is true for matrices but not superoperators . Instead they use results by King that appear to be specialized to qubits.
As a result, there are some nice applications to k-local Hamiltonians. For example, if |H|_2=1 and tgeq (2e)^{k/2}, then the fraction of eigenvalues of H greater than t is less than e^{-kt^{2/k}/2e}. And if H is nonzero then it has rank geq 2^n e^{-2k}.
The FKN theorem also carries over, implying that if the weight of Fourier coefficients on (2+)-local terms is no more than epsilon then the operator must be O(epsilon)-close to a single-qubit operator in 2-norm distance.
There are a number of great open questions raised in this work. Ashley didn’t mention this, but one of those open questions led to our joint work arXiv:1001.0017, which has been a lot of fun to work on. A quantum KKL theorem is another one. Here is another. Suppose H^2=I, and that H acts nontrivially on each qubit. Then does it hold that the degree of H is always at least Omega(log n)?
<Interlude>
Live-blogging is hard work! Even semi-live blogging is.
You’ll notice the level of details of my reports diminishes over time.
The speakers were all excellent; it’s only your reporter that started to fade.
</interlude>

Marius Junge, Exponential rates via Banach space tools

The key quantum information theory question addressed today is:
Why C_E = 2Q_E? And no, the answer is not “super-dense coding and teleportation.” (At least not in this talk.) Note that the classic quantum papers on entanglement-assisted channel coding are quant-ph/0106052 and quant-ph/0106075.
First, Marius gave a nice review of classical information theory. In the spirit of Shannon, I will not repeat it here.
Then he reinterpreted it all in operator algebra language! For example, classical capacity can be interpreted as a commutative diagram!
begin{array}{ccc} L_1(A^{otimes n}) & overset{{{cal N}^{otimes n}}}{longrightarrow} & L_1(B^{otimes n}) \ uparrow {cal E} & &downarrow {cal D} \ ell_1^N & overset {approx {rm Id}}{longrightarrow} & ell_1^N end{array}
For the quantum capacity, we simply replace ell_1^N with L_1(M_d).
To mix quantum and classical, we can define C = M_{n_1} oplus cdots oplus M_{n_k} in the spirit of quant-ph/0203105.
The resulting commuting diagram is:
begin{array}{ccc}L_1(A^{otimes n}) &overset{{{cal N}^{otimes n}}}{longrightarrow}& L_1(B^{otimes n}) \ uparrow ({rm id} otimes {cal E}) & &downarrow {cal D} \ L_1(C) & overset {approx {rm Id}}{longrightarrow} & L_1(C) end{array}
There is also an entanglement-assisted version that I won’t write down, but hopefully you can imagine.
Next, he introduced p-summing operators. The underlying principle is the following.

In a finite-dimensional Banach space, every unconditionally convergent sequence (i.e. converges even if arbitrarily permuted) is absolutely summing. But in general, this is not the case.

More formally,
T:X rightarrow Y is p-summing if
left ( sum_k |T x_k|_Y^p right)^{frac{1}{p}} leq pi_p(T)sup_{|(alpha_k)|_{p'}leq 1} left| sum_k alpha_k x_k right|
where 1/p + 1/p' = 1.
pi_p(T) is the optimal constant possible in this expression.
e.g. if p=1,
sum_k |T x_k| leq pi_1(T) sup_{epsilon_k = pm 1} |sum_k epsilon_k x_k|.
Why is this nice?
One use is the following factorization theorem due to Grothendieck and Pietch.

If T: ell_infty^m rightarrow X is absolutely p-summing, then there exists a probability distribution lambda such that
|T(x)|_X leq pi_p(T) left(sum_{i=1}^m lambda_i |x_i|_p right)^{1/p}

Here is the connection to (classical) information theory. A noisy channel can be written as
Phi: ell_1^m rightarrow ell_1^m. The dual map is Phi^*: ell_infty^m rightarrow ell_infty^m. And the capacity is given by this amazing formula!

C(Phi) = lim_{prightarrowinfty} p ln pi_p(Phi^*)

The following innocuous observation will have powerful consequences: pi_p(T^{otimes n}) = pi_p(T)^n.
One consequence is a strong converse: Let alpha > C(Phi), N geq e^{alpha n}.
Then P_{rm succ} leq e^{-epsilon n}.
To prove this we also need the fact that pi_1(S: ell_infty^N rightarrow X) leq N^{1-1/p} pi_p(S), which implies that the success probability is
leq pi_p(T)^n / N^{1/p} leq e^{alpha n /p} / N^{1/p}.
Next, Marius talked about how their formalism applies to the
entanglement-assisted capacity of quantum channels. Again there is a simple formula

C_E({cal N}) = lim_{prightarrow infty} pi_p^o ({cal N}^*)

What is pi_p^o?
Let {cal N}^* map from M_m to M_k. Then pi_p^o(T) = inf |a|_p |||S|||, where the inf is over all a, S satisfying T(x) = S(a^* x a).
There is another expression also for limited entanglement assistance, which was considered operationally in quant-ph/0402129.
Ultimately, there is an answer for the question at the start of the talk. The classical capacity is twice as big because dim M_d = d^2 and dim ell_1^d = d. Obviously! 🙂
There is also the promise of an intriguing new additivity violation in the limited-entanglement setting, although I admit that the exact details eluded me.

Yi-Kai Liu,
Universal low-rank matrix recovery from Pauli measurements, based on 1103.2816

Previous work on compressed tomography established that
for any rank-r density matrix rho, O(rd log^2 d) random Paulis suffice to reconstruct rho with high probability.
This work establishes that O(r d log^6(d)) random Paulis work simultaneously for all rho. This also gives better error bounds for noisy behavior.
As a result, one can obtain bounds on the sample complexity, instead of just the number of measurement settings.
The technical statement that we need is that random Paulis satisfy the following restricted isometry property (RIP):

For all X with rank leq r,
(1-delta) |X|_{S_2} leq |R(X)|_{ell_2} leq (1+delta) |X|_2

Gaussian matrices are known to have this property [Recht, Fazel, Parillo 2007].
More concretely, let Omega be a random set of m = O(r d log^6 d) random Pauli matrices.
Define R(rho) = ({rm tr} Prho)_{Pin Omega}.
Measurements produce bapprox R(X)
The reconstruction algorithm is to solve:

{rm argmin}_{Xgeq 0} |R(X)-b|_2^2 + mu {rm tr} |X|

Why does this work?
The set of X with {rm tr} X leq 1 is a ball, and low-rank states are on exposed.
So when we intersect with some generic hyperplane R(X)=b, we’re likely to have a unique solution.
More formally, let rho be the true state and S = X-rho. Note that R(S)=0. We want to show S=0. Decompose S = S_0 + S_c, where S_0 has rank leq 2r and S_c has no overlap with row or column spaces of rho.
If X has minimum trace, then |S_c|_1 leq |S_0|_1.
Then we can use RIP to show that S_0 and S_c are both small, using a clever telescoping technique due originally to Candes and Recht.
Ok, so how do we prove the RIP? The idea is that R should be well conditioned, and be “incoherent” so that it’s operator norm is much less than its 2-norm.
Recht et al ’07 used a union bound over (a net for) rank-r matrices. This works because Gaussians have great concentration. But Paulis are pricklier.
This work: Use generic chaining (a la Rudelson and Vershynin). This requires proving bounds on covering numbers, which will be done using entropy duality (c.f. Guedon et al 2008).
Here’s a little more detail. If T is a self-adjoint linear map from M_d to M_d, then
define |T|_{(r)} = sup {|tr X^dag T(X)| : X in U}, where
U = {X in M_d : |X|_2leq 1, {rm rank}(X) leq r}
The goal is to show |R*R-I|_{(r)} leq 2 delta - delta^2, where delta comes from the RIP condition.
The main tool is Dudley’s inequality:
mathbb{E}[sup G(X) : X in U] leq {rm const} int {rm d}epsilon sqrt{log(N(U,d_G, epsilon))}
Here G is a Gaussian process with d_G(X,Y) = sqrt{mathbb{E}((G(X)-G(Y))^2)} and N(U,d_G,epsilon) is the # of radius-epsilon balls in metric d_G needed to cover U.
We can upper bound N using the trace norm. Let B_1 denote the trace-norm ball.
Define |M|_X = max_{P in Omega} |tr P^dag M|.
There are two estimates of N. The easy one is that
N(B_1, |cdot|_X, epsilon) leq poly(1/epsilon) exp(d^2)
The harder one is that
N(B_1, |cdot|_X, epsilon) leq exp(log^2(d) / epsilon^2).
This is obtained using entropy duality, with arguments that are somewhat specific to the spaces in question, using techniques of Maurey. See paper (and references 🙂 ) for details.

Matthias Christandl
Reliable quantum state tomography, based on 1108.5329

The central question:

How do you reconstruct a density matrix, with error bars, from measurements?

One framework is “measure and predict.”
We have n+k systems, and measure n (like a training set) and then predict the results of measurement outcomes from the next k (like a testing set).
The method:
First compute the likelihood function
mu_{B^n}(sigma) = frac{1}{c_{B^n}} {rm tr}[sigma^{otimes n}B^n]
From this we can compute a confidence region, point estimates, error bars, and what have you.
How to compute a 1-epsilon confidence region?
Let epsilon' = epsilon/poly(n), and let Gamma have measure 1-epsilon'. Then add an extra distance delta = sqrt{(2/n) (ln 2/epsilon + O(ln n))} to obtain Gamma^delta.
Then the main result is the state is in Gamma^delta with probability geq 1-epsilon. Crucially the probability is taken over measurement outcomes, and it is important to realize that the outputted confidence interval is itself a random variable. So one cannot say that conditioned on the measurement outcome, we have a high probability of the state being in the confidence region. Without a prior distribution over states, such statements are impossible.
A key lemma in the proof (due to 0806.1091 and applied to QKD in 0809.3019) is that for any sigma, sigma^{otimes n}leq n^{O(d^2)} int {rm d}rho, rho^{otimes n}, where {rm d}rho is the Hilbert-Schmidt measure.

John Preskill, Protected gates for superconducting qubits

This talk was an award lecture for the award that goes by the charmingly Frenglish name of “chaire Aisenstadt chair”. John was introduced by a poem by Patrick!
I enjoyed John’s two opening questions that he imagined the audience would have after seeing the title: “What does this have to do with this conference?” and “Will I understand anything?”
His response was that we shouldn’t worry, since this is a theorist’s conception of superconducting qubits.
Unfortunately, my note-taking quality suffered during this talk, since there was a high density of equations, figures and ideas. So my summary will be breezier. This may become the norm for the remaining days of the conference as well. However, here is an older version of this talk.
As we all know, it’d be great to have a quantum computer. Instead of concatenated FTQC, which has lousy constants, what about physically motivated QECCs? One example is the braiding of nonabelian anyons. Here’s another less studied one. Encoding qubits in harmonic oscillators [“Continuous variable quantum codes”, Gottesman-Kitaev-Preskill 2000].
The rest of the talk was about a particular variant of this idea, called the 0-pi qubit (due to
I have more detailed notes, but they are really rough, and I am saving my energy for the upcoming sessions. In other words, the margin is big enough for the proof, but my glucose level is not.

4 Replies to “Codes, Geometry and Random Structures: Day 1”

  1. Regarding the (outstandingly interesting) work of Preskill and colleagues on oscillator-based systems for quantum computing, fans of technological history may enjoy von Neumann’s description of oscillator-based computation in his patent Non-linear capacitance or inductance switching, amplifying and memory organs (US 2815488, filed 1954 – issued 1957).
    Von Neumann’s patent description begins by surveying the practical disadvantages of vacuum tube circuits, and although the patent recognizes that the new-fangled “crystal triodes” (transistors) have some advantageous attributes, it proposes instead computational elements that are explicitly Hamiltonian, thus microscopically time-reversible, and thereby inherently more efficient than dissipative transistor-based computing. Moreover, at the same time as von Neumann, Eiichi Goto in Japan was developing a model of (essentially reversible) computational dynamics along very similar lines.
    To trace the subsequent history of the seminal von Neumann/Goto ideas, starting with the Japanese “parametron” computers of the 1960s, up-to and including today’s D-WAVE devices and nonlinear optical computing experiments, would be a talk in itself — a very interesting talk.
    It is regrettable that von Neumann’s untimely death interrupted further development of these ideas (the patent issued in the same year as his death), as rather plausibly the key ideas of quantum computing might have been conceived decades earlier had this work continued.

  2. Thank you for this.
    “One that I think is natural is that quantum circuits of size nk given access to a black box unitary U can’t tell whether U was drawn from the Haar measure on the full unitary group, or from a
    nO(k) design.”
    Can you compare this to classical results for pseudorandom generators fooling different classes of functions?
    “The rest of the talk was about a particular variant of this idea, called the 0-pi qubit”
    This is not a new idea, and superconducting qubits seem well advanced. Why has it not been implemented yet? Is 3-d the problem?

  3. Anon #1: Classically we have similar results about k-wise-independent permutations fooling circuits of size << k. They also fool quantum circuits of that size, in the sense that they are indistinguishable from uniformly random permutations. So the significance of unitary t-designs is that they're indistinguishable from uniformly random unitaries.
    And yes, I totally neglected to describe the substance of John's talk, which was about how to perform protected gates on these superconducting qubits.
    And Devoret _has_ implemented a 0-pi qubit, with 43 Josephson junctions. The difficulty is that you need to add even more to get high enough inductance to have good error suppression, but then you run into fabrication errors, and capacitance creeps up (and error suppression goes like exp(-sqrt(L/C))).

Leave a Reply

Your email address will not be published. Required fields are marked *