# Codes, Geometry and Random Structures: Day 1

I now appreciate the difficulty of taking notes in real time! Here is my “liveblogging” of the first day.

Codes, Geometry and Random Structures

The first talk is by Fernando Brandao, who’s talking about our joint paper (also with Michal Horodecki) titled Random quantum circuits are approximate polynomial-designs. “Random quantum circuits” means choosing poly(n) random two-qudit gates between nearest neighbors of a line of n qudits. (The hardest case is qubits, since increasing the local dimension increases the randomizing power of the local gates.) An (approximate) t-design is a distribution over U(dn) that has (approximately) the same first t moments as the Haar measure on U(dn). (Technically by tth moment, we mean polynomials of degree t in entries of U and degree t in U*.)

Exact t-designs are finicky combinatorial objects, and we only know how to construct them efficiently when t is 1 or 2 (Paulis are 1-designs and Cliffords are 2-designs). But for a long time, the only approximate t-designs we could construct were also only for t=1 or 2, and the only progress was to reduce the polynomial cost of these designs, or to connect them with plausible natural models of random circuits. In the last few years, the three of us (together with Richard Low), found a construction of efficient t-designs on n qubits for $t\leq O(n/\log n)$, and found that polynomial-size random circuits give approximate 3-designs.

So how do we get t up to poly(n) in our current paper? There are four technical ingredients.

1. As with classical random walks, it’s useful to think about quantum random circuits in terms of the spectral gaps of certain Hermitian matrices. The matrices we consider have dimension d2nt, and we hope to show that their spectral gap is at least 1/poly(n,t). For more on this, see my earlier work (with Matt Hastings) on tensor product expanders, or the Hamiltonian-based formalism of Znidaric and Brown-Viola
2. Using a version of path coupling for the unitary group due to Oliveira, we can show that random circuits of exponential length (i.e. poly(dn) gates) are t-designs for all t. In other words, the resulting distribution over the unitary group is approximately uniform in whatever natural distance measure you like (for us, we use Wasserstein (earthmover) distance). This is what we intuitively expect, since constructing an arbitrary unitary requires poly(dn)gates, so one might guess that applying a similar number of random gates would give something approximately uniform.
3. This means that random circuits on $O(\log n)$ qudits are rapidly mixing, which translates into a statement about the gaps of some corresponding Hamiltonians. We would like to extend this to a statement about the gaps for n qudits. This can be achieved by a theorem of Nachtergaele.
4. For this theorem to apply, we need the certain projectors to approximately commute. This involves a technical calculation of which the key idea is that the t! permutations of t D-dimensional systems are approximately orthogonal (according to the Hilbert-Schmidt inner product) when $t \ll \sqrt{D}$. Here t comes from the number of moments we are trying to control (i.e. we want a t-design) and D is the dimension of the smaller block that we know we have good convergence on. In this case, the block has O(log n) qudits, so D = poly(n). If we choose the constant in the O(log n) right, then D will dominate t and the overall circuit will be a t-design.

Whenever I talk about t designs, I get a lot of skeptical questions about applications. One that I think is natural is that quantum circuits of size nk given access to a black box unitary U can’t tell whether U was drawn from the Haar measure on the full unitary group, or from a
nO(k) design. (This is described in our paper.) The proof of this is based on another general application, which is that t designs give concentration of measure, similar to what you get from uniformly random unitaries, but with the probability of a quantity being far from its expectation decreasing only as $D^{-t}$, where D is the dimension of the system.

A quantum generalisation of Fourier analysis on the boolean cube

based mostly on his nice paper on this topic with Tobias Osborne.

In theoretical computer science, Fourier analysis of boolean functions is a powerful tool. One good place to learn about this are these lecture notes of Ryan O’Donnell. There are also good surveys by Punya Biswal and Ronald De Wolf. The main idea is that a function f from $\{-1,1\}^n$ to $\mathbb{R}$ can be expanded in the Fourier basis for $\mathbb{Z}_2^n$. This is equivalent to expressing f as a multilinear form, or in quantum language, we might think of f as a 2n-dimensional vector and apply $H^{\otimes n}$. If f is a multilinear function, then it has a degree, given by the size of the largest monomial with a nonzero coefficient. Alternatively, we can ask for the maximum Hamming weight of any state that has nonzero amplitude after we apply $H^{\otimes n}$.

Here is a small sample of the nice things we know classically.

• KKL Theorem: Any boolean function f has some j for which $I_j(f) = \Omega({\rm Var}(f)\log(n)/n)$.
(Spoiler alert: no quantum analogue is known, but proving one is a great open problem.)
• Hypercontractive bounds: Define a noise operator $D_\rho$ that flips each bit with probability $\frac{1-\rho}{2}$. Define the p-norm
$\|f\|_p = \left( \frac{1}{2^n} \sum_{x\in\{-1,1\}^n} |f(x)|^p\right)^{1/p}$
Then the hypercontractive inequality states that
$\|D_\rho(f)\|_q \leq \|f\|_p$
if $1 \leq p \leq q$ and $\rho \leq \sqrt{\frac{p-1}{q-1}}$
• One application is that degree-d functions satisfy
$\|f\|_q \leq (q-1)^{d/2} \|f\|_2$.

Boolean functions are replaced by Hermitian matrices of dimension $2^n$. The Fourier expansion is replaced by a Pauli expansion. The noise operator is replaced by a depolarizing channel acting on every qubit.

With these replacements, a hypercontractive inequality can still be proven, albeit with the restriction that $p\leq 2\leq q$. The classical argument does not entirely go through, since at one point it assumes that $p\rightarrow q$ norms are multiplicative, which is true for matrices but not superoperators . Instead they use results by King that appear to be specialized to qubits.

As a result, there are some nice applications to k-local Hamiltonians. For example, if $\|H\|_2=1$ and $t\geq (2e)^{k/2}$, then the fraction of eigenvalues of H greater than t is less than $e^{-kt^{2/k}/2e}$. And if H is nonzero then it has rank $\geq 2^n e^{-2k}$.

The FKN theorem also carries over, implying that if the weight of Fourier coefficients on (2+)-local terms is no more than $\epsilon$ then the operator must be $O(\epsilon)$-close to a single-qubit operator in 2-norm distance.

There are a number of great open questions raised in this work. Ashley didn’t mention this, but one of those open questions led to our joint work arXiv:1001.0017, which has been a lot of fun to work on. A quantum KKL theorem is another one. Here is another. Suppose $H^2=I$, and that H acts nontrivially on each qubit. Then does it hold that the degree of H is always at least $\Omega(\log n)$?

<Interlude>
Live-blogging is hard work! Even semi-live blogging is.
You’ll notice the level of details of my reports diminishes over time.
The speakers were all excellent; it’s only your reporter that started to fade.
</interlude>

Marius Junge, Exponential rates via Banach space tools

The key quantum information theory question addressed today is:
Why $C_E = 2Q_E$? And no, the answer is not “super-dense coding and teleportation.” (At least not in this talk.) Note that the classic quantum papers on entanglement-assisted channel coding are quant-ph/0106052 and quant-ph/0106075.

First, Marius gave a nice review of classical information theory. In the spirit of Shannon, I will not repeat it here.

Then he reinterpreted it all in operator algebra language! For example, classical capacity can be interpreted as a commutative diagram!
$\begin{array}{ccc} L_1(A^{\otimes n}) & \overset{{{\cal N}^{\otimes n}}}{\longrightarrow} & L_1(B^{\otimes n}) \\ \uparrow {\cal E} & &\downarrow {\cal D} \\ \ell_1^N & \overset {\approx {\rm Id}}{\longrightarrow} & \ell_1^N \end{array}$

For the quantum capacity, we simply replace $\ell_1^N$ with $L_1(M_d)$.

To mix quantum and classical, we can define $C = M_{n_1} \oplus \cdots \oplus M_{n_k}$ in the spirit of quant-ph/0203105.

The resulting commuting diagram is:
$\begin{array}{ccc}L_1(A^{\otimes n}) &\overset{{{\cal N}^{\otimes n}}}{\longrightarrow}& L_1(B^{\otimes n}) \\ \uparrow ({\rm id} \otimes {\cal E}) & &\downarrow {\cal D} \\ L_1(C) & \overset {\approx {\rm Id}}{\longrightarrow} & L_1(C) \end{array}$

There is also an entanglement-assisted version that I won’t write down, but hopefully you can imagine.

Next, he introduced p-summing operators. The underlying principle is the following.

In a finite-dimensional Banach space, every unconditionally convergent sequence (i.e. converges even if arbitrarily permuted) is absolutely summing. But in general, this is not the case.

More formally,
$T:X \rightarrow Y$ is p-summing if
$\left ( \sum_k \|T x_k\|_Y^p \right)^{\frac{1}{p}} \leq \pi_p(T)\sup_{\|(\alpha_k)\|_{p'}\leq 1} \left\| \sum_k \alpha_k x_k \right\|$
where $1/p + 1/p' = 1$.

$\pi_p(T)$ is the optimal constant possible in this expression.

e.g. if p=1,
$\sum_k \|T x_k\| \leq \pi_1(T) \sup_{\epsilon_k = \pm 1} \|\sum_k \epsilon_k x_k\|$.

Why is this nice?

One use is the following factorization theorem due to Grothendieck and Pietch.

If $T: \ell_\infty^m \rightarrow X$ is absolutely p-summing, then there exists a probability distribution $\lambda$ such that
$\|T(x)\|_X \leq \pi_p(T) \left(\sum_{i=1}^m \lambda_i |x_i|_p \right)^{1/p}$

Here is the connection to (classical) information theory. A noisy channel can be written as
$\Phi: \ell_1^m \rightarrow \ell_1^m$. The dual map is $\Phi^*: \ell_\infty^m \rightarrow \ell_\infty^m$. And the capacity is given by this amazing formula!

$C(\Phi) = \lim_{p\rightarrow\infty} p \ln \pi_p(\Phi^*)$

The following innocuous observation will have powerful consequences: $\pi_p(T^{\otimes n}) = \pi_p(T)^n$.

One consequence is a strong converse: Let $\alpha > C(\Phi), N \geq e^{\alpha n}$.
Then $P_{\rm succ} \leq e^{-\epsilon n}$.

To prove this we also need the fact that $pi_1(S: \ell_\infty^N \rightarrow X) \leq N^{1-1/p} \pi_p(S)$, which implies that the success probability is
$\leq \pi_p(T)^n / N^{1/p} \leq e^{\alpha n /p} / N^{1/p}$.

Next, Marius talked about how their formalism applies to the
entanglement-assisted capacity of quantum channels. Again there is a simple formula

$C_E({\cal N}) = \lim_{p\rightarrow \infty} \pi_p^o ({\cal N}^*)$

What is $\pi_p^o$?
Let ${\cal N}^*$ map from $M_m$ to $M_k$. Then $\pi_p^o(T) = \inf \|a\|_p |||S|||$, where the inf is over all a, S satisfying $T(x) = S(a^* x a)$.

There is another expression also for limited entanglement assistance, which was considered operationally in quant-ph/0402129.

Ultimately, there is an answer for the question at the start of the talk. The classical capacity is twice as big because $\dim M_d = d^2$ and $\dim \ell_1^d = d$. Obviously!

There is also the promise of an intriguing new additivity violation in the limited-entanglement setting, although I admit that the exact details eluded me.

Yi-Kai Liu,
Universal low-rank matrix recovery from Pauli measurements, based on 1103.2816

Previous work on compressed tomography established that
for any rank-r density matrix $\rho$, $O(rd \log^2 d)$ random Paulis suffice to reconstruct $\rho$ with high probability.

This work establishes that $O(r d \log^6(d))$ random Paulis work simultaneously for all $\rho$. This also gives better error bounds for noisy behavior.
As a result, one can obtain bounds on the sample complexity, instead of just the number of measurement settings.

The technical statement that we need is that random Paulis satisfy the following restricted isometry property (RIP):

For all X with rank $\leq r$,
$(1-\delta) \|X\|_{S_2} \leq \|R(X)\|_{\ell_2} \leq (1+\delta) \|X\|_2$

Gaussian matrices are known to have this property [Recht, Fazel, Parillo 2007].

More concretely, let $\Omega$ be a random set of $m = O(r d \log^6 d)$ random Pauli matrices.
Define $R(\rho) = ({\rm tr} P\rho)_{P\in \Omega}$.
Measurements produce $b\approx R(X)$

The reconstruction algorithm is to solve:

${\rm argmin}_{X\geq 0} \|R(X)-b\|_2^2 + \mu {\rm tr} |X|$

Why does this work?
The set of X with ${\rm tr} X \leq 1$ is a ball, and low-rank states are on exposed.
So when we intersect with some generic hyperplane R(X)=b, we’re likely to have a unique solution.

More formally, let $\rho$ be the true state and $S = X-\rho$. Note that R(S)=0. We want to show S=0. Decompose $S = S_0 + S_c$, where $S_0$ has rank $\leq 2r$ and $S_c$ has no overlap with row or column spaces of $\rho$.
If X has minimum trace, then $\|S_c\|_1 \leq \|S_0\|_1$.

Then we can use RIP to show that $S_0$ and $S_c$ are both small, using a clever telescoping technique due originally to Candes and Recht.

Ok, so how do we prove the RIP? The idea is that R should be well conditioned, and be “incoherent” so that it’s operator norm is much less than its 2-norm.

Recht et al ’07 used a union bound over (a net for) rank-r matrices. This works because Gaussians have great concentration. But Paulis are pricklier.
This work: Use generic chaining (a la Rudelson and Vershynin). This requires proving bounds on covering numbers, which will be done using entropy duality (c.f. Guedon et al 2008).

Here’s a little more detail. If T is a self-adjoint linear map from $M_d$ to $M_d$, then
define $\|T\|_{(r)} = \sup \{|tr X^\dag T(X)| : X \in U\}$, where
$U = \{X in M_d : \|X\|_2\leq 1, {\rm rank}(X) \leq r\}$

The goal is to show $\|R*R-I\|_{(r)} \leq 2 \delta - \delta^2$, where $\delta$ comes from the RIP condition.

The main tool is Dudley’s inequality:
$\mathbb{E}[\sup G(X) : X \in U] \leq {\rm const} \int {\rm d}\epsilon \sqrt{\log(N(U,d_G, \epsilon))}$

Here G is a Gaussian process with $d_G(X,Y) = \sqrt{\mathbb{E}((G(X)-G(Y))^2)}$ and $N(U,d_G,\epsilon)$ is the # of radius-$\epsilon$ balls in metric $d_G$ needed to cover U.

We can upper bound N using the trace norm. Let $B_1$ denote the trace-norm ball.
Define $\|M\|_X = \max_{P \in \Omega} |tr P^\dag M|$.

There are two estimates of N. The easy one is that
$N(B_1, \|\cdot\|_X, \epsilon) \leq poly(1/\epsilon) \exp(d^2)$
The harder one is that
$N(B_1, \|\cdot\|_X, \epsilon) \leq \exp(\log^2(d) / \epsilon^2)$.

This is obtained using entropy duality, with arguments that are somewhat specific to the spaces in question, using techniques of Maurey. See paper (and references ) for details.

Matthias Christandl
Reliable quantum state tomography, based on 1108.5329

The central question:

How do you reconstruct a density matrix, with error bars, from measurements?

One framework is “measure and predict.”
We have n+k systems, and measure n (like a training set) and then predict the results of measurement outcomes from the next k (like a testing set).

The method:
First compute the likelihood function
$\mu_{B^n}(\sigma) = \frac{1}{c_{B^n}} {\rm tr}[\sigma^{\otimes n}B^n]$
From this we can compute a confidence region, point estimates, error bars, and what have you.

How to compute a $1-\epsilon$ confidence region?
Let $\epsilon' = \epsilon/poly(n)$, and let $\Gamma$ have measure $1-\epsilon'$. Then add an extra distance $\delta = \sqrt{(2/n) (\ln 2/\epsilon + O(\ln n))}$ to obtain $\Gamma^\delta.$

Then the main result is the state is in $\Gamma^\delta$ with probability $\geq 1-\epsilon$. Crucially the probability is taken over measurement outcomes, and it is important to realize that the outputted confidence interval is itself a random variable. So one cannot say that conditioned on the measurement outcome, we have a high probability of the state being in the confidence region. Without a prior distribution over states, such statements are impossible.

A key lemma in the proof (due to 0806.1091 and applied to QKD in 0809.3019) is that for any $\sigma$, $\sigma^{\otimes n}\leq n^{O(d^2)} \int {\rm d}\rho\, \rho^{\otimes n}$, where ${\rm d}\rho$ is the Hilbert-Schmidt measure.

John Preskill, Protected gates for superconducting qubits

This talk was an award lecture for the award that goes by the charmingly Frenglish name of “chaire Aisenstadt chair”. John was introduced by a poem by Patrick!

I enjoyed John’s two opening questions that he imagined the audience would have after seeing the title: “What does this have to do with this conference?” and “Will I understand anything?”

His response was that we shouldn’t worry, since this is a theorist’s conception of superconducting qubits.

Unfortunately, my note-taking quality suffered during this talk, since there was a high density of equations, figures and ideas. So my summary will be breezier. This may become the norm for the remaining days of the conference as well. However, here is an older version of this talk.

As we all know, it’d be great to have a quantum computer. Instead of concatenated FTQC, which has lousy constants, what about physically motivated QECCs? One example is the braiding of nonabelian anyons. Here’s another less studied one. Encoding qubits in harmonic oscillators ["Continuous variable quantum codes", Gottesman-Kitaev-Preskill 2000].

The rest of the talk was about a particular variant of this idea, called the 0-pi qubit (due to

I have more detailed notes, but they are really rough, and I am saving my energy for the upcoming sessions. In other words, the margin is big enough for the proof, but my glucose level is not.

This entry was posted in General. Bookmark the permalink.

### 4 Responses to Codes, Geometry and Random Structures: Day 1

1. Anonymous says:

Thank you for this.

“One that I think is natural is that quantum circuits of size nk given access to a black box unitary U can’t tell whether U was drawn from the Haar measure on the full unitary group, or from a
nO(k) design.”

Can you compare this to classical results for pseudorandom generators fooling different classes of functions?

“The rest of the talk was about a particular variant of this idea, called the 0-pi qubit”

This is not a new idea, and superconducting qubits seem well advanced. Why has it not been implemented yet? Is 3-d the problem?

Like or Dislike: 0  0

2. John Sidles says:

Regarding the (outstandingly interesting) work of Preskill and colleagues on oscillator-based systems for quantum computing, fans of technological history may enjoy von Neumann’s description of oscillator-based computation in his patent Non-linear capacitance or inductance switching, amplifying and memory organs (US 2815488, filed 1954 – issued 1957).

Von Neumann’s patent description begins by surveying the practical disadvantages of vacuum tube circuits, and although the patent recognizes that the new-fangled “crystal triodes” (transistors) have some advantageous attributes, it proposes instead computational elements that are explicitly Hamiltonian, thus microscopically time-reversible, and thereby inherently more efficient than dissipative transistor-based computing. Moreover, at the same time as von Neumann, Eiichi Goto in Japan was developing a model of (essentially reversible) computational dynamics along very similar lines.

To trace the subsequent history of the seminal von Neumann/Goto ideas, starting with the Japanese “parametron” computers of the 1960s, up-to and including today’s D-WAVE devices and nonlinear optical computing experiments, would be a talk in itself — a very interesting talk.

It is regrettable that von Neumann’s untimely death interrupted further development of these ideas (the patent issued in the same year as his death), as rather plausibly the key ideas of quantum computing might have been conceived decades earlier had this work continued.

Like or Dislike: 0  0

3. aram says:

Anon #1: Classically we have similar results about k-wise-independent permutations fooling circuits of size << k. They also fool quantum circuits of that size, in the sense that they are indistinguishable from uniformly random permutations. So the significance of unitary t-designs is that they’re indistinguishable from uniformly random unitaries.

And yes, I totally neglected to describe the substance of John’s talk, which was about how to perform protected gates on these superconducting qubits.
And Devoret _has_ implemented a 0-pi qubit, with 43 Josephson junctions. The difficulty is that you need to add even more to get high enough inductance to have good error suppression, but then you run into fabrication errors, and capacitance creeps up (and error suppression goes like exp(-sqrt(L/C))).

Like or Dislike: 0  0

4. 1. Will Patrick post the poem?

2. A commutative diagram during a talk. Holy shit.

Like or Dislike: 1  0