Should Papers Have Unit Tests?

Perhaps the greatest shock I’ve had in moving from the hallowed halls of academia to the workman depths of everyday software development is the amount of testing that is done when writing code. Likely I’ve written more test code than non-test code over the last three plus years at Google. The most common type of test I write is a “unit test”, in which a small portion of code is tested for correctness (hey Class, do you do what you say?). The second most common type is an “integration test”, which attempts to test that the units working together are functioning properly (hey Server, do you really do what you say?). Testing has many benefits: correctness of code, of course, but it is also important for ease of changing code (refactoring), supporting decoupled and simplified design (untestable code is often a sign that your units are too complicated, or that your units are too tightly coupled), and more.
Over the holiday break, I’ve been working on a paper (old habit, I know) with lots of details that I’d like to make sure I get correct. Throughout the entire paper writing process, one spends a lot of time checking and rechecking the correctness of the arguments. And so the thought came to my mind while writing this paper, “boy it sure would be easier to write this paper if I could write tests to verify my arguments.”
In a larger sense, all papers are a series of tests, small arguments convincing the reader of the veracity or likelihood of the given argument. And testing in a programming environment has a vital distinction that the tests are automated, with the added benefit that you can run them often as you change code and gain confidence that the contracts enforced by the tests have not been broken. But perhaps there would be a benefit to writing a separate argument section with “unit tests” for different portions of a main argument in a paper. Such unit test sections could be small, self-contained, and serve as supplemental reading that could be done to help a reader gain confidence in the claims of the main text.
I think some of the benefits for having a section of “unit tests” in a paper would be

  • Documenting limit tests A common trick of the trade in physics papers is to take a parameter to a limiting value to see how the equations behave. Often one can recover known results in such limits, or show that certain relations hold after you scale these. These types of arguments give you confidence in a result, but are often left out of papers. This is sort of kin to edge case testing by programmers.
  • Small examples When a paper gets abstract, one often spends a lot of time trying to ground oneself by working with small examples (unless you are Grothendieck, of course.) Often one writes a paper by interjecting these examples in the main flow of the paper, but these sort of more naturally fit in a unit testing section.
  • Alternative explanation testing When you read an experimental physics paper, you often wonder, am I really supposed to believe the effect that they are talking about. Often large portions of the paper are devoted to trying to settle such arguments, but when you listen to experimentalists grill each other you find that there is an even further depth to these arguments. “Did you consider that your laser is actually exciting X, and all you’re seeing is Y?” The amount of this that goes on is huge, and sadly, not documented for the greater community.
  • Combinatorial or property checks Often one finds oneself checking that a result works by doing something like counting instances to check that they sum to a total, or that a property holds before and after a transformation (an invariant). While these are useful for providing evidence that an argument is correct, they can often feel a bit out of place in a main argument.

Of course it would be wonderful if there we a way that these little “units” could be automatically executed. But the best path I can think of right now towards getting to that starts with the construction of an artificial mind. (Yeah, I think perhaps I’ve been at Google too long.)

Goodbye Professor Tombrello

This morning I awoke to the horrible news that Caltech Physics Professor Tom Tombrello had passed away. Professor Tombrello was my undergraduate advisor, my research advisor, a mentor, and, most importantly a friend. His impact on me, from my career to the way I try to live my life, was profound.
Because life is surreal, just a few days ago I wrote this post that describes the event that led Professor Tombrello and I down entwined paths, my enrollment in his class Physics 11. Physics 11 was a class about how to create value in the world, disguised as a class about how to do “physics” research as an undergraduate. Indeed, in my own life, Professor Tombrello’s roll was to make me think really really hard about what it meant to create. Sometimes this creation was in research, trying to figure out a new approach or even a new problem. Sometimes this creation was in a new career, moving to Google to be given the opportunity to build high impact creations. I might even say that this creation extends into the far reaches of Washington state, where we helped bring about the creation of a house most unusual.
There are many stories I remember about Professor Tombrello. From the slightly amusing like the time after the Northridge earthquake when an aftershock shook our class while he was practicing his own special brand of teach, and we all just sort of sat still until we heard this assistant, Michelle, shout out “That’s it! I’m outta here!” and go storming out. To the time I talked with him following the loss of one of his family members, and could see the profound sadness even in a man who push optimistically forward at full speed.
Some portraits:

After one visit to Professor Tombrello, I actually recorded my thoughts on our conversation:

This blog post is for me, not for you. Brought to you by a trip down memory lane visiting my adviser at Caltech.
Do something new. Do something exciting. Excel. Whether the path follows your momentum is not relevant.
Don’t dwell. Don’t get stuck. Don’t put blinders on.
Consider how the problem will be solved, not how you are going to solve it.
Remember Feynman: solve problems.
Nothing is not interesting, but some things are boring.
Dyson’s driving lesson: forced intense conversation to learn what the other has to say.
Avoid confirmatory sources of news, except as a reminder of the base. Keep your ear close to the brains: their hushed obsessions are the next big news.
Learn something new everyday but also remember to forget the things not worth knowing.
Technically they can do it or they can’t, but you can sure help them do it better when they can.
Create. Create. Create.
Write a book, listen to Sandra Tsing Loh, investigate Willow Garage, and watch Jeff Bezos to understand how to be a merchant.
Create. Create. Create.

So tonight, I’ll have a glass of red wine to remember my professor, think of his family, and the students to whom he meant so much. And tomorrow I’ll pick myself up, and try to figure out just what I can create next.

Sailing Stones: Mystery No More

My first research project, my first research paper, was on a perplexing phenomenon: the sliding rocks of Death Valley’s Racetrack playa. Racetrack playa is a large desolate dry lake bed that has one distinguishing feature above and beyond its amazing flatness. At the south end of the playa are a large number of large rocks (one man size and smaller), and behind these rocks, if you visit in the summer, are long tracks caked into the dried earth of the playa. Apparently these rocks, during the winter months, move and leave these long tracks. I say apparently, because, for many many years, no one had ever seen these rocks move. Until now! The following video makes me extremely happy

This is a shot of one of the playa stones actually moving! This is the end result of a large study that sought to understand the mechanism behind the sliding stones, published recently in PloS one:

In 1993, fresh out of Yreka High School, I found myself surrounded by 200+ geniuses taking Caltech’s first year physics class, Physics 1 (med schools sometimes ask students at Caltech to verify that they know Calculus because the transcripts have just these low numerical course indicators on them, and of course Physics 1 couldn’t actually be physics with calculus, could it?) It would be a lie to say that this wasn’t intimidating: some of the TAs in the class where full physics professors! I remember a test where the average score was 0.5 out of 10 and perhaps it didn’t help that my roommate studied with a Nobel prize winner as a high school student. Or that another freshman in my class was just finishing a paper with his parents on black holes (or that his dad is one of the founders of the theory of inflation!) At times I considered transferring, because that is what all Caltech students do when they realized how hard Caltech is going to be, and also because it wasn’t clear to me what being a physics major got you.

One day in Physics 1 it was announced that there was a class that you could gain entrance to that was structured to teach you not physics, but how to do creative research. Creativity: now this was something I truly valued! It was called Physics 11 and it was run by one Professor Tom Tombrello (I’d later see his schedule on the whiteboard with the abbreviation T2). The only catch was that you had to get accepted into the class and to do this you had to do you best at solving a toy research problem, what the class termed a “hurdle”. The students from the previous class then helped select the new Physics 11 students based upon their performance on the hurdles. The first hurdle also caught my eye: it was a problem based upon the old song Mairzy Doats which my father had weekly sung while showering in the morning. So I set about working on the problem. I don’t remember much of my solution, except that it was long and involved lots of differential equations of increasing complexity. Did I mention that it was long? Really long. I handed in the hurdle, then promptly ran out of time to work on the second hurdle.

Because I’d not handed in the second hurdle, I sort of expected that I’d not get selected into the class. Plus I wasn’t even in the advanced section of physics 1 (the one TAed by the professors, now those kids were well prepared and smart!) But one late night I went to my mailbox, opened it, and found…nothing. I closed it, and then, for some strange reason, thought: hey maybe there is something stuck in there. So I returned and opened the box, dug deep, and pulled out an invitation to join physics 11! This story doesn’t mean much to you, but I can still smell, feel, and hear Caltech when I think of this event. Also I’ve always been under the impression that being accepted to this class was a mistake and really the invitation I got was meant for another student in a mailbox next to mine. But that’s a story for another session on the couch.

So I enrolled in Physics 11. It’s not much of a stretch to say that it was the inspiration for me to go to graduate school, to do a postdoc, and to become a pseudo-professor. Creative research is an amazing drug, and also, I believe, one of the great endeavors of humanity. My small contribution to the racetrack playa story was published in the Journal of Geology:

The basic mystery was what caused these rocks to move. Was it the wind? It seemed hard to get enough force to move the rocks. Was it ice? When you placed stakes around the rocks, some of the rocks moved out of the stakes and some did not. In the above paper we pointed out that a moving layer of water would mean that there was more wind down low that one would normally get because the boundary layer was moving. We also looked for the effect of said boundary layer on the rocks motion and found a small effect.

The answer, however, as to why the rocks moved, turned out to be even more wonderful. Ice sheets dislodged and bashing the rocks forward. A sort of combination of the two competing previous hypothesis! This short documentary explains it nicely

So, another mystery solved! We know more about how the world works, not on a level of fundamental physics, but on a level of, “because it is interesting”, and “because it is fun”, and isn’t that enough? Arthur C. Clarke, who famously gave airtime to these rocks, would, I think, have been very please with this turn of events

Two Cultures in One of the Cultures

This makes no senseA long time ago in a mental universe far far away I gave a talk to a theory seminar about quantum algorithms. An excerpt from the abstract:

Quantum computers can outperform their classical brethren at a variety of algorithmic tasks….[yadda yadda yadaa deleted]… This talk will assume no prior knowledge of quantum theory…

The other day I was looking at recent or forthcoming interesting quantum talks and I stumbled upon one by a living pontiff:

In this talk, I’ll describe connections between the unique games conjecture (or more precisely, the closely relatedly problem of small-set expansion) and the quantum separability problem… [amazing stuff deleted]…The talk will not assume any knowledge of quantum mechanics, or for that matter, of the unique games conjecture or the Lasserre hierarchy….

And another for a talk to kick off a program at the Simons institute on Hamiltonian complexity (looks totally fantastic, wish I could be a fly on the wall at that one!):

The title of this talk is the name of a program being hosted this semester at the Simons Institute for the Theory of Computing….[description of field of Hamiltonian complexity deleted…] No prior knowledge of quantum mechanics or quantum computation will be assumed.

Talks are tricky. Tailoring your talk to your audience is probably one of the trickier sub-trickinesses of giving a talk. But remind me again, why are we apologizing to theoretical computer scientists / mathematicians (which are likely the audiences for the three talks I linked to) for their ignorance of quantum theory? Imagine theoretical computer science talks coming along with a disclaimer, “no prior knowledge of the PCP theorem is assumed”, “no prior knowledge of polynomial-time approximation schemes is assumed”, etc. Why is it still considered necessary, decades after Shor’s algorithm and error correction showed that quantum computing is indeed a fascinating and important idea in computer science, to apologize to an audience for a large gap in their basic knowledge of the universe?
As a counter argument, I’d love to hear from a non-quantum computing person who was swayed to attend a talk because it said that no prior knowledge of quantum theory is assumed. Has that ever worked? (Or similar claims of a cross cultural prereq swaying you to bravely go where none of your kind has gone before.)

Error correcting aliens

Happy New Year!  To celebrate let’s talk about error correcting codes and….aliens.
The universe, as many have noted, is kind of like a computer.  Or at least our best description of the universe is given by equations that prescribe how various quantities change in time, a lot like a computer program describes how data in a computer changes in time.  Of course, this ignores one of the fundamental differences between our universe and our computers: our computers tend to be able to persist information over long periods of time.  In contrast, the quantities describing our universe tend to have a hard time being recoverable after even a short amount of time: the location (wave function) of an atom, unless carefully controlled, is impacted by an environment that quickly makes it impossible to recover the initial bits (qubits) of the location of the atom. 
Computers, then, are special objects in our universe, ones that persist and allow manipulation of long lived bits of information.  A lot like life.  The bits describing me, the structural stuff of my bones, skin, and muscles, the more concretely information theoretic bits of my grumbly personality and memories, the DNA describing how to build a new version of me, are all pieces of information that persist over what we call a lifetime, despite the constant gnaw of second law armed foes that would transform me into unliving goo.  Maintaining my bits in the face of phase transitions, viruses, bowel obstructions, cardiac arrest, car accidents, and bears is what human life is all about, and we increasingly do it well, with life expectancy now approaching 80 years in many parts of the world.
But 80 years is not that long.  Our universe is 13.8ish billion years old, or about 170ish million current lucky human’s life expectancies.  Most of us would, all things equal, like to live even longer.  We’re not big fans of death.  So what obstacles are there toward life extension?  Yadda yadda biology squishy stuff, yes.  Not qualified to go there so I won’t.  But since life is like a computer in regards to maintaining information, we can look toward our understanding of what allows computers to preserve information…and extrapolate!
Enter error correction.  If bits are subject to processes that flip the values of the bits, then you’ll lose information.  If, however, we give up storing information in each individual bit and instead store single bits across multiple individual noisy bits, we can make this spread out bit live much longer.  Instead of saying 0, and watching it decay to unknown value, say 000…00, 0 many times, and ask if the majority of these values remain 0.  Viola you’ve got an error correcting code.  Your smeared out information will be preserved longer, but, and here is the important point, at the cost of using more bits.
Formalizing this a bit, there are a class of beautiful theorems, due originally to von Neumann, classically, and a host of others, quantumly, called the threshold theorems for fault-tolerant computing which tell you, given an model for how errors occur, how fast they occur, and how fast you can compute, whether you can reliably compute. Roughly these theorems all say something like: if your error rate is below some threshold, then if you want to compute while failing at a particular better rate, then you can do this using a complicated larger construction that is larger proportional to a polynomial in the log of inverse of the error rate you wish to achieve. What I’d like to pull out of these theorems for talking about life are two things: 1) there is an overhead to reliably compute, this overhead is both larger, in size, and takes longer, in time, to compute and 2) the scaling of this overhead depends crucially on the error model assumed.
Which leads back to life. If it is a crucial part of life to preserve information, to keep your bits moving down the timeline, then it seems that the threshold theorems will have something, ultimately, to say about extending your lifespan. What are the error models and what are the fundamental error rates of current human life? Is there a threshold theorem for life? I’m not sure we understand biology well enough to pin this down yet, but I do believe we can use the above discussion to extrapolate about our future evolution.
Or, because witnessing evolution of humans out of their present state seemingly requires waiting a really long time, or technology we currently don’t have, let’s apply this to…aliens. 13.8 billion years is a long time. It now looks like there are lots of planets. If intelligent life arose on those planets billions of years ago, then it is likely that it has also had billions of years to evolve past the level of intelligence that infects our current human era. Which is to say it seems like any such hypothetical aliens would have had time to push the boundaries of the threshold theorem for life. They would have manipulated and technologically engineered themselves into beings that live for a long period of time. They would have applied the constructions of the threshold theorem for life to themselves, lengthening their life by apply principles of fault-tolerant computing.
As we’ve witnessed over the last century, intelligent life seems to hit a point in its life where rapid technological change occurs. Supposing that the period of time in which life spends going from reproducing, not intelligent stuff, to megalords of technological magic in which the life can modify itself and apply the constructions of the threshold theorem for life, is fast, then it seems that most life will be found at the two ends of the spectrum, unthinking goo, or creatures who have climbed the threshold theorem for life to extend their lifespans to extremely long lifetimes. Which lets us think about what alien intelligent life would look like: it will be pushing the boundaries of using the threshold theorem for life.
Which lets us make predictions about what this advanced alien life would look life. First, and probably most importantly, it would be slow. We know that our own biology operates at an error rate that ends up being about 80 years. If we want to extend this further, then taking our guidance from the threshold theorems of computation, we will have to use larger constructions and slower constructions in order to extend this lifetime. And, another important point, we have to do this for each new error model which comes to dominate our death rate. That is, today, cardiac arrest kills the highest percentage of people. This is one error model, so to speak. Once you’ve conquered it, you can go down the line, thinking about error models like earthquakes, falls off cliffs, etc. So, likely, if you want to live a long time, you won’t just be slightly slow compared to our current biological life, but instead extremely slow. And large.
And now you can see my resolution to the Fermi paradox. The Fermi paradox is a fancy way of saying “where are the (intelligent) aliens?” Perhaps we have not found intelligent life because the natural fixed point of intelligent evolution is to produce entities for which our 80 year lifespans is not even a fraction of one of their basic clock cycles. Perhaps we don’t see aliens because, unless you catch life in the short transition from unthinking goo to masters of the universe, the aliens are just operating on too slow a timescale. To discover aliens, we must correlate observations over a long timespan, something we have not yet had the tools and time to do. Even more interesting the threshold theorems also have you spread your information out across a large number of individually erring sub-systems. So not only do you have to look at longer time scales, you also need to make correlated observations over larger and larger systems. Individual bits in error correcting codes look as noisy as before, it is only in the aggregate that information is preserved over longer timespans. So not only do we have too look slower, we need to do so over larger chunks of space. We don’t see aliens, dear Fermi, because we are young and impatient.
And about those error models. Our medical technology is valiantly tackling a long list of human concerns. But those advanced aliens, what kind of error models are they most concerned with? Might I suggest that among those error models, on the long list of things that might not have been fixed by their current setup, the things that end up limiting their error rate, might not we be on that list? So quick, up the ladder of threshold theorems for life, before we end up an error model in some more advanced creatures slow intelligent mind!

Portrait of an Academic at a Midlife Crisis

Citations are the currency of academia. But the currency of your heart is another thing altogether. With apologies to my co-authors, here is a plot of my paper citations versus my own subjective rating of the paper. Hover over the circles to see the paper title, citations per year, and score. Click to see the actual paper. (I’ve only included papers that appear on the arxiv.)

If I were an economist I suppose at this point I would fit a sloping line through the data and claim victory. But being a lowly software developer, its more interesting to me to give a more anecdotal treatment of the data.

  • The paper that I love the most has, as of today, exactly zero citations! Why do I love that paper? Not because it’s practical (far from it.) Not because it proves things to an absolute T (just ask the beloved referees of that paper.) But I love it because it says there is the possibility that there exists a material that quantum computes in a most peculiar manner. In particular the paper argues that it is possible to have devices where: quantum information starts on one side of device, you turn on a field over the device, and “bam!” the quantum information is now on the other side of the material with a quantum circuit applied to it. How f’in cool is that! I think its just wonderful, and had I stuck around the hallowed halls, I probably would still be yelling about how cool it is, much to the dismay of my friends and colleagues (especially those for which the use of the word adiabatic causes their brain to go spazo!)
  • Three papers I was lucky enough to be involved in as a graduate student, wherein we showed how exchange interactions alone could quantum compute, have generated lots of citations. But the citations correlate poorly with my score. Why? Because it’s my score! Haha! In particular the paper I love the most out of this series is not the best written, most deep, practical, or highly cited. It’s the paper where we first showed that exchange alone was universal for quantum computing. The construction in the paper has large warts on it, but it was the paper where I think I first participated in a process where I felt like we knew something about the possibilities of building a quantum computer that others had not quite exactly thought of. And that feeling is wonderful and is why that paper has such a high subjective score.
  • It’s hard not to look this decades worth of theory papers and be dismayed about how far they are from real implementation. I think that is why I like Coherence-preserving quantum bit and Adiabatic Quantum Teleportation. Both of these are super simple and always felt like if I could just get an experimentalist drunk enough excited enough they might try implement that damn thing. The first shows a way to make a qubit that should be more robust to errors because its ground state is in an error detecting state. The second shows a way to get quantum information to move between three qubits using a simple adiabatic procedure related to quantum teleportation. I still hope someday to see these executed on a functioning quantum computer, and I wonder how I’d feel about them should that happen.

When I Was Young, I Thought It Would Be Different….

When I was in graduate school (back before the earth cooled) I remember thinking the following thoughts:

  1. Quantum computing is a new field filled with two types of people: young people dumb enough to not know they weren’t supposed to be studying quantum computing, and old, tenured people who understood that tenure meant that they could work on what interested them, even when their colleagues thought they were crazy.
  2. Younger people are less likely to have overt biases against woman.  By this kind of bias I mean that like the math professor at Caltech who told one of my friends that woman were bad at spatial reasoning (a.k.a. Jerks).  Maybe these youngsters even had less hidden bias?
  3. Maybe, then, because the field was new, quantum computing would be a discipline in which the proportion of woman was higher than the typical rates of their parent disciplines, physics and in computer science?

In retrospect, like most of the things I have thought in my life, this line of reasoning was naive.
Reading Why Are There So Few Women In Science in the New York Times reminded me about these thoughts of my halcyon youth, and made me dig through the last few QIP conferences to get one snapshot (note that I just say one, internet comment troll) of the state of woman in the quantum computing (theory) world:

Year Speakers Woman Speakers Percent
2013 41 1 2.4
2012 43 2 4.7
2011 40 3 7.5
2010 39 4 10.2
2009 40 1 2.5

Personally, it’s hard to read these numbers and not feel a little disheartened.

Why I Left Academia

TLDR: scroll here for the pretty interactive picture.
Over two years ago I abandoned my post at the University of Washington as a assistant research professor studying quantum computing and started a new career as a software developer for Google. Back when I was a denizen of the ivory tower I used to daydream that when I left academia I would write a long “Jerry Maguire”-esque piece about the sordid state of the academic world, of my lot in that world, and how unfair and f**ked up it all is. But maybe with less Tom Cruise. You know the text, the standard rebellious view of all young rebels stuck in the machine (without any mirror.) The song “Mad World” has a lyric that I always thought summed up what I thought it would feel like to leave and write such a blog post: “The dreams in which I’m dying are the best I’ve ever had.”
But I never wrote that post. Partially this was because every time I thought about it, the content of that post seemed so run-of-the-mill boring that I feared my friends who read it would never ever come visit me again after they read it. The story of why I left really is not that exciting. Partially because writing a post about why “you left” is about as “you”-centric as you can get, and yes I realize I have a problem with ego-centric ramblings. Partially because I have been busy learning a new career and writing a lot (omg a lot) of code. Partially also because the notion of “why” is one I—as a card carrying ex-Physicist—cherish and I knew that I could not possibly do justice to giving a decent “why” explanation.
Indeed: what would a “why” explanation for a life decision such as the one I faced look like? For many years when I would think about this I would simply think “well it’s complicated and how can I ever?” There are, of course, the many different components that you think about when considering such decisions. But then what do you do with them? Does it make sense to think about them as probabilities? “I chose to go to Caltech, 50 percent because I liked physics, and 50 percent because it produced a lot Nobel prize winners.” That does not seem very satisfying.
Maybe the way to do it is to phrase the decisions in terms of probabilities that I was assigning before making the decision. “The probability that I’ll be able to contribute something to physics will be 20 percent if I go to Caltech versus 10 percent if I go to MIT.” But despite what some economists would like to believe there ain’t no way I ever made most decisions via explicit calculation of my subjective odds. Thinking about decisions in terms of what an actor feels each decision would do to increase his/her chances of success feels better than just blindly associating probabilities to components in a decision, but it also seems like a lie, attributing math where something else is at play.
So what would a good description of the model be? After pondering this for a while I realized I was an idiot (for about the eighth time that day. It was a good day.) The best way to describe how my brain was working is, of course, nothing short than my brain itself. So here, for your amusement, is my brain (sorry, only tested using Chrome). Yes, it is interactive.

A Paradox of Toom's Rule?

Science is slow.  You can do things like continue a conversation with yourself (and a few commenters) that started in 2005.  Which is what I’m now going to do 🙂  The below is probably a trivial observation for one of the cardinals, but I find it kind of interesting.
Let’s begin by recalling the setup.  Toom’s rule is a cellular automata rule for a two dimensional cellular automata on a square grid.  Put +1 and -1’s on the vertices of a square grid, and then use the following update rule at each step: “Update the value with the majority vote of your own state, the state of your neighbor to the north, and the state of your neighbor to the east.”  A few steps of the rule are shown here with +1 as white and -1 as blue:
Toom's RuleAs you can see Toom’s rule “shrinks” islands of “different” states (taking away such different cells from the north and east sides of such an island.)  It is this property which gives Toom’s rule some cool properties in the presence of noise.
So now consider Toom’s rule, but with noise.  Replace Toom’s update rule with the rule followed by, for each and every cell a noise process.  For example this noise could be to put the cell into state +1 with p percent probability and -1 with q percent probability.  Suppose now you are trying to store information in the cellular automata.  You start out at time zero, say, in the all +1 state.  Then let Toom’s rule with noise run.  If p=q and these values are below a threshold, then if you start in the +1 state you will remain in a state with majority +1 with a probability that goes to one exponentially as a function of the system size.  Similarly if you start in -1.  The cool thing about Toom’s rule is that this works not just for p=q, but also for some values of p not equal to q (See here for a picture of the phase diagram.)  That is there are two stable states in this model, even for biased noise.
Contrast Toom’s rule with a two dimensional Ising model which is in the process of equilibriating to temperature T.  If this model has no external field applied, then like Toom’s rule there is a phase where the mostly +1 and the mostly -1 states are stable and coexist.  These are from zero temperature (no dynamics) to a threshold temperature T, the critical temperature of the Ising model. But, unlike in Toom’s rule, if you now add an external field, which corresponds to a dynamics where there is now a greater probability of flipping the cell values to a particular value (p not equal to q above), then the Ising model no longer has two stable phases.
In fact there is a general argument that if you look at a phase diagram as a function of a bunch of parameters (say temperature and applied magnetic field strength in this case), then the places where two stable regimes can coexist has to be a surface with one less dimension than your parameter space.  This is known as Gibbs’ phase rule.  Toom’s rule violates this.  It’s an example of a nonequilibrium system.
So here is what is puzzling me.  Consider a three dimensional cubic lattice with +1,-1 spins on its vertices. Define an energy function that is a sum over terms that act on the spins on locations (i,j,k), (i+1,j,k), (i,j+1,k), (i,j,k+1) such that E = 0 if the spin at (i,j,k+1) is in the correct state for Toom’s rule applied to spins (i,j,k), (i+1,j,k), and (i,j+1,k) and is J otherwise.  In other words the terms enforce that the ground state locally obey’s Toom’s rule, if we imagine rolling out Toom’s rule into the time dimension (here the z direction). At zero temperature, the ground state of this system will be two-fold degenerate corresponding to the all +1 and all -1 state.  At finite temperature this model well behave as a symmetric noise Toom’s rule model (see here for why.)  So even at finite temperature this will preserve information, like the d>2 Ising model and Toom’s CA rule.
But since this behaves like Toom’s rule, it seems to me that if you add an external field, then this system is in a bit of paradox.  On the one hand, we know from Gibb’s phase rule, that this should not be able to exhibit two stable phases over a range of external fields.  On the other hand, this thing is just Toom’s rule, laid out spatially.  So it would seem that one could apply the arguments about why Toom’s rule is robust at finite field.  But these contradict each other.  So which is it?