Elsevier again, and collective action

We all know about higher education being squeezed financially. Government support is falling and tuition is going up. We see academic jobs getting scarcer, and more temporary. The pressure for research to focus on the short term is going up. Some of these changes may be fair, since society always has to balance its immediate priorities against its long-term progress. At other times, like when comparing the NSF’s $7.6 billion FY2014 budget request to the ongoing travesty that is military procurement, it does feel as though we are eating our seed corn for not very wise reasons.
Against this backdrop, the travesty that is scientific publishing may feel like small potatoes. But now we are starting to find out just how many potatoes. Tim Gowers has been doing an impressive job of digging up exactly how much various British universities pay for their Elsevier subscriptions. Here is his current list. Just to pick one random example, the University of Bristol (my former employer), currently pays Elsevier a little over 800,000 pounds (currently $1.35M) for a year’s access to their journals. Presumably almost all research universities pay comparable amounts.
To put this number in perspective, let’s compare it not to the F-35, but to something that delivers similar value: arxiv.org. Its total budget for 2014 is about 750,000 US dollars (depending on how you count overhead), and of course this includes access for the entire world, not only the University of Bristol. To be fair, ScienceDirect has about 12 times as many articles and the median quality is probably higher. But overall it is clearly vastly more expensive for society to have its researchers communicate in this way.
Another way to view the £800,000 price tag is in terms of the salaries of about 40 lecturers (\approx assistant professors), or some equivalent mix of administrators, lecturers and full professors. The problem is that these are not substitutes. If Bristol hired 40 lecturers, they would not each spend one month per year building nearly-free open-access platforms and convincing the world to use them; they would go about getting grants, recruiting grad students and publishing in the usual venues. There are problems of collective action, of the path dependence that comes with a reputation economy and of the diffuse costs and concentrated benefits of the current system.
I wish I could end with some more positive things to say. I think at least for now it is worth getting across the idea that there is a crisis, and that we should all do what we can to help with it, especially when we can do so without personal cost. In this way, we can hopefully create new social norms. For example, it is happily unconventional now to not post work on arxiv.org, and I hope that it comes to be seen also as unethical. In the past, it was common to debate whether QIP should have published proceedings. Now major CS conferences are cutting themselves loose from parasitic professional societies (see in particular the 3% vote in favor of the status quo) and QIP has begun debating whether to require all submissions be accompanied by arxiv posts (although this is of course not at all clear-cut). If we cannot have a revolution, hopefully we can at least figure out an evolutionary change towards a better scientific publishing system. And then we can try to improve military procurement.

TQC 2014!

While many of us are just recovering from QIP, I want to mention that the submission deadline is looming for the conference TQC, which perhaps should be called TQCCC because its full name is Theory of Quantum Computation, Communication and Cryptography. Perhaps this isn’t done because it would make the conference seem too classical? But TQQQC wouldn’t work so well either. I digress.
The key thing I want to mention is the imminent 15 Feb submission deadline.
I also want to mention that TQC is continuing to stay ahead of the curve with its open-access author-keeps-copyright proceedings, and this year with some limited open reviewing (details here). I recently spoke to a doctor who complained that despite even her Harvard Medical affiliation, she couldn’t access many relevant journals online. While results of taxpayer-funded research on drug efficacy, new treatments and risk factors remain locked up, at least our community is ensuring that anyone wanting to work on the PPT bound entanglement conjecture will be able to catch up to the research frontier without having to pay $39.95 per article.
One nice feature about these proceedings is that if you later want to publish a longer version of your submission in a journal, then you will not face any barriers from TQC. I also want to explicitly address one concern that some have raised about TQC, which is that the published proceedings will prevent authors from publishing their work elsewhere. For many, the open access proceedings will be a welcome departure from the usual exploitative policies of not only commercial publishers like Elsevier, but also the academic societies like ACM and IEEE. But I know that others will say “I’m happy to sign your petitions, but at the end of the day, I still want to submit my result to PRL” and who am I to argue with this?
So I want to stress that submitting to TQC does not prevent submitting your results elsewhere, e.g. to PRL. If you publish one version in TQC and a substantially different version (i.e. with substantial new material) in PRL, then not only is TQC fine with it, but it is compatible with APS policy which I am quoting here:

Similar exceptions [to the prohibition against double publishing] are generally made for work disclosed earlier in abbreviated or preliminary form in published conference proceedings. In all such cases, however, authors should be certain that the paper submitted to the archival
journal does not contain verbatim text, identical figures or tables, or other copyrighted materials which were part of the earlier publications, without providing a copy of written permission from the copyright holder. [ed: TQC doesn’t require copyright transfer, because it’s not run by people who want to exploit you, so you’re all set here] The paper must also contain a substantial body of new material that was not included in the prior disclosure. Earlier relevant published material should, of course, always be clearly referenced in the new submission.

I cannot help but mention that even this document (the “APS Policy on Prior Disclosure”) is behind a paywall and will cost you $25 if your library doesn’t subscribe. But if you really want to support this machine and submit to PRL or anywhere else (and enjoy another round of refereeing), TQC will not get in your way.
Part of what makes this easy is TQC’s civilized copyright policy (i.e. you keep it). By contrast, Thomas and Umesh had a more difficult, though eventually resolved, situation when combining STOC/FOCS with Nature.

Are articles in high-impact journals more like designer handbags, or monarch butterflies?

Monarch butterfly handbag
Monarch butterfly handbag (src)

US biologist Randy Schekman, who shared this year’s physiology and medicine Nobel prize, has made prompt use of his new bully pulpit. In
How journals like Nature, Cell and Science are damaging science: The incentives offered by top journals distort science, just as big bonuses distort banking
he singled out these “luxury” journals as a particularly harmful part of the current milieu in which “the biggest rewards follow the flashiest work, not the best,” and he vowed no longer to publish in them. An accompanying Guardian article includes defensive quotes from representatives of Science and Nature, especially in response to Schekman’s assertions that the journals favor controversial articles over boring but scientifically more important ones like replication studies, and that they deliberately seek to boost their impact factors by restricting the number of articles published, “like fashion designers who create limited-edition handbags or suits.”  Focusing on journals, his main concrete suggestion is to increase the role of open-access online journals like his elife, supported by philanthropic foundations rather than subscriptions. But Schekman acknowledges that blame extends to funding organizations and universities, which use publication in high-impact-factor journals as a flawed proxy for quality, and to scientists who succumb to the perverse incentives to put career advancement ahead of good science.  Similar points were made last year in Serge Haroche’s thoughtful piece on why it’s harder to do good science now than in his youth.   This, and Nature‘s recent story on Brazilian journals’ manipulation of impact factor statistics, illustrate how prestige journals are part of the solution as well as the problem.
Weary of people and institutions competing for the moral high ground in a complex terrain, I sought a less value-laden approach,  in which scientists, universities, and journals would be viewed merely as interacting IGUSes (information gathering and utilizing systems), operating with incomplete information about one another. In such an environment, reliance on proxies is inevitable, and the evolution of false advertising is a phenomenon to be studied rather than disparaged.  A review article on biological mimicry introduced me to some of the refreshingly blunt standard terminology of that field.  Mimicry,  it said,  involves three roles:  a model,  i.e.,  a living or material agent emitting perceptible signals, a mimic that plagiarizes the model, and a dupe whose senses are receptive to the model’s signal and which is thus deceived by the mimic’s similar signals.  As in human affairs, it is not uncommon for a single player to perform several of these roles simultaneously.

El Naschie works on entanglement now

El Naschie (top), shown photoshopped in with three Nobel laureates.

The Journal of Quantum Information Science will not be getting any of my papers starting today, because today is when I learned that they recently published the following gemA Resolution of Cosmic Dark Energy via a Quantum Entanglement Relativity Theory, by M. El Naschie.
Upon closer inspection, it isn’t hard to see why they published this paper. It’s because  “El Naschie is very highly regarded in the community” and is “always spoken of as a possible Nobel prize candidate”. And as the great man himself has said, “Senior people are above this childish, vain practice of peer review”, so there was no need for that.
Oh, but despite the apparent lack of peer review, they do have a $600 article processing charge for open access. I wonder what costs these charges are meant to offset if the “submit” button just puts the article straight into the publication? Hmmm, I hope that the journal didn’t simply accept money in exchange for publishing the paper under the pretense of “open access”! Golly, that would be unethical.

Science Code Manifesto

Recently, one of the students here at U. Sydney and I had the frustrating experience of trying to reproduce a numerical result from a paper, but it just wasn’t working. The code used by the authors was regrettably not made publicly available, so once we were fairly sure that our code was correct, we didn’t know how to resolve the discrepancy. Luckily, in our small community, I knew the authors personally and we were able to figure out why the results didn’t match up. But as code becomes a larger and larger part of scientific projects, these sorts of problems will increase in frequency and severity.
What can we do about it?
A team of very smart computer scientists have come together and written the science code manifesto. It is short and sweet; the whole thing boils down to five simple principles of publishing code:

All source code written specifically to process data for a published paper must be available to the reviewers and readers of the paper.
The copyright ownership and license of any released source code must be clearly stated.
Researchers who use or adapt science source code in their research must credit the code’s creators in resulting publications.
Software contributions must be included in systems of scientific assessment, credit, and recognition.
Source code must remain available, linked to related materials, for the useful lifetime of the publication.

If you support this, and you want to help contribute to the solution, then please go and endorse the manifesto. Even more importantly, practice the five C’s the next time you publish a paper!

Funding boost for the arXiv

This is fantastic news: starting this January, the Simons Foundation will provide the Cornell University Library with up to US $300k per year (for the next five years) of matching funds to help ensure the continued sustainability of arXiv.org. The funds are matched to donations by about 120 institutions in a dozen countries that are well funded and are heavy downloaders of articles from the arXiv. It is also providing an unconditional gift of $50k per year. Here’s the press release from the CUL.
I think it is pretty remarkable how an institution like the arXiv, which every reader of this blog will agree is absolutely indispensable for research, has struggled to make ends meet. This is especially true given that the amount of money it takes to keep it going is really just a drop in the bucket compared to other spending. Look at some of the numbers: in the last year alone, there were more than  50 million downloads worldwide and more than 76,000 articles submitted. To have open access to that kind of information for a total cost of about $1m per year? Priceless.

The Nine Circles of LaTeX Hell

This guy had an overfull hbox
This guy had an overfull hbox
Poorly written LaTeX is like a rash. No, you won’t die from it, but it needlessly complicates your life and makes it difficult to focus on pertinent matters. The victims of this unfortunate blight can be both the readers, as in the case of bad typography, or yourself and your coauthors, as in the case of bad coding practice.
Today, in an effort to combat this particular scourge (and in keeping with the theme of this blog’s title), I will be your Virgil on a tour through the nine circles of LaTeX hell. My intention is not to shock or frighten you, dear Reader. I hope, like Dante before me, to promote a more virtuous lifestyle by displaying the wages of sin. However, unlike Dante I will refrain from pointing out all the famous people that reside in these various levels of hell. (I’m guessing Dante already had tenure when he wrote The Inferno.)
The infractions will be minor at first, becoming progressively more heinous, until we reach utter depravity at the ninth level. Let us now descend the steep and savage path.

1) Using {\it ...} and {\bf ...}, etc.

Admittedly, this point is a quibble at best, but let me tell you why to use \textit{...} and \textbf{...} instead. First, \it and \bf don’t perform correctly under composition (in fact, they reset all font attributes), so {\it {\bf ...}} does not produce bold italics as you might expect. Second, it fails to correct the spacing for italicized letters. Compare \text{{\it test}text} to \text{\textit{test}text} and notice how crowded the former is.

2) Using \def

\def is a plain TeX command that writes macros without first checking if there was a preexisting macro. Hence it will overwrite something without producing an error message. This one can be dangerous if you have coauthors: maybe you use \E to mean \mathbb{E}, while your coauthor uses it to mean \mathcal{E}. If you are writing different sections of the paper, then you might introduce some very confusing typos. Use \newcommand or \renewcommand instead.

3) Using $$...$$

This one is another plain TeX command. It messes up vertical spacing within formulas, making them inconsistent, and it causes fleqn to stop working. Moreover, it is syntactically harder to parse since you can’t detect an unmatched pair as easily. Using [...] avoids these issues.

4) Using the eqnarray environment

If you don’t believe me that eqnarray is bad news, then ask Lars Madson, the author of “Avoid eqnarray!”, a 10 page treatise on the subject. It handles spacing in an inconsistent manner and will overwrite the equation numbers for long equations. You should use the amsmath package with the align environment instead.
Now we begin reaching the inner circles of LaTeX hell, where the crimes become noticeably more sinister.

5) Using standard size parentheses around display size fractions

Consider the following abomination: (\displaystyle \frac{1+x}{2})^n (\frac{x^k}{1+x^2})^m = (\int_{-\infty}^{\infty} \mathrm{e}^{-u^2}\mathrm{d}u )^2.
Go on, stare at this for one minute and see if you don’t want to tear your eyes out. Now you know how your reader feels when you are too lazy to use \left and \right.

6) Not using bibtex

Manually writing your bibliography makes it more likely that you will make a mistake and adds a huge unnecessary workload to yourself and your coauthors. If you don’t already use bibtex, then make the switch today.

7) Using text in math mode

Writing H_{effective} is horrendous, but even H_{eff} is an affront. The broken ligature makes these examples particularly bad. There are lots of ways to avoid this, like using \text or \mathrm, which lead to the much more elegant and legible H_{\text{eff}}. Don’t use \mbox, though, because it doesn’t get the font size right: H_{\mbox{eff}}.

8 ) Using a greater-than sign for a ket

At this level of hell in Dante’s Inferno, some of the accursed are being whipped by demons for all eternity. This seems to be about the right level of punishment for people who use the obscenity |\psi>.

9) Not using TeX or LaTeX

This one is so bad, it tops Scott’s list of signs that a claimed mathematical breakthrough is wrong. If you are typing up your results in Microsoft Word using Comic Sans font, then perhaps you should be filling out TPS reports instead of writing scientific papers.

A Federal Mandate for Open Science

Witness the birth of the Federal Research Public Access Act:

“The Federal Research Public Access Act will encourage broader collaboration among scholars in the scientific community by permitting widespread dissemination of research findings.  Promoting greater collaboration will inevitably lead to more innovative research outcomes and more effective solutions in the fields of biomedicine, energy, education, quantum information theory and health care.”

[Correction: it didn’t really mention quantum information theory—SF.]

You can read the full text of FRPAA here.
The bill states that any federal agency which budgets more than $100 million per year for funding external research must make that research available in a public online repository for free download now later than 6 months after the research has been published in a peer-reviewed journal.
This looks to me like a big step in the right direction for open science. Of course, it’s still just a bill, and needs to successfully navigate the Straights of the Republican-controlled House, through the Labyrinth of Committees and the Forest of Filibuster, and run the Gauntlet of Presidential Vetos. How can you help it survive this harrowing journey? Write your senators and your congresscritter today, and tell them that you support FRPAA and open science!
Hat tip to Robin Blume-Kohout.

Why boycott Elsevier?

Everyone has their own reasons for doing this. There is an interesting debate at Gower’s blog, including a response from an Elsevier employee. Some people dislike Elsevier’s high prices, their bundling practices, their fake medical journals, their parent company’s (now-former) involvement in the global arms trade, their lobbying for SOPA/PIPA/RWA, or other aspects of their business practice. Indeed, for those who want to reform Elsevier, this is one limitation of the boycott, in that it doesn’t clearly target a particular practice of the company that we want changed. On the other hand, others think Elsevier isn’t evil, but just has a communications problem.
In this post, I want to defend a more radical position, which is that we should try not to reform Elsevier or other publishers of academic journals, but to eliminate them. Until the debate over SOPA, I thought this position was too extreme. I thought we could tolerate a status quo in which journals are used for credentialing, and although it is a little unjust and absurd, the only real cost is bleeding the library budgets a little bit.
But the status quo isn’t stable. Open access and self-archiving are expanding. Soon, someone will successfully mirror JSTOR. Libraries are increasingly complaining about subscription costs.
In the long run, the future looks more like arxiv.org. Their front page boasts (as of this writing):

Open access to 731,335 e-prints in Physics, Mathematics, Computer Science, Quantitative Biology, Quantitative Finance and Statistics.

Just like the walled gardens of Compuserve and AOL would never grow into the Internet, no commercial publisher will ever be able to match the scope and ease of access of arxiv.org. Nor can they match the price. In 2010, there were about 70,000 new papers added to arxiv.org and there were 30 million articles downloaded, while their annual budget was $420,000. This comes to $6 per article uploaded (or 1.4 cents per download). Publishers talk about how much their business costs and how even “open access” isn’t free, but thanks to arxiv.org, we know how low the costs can go.
By contrast, if you want your article published open access with Springer, it costs $3000. This seems like something we might be able to protest, and convince them to change. We can’t. Elsevier’s outgoing CEO left with a golden parachute worth two million pounds. They’re not going to make that kind of money while running with the efficiency of arxiv.org. So while scientists and the public see the internet as a way of sharing knowledge and driving down costs, publishers like Elsevier see it as a threat. For them, $6/article is a nightmare scenario that has to be stopped.
Some of you might think I’m overreacting. After all, publishers have tolerated self-archiving, citeseer, arxiv.org, etc. so far. This is partly to avoid backlash, and partly because for historical reasons editors of journals like Science and Nature have personally supported the advance of science even over the profits of the companies they work for. But in the long run, we can’t both have everything available for free, and journals continuing to charge extortionate prices. I suspect that a conflict is inevitable, and when it happens, we’ll regret the fact that journals hold all of the copyrights. SOPA was the first sign that publishers are not on the side of advancing knowledge, and if a journal ever goes bankrupt and sells its portfolio of intellectual property, we’ll find out what they’re capable of when they no longer are run by people who place any value on science.
So what can we do about it? A boycott of Elsevier is a good first step. But really we need to change the system so that publishers no longer hold copyright. Their role (and rate of profits) would be like that of the local Kinko’s when they prepare course packs. This would also improve the academic societies, like ACM and APS, by removing the terrible incentive that their publishing gives them to support organizations like the AAP that in turn support SOPA. Instead, they could simply represent communities of scientists, like they were originally designed to do.
I’m not idealistic enough to imagine that arxiv.org is enough. The issue is not so much that it lacks refereeing (which could be remedied easily enough), but that it lacks scarcity. To see what I mean, imagine starting a free online-only virtual journal that simply selects papers from the arxiv. The entire journal archives could be a single html file of less than a megabyte. But without space constraints, it would need to credibly signal that papers accepted into it were high quality. This is nontrivial, and involves convincing authors, readers, referees and hiring committees, all more or less simultaneously. As a community, we need to figure out a way to do this, so that the internet can finally do what it was designed for, and disrupt scientific publishing.
Update: Via John Baez, I came across a proposal for replacing academic journals with overlay boards that seems promising.


In a move that will undoubtedly bring the US Senate to its knees, the Quantum Pontiff is going dark from 8am to 8pm EST on Jan 18 to protest SOPA, PIPA, the Research Works Act and other proposed acts of censorship.
We suggest you use this time to contact your representatives, read a book (or 1201.3387), or go outside.