What If Papers Had APIs?

API is an abbreviation that stands for “Application Program Interface.” Roughly speaking an API is a specification of a software component in terms of the operations one can perform with that component. For example, a common kind of an API is the set of methods supported by a encapsulated bit of code a.k.a. a library (for example, a library could have the purpose of “drawing pretty stuff on the screen”, the API is then the set of commands like “draw a rectangle”, and specify how you pass parameters to this method, how rectangles overlay on each other, etc.) Importantly the API is supposed to specify how the library functions, but does this in a way that is independent of the inner workings of the library (though this wall is often broken in practice). Another common API is found when a service exposes remote calls that can be made to manipulate and perform operations on that service. For example, Twitter supports an API for reading and writing twitter data. This later example, of a service exposing a set of calls that can manipulate the data stored on a remote server, is particularly powerful, because it allows one to gain access to data through simple access to a communication network. (As an interesting aside, see this rant for why APIs are likely key to some of Amazon’s success.)
jdrzxAs you might guess, (see for example my latest flop Should Papers Have Unit Tests?), I like smooshing together disparate concepts and seeing what comes out the other side. When thinking about APIs then led me to consider the question “What if Papers had APIs”?
In normal settings academic papers are considered to be relatively static objects. Sure papers on the arXiv, for example, have versions (some more than others!) And there are efforts like Living Reviews in Relativity, where review articles are updated by the authors. But in general papers exist, as fixed “complete” works. In programming terms we would say that are “immutable”. So if we consider the question of exposing an API for papers, one might think that this might just be a read only API. And indeed this form of API exists for many journals, and also for the arXiv. These forms of “paper APIs” allow one to read information, mostly metadata, about a paper.
But what about a paper API that allows mutation? At first glance this heresy is rather disturbing: allowing calls from outside of a paper to change the content of the paper seems dangerous. It also isn’t clear what benefit could come from this. With, I think, one exception. Citations are the currency of academia (last I checked they were still, however not fungible with bitcoins). But citations really only go in one direction (with exceptions for simultaneous works): you cite a paper whose work you build upon (or whose work you demonstrate is wrong, etc). What if a paper exposed a reverse citation index. That is, if I put my paper on the arXiv, and then, when you write your paper showing how my paper is horribly wrong, you can make a call to my paper’s api that mutates my paper and adds to it links to your paper. Of course, this seems crazy: what is to stop rampant back spamming of citations, especially by *ahem* cranks? Here it seems that one could implement a simple approval system for the receiving paper. If this were done on some common system, then you could expose the mutated paper either A) with approved mutations or B) with unapproved mutations (or one could go ‘social’ on this problem and allow voting on the changes).
What benefit would such a system confer? In some ways it would make more accessible something that we all use: the “cited by” index of services like Google Scholar. One difference is that it could be possible to be more precise in the reverse citation: for example while Scholar provides a list of relevant papers, if the API could expose the ability to add links to specific locations in a paper, one could arguably get better reverse citations (because, frankly, the weakness of the cited by indices is their lack of specificity).
What else might a paper API expose? I’m not convinced this isn’t an interesting question to ponder. Thanks for reading another wacko mashup episode of the Quantum Pontiff!

Elsevier again, and collective action

We all know about higher education being squeezed financially. Government support is falling and tuition is going up. We see academic jobs getting scarcer, and more temporary. The pressure for research to focus on the short term is going up. Some of these changes may be fair, since society always has to balance its immediate priorities against its long-term progress. At other times, like when comparing the NSF’s $7.6 billion FY2014 budget request to the ongoing travesty that is military procurement, it does feel as though we are eating our seed corn for not very wise reasons.
Against this backdrop, the travesty that is scientific publishing may feel like small potatoes. But now we are starting to find out just how many potatoes. Tim Gowers has been doing an impressive job of digging up exactly how much various British universities pay for their Elsevier subscriptions. Here is his current list. Just to pick one random example, the University of Bristol (my former employer), currently pays Elsevier a little over 800,000 pounds (currently $1.35M) for a year’s access to their journals. Presumably almost all research universities pay comparable amounts.
To put this number in perspective, let’s compare it not to the F-35, but to something that delivers similar value: arxiv.org. Its total budget for 2014 is about 750,000 US dollars (depending on how you count overhead), and of course this includes access for the entire world, not only the University of Bristol. To be fair, ScienceDirect has about 12 times as many articles and the median quality is probably higher. But overall it is clearly vastly more expensive for society to have its researchers communicate in this way.
Another way to view the £800,000 price tag is in terms of the salaries of about 40 lecturers (\approx assistant professors), or some equivalent mix of administrators, lecturers and full professors. The problem is that these are not substitutes. If Bristol hired 40 lecturers, they would not each spend one month per year building nearly-free open-access platforms and convincing the world to use them; they would go about getting grants, recruiting grad students and publishing in the usual venues. There are problems of collective action, of the path dependence that comes with a reputation economy and of the diffuse costs and concentrated benefits of the current system.
I wish I could end with some more positive things to say. I think at least for now it is worth getting across the idea that there is a crisis, and that we should all do what we can to help with it, especially when we can do so without personal cost. In this way, we can hopefully create new social norms. For example, it is happily unconventional now to not post work on arxiv.org, and I hope that it comes to be seen also as unethical. In the past, it was common to debate whether QIP should have published proceedings. Now major CS conferences are cutting themselves loose from parasitic professional societies (see in particular the 3% vote in favor of the status quo) and QIP has begun debating whether to require all submissions be accompanied by arxiv posts (although this is of course not at all clear-cut). If we cannot have a revolution, hopefully we can at least figure out an evolutionary change towards a better scientific publishing system. And then we can try to improve military procurement.

TQC 2014!

While many of us are just recovering from QIP, I want to mention that the submission deadline is looming for the conference TQC, which perhaps should be called TQCCC because its full name is Theory of Quantum Computation, Communication and Cryptography. Perhaps this isn’t done because it would make the conference seem too classical? But TQQQC wouldn’t work so well either. I digress.
The key thing I want to mention is the imminent 15 Feb submission deadline.
I also want to mention that TQC is continuing to stay ahead of the curve with its open-access author-keeps-copyright proceedings, and this year with some limited open reviewing (details here). I recently spoke to a doctor who complained that despite even her Harvard Medical affiliation, she couldn’t access many relevant journals online. While results of taxpayer-funded research on drug efficacy, new treatments and risk factors remain locked up, at least our community is ensuring that anyone wanting to work on the PPT bound entanglement conjecture will be able to catch up to the research frontier without having to pay $39.95 per article.
One nice feature about these proceedings is that if you later want to publish a longer version of your submission in a journal, then you will not face any barriers from TQC. I also want to explicitly address one concern that some have raised about TQC, which is that the published proceedings will prevent authors from publishing their work elsewhere. For many, the open access proceedings will be a welcome departure from the usual exploitative policies of not only commercial publishers like Elsevier, but also the academic societies like ACM and IEEE. But I know that others will say “I’m happy to sign your petitions, but at the end of the day, I still want to submit my result to PRL” and who am I to argue with this?
So I want to stress that submitting to TQC does not prevent submitting your results elsewhere, e.g. to PRL. If you publish one version in TQC and a substantially different version (i.e. with substantial new material) in PRL, then not only is TQC fine with it, but it is compatible with APS policy which I am quoting here:

Similar exceptions [to the prohibition against double publishing] are generally made for work disclosed earlier in abbreviated or preliminary form in published conference proceedings. In all such cases, however, authors should be certain that the paper submitted to the archival
journal does not contain verbatim text, identical figures or tables, or other copyrighted materials which were part of the earlier publications, without providing a copy of written permission from the copyright holder. [ed: TQC doesn’t require copyright transfer, because it’s not run by people who want to exploit you, so you’re all set here] The paper must also contain a substantial body of new material that was not included in the prior disclosure. Earlier relevant published material should, of course, always be clearly referenced in the new submission.

I cannot help but mention that even this document (the “APS Policy on Prior Disclosure”) is behind a paywall and will cost you $25 if your library doesn’t subscribe. But if you really want to support this machine and submit to PRL or anywhere else (and enjoy another round of refereeing), TQC will not get in your way.
Part of what makes this easy is TQC’s civilized copyright policy (i.e. you keep it). By contrast, Thomas and Umesh had a more difficult, though eventually resolved, situation when combining STOC/FOCS with Nature.

A Federal Mandate for Open Science

Witness the birth of the Federal Research Public Access Act:

“The Federal Research Public Access Act will encourage broader collaboration among scholars in the scientific community by permitting widespread dissemination of research findings.  Promoting greater collaboration will inevitably lead to more innovative research outcomes and more effective solutions in the fields of biomedicine, energy, education, quantum information theory and health care.”

[Correction: it didn’t really mention quantum information theory—SF.]

You can read the full text of FRPAA here.
The bill states that any federal agency which budgets more than $100 million per year for funding external research must make that research available in a public online repository for free download now later than 6 months after the research has been published in a peer-reviewed journal.
This looks to me like a big step in the right direction for open science. Of course, it’s still just a bill, and needs to successfully navigate the Straights of the Republican-controlled House, through the Labyrinth of Committees and the Forest of Filibuster, and run the Gauntlet of Presidential Vetos. How can you help it survive this harrowing journey? Write your senators and your congresscritter today, and tell them that you support FRPAA and open science!
Hat tip to Robin Blume-Kohout.

Why boycott Elsevier?

Everyone has their own reasons for doing this. There is an interesting debate at Gower’s blog, including a response from an Elsevier employee. Some people dislike Elsevier’s high prices, their bundling practices, their fake medical journals, their parent company’s (now-former) involvement in the global arms trade, their lobbying for SOPA/PIPA/RWA, or other aspects of their business practice. Indeed, for those who want to reform Elsevier, this is one limitation of the boycott, in that it doesn’t clearly target a particular practice of the company that we want changed. On the other hand, others think Elsevier isn’t evil, but just has a communications problem.
In this post, I want to defend a more radical position, which is that we should try not to reform Elsevier or other publishers of academic journals, but to eliminate them. Until the debate over SOPA, I thought this position was too extreme. I thought we could tolerate a status quo in which journals are used for credentialing, and although it is a little unjust and absurd, the only real cost is bleeding the library budgets a little bit.
But the status quo isn’t stable. Open access and self-archiving are expanding. Soon, someone will successfully mirror JSTOR. Libraries are increasingly complaining about subscription costs.
In the long run, the future looks more like arxiv.org. Their front page boasts (as of this writing):

Open access to 731,335 e-prints in Physics, Mathematics, Computer Science, Quantitative Biology, Quantitative Finance and Statistics.

Just like the walled gardens of Compuserve and AOL would never grow into the Internet, no commercial publisher will ever be able to match the scope and ease of access of arxiv.org. Nor can they match the price. In 2010, there were about 70,000 new papers added to arxiv.org and there were 30 million articles downloaded, while their annual budget was $420,000. This comes to $6 per article uploaded (or 1.4 cents per download). Publishers talk about how much their business costs and how even “open access” isn’t free, but thanks to arxiv.org, we know how low the costs can go.
By contrast, if you want your article published open access with Springer, it costs $3000. This seems like something we might be able to protest, and convince them to change. We can’t. Elsevier’s outgoing CEO left with a golden parachute worth two million pounds. They’re not going to make that kind of money while running with the efficiency of arxiv.org. So while scientists and the public see the internet as a way of sharing knowledge and driving down costs, publishers like Elsevier see it as a threat. For them, $6/article is a nightmare scenario that has to be stopped.
Some of you might think I’m overreacting. After all, publishers have tolerated self-archiving, citeseer, arxiv.org, etc. so far. This is partly to avoid backlash, and partly because for historical reasons editors of journals like Science and Nature have personally supported the advance of science even over the profits of the companies they work for. But in the long run, we can’t both have everything available for free, and journals continuing to charge extortionate prices. I suspect that a conflict is inevitable, and when it happens, we’ll regret the fact that journals hold all of the copyrights. SOPA was the first sign that publishers are not on the side of advancing knowledge, and if a journal ever goes bankrupt and sells its portfolio of intellectual property, we’ll find out what they’re capable of when they no longer are run by people who place any value on science.
So what can we do about it? A boycott of Elsevier is a good first step. But really we need to change the system so that publishers no longer hold copyright. Their role (and rate of profits) would be like that of the local Kinko’s when they prepare course packs. This would also improve the academic societies, like ACM and APS, by removing the terrible incentive that their publishing gives them to support organizations like the AAP that in turn support SOPA. Instead, they could simply represent communities of scientists, like they were originally designed to do.
I’m not idealistic enough to imagine that arxiv.org is enough. The issue is not so much that it lacks refereeing (which could be remedied easily enough), but that it lacks scarcity. To see what I mean, imagine starting a free online-only virtual journal that simply selects papers from the arxiv. The entire journal archives could be a single html file of less than a megabyte. But without space constraints, it would need to credibly signal that papers accepted into it were high quality. This is nontrivial, and involves convincing authors, readers, referees and hiring committees, all more or less simultaneously. As a community, we need to figure out a way to do this, so that the internet can finally do what it was designed for, and disrupt scientific publishing.
Update: Via John Baez, I came across a proposal for replacing academic journals with overlay boards that seems promising.

Why medicine needs scirate.com

Defenders of the traditional publishing model for medicine say that health-related claims need to be vetted by a referee process. But there are heavy costs. In quantum information, one might know the proof of a theorem (e.g. the Quantum Reverse Shannon Theorem) for years without publishing it. But one would rarely publish using data that is itself secret. Unfortunately, this is the norm in public health. It’s ironic that the solution to the 100-year-old Poincaré conjecture was posted on arxiv.org and rapidly verified, while research on fast-moving epidemics like H5N1 (bird flu) is
delayed so that scientists who control grants can establish priority.
All this is old news. But what I hadn’t realized is that the rest of science needs not only arxiv.org, but also scirate.com. Here is a recent and amazing, but disturbingly common, example of scientific fraud. A series of papers were published with seemingly impressive results, huge and expensive clinical trials were planned based on these papers, while other researchers were privately having trouble replicating the results, or even making sense of the plots. But when they raised their concerns, here’s what happened (emphasis added):

In light of all this, the NCI expressed its concern about what was going on to Duke University’s administrators. In October 2009, officials from the university arranged for an external review of the work of Dr Potti and Dr Nevins, and temporarily halted the three trials. The review committee, however, had access only to material supplied by the researchers themselves, and was not presented with either the NCI’s exact concerns or the problems discovered by the team at the Anderson centre. The committee found no problems, and the three trials began enrolling patients again in February 2010.

As with the Schön affair, there were almost comically many lies, including a fake “Rhodes scholarship in Australia” (which you haven’t heard of because it doesn’t exist) on one of the researcher’s CVs. But what if they lied only slightly more cautiously?
By contrast, with scirate.com, refutations of mistaken papers can be quickly crowdsourced. If you know non-quantum scientists, go forth and spread the open-science gospel!

Two Interesting LaTeX Online Editors

One, a Google Docs app, LaTeX Lab (thanks to Daniel for pointing this out.) Another with https support Verbosus. I couldn’t get the later to compile and display in browser with FireFox, but did in Safari. Verbosus has an android app, but strangely no desktop version.
Recently I’ve been mostly using Dropbox for collaborations using LaTeX. Every once in a while there are conflicts in editing at the same time, but with only a few people this seems to work really well.

In Search of Plugins

Speaking of blogging technology: plugins that I haven’t seen for WordPress but that would be fun (i.e. thing I’d love to do if I were king of infinite time.)

  • A plugin to allow for comments INSIDE of a blog post.  The reader would be able to click at, say the end of a paragraph, and have their comment inserted there.  The comment would display collapsed (as a small icon or such) but when clicked on would expand for an inline comment.  Comment replies could be posted there as well.  An option to display all comments (inline and after the post) should be available as well.  Of course one could apply this recursively 🙂
  • Automagic arXiv linker.  I’m amazed this doesn’t exist.  I should be able to cut and paste an arXiv reference and it should automatically link to this (including maybe the option to add [pdf] after the paper for a direct link to the pdf.)
  • A plugin to list the tweets that have been made about this post.  I have tweetmeme installed, which gives a link to the tweets, but if there is a plugin that lists, inline, tweets, I haven’t found it.