What If Papers Had APIs?

API is an abbreviation that stands for “Application Program Interface.” Roughly speaking an API is a specification of a software component in terms of the operations one can perform with that component. For example, a common kind of an API is the set of methods supported by a encapsulated bit of code a.k.a. a library (for example, a library could have the purpose of “drawing pretty stuff on the screen”, the API is then the set of commands like “draw a rectangle”, and specify how you pass parameters to this method, how rectangles overlay on each other, etc.) Importantly the API is supposed to specify how the library functions, but does this in a way that is independent of the inner workings of the library (though this wall is often broken in practice). Another common API is found when a service exposes remote calls that can be made to manipulate and perform operations on that service. For example, Twitter supports an API for reading and writing twitter data. This later example, of a service exposing a set of calls that can manipulate the data stored on a remote server, is particularly powerful, because it allows one to gain access to data through simple access to a communication network. (As an interesting aside, see this rant for why APIs are likely key to some of Amazon’s success.)
jdrzxAs you might guess, (see for example my latest flop Should Papers Have Unit Tests?), I like smooshing together disparate concepts and seeing what comes out the other side. When thinking about APIs then led me to consider the question “What if Papers had APIs”?
In normal settings academic papers are considered to be relatively static objects. Sure papers on the arXiv, for example, have versions (some more than others!) And there are efforts like Living Reviews in Relativity, where review articles are updated by the authors. But in general papers exist, as fixed “complete” works. In programming terms we would say that are “immutable”. So if we consider the question of exposing an API for papers, one might think that this might just be a read only API. And indeed this form of API exists for many journals, and also for the arXiv. These forms of “paper APIs” allow one to read information, mostly metadata, about a paper.
But what about a paper API that allows mutation? At first glance this heresy is rather disturbing: allowing calls from outside of a paper to change the content of the paper seems dangerous. It also isn’t clear what benefit could come from this. With, I think, one exception. Citations are the currency of academia (last I checked they were still, however not fungible with bitcoins). But citations really only go in one direction (with exceptions for simultaneous works): you cite a paper whose work you build upon (or whose work you demonstrate is wrong, etc). What if a paper exposed a reverse citation index. That is, if I put my paper on the arXiv, and then, when you write your paper showing how my paper is horribly wrong, you can make a call to my paper’s api that mutates my paper and adds to it links to your paper. Of course, this seems crazy: what is to stop rampant back spamming of citations, especially by *ahem* cranks? Here it seems that one could implement a simple approval system for the receiving paper. If this were done on some common system, then you could expose the mutated paper either A) with approved mutations or B) with unapproved mutations (or one could go ‘social’ on this problem and allow voting on the changes).
What benefit would such a system confer? In some ways it would make more accessible something that we all use: the “cited by” index of services like Google Scholar. One difference is that it could be possible to be more precise in the reverse citation: for example while Scholar provides a list of relevant papers, if the API could expose the ability to add links to specific locations in a paper, one could arguably get better reverse citations (because, frankly, the weakness of the cited by indices is their lack of specificity).
What else might a paper API expose? I’m not convinced this isn’t an interesting question to ponder. Thanks for reading another wacko mashup episode of the Quantum Pontiff!

13 Replies to “What If Papers Had APIs?”

  1. Some thoughts come to mind.
    – micro-comments – I have the ability to tag a paragraph of your paper with a comment, much like what we can do now on certain blog posts (like with Medium)
    – proof substitutions: fix bugs in a proof, or link to a simpler proof of a claim.
    – code invocations: embedded code in my paper can be accessed via an API call, or even better, you could run my code with your data.

  2. Notebook environments like iPython and iHaskell make it feasible to provide as “Supplemental Information” both a symbolic derivation of the main results (so less guesswork regarding “it can be shown”) and a replicable numerical computation of figures and tables (so less guesswork regarding “numerical computations indicate”).
    It’s unfortunate (as it seems to me) that iPython and iHaskell are evolving rapidly … a great virtue of TeX is that it evolves not at all, and of LaTeX that it evolves with glacial slowness.
    Also, thanks for sustaining The Quantum Pontiffs as a wonderful forum.

    1. N.B.  Some Quantum Pontiff readers may be interested too in the Knuth-style command-line nuweb system for Literate Programming.”
      Two attractive features of nuweb are: (1) enables literate programming in any text-file programming language, and (2) source-code sufficiently simple and well-documented that you can modify it yourself as-needed.
      E.g., I’ve had pleasant experiences with nuweb/matlab literature programming. Needless to say, Knuth-style literate programming requires a *lot* of additional work in *any* API; neither nuweb nor any other literate programming api (that is known to me or anyone else) composes a magic bullet for eliminating this work.

  3. I think this is superseded by the idea of having papers essentially as code repositories (say as git repos on github, or similar). Then if there’s a horrible flaw, you just fix it, or open a bug report.
    I think the “opt-in” form of citations will eventually die out anyway, to be replaced with programmatic style ones where my “paper” builds on yours because it literally *requires* it to “compile”, so I don’t see any point to try and make the “old” citation system marginally better.

  4. @misc{Avigad:2014aa, Author = {Jeremy Avigad
    and Steve Awodey and Robert Harper and Daniel
    Licata and Michael Shulman and Vladimir
    Voevodsky and Andrej Bauer and Thierry Coquand
    and Nicola Gambino and David Spivak},
    Howpublished = {{MURI} proposal: on-line
    \url{https://hottheory.files.wordpress.com/
    2014/04/ proposalpublic2.pdf}}, Month = {April
    29}, Title = {Homotopy type theory: unified
    foundations of mathematics and computation},
    Year = {2014}}
    @article{McLarty:1990aa, Author = {Colin
    McLarty}, Journal = {British Journal of the
    Philosophy of Science}, Pages = {351-375},
    Title = {The uses and abuses of the history of
    topos theory}, Volume = {41}, Year = {1990}}
    @article{Lawvere:2014aa, Author = {F. William
    Lawvere}, Journal = {Reprints in Theory and
    Applications of Categories,}, Pages = {1-22},
    Title = {Comments on the development of topos
    theory}, Volume = {24}, Year = {2014}}
    @article{cite-key, Author = {Richard Rorty},
    Journal = {Wilson Quarterly}, Pages = {30},
    Title = {Against Unity}, Volume = {(Winter)},
    Year = {1998}}
    @incollection{Foucault:2000aa, Author =
    {Michel Foucault}, Booktitle = {Ethics:
    Subjectivity and Truth}, Editor = {Michel
    Foucault and Paul Rabinow}, Pages = {111–19},
    Publisher = {Lane, The Penguin Press}, Title =
    {Polemics, Politics and Problematizations},
    Volume = {1}, Year = {2000}}

  5. Noon Silk predicts  “the ‘opt-in’ form of citations will eventually die out anyway, to be replaced with programmatic style ones where my ‘paper’ builds on yours because it literally requires it to ‘compile’.”

    Works that consider (implicitly or explicitly) the features of a compiled STEM culture — features that are some regards positive, but mainly are negative — include the following:
    (1)  Homotopy Type Theory: Unified Foundations of Mathematics and Computation (Avigad et al., MURI Proposal, 2014)  A STEM literature that ‘compiled’ in 20th century ZFC might not ‘compile’ in 21 century HoTT.
    (2)  The uses and abuses of the history of topos theory (Colin McLarty, 1990)  It is adventageous for students to regard present-day ZFC dominance as a more-or-less accidental historical contingency.
    (3)  Comments on the development of topos theory (F. William Lawvere, 2014)  “One should not get drunk on the idea that everything is general. Category theorists should get back to the original goal: applying general results to particularities and to making connections between different areas of mathematics.”
    (4)  Against Unity (Richard Rorty, 1998)  Theorems natural in one language may be non-natural in another language.
    (5)  Polemics, Politics and Problematizations (Michel Foucault, 2000)  summarizes broad cultural implications of discourse presented solely as logic grounded in postulates that are nominated as axioms (by whom?) …

    I like discussions, and when I am asked questions, I try to answer them. It’s true that I don’t like to get involved in polemics. […] I don’t belong to the world of people who do things that way. I insist on this difference as something essential: a whole morality is at stake, the one that concerns the search for truth and the relation to the other.
    In the serious play of questions and answers, in the work of reciprocal elucidation, the rights of each person are in some sense immanent in the discussion. They depend only on the dialogue situation. The person asking the questions is merely exercising the right that has been given him: to remain unconvinced, to perceive a contradiction, to require more information, to emphasize different postulates, to point out faulty reasoning, and so on. As for the person answering the questions, he too exercises a right that does not go beyond the discussion itself; by the logic of his own discourse, he is tied to what he has said earlier, and by the acceptance of dialogue he is tied to the questioning of other. Questions and answers depend on a game — a game that is at once pleasant and difficult — in which each of the two partners takes pains to use only the rights given him by the other and by the accepted form of dialogue.

    Conclusion Each generation of researchers (including QIT researchers) has found ample reason to be glad that the previous generation of researchers did not insist upon rigid ‘compile-time’ axioms.

    @misc{Avigad:2014aa, Author = {Jeremy Avigad
    and Steve Awodey and Robert Harper and Daniel
    Licata and Michael Shulman and Vladimir
    Voevodsky and Andrej Bauer and Thierry Coquand
    and Nicola Gambino and David Spivak},
    Howpublished = {{MURI} proposal: on-line
    \url{https://hottheory.files.wordpress.com/
    2014/04/ proposalpublic2.pdf}}, Month = {April
    29}, Title = {Homotopy type theory: unified
    foundations of mathematics and computation},
    Year = {2014}}
    @article{McLarty:1990aa, Author = {Colin
    McLarty}, Journal = {British Journal of the
    Philosophy of Science}, Pages = {351-375},
    Title = {The uses and abuses of the history of
    topos theory}, Volume = {41}, Year = {1990}}
    @article{Lawvere:2014aa, Author = {F. William
    Lawvere}, Journal = {Reprints in Theory and
    Applications of Categories,}, Pages = {1-22},
    Title = {Comments on the development of topos
    theory}, Volume = {24}, Year = {2014}}
    @article{cite-key, Author = {Richard Rorty},
    Journal = {Wilson Quarterly}, Pages = {30},
    Title = {Against Unity}, Volume = {(Winter)},
    Year = {1998}}
    @incollection{Foucault:2000aa, Author =
    {Michel Foucault}, Booktitle = {Ethics:
    Subjectivity and Truth}, Editor = {Michel
    Foucault and Paul Rabinow}, Pages = {111--19},
    Publisher = {Lane, The Penguin Press}, Title =
    {Polemics, Politics and Problematizations},
    Volume = {1}, Year = {2000}}

    1. N.B.  One more amusing reflection is provided by (historian) Jonathan Israel …

      Categories are terribly important, not just for philosophers, but for everyone.
      Historians sometimes forget this and try to operate without categories, but I don’t think that’s a very good way of pursuing historical studies [because] one is always in danger of thinking about something, making up your mind about it, and then not being critical enough in your thinking on that topic subsequently.

      Conclusion Israel’s categoric considerations read naturally (and amusingly) as applying broadly to STEM discourse, particularly to the Harrow/Kalai debate.

      @misc{Israel:2014ab, Author =
      {Jonathan Israel}, Howpublished =
      {\emph{Five Books} on-line interviews
      (\url{http://fivebooks.com/interviews
      /jonathan-israel-on-enlightenment})},
      Month = {Nov 28}, Title = {An
      interview with.{J}onathan {I}srael on
      the {E}nlightenment}, Year = {2014}}

  6. Another API-related reference  Per recent API-related discussion on Gödel’s Lost Letter, the attention of Quantum Pontiff readers is directed to the recent Bulletin of the AMS article by Pelayo and Warren “Homotopy type theory and Voevodsky’s univalent foundations” (2014). The concluding paragraph of Pelayo and Warren’s article is an explicit roadmap for an API that encompasses both mathematical, scientific, and engineering publications:

    Ultimately, we hope that it will be possible to formalize large amounts of modern mathematics in the univalent setting, and that doing so will give rise to both new theoretical insights and good numerical algorithms (extracted from code in a proof assistant like Coq) which can be applied to real-world problems by applied mathematicians.

    Needless to say, plenty of mathematicians are expressing concern. On Michael Harris’ weblog Mathematics without apolofies (which is a *very* fun weblog), a comment by Richard Séguin speaks for many folks (including me):

    Another concern that I have [is] the language du jour problem. There have been many fads over the years in computer programming languages, CS folks seem to love inventing new ones*, and backwards compatibility with anything is generally not a central concern.
    In mathematics, there has always been, for example, drift in notation and invention of new words, but it generally happens slowly enough that there isn’t much of a problem reading 50 year old papers.
    Similarly, TeX, mathematic’s typesetting language, evolves slowly, generally has backward compatibility, and is the universally accepted standard.
    In contrast, I suspect that we will see a proliferation of different “foundations,” proof checkers, and proof generators driven by CS folks*, and it will never settle down, mirroring the situation with programming languages. I see the chaos of a Tower of Babel. Tell me it ain’t so.
    *especially if research money starts to flow

    Conclusion  At present the STEAM community suffers not from too-few too-old research APIs, but rather too many research APIs that are too new.

    @article{Pelayo:2014aa, Author = {{\'A}lvaro
    Pelayo and Michael A. Warren}, Journal = {Bull.
    Amer. Math. Soc.}, Pages = {597-648}, Title =
    {Homotopy type theory and Voevodsky's univalent
    foundations}, Volume = {5}, Year = {2014}}

  7. Here’s news of broad interest to readers of weblogs like The Quantum Pontiffs, Shtetl Optimized, Gödel’s Lost Letter, and Quantum Frontiers:

    SYNOPSIS  The Intelligence Advanced Research Projects Activity (IARPA) will host a Proposers’ Day on 19 May 2015 at the University of Maryland Stamp Student Union to provide information to potential proposers on the objectives of an anticipated Broad Agency Announcement (BAA) for the Logical Qubits (LogiQ) program.
    PROGRAM OBJECTIVE AND DESCRIPTION  The LogiQ program in IARPA’s Safe and Secure Operations (SSO) Office is seeking creative technical solutions to the challenge of encoding imperfect physical qubits into a logical qubit that protects against decoherence, gate errors, and deleterious environmental influences.
    While quantum information processing has witnessed tremendous advances in high-fidelity qubit operations and an increase in the size and complexity of controlled quantum computing systems, it still suffers from physical-qubit gate and measurement fidelities that fall short of desired thresholds, multi-qubit systems whose overall performance is inferior to that of isolated qubits, and non-extensible architectures-all of which hinder their path toward fault-tolerance.
    Underpinning the program’s strategy to build a logical qubit is a push for higher fidelity in multi-qubit operations, the pursuit of dynamically controlled experiments in multi-qubit systems to remove entropy from the system during computation, and characterization and mitigation of environmental noise and correlated errors.

    There’s plenty more material on the LogiQ Program Proposer’s Day announcement page.
    Readers of Michael Harris’ book and/or weblog Mathematics Without Apologies (as they are both titled) will appreciate that IARPA is designating stable logical qubits as an avatar (in Harris’ phrase) for catalyzing advances in our general understanding of noise, decoherence, and entropy.
    Conclusion  The physical process of removing von Neumann entropy from systems of qubits/qudits can be appreciated as a mathematical avatar — in Michael Harris’ phrase — for the computational algorithms that (heuristically) are so marvelously effective in removing Boltzmann entropy from atoms/molecules in large-scale quantum simulation software … sufficient to compose (F)foundations for the $120M/5yr investment by the Swiss-based pharmaceutical corporation Sanofi in the Portland-based quantum simulation corporation Schrödinger.
    Hopefully at least some East Coast quantum cognoscenti will attempt/report on this fine workshop!

  8. Further high-value discussion of mathematical API’s — by luminaries like Lurie, Tao, Donaldson, etc — is summarized by Michael Harris in this week’s Mathematics Without Apologies Univalent Foundations: “No Comment.” (May 13, 2015).
    Terry Tao’s view:

    “Some day we may actually write our papers, not in LaTeX … but in language that some smart software could convert into a formal language, and every so often you get a compilation error: ‘THE COMPUTER DOES NOT UNDERSTAND HOW YOU DERIVE THIS STEP’.”

    Note: Harris provides a link to an on-line video of this discussion.

  9. Assuming I understand what you’re suggesting…
    Having just recently read about controversial papers, this came to mind. Suppose someone publishes research that is considered “politically incorrect” or “politically sensitive”, whatever field it may be. Allowing for revisions of papers allows for effectively “changing the past”, which, in my opinion, is the kiss of death for the preservation of truth. Yes, some fields might benefit from it, such as computer science, which, at the moment, has no serious political or social issues I’m aware of that would be negatively affected from this… at least until super AI is invented. That too may change when culture decides that AI should have a prominent role in society – for better or for worse – and like-minded new generations of scientists decide to go back and “change the books” to fit cultural trends.
    The fact that scientific papers are set-in-stone, so to speak, allows us to go back and reference things of the past without worry that they will be modified to fit current views. For example, challenges have been made about the traditional Copenhagen interpretation on quantum mechanics. Notably, those views were stomped on for a long time but are coming back in recent years. The papers about them dating back to the 70s haven’t changed. No one went back to “change the books” because that’s not what we do in science.
    Wrong or right, the paper needs to stay the same.
    Furthermore, consider all of those papers that reference papers that are wrong. Such papers may explicitly call attention to the wrongs. But if those wrongs are corrected, readers of the new papers won’t know the problem (in the old papers) that the new papers are referring to. The new papers may say “Previously, we thoughtPQR [1][2][3], but recent studies show XYZ [4][5][6]”, and such.
    Unchanging scientific papers show us where we’ve been and give us a standard to compare our current work against. Let’s leave them alone please.

  10. N Anderson I think this is easily addressed by having the system keep a history of changes. Very similar to how arxiv.org keeps a history of all papers that have been submitted (so if you find a withdrawn one you can find the previous version). Exposing this history I think would be an problem, but by preserving it you won’t be able to “rewrite the past”.

Leave a Reply

Your email address will not be published. Required fields are marked *