New Caelifera

New Caelifera

modern methow cabin

IDE Tools for Reading Research Papers?

January 15, 2012 | 7 Comments

It’s been over six months I made the jump from quantum computing theory professor at the University of Washington to software engineer at Google. When I hear from my friends back in quantum computing, the second question they ask is, “what’s it like?” (the first question is whether Google wants to build a quantum computer.)  There are lots of answers to this question, but what I think is really interesting is not how I feel about the ins and outs of this new career, but what I found the most surprising about the similarities between my new jobs and my old job.  And, for me, hands down, the most surprising similarity is between reading papers and reading other peoples code.

Reading a a paper in a new subject area was definitely one of the favorite parts of my old job (and a central skill to being a good researcher.)  You’d start out, often, with only a small clue about the subject of the paper.  Often you were led to the paper by a search that hit a few keywords, and an abstract that seemed interesting.  But after a few pages it often becomes clear that there are all sorts of terms and ideas that you just haven’t seen before.  And so you often have to spend some time doing some reading of other papers that contain the terms you don’t understand and see if they help.  Mostly they don’t, but sometimes they do and then you can backtrack and figure out some of what the first paper said.  This sort of jumping around, at least for me, occurred quite a bit as I’d try to parse a paper, interspread with periods of logical thought and pen and paper verification of calculations.

Reading other peoples code is very similar.  At first you start looking at some class, say, and you have some vague idea what it does.  Documentation and implicit documentation through naming gives  you some idea of what is going on, but quickly you often see the code start calling code that you don’t know how it works or what it exactly does, and so you have to go track down that other class, and then figure it out, and then backtrack.  Of course this is often interspread with bits of following the logic of the code.  Today, with modern IDE tools, this sort of jumping back and forth becomes a quick habit and makes the process of figuring out someone else’s code significantly easier.

Which got me thinking.  Why aren’t there modern tools for reading research documents that provide some of the functionality that is found in IDEs such as Eclipse?  Certainly some authors are gracious enough to compile their LaTeX such that their citation data is a link, but this is a long way from having PDFs where you can click on citations in the text and then you get immediately transported to the other paper, maybe even to the particular location in the paper that is relevant.  I think the technical challenge here is providing hooks between the documents: how do I make a citation that is more than just citing the full paper (wouldn’t it be nice if you specified the set of ranges of relevant lines in the paper?)  There are certainly very cool tools out there now for storing and parsing your scientific papers, but while the implicit linking between these papers is complex, most of this complexity is buried in the [12] citation pater.  But maybe solutions for this are already out there?  Thoughts?

7 people are talking about “IDE Tools for Reading Research Papers?

  1. Sounds like you just found a 20% project. :). It should be integrated with Google scholar. 🙂

    Citations (obviously) should be linked to the relevant paper.

    Perhaps a tree of referenced papers, including each papers’ relevancy score (as determined by how central of a reference each is). The tree is (naturally) virtually, infinitely deep as recursive references are gathered in real time.

    Allow me to highlight and add notes as I read the papers. (as implemented in the Kindle. You obviously need to buy a Kindle on Google’s dime as research). Then use common highlights as an indicator of important parts of the paper. (Again, as implemented by the Kindle. If they’ve patented that you may have to license.)

    Rollover text for referenced papers should include the most critical snippet of the abstract.

    “Most critical” snippet will change if I have selected a word or phrase in the currently open paper. Then the rollover text of referenced papers will include snippets which include the selected word/phrase (if it exists).

    Make it happen Dave! Shouldn’t take more than a few weeks.

    • Oh, and what is Google planning on using their new quantum computers for? That is, what is the coolest new task they want to accomplish with it?

  2. I sometimes use cite[Theorem~X]{PaperZ} or cite[pp.~X–Y]{PaperZ}, which does give valuable hints (I believe). it might be possible to create hooks based on this information. The main problem is that often I don’t give precise references. Of course it’s not always possible to pinpoint a single page or a theorem but even if it is I sometimes don’t bother. I guess it may be because the document still compiles (and/or gets accepted) without these precise references(?)

  3. It’s an interesting idea I suppose, and I suppose also that it’d be pretty trivial to at least specify some more direct mapping in LaTeX and then encoding that into the output in some fashion. That is to say, perhaps the encoding could be a link to a authorized “Segment” website, that will open the relevant part of the document specified by some globally unique identifier (DOI?). What may be preferrable, though, is for it to link to actual PDF’s or entries in our relevant bibtex libraries; which may be possible if you were to, say, use a custom URI (ref://doi..?pg=2), and then JabRef or whoever could register for the handling of that. Note, though, that there’d be issues with different versions of a paper having the same identifier (i.e. updates).

    I do, though, somewhat wonder how neccessary it is.

    Programs tend to have significantly more unfamiliar terms; for example a lot of papers talk about Grovers algorithm, and you wouldn’t want to keep seeing specific links back to the paper about that; you’d maybe like a quick review the first few times you encounter it. So, then, I suggest that perhaps just a much better search, in existing reference managing software, is appropriate.

    I do agree, though, that perhaps the world needs some awesome reference management tool on par with vim for programming 🙂 (that is to say, a plugin for vim 🙂

  4. Why not a social network? Or a “smart paper citation network”? Just add some social elements and it sounds like a great startup idea! Every paper has a profile, users manage their papers, easy citations into other papers, etc. You could make tools to more easily create papers and cite papers. I’ve wanted a web/cloud based Latex solution for the longest time (LaTeX is the only thing besides strait up coding I need to do on a personal machine).

Leave a Reply