The Secret Order of the ArXiv

The astro/physics blogosphere is all atwitter about papers the Nature embargo policy (See Julianne If a paper is submitted to nature does it still make a sound, the cat herder Hear a paper, see a paper, speak no paper, and he of less than certain principles Unhealthy obsessions of academia. He of uncertain principles loses the catchy title contest 🙂 )

In this discussion, the uncertain principal brings up an interesting effect for arXiv postings:

There’s an obsession in science with the order of publication that I don’t think is really healthy, and I think it’s only gotten worse. At the Science21 meeting last fall, Paul Ginsparg talked about how there’s a huge spike in arxiv submissions just after 4pm, because the daily update email puts papers in the order in which they were submitted, starting at 4pm. He said they can see scripts hitting the server to check the time, and then dumping papers in just as soon as the clock has ticked over. Apparently, the position of a paper in that email has a fairly significant effect on the number of views and citations that paper receives in the future.

Now I myself have been known to try to exploit this effect, but what I don’t understand is why, given that the arXiv crew knows about this effect, that they don’t fix it. I know it probably would be a bit of a hassle to rewrite the code, but really it shouldn’t be impossible to make the order of papers appearing in a day’s listing random.
Actually come to think of it is should be rather easy to fix this. Instead of ordering by date, one can just order by some hidden function of the, say, the title of the paper, the time submitted, and the author list. Of course that would just mean that we could spend some time cracking the arXiv’s hiding function 🙂
On a related note, I just submitted a new version of arXiview to Apple (which means it will appear in a few days time) which has some new features, including….ordering the search and posting results by submitted time/date.

3 Replies to “The Secret Order of the ArXiv”

  1. A simple salted hash of the title should provide an unbreakable deterministic, but unpredictable, ordering function.

  2. Wouldn’t it be simpler just to strip the date and time from the listings and then vary the exact time at which the email information was collated.

Leave a Reply

Your email address will not be published. Required fields are marked *