Via the arxiv API google group, I see that the arXiv now has made available process PDFs for bulk download from Amazon’s Simple Storage System. I haven’t had a chance to play around with it, but according to this webpage, the cost is about 15 cents per gigabyte downloaded and the complete set of PDFs is about 200 gigabytes. Cool, all of physics (and some math and CS :)) for 30 bucks. (It would be nice to have the source as well as the PDFs, but this is a good change over their prior policy of zero bulk access to the entire corpus of PDFs.) Anyone had a chance to play with this?
Doh! Fixing…
I use the full text search for lexical analysis. E.g. “naturality” turns out to be commonplace in mathematics, regular in physics, occasional in computer science, accidental in nonlinear sciences, and unknown in biology, statistics, and finance (here I’m using standard Audubon Society terminology to report sighting of this lexical bird).
Usage gradients like this drive trans-discipline lexical flow. And don’t get me started on adjectives modifying “reduction”.
Um…that’s “Simple Storage System”, dear Pontiff! Or was that an intentional falsehood?
Thirty bucks seems a fair price for twenty years of intellectual masturbation.
Just kidding! (sort of)