Science Code Manifesto

Recently, one of the students here at U. Sydney and I had the frustrating experience of trying to reproduce a numerical result from a paper, but it just wasn’t working. The code used by the authors was regrettably not made publicly available, so once we were fairly sure that our code was correct, we didn’t know how to resolve the discrepancy. Luckily, in our small community, I knew the authors personally and we were able to figure out why the results didn’t match up. But as code becomes a larger and larger part of scientific projects, these sorts of problems will increase in frequency and severity.
What can we do about it?
A team of very smart computer scientists have come together and written the science code manifesto. It is short and sweet; the whole thing boils down to five simple principles of publishing code:

Code
All source code written specifically to process data for a published paper must be available to the reviewers and readers of the paper.
Copyright
The copyright ownership and license of any released source code must be clearly stated.
Citation
Researchers who use or adapt science source code in their research must credit the code’s creators in resulting publications.
Credit
Software contributions must be included in systems of scientific assessment, credit, and recognition.
Curation
Source code must remain available, linked to related materials, for the useful lifetime of the publication.

If you support this, and you want to help contribute to the solution, then please go and endorse the manifesto. Even more importantly, practice the five C’s the next time you publish a paper!

strike!

In a move that will undoubtedly bring the US Senate to its knees, the Quantum Pontiff is going dark from 8am to 8pm EST on Jan 18 to protest SOPA, PIPA, the Research Works Act and other proposed acts of censorship.
We suggest you use this time to contact your representatives, read a book (or 1201.3387), or go outside.

Could Elsevier shut down arxiv.org?


They haven’t yet, but they are supporting SOPA, a bill that attempts to roll back Web 2.0 by making it easy to shut down entire sites like wikipedia and craigslist if they contain any user-submitted infringing material. (Here is a hypothetical airline-oriented version of SOPA, with only a little hyperbole about planes in the air.)
I think that appealing to Elsevier’s love of open scientific discourse is misguided. Individual employees there might be civic-minded, but ultimately they have $10 billion worth of reasons not to let the internet drive the costs of scientific publishing down to zero. Fortunately, their business model relies on the help of governments and academics. We can do our part to stop them by not publishing in, or refereeing for, their journals (the link describes other unethical Elsevier practices). Of course, this is easy to say in physics, harder in computer science, and a lot harder in fields like medicine.
There is another concrete way to stand up for open access. The White House Office of Science and Technology Policy has requested comments on the question of public access to federally-funded scientific research. Comments should be from “non-Federal stakeholders, including the public, universities, nonprofit and for-profit publishers, libraries, federally funded and non-federally funded research scientists, and other organizations and institutions with a stake in long-term preservation and access to the results of federally funded research.” That’s us!
But don’t procrastinate. The deadline for comments is January 2.
Here is more information, with instructions on how to comment.
Here is also the official government Request For Information with more details.

Hippy Software Licenses

One of my favorite software licenses is the Beerware license, here in a version due to Poul-Henning Kamp:

/*
* --------------------------------------------------------------
* "THE BEER-WARE LICENSE" (Revision 42):
* <> wrote this file. As long as you retain
* this notice you can do whatever you want with this stuff.
* If we meet some day, and you think this stuff is worth it,
* you can buy me a beer in return Poul-Henning Kamp
* --------------------------------------------------------------
*/

Recently I came across a license of a form I’d never seen before, this one for one of the top graph isomorphism software programs, nauty:

Copyright (1984-2010) Brendan McKay. All rights reserved. Permission is hereby given for use and/or distribution with the exception of sale for profit or application with nontrivial military significance. You must not remove this copyright notice, and you must document any changes that you make to this program. This software is subject to this copyright only, irrespective of any copyright attached to any package of which this is a part.

Just as there are socially conscious mutual funds, it also appears that there are socially conscious software licenses! Who knew?

Watson

Exciting times for machine learning and artificial intelligence fans: IBM’s Watson computer versus two of the flesh bound starting tonight, Feb 14, 2011, on the game show Jeopardy. And not just any humans, I guess, but Ken Jennings and Brad Rutter (I suppose the latter is less known to non-Jeopardy fanatics.) Too bad they aren’t competing against Seth Lloyd (someone have a link to the game show Seth won?) Here is a nice Nova special on the machine and the challenge. I sure hope Arnold Schwarzenegger is in the audience so he can tell us whether Watson is a good Terminator or a bad Terminator.
Update: Here is an amusing curmudgeonly discussion of Watson by Noam Chomsky. Don’t worry Noam, I’m sure Watson won’t feel sad about all your mean words 🙂

Steve Ballmer Talk at UW March 4, 2010

Today Microsoft CEO Steve Ballmer spoke at the University of Washington in the Microsoft Atrium of the Computer Science & Engineering department’s Paul Allen Center. As you can tell from that first sentence UW and Microsoft have long had very tight connections. Indeed, perhaps the smartest thing the UW has ever done was, when they caught two kids using their computers they didn’t call the police, but instead ended up giving them access to those computers. I like to think that all the benefit$ that UW has gotten from Microsoft are a great big karmic kickback for the enlightened sense of justice dished out by the UW.
Todd Bishop from Tech Flash provides good notes on what was in Ballmer’s talk. Ballmer was as I’ve heard: entertaining and loud. Our atrium is six stories high with walkways overlooking it which were all packed: “a hanging room only” crowd as it was called by Ballmer. The subject of his talk was “cloud computing” which makes about 25 percent of people roll their eyes, 25 percent get excited, and the remaining 50 percent look up in the sky and wonder where the computer is. His view was *ahem* the view of cloud computing from a high altitude: what it can be, could be, and should be. Microsoft, Ballmer claimed, has 70 percent of its 40K+ workforce somehow involved in the cloud and that number will reach 90 percent soon. This seems crazy high to me, but reading between the lines what it really said to me is that Microsoft has *ahem* inhaled the cloud and is pushing hard on the model of cloud computing.
But what I found most interesting was the contrast between Ballmer and Larry Ellison. If you haven’t seen Ellison’s rant on cloud computing here it is

Ellison belittles cloud computing, and rightly points out that in some sense cloud computing has been around for a long time. Ballmer, in his talk, says nearly the same thing. Paraphrasing he said something like “you could call the original internet back in 1969 the cloud.” He also said something to the effect that the word “cloud” may only have a short lifespan as a word describing this new technology. But what I found interesting was that Ballmer, while acknowledging the limits of the idea of cloud computing, also argued for a much more expansive view of this model. Indeed as opposed to Ellison, for which server farms equal cloud computing, Ballmer essentially argues for a version of “cloud computing” which is far broader than any definition you’ll find on wikipedia. What I love about this is that it is, in some ways, a great trick to create a brand out of cloud computing. Sure tech wags everywhere have their view of what is and is not new in the recent round of excitement about cloud computing. But the public doesn’t have any idea what this means. Love them or hate them, Microsoft clearly is pushing to move the “cloud” into an idea that consumers, while not understand one iota of how it works, want. Because everything Ballmer described, every technology they demoed, was “from the cloud”, Microsoft is pushing, essentially, a branding of the cloud. (Start snark. The scientist in you will, of course, revolt at such an idea, but fear not fellow scientist: you’re lack of ability to live with imprecision and incompleteness is what keeps your little area of expertise safe and sound and completely fire walled from being exploited to the useful outside world. End snark.)
So, while Ellison berates, Ballmer brands. Personally I suspect Ballmer’s got a better approach…even if Larry’s got the bigger yacht. But it will fun to watch the race, no matter what.

Help Rod With His Summer Reading

Rod Van Meter is in search of some summer reading:

I’m feeling the need to recharge my store of ideas, and I have the
nagging feeling that my lack of currency in a bunch of fields is
causing me to miss some connections I could use in my own research.
So, I’m looking for a reading list of, say, the one hundred most
important papers of the decade. It doesn’t have to be an even
hundred, but I’m looking for a good summer’s reading. (Given that
it’s mid-2009, now would be a good time to start composing such a list
anyway, depending on where you want to place the “decade” boundary.)
I want these papers to cover *ALL* fields of computer science and
engineering; I am by nature catholic in my reading.

Head over to his site and help out with your favorite gem of CS/engineering!

Original McEliece Cracked

Shor’s algorithm is an algorithm for quantum computers which allows for efficiently factoring of numbers. This in turn allows Shor’s algorithm to break the RSA public key cryptosystem. Further variations on Shor’s algorithm break a plethora of other public key cryptosystems, including those based on elliptic curves. The McEliece cryptosystem is one of the few public key cryptosystems where variations on Shor’s algorithm do not break the cryptosystem. Thus it has been suggested that the McEliece cryptosystem might be a suitable cryptosystem in the “post quantum world”, i.e. for a world where a quantum computer is built (and if your a commenter who wishes to simply post the quantum computers are like string theory, please…save your keystrokes.)
Continue reading “Original McEliece Cracked”

Graceful Error Handling?

Running a process to fix the utf-8 support on scirate.com using the unix “screen” command I got the following crash:

Suddenly the Dungeon collapses!! - You die...

Doh!