[latexpage]

An interesting paper in PNAS from a few years back that I missed, “Strong profiling is not mathematically optimal for discovering rare malfeasors” by William H. Press.

Suppose you have a large population of people and a single bad guy who you want to catch. You could look through the entire population to find the bad guy, or you could set up checkpoints (err I mean airline security screening areas) to look for people, sampling only some small fraction of the population that goes through the checkpoint. Now, if you don’t know anything about the population you’re sampling, you might as well just sample them randomly until you find your baddie, since you don’t have any other information that could help you.

But suppose that you are able to come up with some prior probabilities for different people to be the bad guy. I.e. you’ve developed a model for what makes someone more likely to be a bad guy, and further assume that this model is really pretty accurate. To each person you can assign a probability $p_i$ that the person is indeed the bad guy. Now you could continue to sample uniformly, but you have this great model that you want to use to help catch the bad guy. What do you do?

It turns out that the **wrong** thing to do is to sample in proportion to the probability $p_i$. To figure out the correct strategy, suppose that you sample from the population with probability $q_i$. Then if $k$ is your man, the probability that you get him in one sample is $q_k$. Or, another way to say it is that the mean number of screenings you’ll need to find the baddie is $1/q_k$. Assuming your model is correct, the mean number of people you will have to sample is $sum_i p_i/q_i$. So now to calculate the optimum we need to minimize this expression for $q_i$ subject to the constraint that $sum_i q_i =1$.

To calculate this optimum, you use a Lagrange multiplier

$${partial over partial q_j} left[ sum_i {p_i over q_i} + lambda (sum_i q_i -1)right]=0$$

or

$$ – {p_j over q_j^2} + lambda = 0$$

Which, in order to satisfy our contraints (and also positive probabilities) gives us the answer for the optimum of

$$ q_j = { sqrt{p_j} over sum_i sqrt{p_i}}$$

Or, in other words, you should sample proportional to the square root of the probabilities. Pretty cool, a nice easy, yet surprising answer.

Even more awesome is that we got some square roots of probabilities in there. Quantum probability amplitudes are, of course, like square roots of probabilities. Now if only we could massage this into insight into quantum theory. Do it. Or the terrorist win.