Peter Chng

How accurate is your test?

Suppose we have a test that can detect whether someone has some disease or not, and that this test has the following performance:

  • If you have the disease, there is a 99% chance the test will return positive.
    This is known as the sensitivity or true positive rate (TPR) of the test.
  • If you do not have the disease, there is a 99% chance the test will return negative.
    This is known as the specificity or true negative rate (TNR) of the test.

Let’s say we go out and use this test on a number of people, and get back a number of positive results. What fraction of the positive results will be false positives?
(A false positive is where the test returned positive but the person did not actually have the disease)

  • A: 0%
  • B: 1%
  • C: There is not enough information to answer this question.

Even if you don’t know the answer to this question, you probably guessed (C), since otherwise I would not have anything else left to write.

(C) is indeed the answer: The fraction of false positives out of all positives of the test depends on the underlying prevalence (or base rate) of the disease in the population we used the test on.

99% might not be good enough

A test that has 99% sensitivity (true positive rate) and 99% specificity (true negative rate) does not mean that 99% of all positives will be true positives. This seems paradoxical, but is easily illustrated with this example:

  • We run the test on 10,000 people, but none of them actually have the disease.
  • Because there is a 99% chance the test will return a negative if the person doesn’t have the disease, this means there is a 1% chance the test will return positive if the person doesn’t have the disease. In other words, the false positive rate (FPR) is 1%.

From this, the number of false positives (FP) is:

$$ \text{FP} = (\text{Number of negatives} \times \text{FPR}) = 10000 \times 0.01 = 100 $$

The fraction of all positives that are false positives is called the false discovery rate (FDR), and it’s defined as follows:

$$ \text{FDR} = {\text{FP}\over{\text{FP + TP}}} $$

  • FP: Number of false positives
  • TP: Number of true positives

Since there are no true positives (because we said no one actually had the disease), the FDR reduces to just 1.0, meaning all positives are false positives. This means we have a false discovery rate of 100% from a test with 99% sensitivity and 99% specificity!

This is an example of the base rate fallacy. Here, what that means is that the false discovery rate depends on what the actual rate of the disease is in the population we are testing.

Prior information matters

If the base rate is too low relative to the false positive rate of the test, the number of false positives can quickly overwhelm the number of true positives, causing the false discovery rate to increase. When the false discovery rate increases, the informative value of a positive test diminishes. This means that getting back a “positive” test result doesn’t tell you much - you might have the disease, or you might not.

Without an estimation of the base rate, you cannot estimate the false discovery rate. Thus, to understand how informative a test is, we need to estimate not only the test’s accuracy but also the base rate of the disease among the population being tested.

We can think of the base rate as prior information about how prevalent the disease is among the population. In this way, we can formulate the calculation of the false discovery rate using Bayes' rule. Recall that the false discovery rate is the fraction of people who test positive who are actually negative, i.e. do not have the disease. This can be thought of as a conditional probability of the form:

$$ P(\text{negative|test positive}) $$

This reads as “The conditional probability of being negative given a positive test”. Using Bayes' rule, we can calculate this value as follows:

$$ P(\text{negative|test positive}) = {P(\text{test positive|negative}) P(\text{negative})\over{P(\text{test positive})}} $$

We can determine the three terms on the right hand side as follows:

\(P(\text{test positive|negative})\) is the probability of a person testing positive given they are actually negative. This can be thought of as the false positive rate, or \(1 - \text{Specificity}\)

\(P(\text{negative})\) is simply the proportion of individuals tested who actually do not have the disease. Usually this is an estimate based on some prior information.

\(P(\text{test positive})\) is the probability of testing positive. It is the weighted sum of positives from those actually positive, and the positives from those actually negative:

  • \(P(\text{test positive|positive})P(\text{positive})\): This is the fraction of positive tests from those actually positive.
  • \(P(\text{test positive|negative})P(\text{negative})\): This is the fraction of positive tests from those actually negative.

So:

$$ \displaylines{P(\text{test positive}) = P(\text{test positive|positive}) P(\text{positive}) + P(\text{test positive|negative}) P(\text{negative})} $$

Putting it altogether, we have:

$$ P(\text{negative|test positive}) = {P(\text{test positive|negative}) P(\text{negative}) \over{P(\text{test positive|positive}) P(\text{positive}) + \\ P(\text{test positive|negative}) P(\text{negative})}} $$

If we look at this equation, you’ll notice that if there are no actual positives, i.e. \(P(\text{positive}) = 0\), then the first term on the bottom drops out, and the right hand side reduces to \(1\), meaning the probability of a positive test being a false positive is 100%. This is consistent with our earlier calculation using actual numbers instead of probabilities.

Writing out the equation this way also makes it clear that the false discovery rate depends on the base rate, which are the terms \(P(\text{positive})\) and \(P(\text{negative})\). Since there are only two conditions, each one is the complement of the other, i.e. \(P(\text{positive}) = 1 - P(\text{negative})\), and so we really only need to know one of them. Usually this is \(P(\text{positive})\), which in this example would be the base rate of the disease among the population we will test.

How the base rate affects the FDR

We saw that when the base rate is 0%, the false discovery rate (FDR) or the fraction of positives that are false positives will be 100%. Conversely, if the base rate is 100%, then the FDR will be 0%, and all positives will be true positives. This is because if everyone that we test has the disease, then all positive results will be true positives and there simply cannot be any false positives. So the FDR reduces to 0:

$$ \text{FDR} = {\text{FP}\over{\text{FP + TP}}} = {0\over{\text{TP}}} = 0 $$

Intuitively then, the FDR should vary between 1 and 0 as the base rate varies from 0 to 1. Here’s a graph of how that looks like for our test with 99% sensitivity and 99% specificity:

FDR for a 99% sensitive, 99% specific test as a function of the base rate
FDR for a 99% sensitive, 99% specific test as a function of the base rate

You can see that the FDR or percentage of false positives out of all positives stays below 10% as long as the base rate (x-axis) of the disease is > 10%. Once the base rate drops below 10%, the FDR starts to increase quickly.

Another way to think about the FDR is to look at its complement, which is called the Precision or the Positive Predictive Value (PPV) of a test. This is the fraction of all positives that are true positives, and it is \(1 - \text{FDR}\), or:

$$ \text{PPV} = {\text{TP}\over{\text{FP + TP}}} = 1 - \text{FDR} $$

The precision tells you how likely it is you actually have the disease if you test positive. The alternate name, Positive Predictive Value, tells us this is a measure of how useful (or predictive) a positive test result is. If the PPV is too low, this means a positive test result doesn’t tell you much. Let’s take a look at a graph of the precision vs. the base rate:

Precision for a 99% sensitive, 99% specific test as a function of the base rate
Precision for a 99% sensitive, 99% specific test as a function of the base rate

This is just the complement of the first graph of the FDR. But again, we can see that the precision drops off quickly as the base rate falls below 10%.

Practical Implications for Testing

Because the FDR (or its complement, the precision/PPV) can vary so much depending on the base rate, this means we cannot ignore the base rate when administering a test to a population. If the base rate is too low, the number of false positives can quickly overwhelm the true positives and cause the FDR to quickly rise (or equivalently, the precision to plummet), decreasing the informative value of a positive test result.

Thus, before a test is administered to a population, one should have an estimate of the base rate. This can be problematic, since you are often administering the test in order to discover how prevalent some condition is, and you may not have a good idea of the actual base rate.

This is why medical tests that look for serious disease are typically only done when the patient actually displays symptoms. For example, an EKG can be done to check for indications of serious heart disease, but because there is a non-zero false positive rate (i.e. the specificity is not 100%), using an EKG to screen the general population can produce too many false positives. This would be a waste of resources and upsetting to those who got a false positive.

Instead, an EKG is typically only done when a patient displays symptoms, such as chest pain or discomfort. The presence of these symptoms in the patients tested increases the base rate. In other words, you’re more likely to have heart disease if you experience these symptoms, and thus running an EKG will provide an informative test. This seems like common sense, but we’ve shown above with some simple calculations why this is the case.

Additionally, multiple different tests can be done in order to confirm previous test results. This can also help reduce the number of false positives, especially since in practice you are never sure of the sensitivity and specificity, but instead have an estimate of it with an associated confidence interval.

Aside: FDR vs. conditional probabilities

As a side note, I’ve used the terms false discovery rate (FDR) and \(P(\text{negative|test positive})\) somewhat equivalently. Strictly speaking, they may not be the exact same thing. That is, \(\text{FDR}\) and \(P(\text{negative|test positive})\) do not necessarily refer to the same thing.

Recall that the false discovery rate is defined as follows:

$$ \text{FDR} = {\text{FP}\over{\text{FP + TP}}} $$

This is a purely frequentist definition that relies on the actual long-run counts (or frequencies) of true positives (TP) and false positives (FP) produced by the test.

By contrast, \(P(\text{negative|test positive})\) is a conditional probability. Whether this is the same thing as the FDR depends upon your interpretation of probability.

If we define probability as the long-run frequency of an event (i.e. how many times it occurs over many trials), then the above conditional probability is equivalent to the FDR. However, if we think of probability as the chance of a given outcome for a single trial, then the conditional probabilty and the FDR are not the same.

As an example, assume that \(P(\text{negative|test positive}) = 0.05\). Suppose you test positive and want to answer the question, “do I really have the disease or not”? Can you say that there is only a 5% chance that you don’t have the disease? (And a 95% chance that you actually do have the disease)

  • A strict frequentist interpretation would say that you either have the disease, or you don’t. It’s meaningless to speak of “probability” here, because a frequentist probability is only defined over the long-run, i.e. over many trials. Since the test has only been done once on you, speaking of the FDR as it applies to your single test is meaningless. The FDR only applies in the long-run, over many tests.
  • A Bayesian-like interpretation of the probability would say yes: We can assign probabilities to individual events, and these probabilities represent our degree of belief in an event. This is probably closer to what the layperson would interpret probability to mean.

Thus, it’s probably best not to conflate these two terms.