SNOUT & SPIN
Almost no one is good at evaluating medical tests. Not politicians. Not medical journalists. Not the general public. And often, not even doctors.
That’s one lesson I’ve been reminded of during the last few weeks of this Covid19 pandemic - information about testing is confusing. There are diagnostic tests for the virus (the nasopharyngeal swab tests where we look for the antigen and can declare someone infected). There are newer blood tests looking for our body’s reaction to the virus (in the form of specific IgG antibodies). These blood tests can tell epidemiologists how many of us in the population have been previously exposed and have developed an immune response. It's less clear what these tests mean for an individual; some evidence suggests a prior infection may not protect you (give you immunity) from a second infection.
Pathologists are the ones to interpret test data and they typically are cautious about implementing bad tests - bad information can be worse than none at all. If after a medical test you have no more information than you had to start out with, that was a waste of time and resources. (As a surgeon once told me: never order a test when you don’t care what it shows). So let’s avoid the bad information. As you sift through the reports and literature, I thought a brief primer about test statistics might come in handy.
I’ll divide this up into two broad categories:
Is it a good test? (most people stop here)
How does it work for patients?
Is it a good test?
Before we start with any test, we presume a reference standard - someone, somewhere authoritatively has the disease or not (this used to be called the gold standard, although ironically even that term has been abandoned). All stats about the test refer back to that reference standard.
Assuming a yes or no answer, medical tests take raw materials and give us results: Positive or Negative. But not all positive test results really are positive and not all negative test results really are negative. We call these mistakes false positives and false negatives; they occur with every test.
What we all want for any test is for someone with the disease to reliably get a positive result (true positive), and for someone without the disease to get a negative result (true negative). We express this using sensitivity and specificity:
Sensitivity. How good is the test at detecting the presence of disease?
Specificity. How good is the test at detecting the absence of disease?
In general a highly sensitive test means that a negative result is truly negative.
And conversely, a highly specific test means that a positive result is really positive.
These truisms have been expressed by medical students in a helpful mnemonic:
SNOUT (Sensitive, Negative, Rule Out)
SPIN (Specific, Positive, Rule In).
All of my words about how to determine if it is a good test can be expressed in a familiar diagram.
If all you know is sensitivity and specificity, you understand the test. But you are missing the most important question: the patient.
How does the test work for patients?
On the right hand side of the table is what actually matters for patient care. This is the question physicians face when dealing with a test result: “A patient has a positive result; how likely is this patient to actually have the disease?” In other words, what is the probability that someone with a positive screening test result does indeed have the condition? What is the probability that someone with a negative result really is ruled out? These are the post-test probabilities, expressed by the positive and negative predictive values (PPV and NPV).
If you prefer math to express ideas, here are the formulas from the table.
(Doctors: can you challenge yourself to write these out after seeing the table?)
Sensitivity = [a/(a+c)] x 100
Specificity = [d/b+d)] x 100
Positive predictive value = [a/(a+b)] x 100
Negative predictive value = [d/(c+d)] x 100
The missing ingredient from these questions of the test itself is the prevalence of the disease: the total number of people with the disease*. When we look at the results of a test, we need to know: how likely is it that someone with the disease will be encountered in the specified population? What makes a test worth doing is one that significantly increase the difference between the pre and post test probabilities for the condition being tested**.
My shorthand for pre-test probability is ‘Garbage in, Garbage out’; the results from tests you order are only as good as the indications for testing. There are lots of statistical ways to evaluate disease prevalence, beyond the scope of this post and beyond how I evaluate test information.*** But let's look at a well-known study**** that examined physicians’ understanding of clinical laboratory data for an example as to how this works. Researchers asked a set of physicians in 1978 and again in 2014 to interpret this question about a laboratory a test:
“If a test to detect a disease whose prevalence is 1/1000 has a false positive rate of 5%, what is the chance that a person found to have a positive result actually has the disease (PPV), assuming you know nothing about the person's symptoms or signs?”
In both the 1978 and 2014 studies, approximately 75% of physicians answered the question incorrectly as 95%. Even when physicians knew that they needed to calculate the Positive Predictive Value, (they knew what the PPV was), they could not incorporate prevalence into their calculations.
A rough and ugly math approach shows why they were wrong:
A false positive rate of 5% means that out of 1000 people there are 50 false positives
The disease presence in the population, (the prevalence) is only 1 person in 1000 (1:1000)
(so only 1/1000 people will have a True Positive test)
Which means in total we have 51 positive tests, and only one of those is a true positive.
Thus the PPV is something like 1:51, or approximately 2%
There is a lot of information coming at us describing new tests for Covid19 - new tests, tests that work, tests that don’t work, tests that promise to open up society and tests that claim we’ll be declared free or isolated based on results. When you read about the promises of these tests, remember: It's not all about the performance of the test itself. Any test result is dependent on the characteristics of the population.
People routinely mix up and consolidate Incidence and Prevalence when talking about Covid19. But they are different: Incidence is the new additions to the reservoir; prevalence is the total in the reservoir; and cure/death decrease the reservoir.
An example is the fecal occult blood test for colon cancer. A sample patient might have a 3% (0.03) probability of having colon cancer, based on age, family and medical history. After a positive fecal occult blood test, the odds might increase to 18.6% (0.186). This difference (3 → 18.6) is large enough to justify testing.
I am not an epidemiologist but I am doing more reading about pre-test probability. When I interpret a biopsy, the physician assumed a certain pretest probability; presumably high enough to warrant the biopsy. But sometimes their reasons for performing the test aren’t worthwhile, and this affects my results.
Original study from 1978:
2014 Update, full article: