Friday, December 23, 2016 Written by Ellen Jo Baron, Ph.D., D(ABMM), Prof. Emerita, Stanford University Director of Medical Affairs, Cepheid

Dealing with the Complexity of a Living Target

CONCEPTS OF SENSITIVITY AND CONFIDENCE INTERVALS OF A NEW TEST RELATED TO COMPARATIVE (REFERENCE) ASSAYS

Example: Clostridium difficile tests

In their interpretation of laboratory test results, potentially risky practice patterns have been observed among some clinicians. If a laboratory test yields a positive result, clinicians tend to believe it and make treatment decisions based on that result. However, if the test result is negative but does not match their initial clinical impression, clinicians tend to discount the test result and continue their empiric therapy regimen or repeat the test to “prove” their initial diagnosis. Microbiologists often observe this phenomenon by examining physician ordering behavior with laboratory tests for Clostridium difficile.

Many sites report large numbers of repeat tests ordered, particularly if the initial test was an enzyme immunoassay (EIA), which are now reported to be less than 50% accurate compared with the cytotoxic culture with broth enrichment, recognized today as the gold standard.1-4 In fact, an article by Peterson and Robiscek contains an excellent statistical model of the declining positive predictive value of a test with 73% sensitivity in a population with a disease prevalence of 10%. From a positive predictive value (PPV) of 75% on the first time the test is performed, the PPV quickly drops to 45% if a second test is performed on another sample from the patient whose first test was negative. To move from the hypothetical to the actual, a description of a pseudo-outbreak of C. difficile infection (CDI) was published last year, in which repeated testing using a test with reduced specificity was implicated in producing false-positive results.5 Some commonly used EIA tests for C. difficile are even less than 73% sensitive, as shown in a recent comparison study.6 Thus, an important role for microbiologists and all laboratory scientists who perform tests and report results to caregivers is to educate physicians about the pretest probabilities, the predictive values, and the sensitivities and specificities of the tests that they are using, particularly in their own unique patient populations.

For determining the performance criteria, the sites chosen for assay comparisons, the numbers of samples and types of samples included, and the choice of the reference assay all influence the composite results achieved in comparative trials. In the past, some manufacturers who are gathering data on a newly developed assay to submit to the U.S. Food and Drug Administration (FDA) Office of In-Vitro Diagnostics, whose clearance (registration) is a requirement to sell any diagnostic assay in the United States, have chosen to generate the best possible results for inclusion in the package insert by comparing their assay with a conventional assay with good but not the highest possible sensitivity. Cepheid, on the other hand, chooses its reference assay based on the advice of the prominent experts in the field. Then Cepheid always seeks concurrence with the FDA on their study design, including comparator assay, prior to initiation of clinical trials. This is called the pre-IDE (investigational device exemption) process. Through this process, the FDA tries to maintain a level playing field when possible by suggesting that all submitting manufacturers use the same comparative assays. In some cases, however, new information relevant to the science of the assay leads the FDA and the manufacturer to upgrade reference methods for clinical trials of new devices.

Because molecular tests are often found to be more sensitive than previously used cultures or phenotypic characteristic-based assays (previously considered to be “gold standards” but now often referred to as “tarnished gold standards”), changes in clinical trial design are expected to occur with more frequency as assays using new technologies proliferate. This happened with molecular tests for C. difficile and resulted in clinical trial results (in the real-world setting) from laboratories who evaluated the FDA-cleared assay after its release to the public differing (always less sensitive) from those presented in the package insert. For example, data presented in the package insert for the BD GeneOhm™ Cdif PCR assay utilized cell culture cytotoxin neutralization, the test considered to be the “gold standard” for diagnosis of CDI until the last few years, as the standard for comparison. The product insert states overall sensitivity (percent positive agreement) to be 93.8% with a 95% confidence interval of 86.2%–98.0%. This means that statistically speaking one could reasonably expect that in a normal distribution of unformed stools from a patient population similar to those in the validation studies, that a laboratory would find a positive test result when testing a sample from a truly positive CDI patient anywhere from 86.2% to 98% of the time. In any given group of patients with diarrhea, a laboratory’s sensitivity (ability to detect all true positive patients) versus cell culture cytotoxin neutralization could be as low as 86% (i.e., the test will be falsely negative for 14% of patients with CDI) or as high as 98%.

When Stamper and colleagues from Johns Hopkins University Laboratory performed a comparison of this commercial assay against cell culture cytotoxin neutralization in their laboratory, they reported the sensitivity of the assay to be 90.9% with a 95% confidence interval of 82.4%–99.4%, very similar to the data shown in the package insert.4 However, when they compared the commercial PCR assay against toxigenic culture (the current “gold standard”), they reported sensitivity of 83.6% with a 95% confidence interval of 74.3%–92.9%. These percent positive agreements differ from those stated in the package insert. Results from another FDA-cleared commercial nucleic acid amplification C. difficile assay, Prodesse proGASTRO,™ were also compared to the results of cell culture cytotoxin neutralization assays. The Prodesse package insert reports sensitivity of 91.7% with a 95% confidence interval of 83.0%–96.1%. The Johns Hopkins group also evaluated this assay versus cell culture cytotoxin and reported a slightly lower sensitivity of 83.3% (95% CI of 70.0% – 96.7%), but when compared against toxigenic culture with broth enrichment, the sensitivity of the Prodesse assay dropped to 77.3% (95% CI of 64.9%–90.0%).7

By comparison, in their clinical trials, the results of the Cepheid Xpert® C. difficile assay were compared to the results of toxigenic culture with broth enrichment.8 The package insert reports that the assay showed sensitivity of 93.5% (95% CI of 90.3% – 95.9%) versus the toxigenic reference culture. In studies conducted outside of the FDA clinical trial, Novak-Weekley and others reported that the Xpert C. difficile assay showed 94.4% sensitivity when compared to enrichment toxigenic culture.9 These data are very similar to the clinical trial data included in the package insert. Having results of an independent study showing >90% positive agreement with a recognized gold standard suggests that the assay is robust and reproducible. Additional characteristics of the test performance, such as rapid turnaround time or a moderate complexity rating, make the test even more attractive to clinical microbiologists. Using a rigorous “gold standard” is critical to avoiding the sorts of surprises seen lately, such as the lack of reliability of the C. difficile EIA tests and, more recently, Influenza A EIA tests.10

NO TEST IS PERFECT

Issues regarding the GeneXpert® MRSA/SA Blood culture assay

The concept of “confidence intervals” is important with respect to recently recognized problems with the detection of some strains of staphylococci in the Xpert® MRSA/SA Blood Culture (BC) test. The package insert for this assay, based on the clinical trials conducted on 249 growth-positive blood cultures that showed gram-positive cocci in clusters on the initial Gram stain, and grew MRSA in culture, states that the test had a sensitivity of 100% for methicillin-resistant Staphylococcus aureus (MRSA) with a 95% confidence interval of 93.3%–100%. The lower limit of 95% confidence is based on statistical analysis of the sample set, which included 53 MRSA-positive cultures. If more samples had been tested with perfect agreement to culture, as the first 53 were, then the lower limit of the confidence interval would have been higher than 93.3%.

However, the chances that the test would continue to perform perfectly would of course decrease as more clinical samples were tested. Another commercial product developed for the same purpose also failed to detect certain strains of staphylococci once use of the test expanded into additional areas of the world.11 In this case, additional variants of one of the target gene sequences that had not been included in the test design explained some of the errors, but at least one failure to detect an isolate of MSSA was not explained.

For methicillin-susceptible S. aureus (MSSA) isolates, the GeneXpert clinical trial data yielded a sensitivity of 100% for 77 positive samples, but now the 95% confidence interval was tighter (95.3%–100%) because there were more positive results and a higher percentage of positives among all the samples tested. However, again, no knowledgeable microbiologist would expect the test to continue to yield 100% positive agreement as more and more samples were tested. In some instances, the specific genetic sequences of the target genes for this assay, although chosen to be highly conserved, are subject to random modifications, either by mutations, insertions, or deletions. S. aureus undergoes continuous selection to maximize its infectious potential.

Another reason to expect some variability of results between the genotype of the molecular test and the phenotype of the culture and susceptibility tests is greater diversity of strains encountered as more and more laboratories around the world adopt the technology. The statistical evaluation of the clinical trial data accounts for this variation in the real world, informing consumers that in their hands, the percent positive agreement (to the gold standard of culture and susceptibility) of the test could be as low as 93.3% for MRSA and 95.3% for MSSA. A third reason to expect that the GeneXpert® assay will not always yield 100% accurate results is the same reason why cultures, instrument-derived susceptibility results, and any other laboratory test results do not yield 100% accurate results: human error and random chance.

Sources of error in laboratory testing

Sources of incorrect results in the laboratory can often be attributed to human error: putting the wrong label on the sample, improper transcribing of results, or using off-label specimens, swabs, or reagents. For example, a recent request for technical support with the Xpert® MRSA/SA BC assay revealed that the laboratory with a problem had used the Xpert to confirm a staphylococcus isolated from a urine sample. Other problems have arisen when positive blood culture broth samples are placed into cartridges without first confirming that the organism is gram-positive cocci in clusters by Gram stain. Such uses are certainly not within the parameters specified in the package insert and often lead to testing problems.

Automated instruments used in other areas of microbiology also are imperfect. For instance, major errors in antibiotic susceptibility testing have been observed for several automated AST systems. However, physicians rarely question a laboratory report based on phenotypic results and laboratories do not routinely report such results including confidence intervals. A paper by the Pseudomonas Antimicrobial Susceptibility Test Study Group published several years ago documented very major error rates as high as 22% when testing piperacillin-tazobactam vs. Pseudomonas aeruginosa with one automated instrument, yet many laboratories continue to use that system for their standard test and physicians probably continue to use that antimicrobial agent to treat patients when the laboratory reports the organism as susceptible, never questioning whether that result could be incorrect.12 The more complex the testing process, the more chance for problems to arise that can lead to an incorrect final answer.

In some cases, the testing system does not even have a method to detect an error or a test system failure. One example of such a system in which an error could have major consequences for the patient is that of many of the current molecular assays for detection of Chlamydia trachomatis and Neisseria gonorrhoeae (CT/NG) from genital sites.

PCR relies on the activity of a temperature-stable polymerase enzyme to amplify the target DNA sequence multiple times during the reaction. Many assays include an internal control target with the same characteristics as the assay target, but with a different profile after the sequence has been amplified (maybe a different-colored fluorophore marker or a different-size product, depending on the detection method) so the operator can verify that the polymerase enzyme was able to do its job, even if the result for the pathogen target was negative. An alternative but slightly less rigorous method, often used with LightCycler® and similar assays, is to amplify the patient sample in two separate reaction tubes, one with the external control target spiked into the reaction vial and one with only the patient sample. If the patient sample is negative after PCR, the result is valid only if the companion sample shows amplification of the control.13

Without such an internal control, there is the possibility of a false-negative result in a patient sample containing the pathogen due to inhibitors in the reaction that prevent target amplification or due to failure of the polymerase activity for some other reason (temperature issues, degradation of reagents, etc.). Some commercial CT/NG assays do not have an internal control to detect inhibition, and so potential false-negative results will not be detected and results will simply be reported as negative. In some cases, there is a control used in a separate tube, but some laboratories have opted not to utilize that system due to high contamination rates. At least one study examined the ability of the inhibition control of several CT/NG assays to actually detect all inhibitory factors, and the commercial systems turned out to be very insensitive.14 Overall rates of inhibition for selected molecular CT/NG assays studied by Chernesky and colleagues varied from 2% to 27% depending on the sample (urine vs. secretions).14 The test result of the inhibited sample would simply appear as "negative."

The GeneXpert® cartridge

The GeneXpert assays all contain an internal control that is added to the reaction at the beginning of the process, and whose amplified presence must be detected at the end of the process in the event that none of the pathogen targets were amplified. In fact, for assays targeting bacterial organisms, Cepheid chose the most stringent control they could think of, a Bacillus spore. If the sample manipulation (including sonication) and PCR reaction can amplify the DNA from within the spore, it almost certainly can amplify the toughest other targets, including spores of C. difficile, S. aureus with thick cell walls, and hard-to-break cells of Mycobacterium tuberculosis. This assures the user that the reaction worked. Of course, a patient sample could still yield a false negative result if the target DNA was below the level of detection of the assay.

The GeneXpert cartridge is a marvel of engineering and molecular know-how. The cartridge contains 11 reservoirs or chambers that can be used to hold and transfer sample, diluents, and reagents (Figures 1 and 2). In the center of the cartridge is a cylinder into which fits a plunger, controlled robotically from above by the software program unique to each assay. The liquids in the chambers are moved among the chambers either by positive or negative pressure via small openings at the base of the reservoirs. The cartridge can rotate around the central cylinder so that the liquid within can be discharged into other chambers via the small opening in the cylinder. The enzymes, DNA building blocks, and other reagents for each type of reaction are maintained as freeze-dried beads stored partway up in the appropriate chambers. When the volume of liquid in that chamber rises to a level above the beads, they dissolve to create the reaction mixture. An important internal quality control check on the proper progression of the reaction is that the level of liquid was high enough to dissolve the beads.

Measuring the liquid level is also built into the system. Another aspect of the complexity of the assay cartridge is the viscosity of the material being moved from chamber to chamber. The plunger has software tools to measure the pressure needed to push and pull the liquids in the cartridge. Sometimes a sample may have too much patient material (for example, a stool swab with too much stool on it or a nasal swab with too much mucus) and the resulting suspension is too viscous or thick to easily push through the openings. This also generates a warning in the GeneXpert software to alert the user and results in an error and no result. If there were a leak in the tight seal at the top of the cartridges that keeps all the liquids in their proper chambers, the pressure on the plunger would yield a different signal, but this would also generate an error result. One can envision instances when the final suspension pushed out into the reaction vial for amplification and detection of products is somehow not perfect, with bubbles in the vial or some contaminant in the mixture that interferes with fluorescence detection. Maybe the liquid after the first reaction in a nested PCR process doesn’t get fully removed from the reaction vial. These potential problems may never happen or may happen so rarely that they are statistically insignificant, but the potential for many different causes of failures is there. It is amazing that the millions of tests that have now been performed in GeneXperts around the world have resulted in so few problems and errors. The overall message is that the GeneXpert cartridge is a highly complex and beautifully engineered multi-environment mini-molecular laboratory in a compact, portable package. Infinite numbers of assays to detect numerous types of analytes (not just DNA) can be developed to work in the same system with the same outer instrument for moving the parts of the cartridge through their paces and heating and cooling the reaction vial precisely as needed for each assay, and the cartridge will continue to deliver highly accurate, exceedingly reliable and rapid results (Figure 3). However, just like any laboratory test, whether based on phenotypic traits or human subjective observation, sometimes the results are not perfect. As leaders in the field of molecular diagnostics, we at Cepheid want to hear from you should we fail to meet your expectations.

REFERENCES

1. Peterson, L. R. & A. Robicsek. 2009. Does my patient have Clostridium difficile infection? Ann Intern Med. 151: 176-179.
2. Cardona, D. M. & K. H. Rand. 2008. Evaluation of repeat Clostridium difficile enzyme immunoassay testing. J Clin Microbiol. 46: 3686-3689.
3. Aichinger, E., et al. 2008. Nonutility of repeat laboratory testing for detection of Clostridium difficileby use of PCR or enzyme immunoassay. J Clin Microbiol. 46: 3795-3797.
4. Stamper, P. D., et al. 2009. Comparison of a commercial real-time PCR assay for tcdB detection to a cell culture cytotoxicity assay and toxigenic culture for direct detection of toxin-producingClostridium difficile in clinical samples. J Clin Microbiol. 47: 373-378.
5. Litvin, M., et al. 2009. Identification of a pseudo-outbreak of Clostridium difficile infection (CDI) and the effect of repeated testing, sensitivity, and specificity on perceived prevalence of CDI. Infect Control Hosp Epidemiol. 30: 1166-1171.
6. Eastwood, K., et al. 2009. Comparison of nine commercially available Clostridium difficile toxin detection assays, a real-time PCR assay for C. difficile tcdB, and a glutamate dehydrogenase detection assay to cytotoxin testing and cytotoxigenic culture methods. J Clin Microbiol. 47: 3211-3217.
7. Stamper, P. D., et al. 2009. Evaluation of a new commercial TaqMan PCR assay for direct detection of the Clostridium difficile toxin B gene in clinical stool specimens. J Clin Microbiol. 47: 3846-3850.
8. Sloan, L. M., et al. 2008. Comparison of real-time PCR for detection of the tcdC gene with four toxin immunoassays and culture in diagnosis of Clostridium difficile infection. J Clin Microbiol. 46: 1996-2001.
9. Novak-Weekley, S. M., et al. 2010. Clostridium difficile testing in the clinical laboratory by use of multiple testing algorithms. J Clin Microbiol. 48: 889-893.
10. CDC. 2009. Evaluation of rapid influenza diagnostic tests for detection of novel influenza A (H1N1) virus - United States, 2009. MMWR Morb Mortal Wkly Rep. 58: 826-829.
11. Snyder, J. W., et al. 2009. Failure of the BD GeneOhm StaphSR assay for direct detection of methicillin-resistant and methicillin-susceptible Staphylococcus aureus isolates in positive blood cultures collected in the United States. J Clin Microbiol. 47: 3747-3748.
12. Juretschko, S., et al. 2007. Accuracies of ß-lactam susceptibility test results for Pseudomonasaeruginosa with four automated systems (BD Phoenix, MicroScan WalkAway, Vitek, and Vitek 2). J Clin Microbiol. 45: 1339-1342.
13. Chan, E. L., et al. 2000. Performance characteristics of the Becton Dickinson ProbeTec System for direct detection of Chlamydia trachomatis and Neisseria gonorrhoeae in male and female urine specimens in comparison with the Roche Cobas System. Arch Pathol Lab Med. 124: 1649-1652.
14. Chernesky, M., et al. 2006. High analytical sensitivity and low rates of inhibition may contribute to detection of Chlamydia trachomatis in significantly more women by the APTIMA Combo 2 assay. J Clin Microbiol. 44: 400-405.