Friday, December 23, 2016 Written by Donna M. Wolk, MHA, Ph.D., D(ABMM) and Daniel Olson, MPH*

Verification of Qualitative Real-time PCR Methods

MEASURE TWICE, CUT ONCE ACCURACY.
Clinical laboratories rely on the FDA clearance/approval process to ensure that commercially available molecular tests have been evaluated and found to be accurate and medically useful. However, the FDA-approval process is only the first step towards ensuring diagnostic accuracy in your laboratory. The process of in-house verification is governed by the Clinical Laboratory Improvement Amendments (CLIA) regulations5, and begins with the laboratory undertaking studies to reproduce the manufacturer's claimed performance characteristics4,5,9. Verification of an FDA-cleared assay by a laboratory performing the test exactly as described in the product insert is less stringent than the validation process needed for a modified test. If a test is not performed according to the published manufacturer guidelines, for instance, if the sample transport matrix is different from that in the package insert, then a more extensive method validation (not simply a verification) must occur and the test is then considered "off-label use" or a laboratory developed test (LDT), depending on the extent of deviation from approved protocols. This makes the user legally responsible for complete validation of the assay5. This article focuses on the process for implementing an FDA-cleared assay in your laboratory. The parameters of a commercial test that a laboratory must document in their own performance of the test are accuracy (analytical sensitivity and specificity), precision (reproducibility), reference range (for qualitative assays), and reportable range (for quantitative assays).

The first step of the in-house verification process is to reproduce the analytical sensitivity (% positive for spiked specimens) and limit of detection (LOD), which addresses the reference range requirement, reported in the package insert. The LOD is the lowest density of the target organism that can be detected [i.e, statistically different from a blank, typically determined to be in the region where the analytical signal can reliably be distinguished from "analytical noise," which is the signal produced in the absence of analyte2]. The 95% LOD is the density at which the target microorganism is detected 95% of the time. Limits of detection are matrix, method, and analyte specific. There are two common ways to confirm these values: replicate testing and probit analysis.

1) Replicate Testing: In replicate testing, multiple aliquots of the genetic target are prepared and tested. It is best to use whole organisms spiked into the same matrix as the samples to determine the LOD to ensure adequacy of the extraction process prior to real-time PCR. Plasmid or genomic DNA of a known organism load (available from commercial sources, such as Zeptometrix, Acrometrix, ATCC, and others) may be used in cases when it is difficult or impossible to grow the microorganism in question. Many laboratories are able to grow bacteria and make dilutions in a liquid for spiking into the matrix but they may need to purchase viral DNA or RNA. One way to prepare the sample matrix for spiking experiments is to pool negative specimens to a sufficient volume to allow aliquoting of at least 20 replicate samples for testing. To begin, one first prepares a fresh microbial suspension to the appropriate density (colony forming units [cfu]/ml). Log phase cultures are optimal to avoid including too many dead bacteria. The suspension, when added to the negative matrix and mixed to homogeneity, should yield an organism load at the LOD stated in the product insert. You can then aliquot the appropriate number of samples to be tested (Table 1). It is a good idea to confirm the organism concentration (cfu/ml) by direct plate counts. To determine an accurate measurement of the 95% LOD, it is optimal to test ≥ 20 replicates. If only one of 20 samples is a false negative, then you have re-verified the assay's stated 95% LOD. If no false negatives are observed, you can document a 100% LOD, while 2 negative results = 90% LOD, and so on. You may consider asking several technologists to perform the replicate testing stage of this assay over a series of days. Thus, you can complete training, reproducibility studies, and verification of the LOD at the same time.

2) Probit analysis: This process, i.e., determining "probability units" (or "probits"), is only necessary for quantitative tests for which there is a binary (positive or negative; yes or no) result. Samples are prepared in a similar manner as for replicate testing, except that several dilutions of the organism suspensions (organism counts/ml) are prepared, much like the process of creating a standard curve with 3-4 targets of different density. One can think of the basis of the process as a dose-response curve, since the probit was originally created to assess dose-response3. Commonly, 3-12 replicates are prepared for each density, avoiding 0% positive and 100% positive densities, as they will skew the creation of the slope of your regression line. The suspensions can encompass low (3-4 x LOD), medium (5-6 x LOD) and high densities (7-8 x LOD). Following this process will enable the user to plot the graph and calculate the 95% LOD. The probit analysis transforms the proportions of positive responses detected into a "probability unit" (or "probit"). A table or software program is used to convert the proportions of positive response (response, y axis) to a probit, which is then plotted against the logarithm of the density (dose, x axis) and thereby, obtain a series of probabilities of positive responses associated with different concentrations. If you have access to cycle threshold values from a real-time PCR assay, the assignment of a cut-off value for the assay can be based on the 95% confidence interval (CI) of the Ct observed at the density associated with the corresponding 95% LOD value obtained from the probit analysis. Note that if using an LDT, the assay cutoff value must be revalidated periodically in accordance with clinical guidelines. Every six months is a commonly used metric.

Next, analytical specificity (% accuracy for negative specimens) is determined by challenging the assay with genetically similar or closely related organisms to assess cross-reactivity. Spiked specimens containing organisms commonly found in the sample type can be used for this assessment. This should include organisms that are genetically similar to those targeted in the assay. You can enhance your assay specificity process by using a variety of known negative patient samples, which were used for your % agreement assessment (see below). Reviewing the results from these samples may yield identifications of other isolated microorganisms that can provide additional evidence that no cross reaction with non-target microorganisms occurred during testing. Finally, if primers and probes are known (which is uncommon for commercial assays), it is prudent to perform virtual specificity studies (i.e., using computer-aided technology) to compare the oligonucleotide sequences to all known genetic sequences, testing for any potential cross-reactivity. This process is done by manufacturers of commercial assays and checked during the FDA clearance process, so laboratories are not obligated to repeat the exercise. Although this approach is able to identify potential cross-reactions, it cannot predict the evolution of microorganisms or the emergence of newly identified microorganisms that could cross-react with primers and probes in the future. For these reasons, continual assessment of specificity is essential throughout the life of any molecular assay.

Another and often parallel step in the verification process is documentation of assay reproducibility (also known as variability) in order to determine acceptable value ranges for the important assay parameters. For qualitative assays, reliability of positive and negative results over time with different operators is sufficient. For quantitative assays, intra-assay variability (within assay) can be obtained by using the same 12-20 repetitions performed to determine LOD. One can calculate the mean Ct values for both target and internal controls, along with the standard deviation (SD), coefficient of variation (% CV), and other variance measures. Reproducibility should be defined at several analyte densities (i.e., low, medium, and high) and multiple operators should be included in the testing. Although it is a commonly used metric, the % CV is not an optimal measure of variability because it will vary by organism density; therefore, the % CV must be calculated at each density. Inter-assay variability (between assays) is determined by calculating the same variance measures as intra-assay variability; however, the data used are those obtained by replicate testing across many days and the mean and three Standard Deviations (3SD) can be used to monitor lot-to-lot performance and operator competency via trend analysis, the assessment of Ct values across time, reagent lots, shifts, and operators. Alternately, the 95% confidence interval may be used for trend analysis; however, 3SD is more commonly used. To obtain the most accurate measurement, 28 measurements are optimal; however, 6-12 replicates will often provide a reasonable mean and SD from which you can launch your assay and begin trend analysis. Mean and 3SD can then be adjusted after obtaining 20-28 runs. In this instance, the upper and lower confidence limits are determined. Performing and characterizing 28 separate test runs provides the clinical laboratory scientist sound results from which new lots can be assessed over time9. Often, results obtained from the external positive controls can be charted over time and graphed against the mean in order to describe the inter-assay variability. The Levey-Jennings Plot6 is a useful way to depict these results (Figure 1).

Trend analysis is part of the qualitative assay characterization that can be utilized as an ongoing process to provide quality assurance for the verified assay. Some investigators have used Westgard rules for assessment of trend analysis and have adapted those rules, or created their own rules, for monitoring the trend results; examples are listed in (Table 2)7,8. For example, if the percent of positive patient results appears to be slowly getting higher over time and the patient characteristics have not changed, you may need to investigate if something has changed with regards to the test. In this case, the Ct values should be graphed to spot trends.

Clinical percent agreement or qualitative accuracy.
These two parameters can be established by comparing the performance of your new assay to a reference (gold) standard method. Ideally, this would be accomplished by using known positive specimens over a known range of target densities (low, medium, and high-signal positives), including enough samples of weakly positive specimens to establish accuracy at the lower end of the spectrum. In order to obtain enough statistical power to compare the assay performances and give the laboratory director sufficient information on whether or not to replace an existing assay, a power calculation should be performed; however, for the sake of cost and expediency, the ASM Cumitech1 recommends a minimum of 50 positives and 100 negatives for testing. Depending on the performance of the assay, and the criticality of test results, more specimens may need to be tested. Laboratories should allow for random testing of the specimens in order to imitate a real testing environment; that is, do not test all the positives in one run and all the negatives in another run. Assessment of the % agreement as well as performance parameters such as clinical sensitivity, clinical specificity, positive predictive value (PPV), and negative predictive value (NPV), allow laboratorians and clinicians to calculate test performance relevant to their particular locale and disease prevalence. Clinical sensitivity and specificity should be calculated in a scenario as close as possible to the real-life population the laboratory will serve since PPV and NPV will vary by disease prevalence. The PPV is the proportion of persons who are truly diseased and are identified as diseased by the test under investigation. The NPV is the proportion of truly non-diseased persons who are classified as negative by the assay. Often a 2x2 table is used to determine clinical performance characteristics of the assay (Figures 2-3). With newer molecular assays, it is possible that the previous "gold standard" is not as accurate as the new test. This can lead to discrepant results that may require a third method for resolution. In some cases, clinical information from symptomatic patients may be used, and in other cases a different molecular method or sequencing the amplified product of the new assay must be performed. The results of the comparator methods, sometimes combined into a final disease assessment, are considered the clinical "truth" (Figure 3).

Summarizing Results:
Only after you perform in-house method verification, can you begin patient testing. Typically a summary of the entire method verification is created and signed by the Medical Director prior to testing and bundled with raw data and other information in the form of a verification folder or binder. Examples of the Table of Contents headings and subheadings used in the University of Arizona Medical Center Verification Binder is shown in (Figure 4).

*Donna M. Wolk, MHA, Ph.D., D(ABMM), Infectious Disease Research Core, BIO5 Institute, University of Arizona, Tucson, AZ University of Arizona Health Network, Tucson, AZ and Daniel Olson, MPH, Infectious Disease Research Core, BIO5 Institute, University of Arizona, Tucson, AZ

REFERENCES

1. American Society for Microbiology. 2009. Cumitech 31A: Verification and Validation of Procedures in the Clinical Microbiology Laboratory. ASM Press, Washington, D.C.
2. Armbruster, D. A. and T. Pry. 2008. Limit of blank, limit of detection and limit of quantitation. Clin. Biochem.Rev. 29 Suppl 1:S49-S52.
3. Bliss, C. I. 1934. The Method of Probits. Science 79:38-39.
4. Burd, E. M. 2010. Validation of laboratory-developed molecular assays for infectious diseases.Clin. Microbiol. Rev. 23:550-576. doi:23/3/550 [pii];10.1128/CMR.00074-09 [doi].
5. Centers for Medicare & Medicaid Services, Department of Health and Human Service. Clinical Laboratory Improvement Act, Subpart K, 493.1253. 7-7-2004.
6. Levey, S. and E. R. Jennings. 1992. The use of control charts in the clinical laboratory. 1950. Arch. Pathol. Lab Med. 116:791-798.
7. Liang, S. L., M. T. Lin, M. J. Hafez, C. D. Gocke, K. M. Murphy, L. J. Sokoll, and J. R. Eshleman. 2008. Application of traditional clinical pathology quality control techniques to molecular pathology. J. Mol. Diagn. 10:142-146. doi:S1525-1578(10)60141-9 [pii];10.2353/jmoldx.2008.070123 [doi].
8. Westgard, J. O., P. L. Barry, M. R. Hunt, and T. Groth. 1981. A multi-rule Shewhart chart for quality control in clinical chemistry. Clin. Chem. 27:493-501.
9. Wolk, D. M. and E. M. Marlowe. 2011. Molecular Method Verification, p. 861-884. In: D. P. Persing (ed.), Molecular Microbiology, Diagnostic Principles and Practice. 2 ed. ASM Press, Washington, DC.