The use of systematic literature review to inform evidence based practice in diagnostics is rapidly expanding. Although the primary diagnostic literature is extensive, studies are often of low methodological quality or poorly reported. There has been no rigorously evaluated, evidence based tool to assess the methodological quality of diagnostic studies. The primary objective of this study was to determine the extent to which variations in the quality of primary studies impact the results of a diagnostic meta-analysis and whether this differs with diagnostic test type.
|Published (Last):||2 November 2013|
|PDF File Size:||18.99 Mb|
|ePub File Size:||20.76 Mb|
|Price:||Free* [*Free Regsitration Required]|
Although the primary diagnostic literature is extensive, studies are often of low methodological quality or poorly reported. There has been no rigorously evaluated, evidence based tool to assess the methodological quality of diagnostic studies. The primary objective of this study was to determine the extent to which variations in the quality of primary studies impact the results of a diagnostic meta-analysis and whether this differs with diagnostic test type.
A secondary objective was to contribute to the evaluation of QUADAS, an evidence-based tool for the assessment of quality in diagnostic accuracy studies. This study was conducted as part of large systematic review of tests used in the diagnosis and further investigation of urinary tract infection UTI in children.
All studies included in this review were assessed using QUADAS, an evidence-based tool for the assessment of quality in systematic reviews of diagnostic accuracy studies. The impact of individual components of QUADAS on a summary measure of diagnostic accuracy was investigated using regression analysis.
The review divided the diagnosis and further investigation of UTI into the following three clinical stages: diagnosis of UTI, localisation of infection, and further investigation of the UTI. Each stage used different types of diagnostic test, which were considered to involve different quality concerns.
Many of the studies included in our review were poorly reported. However, as might be expected, the individual items fulfilled differed between the three clinical stages.
Regression analysis found that different items showed a strong association with test performance for the different tests evaluated. These differences were observed both within and between the three clinical stages assessed by the review. The results of regression analyses were also affected by whether or not a weighting by sample size was applied. Our analysis was severely limited by the completeness of reporting and the differences between the index tests evaluated and the reference standards used to confirm diagnoses in the primary studies.
Few tests were evaluated by sufficient studies to allow meaningful use of meta-analytic pooling and investigation of heterogeneity. This meant that further analysis to investigate heterogeneity could only be undertaken using a subset of studies, and that the findings are open to various interpretations. Further work is needed to investigate the influence of methodological quality on the results of diagnostic meta-analyses.
Large data sets of well-reported primary studies are needed to address this question. Without significant improvements in the completeness of reporting of primary studies, progress in this area will be limited. The use of systematic literature review to inform evidence-based practice in diagnostics is rapidly expanding.
Although the primary diagnostic literature is extensive, there remain a number of problems for systematic reviews of diagnostic tests. Appropriate methods for rigorous evaluation of diagnostic technologies have been well established [ 1 - 5 ]. However, available studies have generally been poorly designed and reported [ 6 - 8 ]. Similarly, although a number of quality checklists for diagnostic accuracy studies have been proposed [ 9 ] and there is growing evidence on the effects of bias in such studies [ 10 ], there has been no rigorously evaluated, evidence-based quality assessment tool for diagnostic studies.
The objective of this study was to investigate the impact of quality on the results of a diagnostic meta-analysis, using regression analysis. A large diagnostic systematic review was required to enable the use of regression analysis to investigate the impact of components of quality upon results.
We have recently completed a systematic review, which aimed to determine the most appropriate pathway for the diagnosis and further investigation of UTI in children [ 11 ]. It included an assessment of the accuracy of tests for three different clinical stages of UTI: the diagnosis UTI, localisation of infection, and further investigation of patients with confirmed UTI.
The nature of the tests included in these three clinical sections of this review differed. Tests used to diagnose UTI were generally laboratory-based or near-patient methods, with relatively objective interpretation of results, e.
By contrast, tests used to investigate confirmed UTI mainly utilised imaging technologies which are largely subjective in their interpretation, and where diagnostic thresholds are difficult to define. Tests used to localise infection spanned both categories. We hypothesised that the components of methodological quality affecting results were likely to differ between the three sections of the review.
Such potential differences may indicate a need for topic-specific checklists for the assessment of quality in diagnostic studies. A secondary aim of this study was to contribute to the evaluation of QUADAS, an evidence-based tool for the assessment of the quality of diagnostic accuracy studies that was specifically developed for use in systematic reviews of diagnostic tests [ 12 ], by investigating the importance of specific QUADAS items. Items were rated as 'yes', 'no', or 'unclear'.
We analysed results grouped by clinical stage. Within these groups, we pooled studies of similar tests or test combinations where sufficient data were available and where pooling was clinically meaningful. This choice was made based on published guidance [ 13 , 14 ]. We estimated summary receiver operator characteristic SROC curves using the following equation [ 15 ]:.
We used both weighted and unweighted models. For the weighted model we weighted on sample size. We chose to weight on sample size rather than inverse variance, a method sometimes used in this type of analysis, as we believe that weighting on the inverse variance can produce biased results.
The reason for this bias is that the DOR is associated with its variance and so large DORs will inevitably have large variances, which will be reflected in the weightings. We assessed between study heterogeneity through visual examination of forest plots and statistically using the Q statistic [ 16 ].
Where sufficient data were available, we used regression analysis to investigate whether individual QUADAS items and additional variables thought likely to be associated with diagnostic accuracy were associated with the DOR and hence whether differences in these items between the studies accounted for some of the observed heterogeneity. Where data were available, the following additional variables were investigated:.
For microscopy for pyuria and bacteriuria a variable on whether the sample was centrifuged was included, and for microscopy for bacteriuria a variable for Gram stain was included. For ultrasound for the detection of reflux a variable for whether or not the ultrasound involved a contrast agent was included.
This allowed us to make some distinction between associations of aspects of methodological quality with test performance and associations of completeness of reporting with test performance. These items were therefore included as dichotomous variables. A multivariate linear regression analysis was conducted. Initially, we performed univariate analysis with all items included separately in the model.
All items found to show moderate evidence of an association in the univariate models were entered into the multivariate model, then dropped in a step-wise fashion with the item with the weakest evidence of an association largest p-value dropped first. For covariates with more than one level, evidence of an association of one indicator variable with test performance was considered sufficient for inclusion in the model.
Interaction terms were not included. The DOR is used as an overall measure of diagnostic accuracy. It is calculated as the odds of positivity among diseased persons, divided by the odds of positivity among non-diseased.
When a test provides no diagnostic evidence then the DOR is 1. It therefore provides an indicator of the overall impact on diagnostic accuracy of the presence of a given covariate. The proportion of QUADAS items fulfilled by studies included in our systematic review was similar for each of the three clinical stages assessed in the review. Studies evaluating tests to diagnose UTI fulfilled a median of 8 range 5—13 items, those evaluating tests used to localise infection also fulfilled a median of 8 range 3—13 items, and those evaluating further investigations fulfilled a median of 7.
The use of an inappropriate spectrum of patients and inadequate reporting of inclusion criteria were problematic for studies in this category. The majority of studies provided insufficient details on how the reference standard was performed. Studies failed to report sufficient details on clinical review bias, diagnostic review bias and test review bias to judge whether these were avoided.
Study withdrawals and handling of uninterpretable results were also poorly reported. The time delay between the index test and reference standard was more of a problem with these studies than with those on the diagnosis of UTI. The use of an appropriate reference standard was also an issue in some of these studies.
Spectrum composition and reporting of details of how children were selected for inclusion in the study was better in these studies than in the studies of the diagnosis of UTI. Only around half of studies provided sufficient details of how the index test and reference standard were performed to allow replication of these tests. As with studies of the diagnosis of UTI, reporting of clinical review bias, handling of uninterpretable results, and withdrawals from the study was poor. As with studies of the diagnosis of UTI, spectrum composition and reporting of inclusion criteria were poor in this group.
The time delay between the index test and reference standard was also an issue in many of these studies. Around half of studies reported that diagnostic and test review bias had been avoided, the remaining studies did not report whether the index test and reference standard were interpreted blind to the results of each other.
This was similar to the situation seen for studies on the localisation of infection. Reporting of the reference standard was poor. As in all previous groups, studies also provided very little information on whether appropriate clinical information was available when test results were interpreted, how uninterpretable results were handled, and whether there were any withdrawals from the study and if so whether all withdrawals were accounted for.
Tests involving dipstick or microscopy techniques were the only categories where enough studies were available to enable regression analysis.
For dipstick to detect urinary nitrite 23 studies [ 20 , 26 , 28 , 34 , 36 , 40 , 41 , 43 , 52 , 54 - 57 , 60 , 63 , 66 , 72 , 74 , 84 , 88 , 93 - 95 ], the weighted analysis found that studies reporting that clinical review bias had been avoided had a DOR 4. This is what would be expected, as the DOR is likely to be higher when those interpreting test results have access to appropriate clinical information similar to that, which would be available in practice.
No studies reported the presence of clinical review bias. This was the only item investigated to show strong evidence of an association with test performance in the weighted multivariate analysis, although age and geographic region did show moderate evidence of an association in the univariate analysis. The unweighted analysis showed slightly different results. The same three items were found to show at least moderate evidence of an association in the univariate analysis.
For dipsticks measuring urinary leukocyte esterase 14 studies [ 20 , 28 , 34 , 36 , 43 , 56 , 57 , 60 , 63 , 66 , 72 , 84 , 94 , 95 ] and for dipsticks for the presence of either nitrite or leukocyte esterase 15 studies [ 19 - 21 , 28 , 34 , 56 , 60 , 63 , 66 , 84 - 86 , 92 , 94 - 96 ], no items showed strong evidence of an association with the DOR in the weighted analysis.
However, for urinary leukocyte esterase, the unweighted analysis found strong evidence of an association between patient age and the DOR. In studies evaluating microscopy to detect pyuria three items showed a strong association with test performance in the weighted analysis 28 studies [ 19 - 23 , 28 , 29 , 34 , 35 , 41 , 43 , 46 , 47 , 49 , 50 , 58 , 59 , 63 , 67 , 70 , 75 , 77 , 80 , 81 , 83 , 85 , 92 - 94 ].
The DOR was 1. All of these items, with the exception of centrifugation, relate to the completeness of reporting. The association for centrifugation is counter-intuitive, as we would expect centrifugation of the sample to lead to improved test accuracy. Two items showed a strong evidence of an association with the DOR in the weighted analysis of studies evaluating microscopy to detect bacteriuria 22 studies [ 20 , 21 , 23 , 28 , 34 , 35 , 41 , 47 , 50 , 61 - 64 , 67 , 70 , 76 , 77 , 80 , 85 , 90 , 91 , 94 ].
The DOR was 3. We would expect both Gram staining and the presence of incorporation bias to increase test performance as found in the analysis. The unweighted analysis found very similar results. Only the evaluation of ultrasound for the localisation of infection provided sufficient data to enable the conduct of regression analysis 20 studies [ 48 , 69 , 97 , 99 - , - , - , , , , , , , ].
How does study quality affect the results of a diagnostic meta-analysis?