A systematic review of the literature examining the diagnostic efficacy of measurement of fractionated plasma free metanephrines in the biochemical diagnosis of pheochromocytoma

Background Fractionated plasma metanephrine measurements are commonly used in biochemical testing in search of pheochromocytoma. Methods We aimed to critically appraise the diagnostic efficacy of fractionated plasma free metanephrine measurements in detecting pheochromocytoma. Nine electronic databases, meeting abstracts, and the Science Citation Index were searched and supplemented with previously unpublished data. Methodologic and reporting quality was independently assessed by two endocrinologists using a checklist developed by the Standards for Reporting of Diagnostic Studies Accuracy Group and data were independently abstracted. Results Limitations in methodologic quality were noted in all studies. In all subjects (including those with genetic predisposition): the sensitivities for detection of pheochromocytoma were 96%–100% (95% CI ranged from 82% to 100%), whereas the specificities were 85%–100% (95% CI ranged from 78% to 100%). Statistical heterogeneity was noted upon pooling positive likelihood ratios when those with predisposition to disease were included (p < 0.001). However, upon pooling the positive or negative likelihood ratios for patients with sporadic pheochromocytoma (n = 191) or those at risk for sporadic pheochromocytoma (n = 718), no statistical heterogeneity was noted (p = 0.4). For sporadic subjects, the pooled positive likelihood ratio was 5.77 (95% CI = 4.90, 6.81) and the pooled negative likelihood ratio was 0.02 (95% CI = 0.01, 0.07). Conclusion Negative plasma fractionated free metanephrine measurements are effective in ruling out pheochromocytoma. However, a positive test result only moderately increases suspicion of disease, particularly when screening for sporadic pheochromocytoma.

The biochemical screening test used for diagnosing pheochromocytoma is institution and laboratory-dependent with variable performance, and an "ideal" test for pheochromocytoma has been sought over the years as false or uninterpretable results are not uncommon with some traditional tests [1,. Recently, measurement of fractionated plasma free metanephrines by high performance liquid chromatography and electrochemical detection has been endorsed by investigators at the National Institute of Health (NIH) as the single best test for biochemical screening for pheochromocytoma [48][49][50][51][52][53][54][55][56][57][58][59][60][61]. The objective of the current study is to systematically review the literature to determine the diagnostic efficacy of measurements of fractionated plasma metanephrines in detection of pheochromocytoma.

Study selection
We included studies of adults who underwent measurement of fractionated plasma free metanephrines for the purpose of diagnostic testing. All studies had a "methods" section and included at least 10 subjects with pheochromocytoma (or paraganglioma) and at least 10 subjects without the diagnosis. Studies in which more than a third of subjects were below the age of 18 years or focussing on patients with end stage renal disease were excluded. In the case of multiple concurrent publications from the same research group, only the article describing the largest number of subjects tested were included. Updated unpublished data obtained from authors was included. The term pheochromocytoma refers to adrenal pheochromocytomas and extra-adrenal paragangliomas. The method of Lenders was used for measurement of fractionated plasma metanephrines [49]. Plasma metanephrine measurements in the setting of clonidine-suppression or glucaconstimulation were excluded.  CANCERLIT (1975CANCERLIT ( -2002, Healthstar 1975-December 2002, and CINAHL 1982to week 1 February, 2003. The search strategy used incorporated the MESH heading "metanephrine" or the textword roots of "metanephrine" or "normetanephrine", as well as the textword "plasma", and the MESH headings of "paraganglioma", "pheochromocytoma," or textword roots of "paraganglioma", "pheochromocytoma", or "phaeochromoctyoma", and the MESH headings "sensitivity and specificity", or "diagnosis", or the textword roots of "sensitiv", "specific", or "diagnos", or the textword "likelihood ratio". We also searched the Web of Science for articles citing the methodologic study of Lenders [49] and hand-searched the abstract books of the 82nd to 84th annual meetings of the Endocrine Society (2000)(2001)(2002). Two endocrinologists independently screened the titles and abstracts obtained through the electronic search and all full-text articles, deemed potentially relevant by either of the reviewers, were obtained for formal review. After reviewing the full-text articles, both reviewers agreed on which articles would be included in the systematic review.

Assessment of methodologic quality and quality of reporting of included studies and data abstraction
Each of the two reviewers independently assessed the quality of methodology and reporting of the included studies, using a 25-item checklist developed by the Standards for Reporting of Diagnostic Studies Accuracy Group (STARD) [62] (Table 1). The two reviewers also independently abstracted the data from published studies in duplicate and consensus was reached on the final data presented. In the case of updated unpublished data from the Mayo Clinic, Rochester, ethics board approval and signed consent was obtained for chart review.

Statistical analyses
A kappa statistic was calculated to measure agreement between the two reviewers in assessment of methodologic and reporting quality [63]. For sensitivities, specificities, and likelihood ratios, 95% confidence intervals were calculated using Wilson's method [64]. The Score Method was used for calculation of 95% CI of likelihood ratios when a zero cell was noted [64]. Likelihood ratios (LRs) predicting the presence of pheochromocytoma given a positive test result (sensitivity/1-specificty) and a negative test result (1-sensitivity/specificity) were calculated for each included study. Of note, a positive LR above 10 and negative LR below 0.1 has been noted to generate large changes from pre-test to post-test probability of disease, often resulting in a large change in patient management; whereas positive LRs between 5 and 10 and negative LRs between 0.1 and 0.2 are considered to generate moderate shifts in pre-test to post-test probability of disease [65][66][67][68].
Pooling likelihood ratios was performed if each laboratory used the assay technique of Lenders [49] with an upper limit of a population-based 95% reference range used as the basis for positivity of the test. Either a free metanephrine or free normetanephrine fraction value had to be above the reference range cut-off, for a test to be considered positive [69]. A chi-squared test of homogeneity (Q-statistic) was performed for pooled studies [70,71]. A random effects model was used for pooling of likelihood ratios using Review Manager 4.1 [66,70,72]. A funnel plot was constructed to visually assess for publication bias of pooled studies [73,73]. A separate analysis was performed for all pheochromocytomas and for sporadic pheochromocytomas (those without known genetic predisposition to the disease).

Summary of methodologic and reporting quality of included studies
The methodologic and reporting quality of the three included studies was evaluated independently by two endocrinologists using a 25-item checklist developed by the STARD steering committee (Table 1) [112]. The kappa statistic for measuring agreement between the two reviewers in assessing the STARD items addressed in each study was 0.82 for the Mayo study, 0.65 for the NIH study, 0.60 for the Vienna study.
Specific threats to internal validity of the studies were appraised. In all of the studies, subjects who had signs, symptoms, or imaging characteristics that warranted testing were included (as opposed to asymptomatic controls). However, blinded adjudication of test interpretation and diagnoses was not performed in any of the studies. In terms of limitation of selection bias, consecutive patient recruitment was noted only in the Mayo study [18]. Data was collected prospectively in the NIH study, retrospectively in the Mayo study, and method of data collection was unclear in Vienna study [18,51,59].
In terms of limiting verification bias, only the NIH investigators stated that the results of plasma metanephrine measurements were not used in guiding further evaluation. In the Mayo and Vienna studies, all pheochromocytoma patients had histologic confirmation, whereas in the NIH study, either histologic confirmation or evidence of inoperable metastatic pheochromocytoma on imaging was deemed adequate for diagnosis. In subjects without pheochromocytoma, different criteria were used to define a negative diagnosis in all three studies: alternative diagnosis after subspecialty evaluation in the Mayo study, alternative adrenal histology in the Vienna study, and, in the NIH study, either lack of radiological evidence of a tumor on imaging or pathologic examination of a nonpheochromocytoma adrenal mass, or patient follow-up of 2 years or more. Thus, only in the Vienna study [59], was a histologic gold standard applied to all patients, regardless of disease status.
Overall, the least number of STARD methodologic and reporting criteria were addressed in the case-control design Vienna study [59]. The Vienna study was also the smallest, comparing 17 patients with pheochromocytoma to 14 subjects without pheochromocytoma, and showed the highest diagnostic accuracy (sensitivity and specificity each 100 percent). Of note, the dichotomous nature of case-control designs may overestimate the accuracy of diagnostic tests [113].

Diagnostic efficacy of measurements of fractionated plasma metanephrines in diagnosis of pheochromocytoma
The cut-off values for positivity as well as the conditions of measurement of fractionated plasma metanephrines were slightly different in the Mayo study compared to the NIH and Vienna studies. In the NIH study, the criterion for test positivity was a metanephrine fraction of 0.3 nmol/L and/or a normetanephrine fraction of 0.6 nmol/ L, based on a laboratory reference range [51]; and the same criterion was used in the Vienna study [59]. In the Mayo study, the criterion for positivity was a metanephrine fraction of 0.5 nmol/L or a normetanephrine fraction of 0.9 nmol/L, based on a 95% reference range of Mayo Medical Laboratories [18]. Acetaminophen was generally avoided prior to measurements of plasma free metanephrines in all studies. Furthermore, subjects were supine for at least 20 minutes with an indwelling intravenous cannula in both the NIH and Vienna studies, but not the Mayo study.
Demographics of patients in the included studies were examined ( , as well as another 158 subjects (including 23 with pheochromocytoma) that were recruited between November 28, 2000 and November 9, 2001. The newly added 158 subjects were consecutive patients seen at the Mayo Clinic Rochester, who did not have a known familial predisposition to pheochromocytoma, were not tested during the first series, and had complete measurements for fractionated plasma metanephrines as well as 24-hour urinary total metanephrines and catecholamines. In both series, patients without pheochromocytoma were screened in clinical practice because of one or more of the following reasons: hypertension, spells (such as episodes of anxiety, sweating, palpitations, or headache), adrenal abnormality on imaging, previous history of surgically resected pheochromocytoma, or known familial predisposition to pheochromocytoma. Upon combining the published and unpublished data, there were a total of 56 patients with pheochromocytoma (39 of whom were truly sporadic with no known genetic predisposition to pheochromocytoma and no previous history of pheochromocytoma, 70%) and 445 subjects without pheochromocytoma (399 with no known genetic predisposition to pheochromocytoma, 90% percent) ( Table 2).
The main difference in demographic characteristics between the included studies was that the majority of subjects without pheochromocytoma in the NIH study had a genetic predisposition to the disease, whereas the majority in the Vienna and Mayo studies did not ( Table 2). Furthermore, the non-pheochromocytoma subjects in the Mayo study appeared older with a mean age above 50 years (Table 2). Also, in the Vienna study, all subjects without pheochromocytoma had a known abnormality of the adrenal, whereas this was not the case in all patients in the NIH and Mayo studies.
The diagnostic efficacy of measurements of fractionated plasma metanephrines in detection of pheochromocytoma from the three included studies (including updated unpublished data in the Mayo study) are shown in Table  3[18, 51,59]. For all patients, the sensitivities ranged from 96% to 100%, and 95% CI ranged from 82% to 100%, whereas the specificities ranged from 85% to 100% with 95% CI ranging from 78% to 100%. For subjects either at risk for or with sporadic pheochromocytoma, the sensitivities ranged from 97% to 100% (95% CI ranged from 79% to 100%), whereas the specificities ranged from 82% to 100% (95% CI ranged from 79% to 100%) ( Table 3). Furthermore, for all patients, the positive likelihood ratios ranged from 6.31 to 29.17 and the negative LRs ranged from 0.02 to 0.03 (Table 3). The positive LRs for all patients with or at risk for sporadic pheochromocytoma (with cured patients who have had a previous diagnosis of pheochromocytoma excluded from the Mayo study) ranged from 6.07 to 29.00 and the negative LRs ranged from 0.031 to 0.03 (Table 3).
Upon pooling of the positive likelihood ratios for all patients (n = 287 with pheochromocytoma, n = 1103 without pheochromocytoma), significant heterogeneity was indicated using chi-squared test (X 2 = 8.20, degrees of freedom = 2, p = 0.017), indicating that studies may have been different secondary to differences in populations studied, assay technique, or reference standard (Figure 1). Although pooling of statistically heterogenous data is of questionable value and should be considered exploratory, the pooled positive likelihood ratio was noted to be 7.86 (95% CI= 5.17, 11.94), which was significantly higher than 1 (z = 9.66, p < 0.001). The pooled estimate of negative likelihood ratios for all patients was 0.02 (95% CI= 0.01, 0.04, z = -8.60, p < 0.001 for the value being less than 1), with no evidence of statistical heterogeneity (X 2 = 0.20, p = 0.91) (Figure 2). The funnel plots examining for publication bias were not interpretable as they were limited by very few studies included in the analyses.
Next, we determined the diagnostic efficacy of fractionated plasma metanephrine measurements in patients at risk for sporadic disease. We included 191 pheochromocytoma patients and 718 non-genetically predisposed non-pheochromocytoma patients and found the pooled estimate of a positive likelihood ratio was 5.77 (95% CI = 4.90, 6.81, z = 20.85, p < 0.001 for the difference being greater than 1 (with no statistically significant evidence of heterogeneity between studies, X 2 = 1.84, p = 0.4) (Figure 3). The pooled estimate of negative likelihood ratios for sporadic subjects was 0.02 (95% CI= 0.01, 0.07, z = -6.31, p < 0.001 for the value being less than 1) (no evidence of statistical heterogeneity, X 2 = 1.08, p = 0.58) (Figure 4).

Discussion
Upon systematically reviewing the literature, we have determined that fractionated plasma metanehrine measurements are highly sensitive in detecting pheochromocytoma, although specificity of these measurements may be variable, particularly in testing for sporadic disease. A negative fractionated plasma metanephrine measurement is highly effective in ruling out disease. However, a positive test result only moderately increases suspicion of disease, particularly in low risk subjects being tested for sporadic pheochromocytoma. Pooled likelihood ratios may be applied in estimation of an individual patient's probability of sporadic pheochromocytoma, given a positive biochemical test result. The pre-test probability of sporadic pheochromocytoma (prevalence) is estimated to be 0.5% among screened hypertensive patients [114], and 5.1% among incidentally discovered adrenal masses >1 cm in diameter in absence of symptoms of adrenal disease [adrenal "incidentalomas"] [3]. For a patient with positive fractionated plasma metanephrines, the post-test probability of sporadic pheochromocytoma would be 2.8% in the patient with hypertension, and 23.7% in the patient with an adrenal incidentaloma. In other words 97.2% of hypertensive subjects and 76.3% of subjects with incidentaloma would not be expected to have a pheochromocytoma, in spite of a positive test result. Similarly, we may estimate the probability of sporadic pheochromocytoma, given negative fractionated plasma metanephrine measurements, using the pooled negative likelihood ratio value of 0.02. For a patient with normal fractionated plasma metanephrine measurements, the post-test probability of sporadic pheochromocytoma would be estimated to be 0.01% in the patient with hypertension and 0.11% in the patient with an adrenal incidentaloma. Likelihood ratios (LRs) of a positive fractionated plasma metanephrine measurement predicting pheochromocytoma in all patients (including sporadic and genetically-predisposed patients) Figure 1 Likelihood ratios (LRs) of a positive fractionated plasma metanephrine measurement predicting pheochromocytoma in all patients (including sporadic and genetically-predisposed patients) Our findings are limited by the fact that data from the included studies may have been subject to multiple methodologic limitations, possibly resulting in over-estimation of the diagnostic efficacy of fractionated plasma metanephrine measurements. Also, many of the patients studied had known genetic predisposition, previously surgically cured disease, or metastatic pheochromocytoma, thereby limiting the external generalizability of our sum-mary. Furthermore, positivity cut-offs were derived somewhat differently between the studies, possibly accounting for the observed heterogeneity of positive likelihood ratios between studies. The criterion for positivity in the NIH and Vienna studies were based on a NIH laboratory reference range [51,59]; whereas a higher criterion was used in the Mayo study, based on a 95% reference range derived by Mayo Medical Laboratories [18]. The Mayo ref- Likelihood ratios (LRs) of a negative fractionated plasma metanephrine measurement predicting pheochromocytoma in all patients (including sporadic and genetically-predisposed patients) Figure 2 Likelihood ratios (LRs) of a negative fractionated plasma metanephrine measurement predicting pheochromocytoma in all patients (including sporadic and genetically-predisposed patients) Likelihood ratios (LRs) of a positive fractionated plasma metanephrine measurement predicting pheochromocytoma in patients with sporadic pheochromocytoma or at risk for sporadic pheochromocytoma Figure 3 Likelihood ratios (LRs) of a positive fractionated plasma metanephrine measurement predicting pheochromocytoma in patients with sporadic pheochromocytoma or at risk for sporadic pheochromocytoma erence range has been tested in hypertensive patients who were not subject to indwelling intravenous cannulation or prolonged supine rest, possibly accounting for the slightly higher cut-offs. Indeed, a laboratory medicine tradition has to derive normal ranges from "normal" healthy individuals as such individuals reflect the general population and are easily accessible for study. Such ranges are reflective of "non-disease", but their use may be subject to excessively high rates of false positive tests in subjects with conditions mimicking a disease in question who are likely to be tested clinically (such as patients with refractory hypertension in the case of pheochromocytoma testing). Limitations of deriving "non-disease" ranges in subjects with conditions mimicking a disease in question (such as hypertensive patients in this case) may include decreasing sensitivity of testing and the potential for missing a potentially fatal, treatable diagnosis.
It is notable that data on the efficacy of fractionated plasma metanephrine measurements in detection of pheochromocytoma is limited to only three laboratories with patients recruited from 6 clinical centres. This may be a reflection of the labor-intensive, time-consuming nature of the high performance liquid chromatography and electrochemical detection method as well as the nuisance of potential interference with acetaminophen [115]. A newer method described by Roden et al. may circumvent the acetaminophen interference issue, but is also quite laborintensive and might not be suitable for widespread clinical laboratory use [104]. A method of measurement of fractionated plasma metanephrines using liquid chroma-tography with tandem mass spectrometry shows promise in terms of improved specificity and rapidity of processing of multiple samples [115]. Further clinical study is indicated to validate such newer assays in clinical patient populations.

Conclusions
Where does this evidence summary leave the physician who is faced with the common clinical scenario of a patient with refractory hypertension or incidentally found adrenal mass? Firstly, the clinician must assess the relative likelihood of pheochromocytoma in each clinical case and decide whether testing is warranted. Decisions for the type of test performed may be subject to clinical availability, cost, and clinical experience of the ordering physician and local laboratory. If measurement of fractionated plasma metanephrines is performed, a positive test result in a high risk setting (such as a genetically predisposed individual or an individual with a known adrenal mass characteristic of pheochromocytoma) or a negative result in a low risk setting (such as a patient with refractory hypertension) is highly predictive of confirming or refuting the diagnosis, respectively. However, a negative result in a high risk setting (such as testing of a genetically predisposed patient or a patient with a known vascular adrenal mass), or a positive result in a low risk setting (such as refractory hypertension) must be interpreted with some caution.

Competing interests
None declared.
Likelihood ratios (LRs) of a negative fractionated plasma metanephrine measurement predicting pheochromocytoma in patients with sporadic pheochromocytoma or at risk for sporadic pheochromocytoma Figure 4 Likelihood ratios (LRs) of a negative fractionated plasma metanephrine measurement predicting pheochromocytoma in patients with sporadic pheochromocytoma or at risk for sporadic pheochromocytoma

Authors' contributions
A.M.S. conceived of the study, developed the study protocol, reviewed the references, abstracted data, analyzed the data, and wrote the paper. LT, ML, LT, AG, APHP, and WFY participated in the design of the study, reviewed the manuscript, and advised on revisions to the manuscript. WFY participated in chart review of Mayo Clinic Rochester patients. LT participated in the data analysis. APHP reviewed the references, and abstracted data.