Biological variability dominates and influences analytical variance in HPLC-ECD studies of the human plasma metabolome
© Shurubor et al. 2007
Received: 27 March 2007
Accepted: 12 November 2007
Published: 12 November 2007
Skip to main content
© Shurubor et al. 2007
Received: 27 March 2007
Accepted: 12 November 2007
Published: 12 November 2007
Biomarker-based assessments of biological samples are widespread in clinical, pre-clinical, and epidemiological investigations. We previously developed serum metabolomic profiles assessed by HPLC-separations coupled with coulometric array detection that can accurately identify ad libitum fed and caloric-restricted rats. These profiles are being adapted for human epidemiology studies, given the importance of energy balance in human disease.
Human plasma samples were biochemically analyzed using HPLC separations coupled with coulometric electrode array detection.
We identified these markers/metabolites in human plasma, and then used them to determine which human samples represent blinded duplicates with 100% accuracy (N = 30 of 30). At least 47 of 61 metabolites tested were sufficiently stable for use even after 48 hours of exposure to shipping conditions. Stability of some metabolites differed between individuals (N = 10 at 0, 24, and 48 hours), suggesting the influence of some biological factors on parameters normally considered as analytical.
Overall analytical precision (mean median CV, ~9%) and total between-person variation (median CV, ~50–70%) appear well suited to enable use of metabolomics markers in human clinical trials and epidemiological studies, including studies of the effect of caloric intake and balance on long-term cancer risk.
After tobacco, over-nutrition is, arguably, the major cause of excess morbidity in developed countries, affecting a broad spectrum of diseases including cancer, cardio-/cerebrovascular disease, and type II diabetes. This association may be seen in both broad demographic groups, such as the American Cancer Society study group (900,000 U.S. adults) and in more narrowly defined demographic groups, such as the Nurses' Health Study (NHS) group (122,000 U.S. female registered nurses) . The difficulty of accurately assessing caloric intake and energy expenditure  has hampered studies relating to energy restriction, caloric balance, and caloric intake in both epidemiology and clinical nutrition. Several of the major hurdles in identifying biomarkers to address this and similar epidemiological problems are related to analytical (the lack of useful measurement standards) and methodological (the inability to distinguish individual physiology) issues [4–13]. Recent results have suggested the advantage of metabolomics approaches in clarifying these situations, at least for issues related to nutritional epidemiology[14, 15].
Metabolomics technology [16, 17] offers a promising new approach to identify biomarkers that characterize health and disease, including, as we have shown [18–21], caloric intake. The major advantage of metabolomic research in epidemiology and nutrition is that, at least in theory, metabolomics provides a snapshot view of a biological system and enables capture of information about both long- and short-term interactions of an organism and its environment, including nutrition. Thus, this approach provides us more complete information about the biochemical status or biochemical phenotype of organisms than many other possible approaches [22–25]. Position papers have remarked on the application of metabolomics to problems ranging from meat contamination to drug development and understanding mechanistic aspects of disease. Within the realm of nutrition, metabolomics has been used to probe specific dietary constituents [27, 28] and has been proposed as a key element in developing personalized medicine approaches [29–33] and in gaining insight into clinical and epidemiological questions [34, 35].
Our metabolomics approach to clinical and epidemiological questions is distinct from, and complementary to, both direct targeted analysis (e.g., studying a few metabolites in a single pathway) and global profiling. Specifically, we focused initially on an animal model that displays the physiological benefits associated with nutritionally replete, lower-energy diets[18–21, 36, 37]. We propose that this analysis will enable us to address statistical concerns about the complexity of uninformed analysis of human datasets, harness the power of well-characterized animal models, and conserve finite biological samples from prospective epidemiologic cohorts. This work was conducted with a long-range focus on the use of epidemiological resources, tools, and approaches to develop individual risk predictors for humans and improved biomarkers for use in pre-clinical, clinical, and epidemiological studies.
We have previously completed proof of principle studies showing that we can identify serum metabolites that differ between AL-fed rats and rats undergoing CR, confirmed these findings in an independent cohort, and generated expert systems/trained algorithms that can objectively identify these groups [19, 21]. We have further published a series of analytical reports related to the detailed methods of these studies, assessing the analytical variability and stability of the individual components in the plasma/sera metabolome[14, 15, 36, 38, 39]. Further characterization of these metabolomic serotypes in rats undergoing CR, including studies related to the duration and extent of restriction, is in progress (YS et al, in preparation). The next goal is to analyze the markers identified in these studies in humans. Before beginning these studies, however, we first need to confirm that our technological platform works with human plasma samples. We also must show that our overall platform, including collection and procurement methods, is robust within the constraints of prospective epidemiologic cohorts. In theory and in practice, the analytical variability of our measurements and the stability of the individual components of the plasma/sera metabolome could be assessed by simply determining the repeatability of the measurements of each marker, as we have described in detail in our previous papers [14, 15].
As we move from studies of rat sera to studies of human plasma, however, many of the potential sources of error become both qualitatively different and quantitatively more complex. Here we address two of these issues: (i) our ability to measure these analytes reproducibly in banked human plasma and (ii) the need to assess the stability of these markers under different, realistic, and "worst case" shipping conditions. We note that, for the purposes of this report, we define this "worst case" in the context of the specific samples we expect to test in the future, which are drawn from the NHS. These samples are handled exactly as are the samples in the current report, and no sample is used that has been held >48 hours in shipping conditions.
Two sets of plasma samples that were collected into sodium heparin-containing tubes for use as analytical controls were examined in this study. Approximately 75% of the blood samples were drawn at least 8 hours after the last meal. In our study, all analysis of samples was conducted in a blinded fashion. Set 1 was comprised of duplicate (split) samples from 12 women who were participants in the NHS and triplicates (splits) of two pooled plasma samples. The latter samples were pools made up of multiple units of fresh frozen plasma obtained from a local hospital and used routinely to evaluate laboratory reproducibility. Set 2 consisted of three (split) samples from each of 10 individuals as well as duplicates of two pooled plasma samples as described above. These 10 adults were healthy men and women recruited locally who responded to a flyer requesting volunteers to provide blood samples for pilot studies. Of the three blood samples from each individual in Set 2, the first was processed (see below) immediately after acquisition, the second was stored as whole blood in a refrigerator for 24 hours, and the third was stored for 48 hours; these latter conditions mimicked typical overnight shipping conditions in many cohort studies. Processing consisted of centrifugation of whole blood samples at 1530 g for 20 min at 4°C, after which the plasma was removed and aliquots placed into cryotubes. These cryotubes containing plasma samples were frozen and maintained in the vapor phase of a liquid nitrogen freezer at <130°C. All nitrogen freezers were alarmed and monitored continuously. Samples were shipped from The Channing Laboratory to Burke Medical Research Institute by overnight courier on dry ice. Further details on this approach have been previously published.
IRB approval was obtained from Partners Human Research Committee and the IRB at Burke Medical Research Institute.
Metabolite extraction, separation, detection, and identification were conducted as previously described. [14, 15, 36, 38, 40–45] (and see Additional File 1) Briefly, plasma samples were thawed to 0°C and distributed into new 1.5 mL microcentrifuge tubes (125 μl/tube). 500 μL of acetonitrile (An)/0.4% glacial acetic acid (HAc) at -20°C was added to each tube, after which the tubes were vortexed 20 sec and centrifuged for 15 min at 12000 g at -2°C. Supernatant in volumes of 500 μl was evaporated to dryness under vacuum in a CentriVap™ Concentrator (Labconco). The dry remains were dissolved in 100 μl of mobile phase A [see above refs, e.g. ] and placed in autosampler vials. Each set of samples was analyzed by HPLC within 2–3 days. The HPLC injection volume was 50 μl.
Chromatographic separation and electrochemical detection were performed using HPLC coupled with an electrochemical array detector (HPLC-ECD), as previously described[43, 45]. The gradient and mobile phase reagents have also been previously described[43, 45]. The reasons for the use of this protocol include integration of sample preparation and the mobile phases used. Notably the use of pentane sulfonic acid in mobile phase A solubilizes any protein fragments that may be extracted into the acetonitrile. The subsequent use of the B mobile phase containing virtually all organic solvents washes the column of any lipid materials which are extracted into the acetonitrile. This was discussed by in reports by Milbury and Yao .
The gradient essentially displays an increase in hydrophobicity from that of ascorbate to that of tocopherol. Detection of metabolites was accomplished with a 16-channel coulometric array detector with potentials incremented in 60 mV steps (0–900 mV). All HPLC-ECD system functions were controlled by CoulArray software; biomarkers were identified and quantitated using CEAS-5.12 software. The metabolite concentration in individual human plasma samples was assessed and reported relative to that of metabolites in the "model" pool, in which the concentration of all markers was set at 100. The metabolic profile of the human pool studied in this report includes up to 66 markers, metabolites that represent a subset of the ~90 metabolites that we previously identified in sera of rats fed either AL or calorie restricted diets [14, 18, 19, 21].
Most of the metabolites studied here are identified by virtue of their position in the array (retention time) and their relative reactivity across the array (dominant and subdominant channel). Examples of metabolites that can be assessed via Coularray-based technology include some amino acids (eg, tyrosine, tryptophan, cysteine, methionine), the majority of the tryptophan and tyrosine catabolic pathways, indoles, purines, antioxidants (eg ascorbate, tocopherol, glutathione, lipoate, dihydrolipoate) and redox damage products (eg, 8-OH-deoxyguanosine, glutathione disulfide).
Pearson correlation matrices were calculated in NCSS 97 with pairwise deletion for missing data. Means and errors, data simulations, and chi-squared analysis were conducted/determined in Microsoft Excel 2002. Paired t-tests were conducted in Statview. Principal components analysis was conducted using SIMCA P10.5 (Umetrics, Kinnelon, NJ). A single metabolite value, present at apparently 100-fold higher levels than any cognate metabolite in different samples, was excluded as an outlier.
Our previous analytical validation studies focused on identifying the sources of potential analytical error in our analyses, including methods of sample acquisition and preparation, handling, transportation, and storage, as well as the influence(s) of total series size, complexity of the organic matrix, and aspects of experimental design[8, 15, 47, 48]. As noted above, we now continue these initial validation studies by examining both the reproducibility of the analytical platform when it is used to study human plasma and the stability of sample metabolites under simulated shipping conditions. Variability in sample acquisition (including variable stability under acquisition conditions) is fundamentally indistinguishable in our study from biological variability and therefore will be considered in a subsequent study.
The reproducibility (precision) of metabolite measurements was addressed by blinded analysis of split samples. In the framework of a reproducibility study, aspects of the analytical platform such as the delivery of a sample to the analytical laboratory and the completion of sample analysis (e.g., sample processing), chromatographic separation and electrochemical detection, and peak identification and quantitation were considered.
To test our ability to analyze human plasma, we received 30 blinded samples, each of which was present in duplicate or triplicate (as a further blind, the laboratory was told that only duplicates were present). At the initial step, we looked for previously identified CR markers in the samples and identified 66 metabolites that clearly were present in serum/plasma from both species and were analytically suitable without further optimization.
Of the 23 remaining markers in our standard rat profile, 12 were not present in the plasma sample (expected, as some metabolites are only found in male rats), and 11 represented unclear assignments and were not studied further in this report.
Before polishing data
After polishing data
Range, median CVs for 66 variables
Range, mean CVs for 66 variables
Median CV for all 66 variables
Mean CV for all 66 variables
By 13 Pairs, overall median CV
Fewer than 4% of the data points were found to have peak matching and quantitation errors of >10%, see legend to Table 1). The comparisons were made across the dataset and between the pairs. Both mean and median values of the measurements are reported to stress that analysis of the majority of analytes had very good quantitative reproducibility. This finding supports the contention that most of the cross-species markers can be measured with sufficient analytical accuracy for use in additional studies. Within subject quantitative reproducibility of the measurement of metabolite concentrations in human plasma was comparable with data obtained for rat sera (rat sera: mean CV of ~12%, median CV of ~7%; human plasma: mean CV of 17–19%, median CV of ~12%, see Table 1) [14, 15].
In contrast to analysis of samples in laboratory-based research, analysis of samples in population-based studies is complicated by the need to transfer specimens from the field to a central location, where they are then processed and stored. In epidemiology studies, this requirement often means that whole blood samples cannot be frozen prior to arrival at the central location and subsequent processing. In practice, this constraint imparts a delay, generally approximately 24 hours but potentially as great as 48 hours, between the time of collection and the time of analysis or freezing for long-term storage. It is therefore essential to confirm that these delays do not destroy or severely degrade the analytes of interest. We used a testing procedure developed within the NHS to address this issue.
Descriptive statistic of total biomarker levels in human plasma
1 st set of data (n = 14)*
2 nd set of data (n = 10)**
Mean ± SD
85.1 ± 75.1
96.2 ± 83.4
97.6 ± 77.6
To assess the statistical validity of this observation of apparent inter-personal differences in compound stability, we compared all metabolite changes by paired t-test (10 triplets means 45 paired t-test comparisons [10*9/2] at each time point, 24 and 48 hours). Twenty of 45 comparisons had p < 0.05 at 24 hours, and 22 of 45 had p < 0.05 at 24 hours. To assess the likelihood of this result occurring by chance, we modeled changes in metabolite levels assuming that all changes were random. Of 100 comparisons, 6 had p values < 0.05 (consistent with expected results from probability alone). Chi-squared analysis of the comparison resulted in p values of < 10-10.
The central finding of this paper is that the metabolites that we have previously used to distinguish caloric intake in rats can be analyzed in human plasma with good analytical precision (median CV of 9–12%) and have high inter-sample variability (median CV of 50–70%). This combination, similar to results obtained from rat studies,  suggests that these markers are analytically suited for use in studies of the serum metabolome in human epidemiologic cohorts and multi-center clinical trials. [Note: Analytical CVs are slightly higher in the human samples. This slightly increased variability in human plasma versus rat sera might relate to the procedure of human plasma preparation, which includes the addition of anticoagulants to the blood. Analysis of human plasma as compared with rat sera was associated with a more rapid contamination of guard and analytical columns and greater wearing of the electrodes, suggesting that even the acetonitrile purified sample retained some contaminants, which might also degrade performance.]
Two important caveats follow from our experimental design: (i) the estimate of total inter-sample variability includes both the analytical variability and the biological variability, although the study's overall analytical precision suggests that the variability is primarily biological in origin; and (ii) we cannot distinguish the components of biological variability that derive from sample-to-sample within-person variability versus long-term between-person variability – the latter of which is critical for our planned investigations. Work on this latter question, the relative biological variability between different people as compared to the variability within a temporal series of samples from a given person, is the next logical step.
In general, biomarker validation studies require demonstrating validity in three broadly defined stages, in which the following concerns are addressed: (i) analytical issues [8, 47, 48]; (ii) inter- vs intra-personal biological variation; and (iii) utility (ie, the correspondence of a certain biomarker profile with a phenotype of interest) [4, 7, 49–57].
A critical inherent assumption in most or all biomarker studies is that, from an analytical/mathematical standpoint, stage (i) must precede stage (ii) and that stage (i) is essentially independent of stage (ii). In part, this logical construction simply states that we must be able to measure the concentration of an analyte, and understand the limits of that measurement, before we can usefully examine differences in that analyte between two or more conditions of interest. The above logical construction further implies that our ability to measure a given analyte and the basic analytical properties of that analyte are expected to be unaffected by its source – that is, the person from whom the sample is derived (e.g., the accuracy of measuring the sodium concentration in a blood sample is expected to be equivalent in identically-treated samples from different people).
Our results appear to provide empirical evidence supporting a noteworthy exception to this logical, but ultimately theoretical argument. These data, and the interpretation of these data, are dependent on the extent to which sample handling in our experiment was sufficiently controlled to enable other influences to be excluded. In support of the idea that we met these conditions, the differences observed are primarily in the 48 hour samples, whereas we would expect random distribution for most possible analytical problems (e.g., sample handling). Further support for our ability to generally fulfill the goal of appropriate sample handling is provided by evidence of high correlations between the levels of corresponding metabolites in paired samples (See Figures 1 and 3). Nonetheless, although we tried to treat all samples equally, it is impossible to exclude the possibility that there was some unrecognized difference in handling that contributed to the observed individual degradation patterns. Given this caveat, however, our results provide evidence that, especially at 48 hours, but even, to a lesser extent, at 24 hours, individual differences in bio-or chemo-transformation of metabolites (i.e., differences in metabolite stability) exist at a measurable level. At 48 hours, these differences are sufficient to enable ready classification by time in simulated shipping conditions, suggesting that avoiding 48 hour delays in initial sample processing is strongly desired. Because relatively fast processing is not always possible, the development of new methods for recognizing excessive transformation/degradation of metabolites would be helpful, allowing "for cause" exclusion of outlying samples if necessary.
Attempts to distinguish the existence of different groups or classes of individuals with respect to their metabolomic transformation appeared suggestive, but were statistically borderline with respect to overfit diagnostics and are not shown. We have no direct evidence as to the mechanism of metabolite transformation, and can only suggest that the interplay between genetics and environment and between enzymatic and non-enzymatic mechanisms might be involved in the variability of biomarker degradation. These data suggest that, for the case of plasma metabolomics analysis relevant for epidemiological studies, the general assumption that biological and analytical variation are independent must be viewed with caution, as there appear to be some individual-specific, metabolite-specific interactions. For studies such as ours, these concerns, if they occur in significant numbers, would show up as loss of signal and increase in noise, with a consequent reduction in the signal:noise ratio. From what we have seen, this issue is not a major concern in our study. The recognition and/or understanding of such changes might, however, be particularly important if one attempts to bring a quasi-mechanistic systems biology approach to deriving models for study in epidemiological cohorts. Consider, for example, a disease hypothesized to result from a failure of homeostatic feedback among the compartments of the genome, transcriptome, proteome, and metabolome. A metabolomic model of this disease could be built based on animal studies in which sample handling is (or, at least, can be) rigorously defined, but would be difficult to address in humans due to both biological and analytical noise. Understanding analytical noise is thus one step toward enabling study of mechanistic hypotheses in humans.
In conclusion, it is worth recalling that the development and validation of biomarkers of nutritional status for use in human studies has a long history. Despite this history, relatively few useful markers – other than direct intake markers such as carotenoids in blood, double-labeled water, and urinary nitrogen – have been identified and are in widespread use for dietary status assessment [34, 58–60]. Attempts to identify biomarkers of direct dietary intake have been limited by many factors, including both analytical and biological issues[34, 61–68]. In this report, we present evidence that biomarker profiles reflecting two extremes of caloric intake in rodents can be adapted for use in humans. This profile is analytically stable at the level of both population and individual markers, with median analytical CVs < 20% of median biological CVs, even under the worst case shipping conditions and the inclusion of markers with lower analytical quality (defined here as stability). The surprising finding was that the stability of some markers clearly varied between individuals. This finding suggests that sources of variation normally considered as analytical can be influenced by biological parameters.
HPLC coupled with an electrochemical array detector
Nurses' Health Study.
This work was supported by NIH R01s AG15354 (BSK), CA102536 (BSK), AG025872 (BSK), and CA49449 (SH).
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.