|
|
||||||||
Laboratory Management |
1 Department of Clinical Chemistry, Rikshospitalet University Hospital of Oslo, Oslo, Norway. 2 Department of Clinical Biochemistry, Odense University Hospital, Odense, Denmark. 3 NOKLUS, Norwegian Centre for External Quality Assurance of Primary Care Laboratories, Division for General Practice, University of Bergen, Bergen, Norway. 4 Department of Pathology, University of Virginia Health System, Charlottesville, VA. 5 Fürst Medical Laboratory, Oslo, Norway. 6 Section of Medical Statistics, University of Oslo, Oslo, Norway.
aAddress correspondence to this author at: Department of Clinical Chemistry, Rikshospitalet University Hospital of Oslo, N-0027 Oslo, Norway. Fax 47-2307-1080; e-mail ari.lahti{at}rikshospitalet.no.
| Abstract |
|---|
|
|
|---|
Methods: We recently proposed partitioning criteria for gaussian distributions. These criteria relate to proportions of the subgroups outside each of the reference limits of the combined distribution (proportion criteria) and to distances between the subgroup distributions as correlates of these proportions (distance criteria). However, distance criteria do not seem to be ideal for nongaussian distributions because a generally valid relationship between proportions and distances cannot be established for these.
Results: Proportion criteria appear preferable to distance criteria for two additional reasons: (a) The prevalences of the subgroup populations may have a considerable effect on stratification, but these are hard to account for by using distance criteria. Two methods to handle prevalences are described, the root method and the multiplication method. (b) Tied reference values, another complication of the partitioning problem, could also be hard to take care of using distance criteria. Some solutions to the problems caused by tied reference values are suggested.
Conclusions: Partitioning of biochemical reference data should preferably be based on proportion criteria; this is particularly true for nongaussian distributions. Both of the described complications of the partitioning problem, the prevalences of the subgroups and tied reference values, are hard to deal with using distance criteria, but the proposed methods make it possible to account for them when proportion criteria are applied.
| Introduction |
|---|
|
|
|---|
The partitioning criteria presented by Harris and Boyd (3)(4) were developed using gaussian distributions, and although practitioners seem frequently to apply them to nongaussian distributions as well, their applicability to these is questionable. In contrast to what many users of these criteria seem to believe, the basic idea of the HarrisBoyd method is not to perform statistical significance tests for distances between means, although a modified normal deviate test is part of this method. Rather, Harris and Boyd aimed at correlating these distances to proportions of the subgroup distributions outside the reference limits of the combined distribution. This correlation, established for gaussian distributions by Harris and Boyd, is not automatically valid for nongaussian distributions, however, because the proportions obtained for nongaussian distributions at particular distances between means can be quite different from those obtained for gaussian distributions at the same distances. The conclusions on partitioning made by applying the same criteria to nongaussian distributions can therefore be erroneous.
Because nongaussian distributions do not have a standard shape, generally valid mathematical expressions to correlate distances between them to proportions of the subgroup distributions outside the common reference limits cannot be derived. Hence, we focus here on measuring the proportions directly. We present numerical methods to solve the partitioning problem that are applicable to any kind of two distributions and that also account for the prevalences of the subgroups. We also discuss the importance of multiple or "tied" reference values and suggest solutions for this complication of the partitioning problem.
| Theoretical Considerations |
|---|
|
|
|---|
|
As was stated above, in general situations similar mathematical expressions, correlating distances and proportions to each other, cannot be derived for nongaussian distributions. Hence, their partitioning must be based on proportion criteria because the distance criteria are just more or less accurate correlates of these, whereas control of the proportions is the primary concern. Therefore, direct estimation of the proportions appears necessary when partitioning nongaussian distributions.
Because proportion criteria have previously not been expressed or used explicitly, we had to prepare a suggestion for such criteria (1). A natural starting point for our suggestion was that Harris and Boyd (3) considered proportions exceeding 4.0% to be high enough to imply partitioning. This value is not presented as an explicit partitioning criterion in their method, however, but because they used it to derive their criteria, that particular proportion has probably been widely used by clinical chemists as an implicit partitioning criterion, considering the fact that the HarrisBoyd method lies behind the NCCLS recommendations on partitioning (6). Although the correlation between distances and proportions seems to be rather poor in this method (1), it is hoped that the value of 4.0% does not lie far from an ideal proportion criterion for partitioning. Because the range of possible values for this criterion is limited by 5.0% from above in a standard situation (1)(2), every 0.1% matters when such an ideal proportion is specified.
In a subsequent publication, Harris and Boyd (4) were of the opinion that their original distance criterion might be too permissive of partitioning. This suggests that if the ideal proportion criterion deviates from the value of 4.0%, that value should be increased rather than decreased from the originally proposed one. We observed (1) that for gaussian distributions, 4.1% corresponded to an approximate critical distance of 0.75 s (standard deviations) for partitioning, an easy-to-remember distance criterion together with our suggestion of 0.25 s as a criterion for combining of the subgroups. These two arguments made us choose the value of 4.1% as our primary proportion criterion. This criterion is, at present, just a suggestion that needs to be tested extensively in practical work to evaluate its true properties as a partitioning criterion.
Instead of setting only one critical proportion, categorically implying partitioning, we proposed (1)(2) the following three-stage classification: In addition to the value of 4.1%, favoring calculation of subgroup-specific reference intervals if it was exceeded, we suggested that the value of 3.2% could be appropriate as a critical proportion for combining the subgroups, justifying use of common reference limits for both of them if it was not exceeded. These values concern the larger one of the subgroup proportions outside a common reference limit (pa in Fig. 1
). Because the sum of the larger and the smaller proportion at each end of the distributions is equal to 5.0% in a standard situation (1)(2), these suggestions simultaneously implied that the critical partitioning and combining proportions for the other one of the subgroup distributions (pb in Fig. 1
) should be 0.9% and 1.8%, respectively.
We recommended partitioning if any of the four proportions (two at the lower and two at the upper end of the distributions) outside the common reference limits was
4.1% or
0.9% and combining if all of these proportions lay between 1.8% and 3.2%. Other types of outcomes were considered as marginal, or as nonconclusive if taken alone. We further recommended that other than purely statistical considerations, e.g., clinical judgment and data from the literature, should be involved in the decision-making on partitioning in these cases. The three-stage classification makes application of the method more flexible than if only two classes of proportions were in use, those implying partitioning and those implying combining of the subgroups. This flexibility is, in our opinion, necessary as long as one has only limited experience on the application of the proposed criteria to real-life partitioning problems.
accounting for the prevalences of the subgroups
Importance of the prevalences for the partitioning problem.
To apply the proportion criteria to nongaussian distributions, one has first to identify their common reference limits. This is ordinarily done by pooling together the two subgroups and determining the common reference limits as the nonparametric 2.5 and 97.5 percentiles of these combined data.
However, to obtain correct conclusions on partitioning, the ratio between the numbers of reference values in the subgroups should reflect the ratio between the prevalences of the subgroup populations in the reference population (2). This is often not the case. In real-life reference interval studies, the ratios between the numbers of reference values in the subgroups can in fact be almost anything compared with the respective ratios between prevalences. Typically, one aims at collecting equal numbers of reference values in each subgroup (e.g., age group), ignoring and neglecting the underlying ratios of prevalences.
Samples used in reference interval studies should be representative of the subgroup populations from which they are extracted in every respect, including the proportional sizes, whenever decisions involving both subgroups, such as those on partitioning, need to be made. To illustrate how easy it is to draw false conclusions, we provide a concrete example from the ongoing Nordic Reference Interval Project (7)(8). This project is a joint reference interval project of the five Nordic countries (Denmark, Finland, Iceland, Norway, and Sweden) that aims at establishing common reference intervals for 25 selected clinical laboratory tests in these countries. The numbers of participating laboratories and recruited reference persons varied among the countries and did not reflect their population sizes. As an example, more data were obtained from Norway, which has a population of 4.5 million, than from Sweden, which has a population of 8.9 million. In making comparisons between countries and examining, for example, whether Norway and Sweden possibly need country-specific reference intervals for a specific biochemical marker, it is important to account for their population sizes in the calculations instead of simply putting together the collected data and making the conclusions on partitioning from these.
The distributions obtained from Norway and Sweden for serum calcium are shown schematically in Fig. 2
. The empirical distributions were in reality not strictly gaussian, but we plotted them as gaussian to obtain smooth and illustrative curves. The two lowermost curves (SwedenProject and Norway) depict the distribution for Sweden as scaled by 0.65 with respect to that for Norway, reflecting the ratio between the numbers of reference values obtained for serum calcium from these two countries (532 and 819, respectively). The lower common reference limit for these distributions would lead to the conclusion that Norway and Sweden should use country-specific reference intervals for serum calcium because a proportion of 4.2% was obtained for Sweden, whereas the critical proportion for partitioning was 4.1%. The two uppermost curves (SwedenPopulation and Norway) depict the distribution for Sweden as scaled by 1.98 with respect to that for Norway, reflecting the ratio between the population sizes of these countries. Using these distributions in the calculations, we obtained a proportion of only 3.5% for Sweden. The conclusion in this case would be that Norway and Sweden should be able to use the same reference interval for serum calcium because there are few nonstatistical reasons that would require the opposite conclusion to be made (the value of 3.5% is "marginal" as interpreted in terms of our three-stage classification of the proportions, and our recommendation was that one should at such marginal proportions consider nonstatistical arguments as a support for decision-making).
|
In this example we found a substantially smaller proportion (a decrease of 0.7% is substantial indeed on a scale from 2.5% to 5.0%) for Sweden if the population sizes, i.e., the "prevalences" of Swedes and Norwegians, are taken into account, because the weight of the distribution of Sweden compared with that of Norway is much larger than when the prevalences are not considered. The larger weight of the distribution of Sweden displaces the common reference limit toward the reference limit of that distribution, leading to the decrease in the proportion. It should be noted that when the weight of a subgroup is changed, only the common reference limits are changed; the reference limits of the subgroups remain unchanged.
Numerical methods to calculate the proportions of the subgroup distributions outside the common reference limits when the prevalences are accounted for.
From the example presented above, it is clear that there is a danger of drawing incorrect conclusions on partitioning if no adjustments are made to account for inconsistency between the ratio of the numbers of reference values in the subgroups (Nr) and the ratio between their prevalences (Fr). The simplest way to obtain common reference limits that are correctly adjusted to the prevalences would be to make these two ratios equal by eliminating an appropriate number of reference values from one of the subgroups. A solution that involves such a waste of original data would obviously not be ideal, however. In the following paragraphs, two numerical methods are proposed that make use of all the data available. One of these, the "root method", requires no manipulation of the original data, whereas the other one, the "multiplication method", is based on multiplying the subgroup distributions by appropriate factors.
Root method. Assume that the reference values are unique and consider first the case with Fr being equal to 1.0. The sum of the proportions pa and pb (Fig. 1
) will then be equal to 5.0%, or 0.05 (2). We denote as xpii a reference value of distribution i corresponding to proportion pi. Reference values, their ranks in the sorted subgroups, and the proportions calculated using these ranks can be treated as continuous real number variables if linear interpolation is applied between the sorted data points. The traditional rank-based method to calculate proportions, recommended by both the IFCC (5) and the NCCLS(6), is as follows:
![]() | (1) |
To determine the proportions of two subgroups outside a common reference limit, one should scan the reference values of the sorted combined data vector, interpolating between data points to achieve a desired precision, until a reference value is found that gives values for the proportions such that they sum to 0.05:
![]() | (2) |
![]() | (3) |
![]() | (4) |
1(pa, pb) in Eq. 4
1(pa, pb) equal to zero. As a function of two independent variables,
1(pa, pb) describes a plane and has a zero line rather than a single zero point as its solution, however.
This situation is illustrated in Fig. 3
. The intersection line between the plane
1(pa, pb) and the zero plane
0(pa, pb) is this zero line that is not the solution for a specific pair of subgroup distributions but rather gives the solutions for all such pairs. To find the unique solution for a specific pair on the zero line, one should consider the individual path corresponding to it on the plane
1(pa, pb); the black curve on this plane in Fig. 3
shows an example of such paths. All of these paths start at pa = 0.025 and end at pb = 0.025, whereas the value of the other proportion at both ends can vary between cases, and depict an irregular curve between these endpoints depending on the shapes of the subgroup distributions. Each path describes a strictly increasing function of both pa and pb because the gradient of the plane
1(pa, pb) is (1, 1), and it can also be shown that between the endpoints there exists a zero point (the black circle at the intersection of the path and the zero plane in Fig. 3
). Conclusively, there is a unique solution for the equation
1(pa, pb) = 0 between the reference limits of the subgroup distributions. A slightly more formal presentation of these considerations is given in the Appendix, which is published in an electronic supplement to this report (http://www.clinchem.org/content/vol50/issue5/). One important condition for this conclusion is, however, that the reference values are unique. This condition is discussed below.
|
Hence, to find the unique solution for
1(pa, pb) = 0, one can restrict the search to the range of reference values lying between the reference limits of the subgroup distributions. Because this range is usually not very large, sophisticated root-finding methods are not necessary. For example, one could scan the reference values of distribution a until two consecutive reference values are found that give opposite signs for
1(pa, pb) (Fig. 4
). The interspace between such reference values could subsequently be scanned to a desired precision by interpolation.
|
The method just described is readily extendable to the more general cases with unequal prevalences. In such cases the sum of the proportions pa and pb will deviate from 5.0%, depending on the value of Fr (2). If Fr is calculated separately for the lower and the upper end of the subgroup distributions, the proportions and the common reference limit for each end can be determined by solving the following equation in exactly the same way that Eq. 4
was solved above:
![]() | (5) |
Multiplication method. A more straightforward method to take account of the prevalences would be multiplication of the distributions by appropriate factors to make Nr and Fr equal to each other. To express the relationship between these two ratios, we use the following parameter:
![]() | (6) |
![]() | (7) |
To illustrate this, assume that Freff is equal to 1.27. Unity would undoubtedly be too rough as an approximation for 1.27. Rounding to one decimal, 1.3 is obtained, but this value is useless as a multiplier if we prefer to operate with integer numbers of copies for each reference value. To obtain such numbers in the multiplied distribution i, 1.3 could be multiplied further by 10 to make it 13. If the factor of distribution i is multiplied by such an additional factor, however, distribution j must also be multiplied by 10 to keep the (adjusted) ratio between the numbers ni and nj unchanged. Accordingly, distribution i could be multiplied by 13 and distribution j by 10 in this example. If the sizes of the original subgroups were of the order, e.g., 1000, virtual data vectors having a size of the order 10 000 would be obtained. Rounding to one decimal could also approximate too much, and even larger virtual data vectors may need to be used in the calculations. Hence, implementations of the multiplication method that would treat the data vectors as concrete objects, e.g., on a spreadsheet, are not feasible. Once the common reference limits have been established from the (virtual) multiplied data vectors, prevalence-adjusted values for the proportions can be readily obtained by applying those limits to the original subgroups.
coping with tied reference values
It was shown above that the proportions pa and pb obtained at the common reference limit are unique, but this conclusion was said to be valid only under the assumption that the reference values also are unique. We examine next what happens if this assumption is not true. Consider once more the path depicted on the plane
1(pa, pb) in Fig. 3
. The difference given in Eq. 4
, described by this path, is strictly increasing with respect to both pa and pb because if the reference values are unique, their ranks in both subgroups increase constantly as one scans the range of the sorted reference values lying between the reference limits of the subgroups; the proportions derived from these ranks by use of expressions such at that given in Eq. 1
also increase constantly. Consider what would happen if there were several copies of a reference value, or tied reference values, in one of the subgroups, such as subgroup b. While scanning through such an array of tied reference values, the rank and the proportion would be increased for subgroup b but would remain unchanged for subgroup a because there were no intervening reference values in the data vector of that subgroup. This is illustrated by line segment 1, running parallel to the axis of proportion pb, of the path plotted in Fig. 5A
.
|
Line segment 2 in Fig. 5A
, running parallel to the axis of proportion pa, represents a similar array of tied reference values in subgroup a. If both subgroups have copies of a specific reference value, the path would have segments of line running parallel to both axes simultaneously (boxes 3 and 4 in Fig. 5A
). If such a box appears at the area where the path crosses the zero line, as is the case with box 4, the solution of the partitioning problem would not be unique, but rather a choice would have to be made between all of the proportions that lie within that box. This seems to introduce a considerable additional component of uncertainty to the values of pa and pb at the common reference limit and thereby to the conclusion on partitioning, as compared with the case of unique reference values illustrated in Fig. 3
.
Another view of the situation described in Fig. 5A
is shown in Fig. 5B
, which shows the lower ends of distributions a and b and that of the combined distribution in detail, illustrating how the path in Fig. 5A
could have been generated from real reference data. The numbers 14 in panels A and B correspond roughly to each other. If the common reference limit is localized within an array of tied reference values, as shown by the arrow in Fig. 5B
, and both subgroups have copies of that reference value, there is apparently no way to obtain unique values for the proportions pa and pb.
The conventional method in nonparametric statistics to treat tied observations is to set the rank of each of them to the average of the ranks that these observations would have if they were not tied observations. This method, illustrated by the midpoint (P1) of box 4 (Fig. 5A
), is not ideal in our case. Consider an example with two similar subgroups, both having a large number of copies of the same reference value representing, e.g., 10% of the data, at the beginning of the distribution. If the average of the ranks of these tied values were in such a special situation used to define the proportions of the subgroups outside the lower common reference limit, one would obtain the proportion of 5% for each subgroup. These proportions would imply partitioning, but such a conclusion from the lower ends of those distributions would obviously be wrong if the lower ends are in reality identical. To handle situations like this, the proportions could be determined by dividing the arrays of tied reference values in the subgroups; use of the same ratio as the common reference limit divides that array in the combined distribution. Such a solution, depicted as point P2 in box 4 (Fig. 5A
), would in the example just considered lead to a proportion of 2.5% for both subgroups, which would correspond much better to our intuition of not partitioning them.
Both of the solutions P1 and P2 have the drawback of not lying on the segment cut off by box 4 from the zero line (Fig. 5A
), whereas an ideal solution should perhaps lie on this segment. The only meaningful solution on this segment seems to be its midpoint, indicated by point P3 in box 4. We suggest also considering this point when looking for a reasonable, unique solution to substitute the latitude of choice represented by the entire box.
calculation of confidence limits for the proportions
Nonparametric confidence intervals for percentiles could be estimated according to the rank-based method published by Reed et al. (10) and recommended by both the IFCC(5) and NCCLS (6). Confidence intervals for the proportions cannot, however, be calculated by applying this method to the subgroup distributions because such confidence intervals would reflect the local properties of these distributions rather than give the desired coverage probability, typically 0.90, for a range of these proportions as representing the spectrum of probable outcomes of the procedure used to calculate them. Instead, one should convert the confidence limits of the common reference limit to proportions within each subgroup distribution and consider such converted proportions as confidence limits for these proportions. Note carefully that such confidence intervals may have coverage probabilities in the subgroup distributions that differ from the nominal confidence level.
Linnet (11) has shown that the bootstrap resampling method usually gives more accurate percentiles than single nonparametric calculations, both for gaussian distributions and for a variety of skewed distributions that are representative for distributions of biochemical reference data. Linnets results further suggest that the bootstrap method is more effective than single calculations to estimate confidence intervals for percentiles of these distributions, provided that that the sample sizes are not very small (<100). If a distribution of bootstrap results for a percentile or a proportion has not been characterized, it is probably prudent to calculate the confidence limits as nonparametric percentiles of that distribution rather than to use gaussian approximations.
Two strategies are possible for estimating the confidence intervals for the proportions using the bootstrap method:
We feel that the latter of these two strategies is in better agreement with the spirit of resampling and therefore preferable. However, at "large" sample sizes and iteration numbers both strategies would probably lead to similar results.
| Discussion |
|---|
|
|
|---|
To correct for prevalences, two methods are proposed: the root method and the multiplication method. Although these two methods appear rather different in technical terms, the root method being based on iterative solution of a simple equation and the multiplication method on expansions of the original data vectors, both methods actually perform the same adjustment for the prevalences. The multiplication method makes this adjustment in a straightforward way, by multiplying the subgroup distributions to enable establishment of the prevalence-corrected common reference limits as percentiles of the combined distribution of such multiplied subgroups. In the root method, no changes to the original data vectors are required. Instead of modifying the subgroups, this method focuses on the relationship between the subgroup proportions at the common reference limit. To specify that limit, it makes adjustments of both subgroup proportions in parallel, and because only proportions determined from the subgroup data vectors are used during the iteration, the numbers of reference values in them do not affect the localization of the common reference limit.
The advantage of the root method is that no manipulation of the original data is needed, but it has one drawback. If a common reference limit lies outside one of the endpoints of the subgroups, the difference in Eq. 5
changes sign before the first reference value of that subgroup is reached when the range of reference values lying between the reference limits of the subgroups is scanned. Such a situation is technically hard to handle because rank-based, nonparametric treatment of distributions starts at the first reference value. This is not a serious problem when a single nonparametric calculation is performed because the exact localization of the common reference limit is seldom needed to make a decision on partitioning in that case; the subgroups should most often be partitioned if the proportion of one of the subgroups corresponds to a rank that is smaller than 1. As an example, if the number of reference values in a subgroup is 120, traditionally considered the minimum sample size for reference interval studies (5), the first reference value would correspond to a proportion of 0.8%, as calculated from Eq. 1
, but this value lies below the critical proportion for partitioning, 0.9%, suggested by us (1)(2).
If the bootstrap method is used, exact values for the proportions should preferably be obtained at every iteration cycle because otherwise the final results could not be calculated as statistics of the bootstrap distribution. Note that a situation with a common reference limit lying outside one of the endpoints of the subgroups may be a rare occurrence of a bootstrap iteration process, corresponding perhaps to some very unlikely combinations among the stochastically constructed bootstrap data vectors, whereas the vast majority of the iterations could give unproblematic common reference limits that lie between the endpoints of both subgroups. Although such occurrences may be rare, they could corrupt the bootstrap calculation and make it hard to obtain reliable results. This may be avoided by the use of curve-fitting methods to delineate the subgroup distributions beyond their endpoints and also to obtain estimates for the proportions from these areas. Because the rank-based nonparametric method applies linear approximation between observations, it is perhaps not necessary to describe the behavior of the subgroup distributions beyond the endpoints in detail to obtain reasonable estimates for the reference values corresponding to the value of zero of the probability density. In other words, if bootstrap calculations are performed and the root method is applied, the conventional rank-based nonparametric approach should be extended from covering the range between the 1st and the nth reference values to the range between the extrapolated 0th and n + 1th reference values, assuming that Eq. 1
is used to calculate proportions.
Despite this drawback of the root method, it appears technically more elegant than the multiplication method. Because the multipliers frequently need to be rounded, one should make sure that the decimals included in them suffice to obtain the proportions with a high enough precision (0.05% could be an appropriate precision). Our experiments made on data of the Nordic Reference Interval Project (7)(8) suggest that multipliers of the order 100010 000 may be needed to reach that level of consistency between runs performed with increasing orders of magnitude for the multipliers.
Tied reference values are a complication that can impair the precision of the proportions considerably. In the Nordic Reference Interval Project (7)(8), some of the laboratory tests had arrays of tied reference values with a length extending up to 1.0% of the subgroups. Box 4 in Fig. 5A
is by no means exaggerated in its dimensions. Observe that the problem occurs only when there are tied values at the common reference limit in both subgroups simultaneously (at least one in one of the subgroups and more than one in the other). If all of the tied reference values originate from one of the subgroups, the path in Fig. 5A
would cross the zero line (the dashed line) at one point, giving a unique solution. This complication is fortunately not very common, but when it occurs it can be an important source of error, especially if it remains unrecognized by the investigator.
Presenting reference values in more decimals than the analytical quality warrants is, in our opinion, not a useful way to treat the uncertainty inherent to arrays of tied reference values. Excessive decimals do not express true accuracy; they are just random noise, irrespective of whether they were generated automatically by the analytical instrument or added intentionally to the reference values by the investigator. The excessive decimals would seemingly eliminate the tied reference values by sorting these in a specified order, but such a randomly generated order can hardly be considered a solid basis for reliable decisions on partitioning.
To understand how such random orders may lead to variation of the proportions and thereby also to different conclusions, consider Fig. 5B
, in which the four lowermost of the seven tied reference values at the common reference limit are depicted as if they were coming from distribution a and the three uppermost as if they were coming from distribution b. In reality, putting aside the excessive decimals, these reference values are equal, and there is no way they can be ascribed to either of the subgroups. The situation in Fig. 5B
is just one of the many possible arrangements that excessive decimals could lead to by ordering, with questionable justification, these seven reference values. One can imagine from Fig. 5B
what kind of proportions this specific arrangement would give, and if this particular randomly generated set of excessive decimals were accepted, one could consider the obtained proportions as a unique solution of the partitioning problem. However, another set of excessive decimals could have ordered the array of the seven tied reference values in the combined distribution quite differently, leading, e.g., to a situation in which the three reference values coming from subgroup b would lie lowermost in that array. Because the position of the common reference limit, indicated by the arrow in Fig. 5B
, remains unchanged, the conversion of that limit to proportions in the subgroups would give proportions different from those in the situation just considered, and one would obtain another, seemingly unique solution. Clearly, neither of these seemingly unique solutions, corresponding to different sets of excessive decimals, should be considered as a true solution of the partitioning problem.
If this randomization procedure were repeated a large number of times in a simulation study, the observation could be made that, on average, the common reference limit will divide the arrays of tied reference values in each subgroup in the same ratio that it divides this array in the combined distribution (the solution represented by point P2 in box 4 of Fig. 5A
). This is another argument supporting such a solution as opposed to the conventional one represented by the midpoint (P1) of box 4. In the section on coping with tied reference values we presented a partitioning problem that the conventional solution failed to solve but which the solution represented by point P2 solved successfully. However, as discussed above, both solutions have the weakness of lying only accidentally on the zero line. Our third suggestion, represented by point P3, is admittedly heuristic, but it has the advantage of making use of the information given by the zero line to restrict the space of possible solutions within the arrays of tied reference values.
This study was focused on the rank-based method recommended by the IFCC for nongaussian distributions, attempting to improve that method by supplementing established techniques with procedures that take account of the prevalences and tied observations. As an alternative approach to the rank-based method, curve-fitting procedures could be applied to model nongaussian distributions. In that approach, tied reference values could be handled by an appropriate smoothing algorithm and the prevalence effect by continuous analogs for the root and multiplication methods.
Focusing on proportions corresponding to peripherally located percentiles of reference distributions, the new method for partitioning appears particularly susceptible to the lower statistical quality of such percentiles compared with that of means. However, the HarrisBoyd method (3)(4) also refers indirectly to the same proportions in the end, using differences between means and ratios between standard deviations as mediating parameters. Direct estimation of a parameter must be more accurate than indirect estimation, however, because the inaccuracy of the mediating parameters, added to the total uncertainty, will then be avoided. Interestingly, the issue of statistical quality of partitioning methods seems never to have been investigated in the laboratory medical literature. According to the NCCLS guidelines (6), 120 reference values in each subgroup could suffice, but this recommendation is not expressed in terms of uncertainty of the proportions or that concerning suggestions on partitioning. The traditional minimum sample size of 120 for establishment of reference limits in the case of nongaussian distributions (5) seems to have been assumed more or less heuristically as also appropriate for partitioning calculations, but that number may turn out to have been highly optimistic as an estimate once the statistical quality of partitioning methods is examined in future studies.
To illustrate the theoretical considerations presented in this report, an associated practical study was prepared (12), which apart from offering extensive practical examples on the suggested improvements for the rank-based method, also performs a comparison between the new method for partitioning and that of Harris and Boyd (3)(4), using reference data collected in the Nordic Reference Interval Project (7)(8).
| Conclusions |
|---|
|
|
|---|
B. Proportion criteria are preferable to distance criteria, notably when partitioning nongaussian distributions, for three reasons:
C. Two methods to account for the prevalences were described, to be applied when using proportion criteria:
D. Tied reference values become a problem if there are copies of them in both subgroups at the common reference limit. The conventional solution used in nonparametric statistics of setting the rank of each of the tied observations to the average of these seems unsatisfactory for partitioning purposes, and better solutions were suggested.
| Acknowledgments |
|---|
| References |
|---|
|
|
|---|
The following articles in journals at HighWire Press have cited this article:
![]() |
E. Grossi, R. Colombo, S. Cavuto, and C. Franzini The REALAB Project: A New Method for the Formulation of Reference Intervals Based on Current Data Clin. Chem., July 1, 2005; 51(7): 1232 - 1240. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||