|
|
||||||||
Laboratory Management |
a Author for correspondence. Fax 49-89-4140 4875; e-mail A_S_Neubauer{at}hotmail.com
| Abstract |
|---|
|
|
|---|
| Introduction |
|---|
|
|
|---|
In 1990, Westgard et al. (2) examined a very similar problem when they tried to determine a cost-efficient control strategy for a Hitachi 737. With their emphasis on showing the impact of the size of medically relevant differences on the selection of alarm limits for control charts, they also concluded that, by increasing the batch size between controls from 20 to 60 patient samples, a higher cost-effectiveness can be achieved. Using batches with bracketing controls first, their study showed how switching to a nonbracketing mode of operation increased productivity.
These two main inspection designs found in practice (bracketing/nonbracketing) are explained below, using symbols. The controls are abbreviated "C," the patient samples "p," and "n" is an index for the number of measurements of a control sample. The bracket "]" means charting the control measurements and examining the results, whereas "[" means the start of a batch. Using this notation, the inspection strategies can be written as follows:
(a) batch mode with bracketing controls:
[C1 p p p ... p C2][C3 p p p ... p C4] [C5 ... Cn-2] [Cn-1 p p p ... p Cn]
(b) nonbracketing mode:
[C1] p p p ... p [C2] p p p ... p [C3] ... [C(n-1)/2] p p p ... p[C(n1)/2].
Westgard et al. also presumed that cost-effectiveness could also be increased by even larger average numbers of patient samples between the control samples. Previously, Koch et al. (3) showed that the size of analytical errors to be detected in relation to the medically relevant changes and the chart type used must be taken into account when making these conclusions. They found that, for almost all analytes in serum chemistry, the use of Shewhart charts with no more than two controls at a time was sufficient.
Designs using preoperation controls before each batch offered little additional gain but reduced the speed of sample processing (2). However, a preoperation control after starting operation may be worth the delay (4).
There are generally three completely different approaches to assessing the problem of optimal control schemes systematically: (a) heuristic (based on experience), (b) statistical optimization, and (c) economic optimization.
Statistical approaches optimize the statistic properties of the inspection scheme, whereas economic models use a cost (or profit) function to optimize the total quality costs. These costs include, for example, the costs of false alarms or the costs of producing unsatisfactory quality. Combined statistic-economic approaches are mainly economic approaches, but these approaches trade small cost increases for an increased overall statistical performance.
The purely economic approach is not commonly used in clinical chemistry, but a basic model was described by Westgard and Barry on pp. 138142 of their book on cost-effective quality control (5). In addition, from this approach they developed a pseudo-economic model, described on pp. 142149 (5), that is based on so-called test yields (i.e., relative productivity) and is the most comprehensive and commonly used model. Regarding the optimal inspection strategy, some the important conclusions of Westgard and Barry are as follows:
(a) Random access analyzers offer higher productivities than batch processes.
(b) The frequency of errors and statistical power of the control chart are critical for the control strategy.
(c) Increasing the number of simultaneous batches (i.e., many simultaneous processes) reduces the productivity.
(d) Productivity increases with increasing run sizes from 10 to 60 patient-samples between controls [ Fig. 62C on p. 147 of (5)]. However, the absolute gains of increasing run sizes get smaller and smaller with higher numbers of patient samples.
(e) For most situations in serum chemistry, one or two controls at each time are statistically sufficient for error detection.
| Materials and Methods |
|---|
|
|
|---|
|
For the approach of Lorenzen and Vance (9), it is also assumed that the average time to remove an existing error is one hour, that the process does not run during an error search or error resolval, and that the minimum run length to examine is 370 (which is equivalent to examining only schemes with a maximum frequency of false alarms corresponding to a 3-SD Shewhart chart). The Fortran source code of the programs [Montgomery (6) and McWilliams (7)] was received from the Journal of Quality Technology via e-mail (now available via the Worldwide Web at http://lib.stat.cmu.edu/jqt/) and then was compiled and run on a Pentium(TM) 100 system using a NAGware(TM) FTN90(TM) Professional Plus compiler, Ver. 2.1.
To summarize, the optimization problem was approached here quite differently from the test-yield model of Westgard and Barry (5). Their model was investigated with a specially programmed simulation program and a Microsoft Excel spreadsheet.
field study
A field study was undertaken to obtain data to evaluate the
designs proposed in the literature and those derived by simulations.
The test bed was prepared by increasing the frequency of control pools
measured in the Hitachi 747 multichannel analyzer, which does most of
the serum analysis in our laboratory. To assess the use of more than
one control sample, one rack was filled with three subsequent sample
cups containing the same control pool and two sample cups containing
another control pool. The control pools used were lyophilized serum
pools (Boehringer Mannheim/Klinikum Großhadern) that differed
mainly in the concentrations of the various analytes and were
reconstituted every evening for the next day. The Hitachi 747 was
operated according to our daily routine with Boehringer Mannheim
reagents and was controlled by our standard inspection strategy, which
is based on one control rack containing four different serum pools for
approximately every 60 patient samples, which were charted on 3-SD
Shewhart charts. The five sera on the additional study rack (containing
the two different pools) were measured after approximately every 30
patient samples. No special cooling was provided for the control sera.
Our laboratory is part of a 1000-bed university hospital, for which it does most of the necessary analyses. Routinely, our laboratory determines 23 different analytes on the Hitachi 747. During the two months of the study (February and March of 1996; 41 working days) an average of 4000 measurements for the 23 different analytes were performed every day on 250350 patient samples.
Seven of the more frequently investigated analytes were selected for
the study on the basis of the criteria representativeness for different
groups of tests, frequency of tests, and bearable costs: sodium,
potassium, calcium, creatinine,
-glutamyltransferase, alkaline
phosphatase, and pseudocholinesterase.
All data were stored in an Microsoft Access database (Ver. 2.0).
Analysis of the ~20 000 control measurements was performed with
SPSS(TM) for Windows (Ver. 6.1.3) and Microsoft Excel, Ver. 5.0. Analysis
included a thorough graphic and statistical examination of all data by
all methods applicable and available with SPSS, especially box-plots
and factor analysis. Finally, a simple approach was derived for
determining an optimized control strategy: All data from the first
month were used to calculate 3-SD limits for Shewhart control charts
(Table 2
). These control charts were then applied to the second month.
Different strategies were theoretically constructed [on the basis of
the mean of one, two, or three of the measurements at each time and on
different numbers of patients between the controls (30, 60, 90, or
120)]. The numbers of alarms caused by the control charts using these
different theoretical inspection schemes were listed in two tables
(summarized in Table 3
). Finally these tables were judged by nine experts
(physicians/clinical chemists leading a hospital laboratory or part of
such a laboratory) by means of a questionnaire (Fig. 1
).
|
|
|
| Results |
|---|
|
|
|---|
|
The results of the economic and the economic-statistic optimization are nearly identical. The two simulation programs result in very similar suggestions for optimal sampling plans. When we varied all different input parameters by up to a factor of 10 in each direction, time intervals between controls varied between 5 and 60 min. Assuming 3000 possible tests per hour and 10 tests per patient, ~5 patient sera can be processed per minute by the Hitachi 747. Under these conditions, the possible intervals between controls vary between 25 and 300 patients. Adjustments to other assumptions, such as different costs or different quality requirements, can be performed easily by downloading the programs/spreadsheets from our website.
field study
In 80% of the cases, the number of patient samples between
controls was
30. The remaining 20% of cases showed longer run sizes
of up to 50 patients. Because not all of the seven selected analytes
were requested for each patient sample, an average of ~18 (SD = 10)
patient samples tested for a specific analyte lay between study
controls.
To determine the optimal batch size, the number of alarms caused by
control charts using the different inspection schemes listed in Table 3
were assessed by experts answering a questionnaire (Fig. 1
). Of the 10
experts who were mailed questionnaires, 9 completed the questionnaire.
One questionnaire was not returned. Their answers, based on the
information presented in this paper, are listed in Table 4
and Fig. 3
.
|
|
The inspection schemes selected (Table 4
) appear fairly divergent at
first glance but reveal a degree of consensus on closer inspection.
Most experts selected a control frequency of between 30 and 100 patient
samples (question 1), and only one answer in each direction is outside
this interval (experts B and G). The repetition of controls (question
2) was believed to be useful by only two experts (experts C and D). For
the chart type (question 4), a simple Shewhart chart was preferred by
most of the quality control managers, two preferred adding the
Rili-BÄK
rules1
and two experts preferred choosing another
chart type or other rules. Question 3 shows that two different control
materials were suggested by the majority of experts (seven of the nine
who responded). However, here two groups exist: one major group
preferred two completely different materials (e.g., one serum pool and
one commercial materialpossibly in different concentrations), whereas
the other group preferred using different concentrations of the same
material.
The factors underlying these assessments are shown in Fig. 3
. Regarding
the costs of control material and reagents for quality control
(questions 1 and 2) two groups exist: One group believed that these
costs are important, the other group did not regard this to be a
factor. A similar situation is found for costs of judging the control
results (question 3), but here the majority assessed this point as
being relatively unimportant. In question 4, the costs for error
removal were also assessed by the majority as being quite unimportant.
Some disagreement is observed regarding the costs of false alarms
(question 5). Questions 6 through 8 ask for the importance of different
sizes of analytical errors. Although medically important errors were
clearly appreciated by nearly all experts (question 8) and those
smaller in size are judged as less important, for some of the analysts
quite small errors (question 6) were of relatively high interest.
Handling of the control chart (question 10) was recognized as being
very important by all experts, whereas the number of channels (question
9) was assessed as being of only medium interest. The frequency of
errors was believed to be crucial (question 12), whereas the time used
for performing and analyzing controls was of medium interest (question
11). Two experts added (question 13) that the sensitivity of the
control chart and the quality of control materials are essential. In
Fig. 3
as well as in Table 4
, no correlation could be identified
between experts working in our institute (codes A, B, C, and I) and
external experts (codes D, E, F, G, and H).
In addition to optimizing the sampling strategy, our study clearly showed how important storage and handling of control material is: Trends during the day can be identified for some analytes when charting cumulative control results over the entire study period. Because the samples were not refrigerated or airtight, specimen evaporation increased concentration or activity up to 10% over 6 to 7 h. For example, the calcium concentration increased significantly [linear regression model, (mmol/L): Ca = 2.571 0.0159 x hours; 95% confidence intervals of (2.542; 2.600) and (0.014; 0.018)]. These findings are in accordance with (11) and (12). The stability of the reconstituted control serum was high enough not to cause visible effects (13)(14)(15). Factors such as the day of the week showed no influence on the test results.
Another point worth mentioning is the method of calculating control
limits (Table 2
). When control charts are used for quality control
during the day, it is essential to use limits derived in the same way
(e.g., including evaporation effects and serum instability during the
day). Control limits derived from an earlier month and calculated
exactly as recommended by the Rili-BÄK (
20 values, the same
scheme, e.g., always the second control of a day) can be used only for
controlling day-to-day imprecision but cannot be used for same-day
control purposes because of the effects mentioned above.
| Discussion |
|---|
|
|
|---|
Regarding the number of controls at each time, all approaches showed that, when using 3-SD Shewhart charts, two identical control samples at each time are sufficient. Charting the mean of the two measurements offers enough statistical power for all analytes, in agreement with Koch et al. (3). The use of such a scheme with two measurements at each time can be represented as follows:
C11C12 C21C22 C31C32 Ci1Ci2 ppp ... p C11C12 C21...
We make no recommendation about the number of different control
materials necessary, i.e., the index "i" in the symbolical
description above. Such a selection (13) cannot be derived
from the statistical requirements (Table 4
, however, does include the
experts' view on this topic).
With control charts that have a greater degree of statistical power than the Shewhart charts that we used here, one repeat of each control sample might be sufficient for all analytes. Such charts include the CUSUM charts (16), EWMA charts (17)(18) or the Westgard Multirule algorithm (19). The official Rili-BÄK multirules are not suitable, as shown earlier (20). However, because the Westgard algorithm is based on one control sample at a time, multirules across at least two different control materials or concentrations are applied. Therefore this procedure has the statistical power desired only if both materials behave identically.
Regarding the frequency of controls necessary on automatic multichannel analyzers, our simulation studies showed relatively low frequencies to be cost-efficient. Our results suggest controls every 15 min, which implies a cost optimum of 75 patient samples between controls when considering a throughput of ~5 patient samples per minute. In this respect, our conclusions are consistent with the findings of Westgard and Barry (5) and Westgard et al. (2). However, our simulation results may not be overestimated because of limitations such as uncertain cost factors and considering only shifts in inaccuracy.
Examining intervals longer than 120 patient samples between controls did not seem reasonable because a batch size of 120 patients means using only 23 controls per day, given a total number of 200400 patient samples per day. Often a change of reagents is necessary once a day, therefore, one control would be used when starting the analysis in the morning, and the other control would be used after the reagents are changed in the afternoon.
The results of the expert assessments support the results of our rough
simulations: (a) Between 30 and 100 patient samples between
controls were preferred in question 1 of Table 4
, which is consistent
with the simulation results. (b) The use of a Shewhart chart
appealed to the majority of experts (question 4, Table 4
).
(c) However, charting the mean of two controls at each time
was not supported by the experts (question 2). In this context, the
size of (medically) relevant shifts is very important and would have to
be assessed separately for each analyte.
The experts' judgment of the different factors underlying the cost
model are reflected in Fig. 3
. Some factors were judged not to be
essential (e.g., questions 14 and question 11). But the factors
crucial for the simulation resultsthe size (questions 68) and
frequency of errors (question 12)were regarded as being important by
the experts. The result of question 6 supports the opinion that small
errors may be of some interest because they occur more frequently and
reveal incipient problems at an earlier timepoint (20).
Interestingly, the answers to question 9 reveal that the importance of
multiple channels is underestimated by most experts. However, as we
were recently reminded by Petersen et al. (21), the
probability of false rejections (Pfr) for a control chart
with n channels is related with nth
power of the false rejection of a single channel: Pfr total
= 1 - [1 - Pfr]n. This
means, for example, that when 3-SD Shewhart charts with 20 channels are
used, the overall probability of false rejection is 5.3%
[Pfr = 1 - (1 - 0.0027)
0.053].
Regarding the validity of our results, it may be noted that the applicability of our simulation approach is based on the assumption of a continuous process not necessarily realized in the clinical laboratory. Nevertheless, the mode of operation is reasonably close to a continuous process, and the stability of the analyzer over the day is high enough to reject batches with bracketing controls (2). The economic models that we used investigated only inaccuracy, and nearly all costs are rough estimates. However, the use of these models still seems justified because they have been practice-proven over many years in industrial quality control and offer a simple means of optimization. Adaptation to other cost situations, separate optimizations for different tests, and charts other than the Shewhart chart can be easily achieved. However, models investigating not only inaccuracy but also imprecision and based on samples instead of hours of operation would improve simulation results.
The specific Westgard and Barry model that uses test yields (5) isunlike their cost modelnot suitable for determining the optimal control strategy, because no optimum exists. The test yield approaches unity with increasing numbers of patients between controls, meaning:
test yield (20 samples) < test yield (60) < test yield (500) < test yield (1 000 000).
Additionally, negative test yields are possible by changing the
rerun factors. Table 5
shows this strange behavior when Westgard's test-yield model
is used.
|
The advantages that the test-yield model provides include ease of adjustment to different quality requirements and the performance of different quality control procedures, a performance estimate that has practical meaning in the laboratory and is potentially measurable and verifiable from laboratory results, and an overall modeling strategy that allows the prediction of quality in terms of the defect rate. When using economic cost models (or economic-statistical models) instead of test-yield models, one encounters problems with determining costs instead of using simple assumptions on cost in the form of repeat factors. On the other hand, transforming these models into others that do not use monetary parameters (such as the test-yield model of Westgard and Barry) is quite complex and error-prone. The final result of such an attempt is mainly an exchange of "costs" for "rerun factors," which still does not resolve problems with determining these factors exactly. We think that using cost models simplifies the whole process because it is easier to estimate monetary costs than to assess less tangible factors such as "analytic rerun factors".
A completely different approach that makes use of the average of normals (AON) to maximize run lengths was proposed by Westgard et al. (22) in 1996. The AON method observes the average of normal patient results and compares them with the theoretically expected average to assess if imprecision and inaccuracy requirements are met. Because, in contrast to quality-control samples, the patient results do not mean any additional sample costs, repeated attempts to establish this method for quality control were made in the past. Although the high number of patient results necessary prevented a widespread use in the past, Westgard et al. have shown that today, with larger laboratories and computer support, AON is applicable for many tests (22). Their minimum numbers necessary for candidates with high potential for AON range between 30 and 450. These numbers give the minimum number of patient samples necessary to use the AON method with a certain statistical power. Unlike this method, in the optimization of a process controlled by control samples, the statistical properties of a control chart are already given and patient numbers between controls are varied to reach a cost optimum. This optimum is found by taking into account the consequences of those cases not meeting quality standards, rejected falsely, and so forth, by means of a cost function or test yields. Thus, our economic optimization approach based on the use of control samples is fundamentally different from Westgard's statistical approach of determining necessary AON lengths. However, control samples and the AON method may be advantageously applied simultaneously, which leads to a combined control system. This combined system could be optimized again in a test-yield or economic model that uses the statistical properties of both control methods and the costs of both methods.
To examine quality-control process parameters more thoroughly than by solely theoretical models, we recommend conducting a field study. By establishing "theoretical" alarm tables for the measurements, we have presented a simple method for judging the impacts of different control strategies. The final selection of a strategy can be made using the different cost-model (or test-yield) scenarios. Additionally, factors such as professional handling by the laboratory technician or special analyzer requirements that are not included in the model, can be taken into account. However, this step means avoiding strict economic optimization and trying an optimization that includes intangible, nonmonetary factors. However, because any model can include only part of the expert knowledge available, a final assessment by experts seems necessary and valuable to account for remaining inadequacies.
| Acknowledgments |
|---|
| Footnotes |
|---|
1 Rili-BÄK (German guideline of the federal physicians association) multirules (10): 7T, the assay is out of control if seven consecutive measurements show the same trend upward of downward and 7X, the assay is out of control if seven consecutive measurements fall on one side of the mean. The association recommends using these rules in addition to the 3-SD Shewhart chart for assessing imprecision in internal quality control. External (as well as internal) quality control is supervised by the Weights and Measures Office in Germany. Medical laboratories are required to participate in external quality assessment and qualify for each analyte at least twice a year. ![]()
| References |
|---|
|
|
|---|
The following articles in journals at HighWire Press have cited this article:
![]() |
C. A. Parvin and S. Robbins III Evaluation of the Performance of Randomized versus Fixed Time Schedules for Quality Control Procedures Clin. Chem., April 1, 2007; 53(4): 575 - 580. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |