This article describes a decision analysis and cost-effectiveness study of screening for mild hypothyroidism. In the current climate of managing costs and managed care, cost-effectiveness studies are increasingly important. Rather than summarizing the paper in great detail, I will try to lay out the basic concepts that underlie it and thus, hopefully, make it easier to read the original study.

Screening for mild thyroid failure at the periodic health examination. A decision and cost-effectiveness analysis

Authors: Danese M, Powe N, Sawin C, Ladenson P.
Source: JAMA. 276:285-92. July 24/31, 1996.
Institutions: Johns Hopkins; Boston VA; Boston University Medical Center.
Financial support: Corning Clinical Laboratories, Forest Laboratories, National Eye Institute, National Institute of Aging, Veterans Affairs Boston.



Thyroid failure is a relatively common problem. In its mildest form, it consists of an elevation of TSH without a significant decrease in circulating thyroid hormone. There are three main arguments in favor of screening for and treating this disorder: averting the progression to overt hypothyroidism, alleviating the subtle symptoms that sometimes accompany subclinical hypothyroidism, and improving serum lipid levels. Since measuring serum TSH in otherwise apparently healthy adults on a regular basis involves a substantial cost, this analysis was performed in order to get a better handle on the actual costs and benefits involved.

Basic approach

In order to perform the analysis, the first step is to construct a simplified model of the situation we are analyzing. In the case described here, two models need to be constructed: a model of health outcomes and costs if we don't screen for subclinical hypothyroidism and a model if we do screen. These models are then implemented and "run" on a computer. For each scenario, the simulation should yield an estimated cost as well as some measure of health outcomes (deaths, years of life, quality-adjusted years of life). Then, when we compare the strategies, we should be able to make a statement of the sort: for X $ spent on screening, we can save Y lives or gain Z quality-adjusted years of life.

Since any model of a complex situation requires multiple assumptions which may or may not be correct, we need to see how sensitive the model is to our various assumptions. Thus, once the original model (or base-case analysis) is "run", we need to re-run it, varying the original assumptions to see how they would affect our results (a sensitivity analysis).

In the analysis presented here, the modelling is accomplished using a Markov decision tree. Markov analysis is a way of analyzing complex systems. It assumes that the system can be described by a number of variables that can each be in one of several "states", and that these variables make transitions from one state to another at discrete time intervals and with probabilities that depend on the values assumed by other variables. When analyzing clinical situations, the appropriate selection of variables and the determination of probabilities governing the transitions from one state to another (such as from the "euthyroid" state to the "mild hypothyroidism" state) are key issues.

Another aspect of the modelling is the use of QALYs or Quality Adjusted Life Years. A QALY is year of life experienced by a patient that is "adjusted" for quality of life by a factor that ranges from 1.0 (no decrease in quality of life) to 0.0 (death). For example, if a patient is followed from age 35 to age 75 without any intervening illnesses, that patient will have generated 75-35 = 40 QALYs. If a patient is followed from age 35 to age 75 but had a stroke at age 65 that reduces the quality of life to 0.6, that patient will generate (65-35) + (75-65)*0.6 = 30 + 10*0.6 = 30 + 6 = 36 QALYs. Obviously, describing an individual's quality of life by a number between 0 and 1 is absurd. It is an absurd but necessary fiction for this sort of analysis.

The model

A number of properties and assumptions underlie the model developed by the authors of this paper. Some of the principle ones include:


Base-case analysis: Under the base-case assumptions, screening for TSH yields an average increase in QALYs for women of 6 days, for men the increase is only 2 days. The average cost was $147 per woman and $120 per man. This yields a cost-effectiveness of $9 223 per QALY for women and $22 595 per QALY for men. For women, approximately 52% of the gain in QALY's came from the prevention of overt hypothyroidism and about 30% from the relief of symptoms of mild hypothyroidism.

Sensitivity analysis: Cost-effectiveness was better for women than for men at all ages, and was more favorable when screening was initiated later in life than at 35 years of age. When the analysis was "re-run", using the upper and lower range values for the various numerical estimates, a number of them were found to substantially alter the results. The two most influential were the cost of the TSH assay and the quality of life adjustment factor for symptomatic mild hypothyroidism. The base-case estimate for TSH assay cost was $25; when that cost was reduced to $10, the price per QALY gained dropped from $9 223 to $3 947; when the assay cost was raised to $50, the price per QALY gained rose to $17 998. Similarly, if patients attached no decrease in quality of life to mild hypothyroidism with symptoms, the price per QALY gained rose to $16 885. The effect of varying other assumptions is given in the article, in figure 5.

Authors' Discussion

The authors feel that, based on these results, periodic screening for mild hypothyroidism with TSH measurement every 5 years after age 35 is cost-effective. They note that the cost per QALY compares reasonably to that for mammography and estrogen replacement therapy in women and for hypertension screening in men. The increase in QALYs afforded by screening is dependent on the prevalence and importance of mild symptoms as well as on the rate of progression to overt hypothyroidism. The costs of screening are offset significantly by the lower cost of cholesterol reduction with thyroid hormone compared to other lipid-lowering drugs. Screening at intervals shorter than 5 years increases the cost of this strategy significantly and is not advised by the authors.


I have two minor criticisms of the model as presented here. First of all, the authors assumed that screening with TSH will avert all cases of overt hypothyroidism. They did not state explicitly where this assumption comes from, and they did not include this in their sensitivity analysis (what would the results have been if, despite screening, 10% of patients developed overt hypothyroidism?). Second, it is not clear from the model whether or not patients who were treated with thyroid replacement would also have been treated with conventional lipid-lowering drugs if the results were not adequate.

These caveats aside, this study has the potential to significantly influence day-to-day medical practice, since a recommendation to screen all adults over the age of 35 with a serum TSH determination obviously involves a lot of patients. Since the authors performed a quantitative analysis and then compared the results to well-established and accepted procedures (mammography, hypertension screening), their conclusions are more likely to be accepted than if they were merely promulgated ad-hoc.

Despite the strengths of this study, it is important to note that these numbers were quite sensitive to several variables. Also, the model is a simplification and is obviously imperfect. For these reasons, the actual dollar numbers obtained here for cost-effectiveness cannot be taken as gospel truth. Nevertheless, I believe that the numbers generated should be considered valid as ballpark figures and should be useful as such.

Apart from generating cost-effectiveness numbers, the effort at undertaking such a study (and the effort involved in reading it!) produces other benefits. The sensitivity analysis yields much information about the process being analyzed. What factors are critical to improving cost-effectiveness and what factors are marginal can only be determined by this sort of sensitivity analysis. Furthermore, developing and understanding the model that was used leads to a better understanding of the concepts involved in analysing cost-effectiveness in general. Anyone who is called upon to help decide whether or not a given screening strategy is appropriate will be better equipped to ask the right questions after having worked through a paper such as this one.

August 3, 1996

Reader comments

October 4, 1996
Dr. Mark Danese, corresponding author of this study, replies:

I apologize for the delay in responding to your comments about our article. As this is a journal club, I will try to respond informally and not "defensively." Please realize that this is my response and that the other authors may not agree completely.

Your first point is true. Not all screened patients will be prevented from developing overt hypothyroidism. However, the ones missed at screening would probably present at the same time as they would have if they were never screened. Hence, the arithmetic difference in quality of life for these patients would be zero. But this also brings up another point: we did not consider compliance in our model since we assumed that it was 100% for all therapies. Non-compliance could lead to the development of symptoms of thyroid hormone deficiency. While the majority of patients, from my understanding, comply with thyroid medication, patients still suffer from lapses in adherence to the regimen. Hopefully, the annual visit and TSH test for all patients with mild thyroid failure would uncover some of the unseen progression and some of the treatment non-compliance, reducing the influence of this variable. Since it is not in the model explicitly, I can not say precisely how important this is in the context of screening populations. Clearly it is relevant to individual clinicians.

Your second point about whether patients on levothyroxine sodium would be treated with lipid lowering medications was another wrinkle that we did not address. Since our estimates of the effectiveness for levothyroxine sodium and lipid-lowering medications in cholesterol reduction were comparable (as defined in more detail in the paper), we addressed the choice between the two and not the combination. Both therapies together may be clinically appropriate, but to my knowledge there is no data discussing the results compared to single therapy.

As always, decision models are good for synthesizing the literature and identifying important aspects of medical decisions. And they suggest a rational basis for making medical decisions. As you point out, there are limitations to the level of complexity that can reasonably be accomodated in the model. It is our hope to continue to expand the model and to conduct clinical research to better define some of the influential factors in screening.

Mark Danese, MHS


Date: Thu, 27 Nov 1997
From: "C. Sitges Serra" <>

As a general practitioner I was quite surprised reading this paper. I would like to comment on two issues.

First: Talking of screening, we actually mean screening the whole population, and therefore we also must assume false positive and false negative lab results, with their costs included.

Second: In my experience, hypophyseal hormones seem to vary a lot in relation to number of hours slept, time of blood sample taking and, similar to prolactin, TSH may have a "fragile" blood level depending on catecholamines and other related hormones. In daily practice I see patients suffering from high TSH levels, with mild or no clinical hypothyroidism in whom determinations of TSH are repeated, and values are back to normal. Can't explain why.

Dr. C. Sitges (General Practitioner)

Nov. 26th. 1997. Barcelona. Spain.

December 1, 1997
Dr. Mark Danese, corresponding author, replies:

The problem with any screening situation is the sensitivity and specificity of the screening tool. In this particular case, the serum TSH is the gold standard; hence, sensitivity and specificity calculations are not possible. This does not mean that there are no false positives or negatives, only that there is no perfect method of assessment of mild degrees of thyroid gland failure. This lack of a perfect tool is one of the reasons for the rigorous follow up testing including repeat TSH, anti-thyroid antibodies, lipids, and free T4.

We felt that in the absence of a gold standard, repeat TSH testing was necessary because it made TSH screening less favorable by increasing the cost of screening. We also assumed that once treated, patients would continue to be monitored and to remain on thyroxine replacement, also increasing costs. In the real world, if therapy was not helpful, it would be discontinued. Also, in a real clinical setting, it might be possible to reflex test, meaning that the T4 and antibodies would be run only if the repeat TSH was also high. Both of these aspects of screening, if implemented, would reduce the cost of testing. However, for publication purposes, we elected to bias our results against screening if there was any doubt about an approach.

In short, there are many ways of testing for mild thyroid failure, different from the algorithm we presented. We have heard many suggestions in the time since our article was published. None that we have tried was superior to the algorithm we proposed, but most were not dramatically different either. It is important to remember that the objective of our paper was not to optimize the screening procedure; rather, it was to determine if screening made sense.

The next step in the evolution of TSH screening is to see if we can improve the identification of patients with mild thyroid failure. Based on the data from our model, at least 25% of patients with elevated TSH levels may not benefit from treatment. If we can understand why, or if we can identify those who don't benefit, this could make screening even more favorable than it already is.

Mark Danese

Journal Club on the Web main screen

Submit a comment about this article

Site Meter