The Measurement and Valuation of Health for Economics Research Paper

This sample The Measurement and Valuation of Health for Economics Research Paper is published for educational and informational purposes only. If you need help writing your assignment, please use our research paper writing service and buy a paper on any topic at affordable price. Also check our tips on how to write a research paper, see the lists of health research paper topics, and browse research paper examples.

Introduction

The quality-adjusted life year (QALY) combines length of life and health-related quality of life (HRQOL) into a single measure. To put the ‘Q’ into a QALY requires an index for valuing health states. The increasing application of economic evaluation, and specifically the use of the incremental cost per QALY to assess cost effectiveness, has resulted in an enormous growth in the demand for health states values for use in decision-analytic models and clinical trials comparing alternative health-care interventions.

The range of tools for valuing health states has expanded considerably from the early notion of a health index in the United States (Fanshel and Bush, 1970) to the emergence of the EQ-5D in Europe (Brooks et al., 2003) and the Health Utilities Index (HUI) in Canada (Feeny et al., 2002). At the same time there have been important debates in the literature concerning the core issues of what to value (or how to define health), how to value it, and who should do the valuing. This research paper aims to provide the reader with an overview of these issues and then focuses on methods for valuing health (see Brazier et al., 2007).

This research paper begins by outlining these core issues. It then describes the main techniques for the direct valuation of health states, including the conventional cardinal techniques (like standard gamble) and their advantages and disadvantages, and ordinal methods (like ranking and pairwise choices) that are starting to be used to value health states. The paper then addresses the question of whose values to use and whether values should be based on preferences (as is usually the case in economics) or experiences. Finally, there is a brief review of the most widely used generic preference-based measures of health (such as EQ-5D, SF-6D, and HUI3).

The Core Questions To Address When Valuing Health States

To calculate QALYs it is necessary to represent health on a scale in which death and full health are assigned values of 0 and 1, respectively. Therefore, states rated as better than dead have values between 0 and 1 and states rated as worse than dead have negative scores that in principle are bounded by negative infinity. One of the most commonly used instruments for estimating the value of the ‘Q’ in the QALY is a generic preference-based measure of health called the EQ-5D (Brooks et al., 2003). This instrument has a structured health state descriptive system with five dimensions of mobility, self-care, usual activities, pain/discomfort, and anxiety/ depression (Table 1). Each dimension has three levels: no problem (level 1), moderate or some problem (level 2), and severe problem (level 3). Together these five dimensions define a total of 243 health states formed by different combinations of the levels (i.e., 35), and each state is described in the form of a five-digit code using the three levels (e.g., state 12321 means no problems in mobility, moderate problems in self-care, etc.). It can be administered to patients or their proxy using a short one-page questionnaire with five questions.

The Measurement and Valuation of Health for Economics Research Paper

The EQ-5D can be scored in a number of ways depending on the method of valuation and source country, but the most widely used to date is the UK York TTO Tariff shown in Table 2. This population value set was obtained using the time trade-off (TTO) method with a sample of about 3000 members of the UK general population; similar tariffs have been estimated for other countries, including the United States. Different valuation methods and the appropriateness of obtaining values from the general population are reviewed later in this research paper.

The Measurement and Valuation of Health for Economics Research Paper

The EQ-5D provides a useful starting point for the rest of this research paper, because it demonstrates the key features of any method for measuring and valuing health. Underpinning the EQ-5D and similar instruments are a number of core methodological questions: How should health be described, how should it be valued, and who should provide the values? The first part of the question concerns the aspects of health (and/or quality of life) that should be covered by the measure. The next part concerns the valuation technique that should be used. The EQ-5D has been valued using TTO and visual analogue scale (VAS). Other generic preference-based measures such as the HUI3 and SF-6D used the standard gamble (SG) method, which some have argued should be the gold standard method of valuation in this field. The last part of the question concerns the source of values, and whether they should be obtained from patients themselves, their carers and medical professionals, or members of the general population. The remainder of this research paper addresses these three questions. (For discussion on whether the QALY is an appropriate measure and how QALYs should be aggregated and used to inform health policy, see Brazier et al., 2007.)

How Should Health Be Described?

There are two broad approaches to describing health for deriving health state values. One is to construct a custom-made description of the condition and/or its treatment and the other is to use a standardized descriptive system (such as the EQ-5D). A bespoke description, sometimes referred to as a vignettes in the literature, can take the form of a text narrative or a more structured description using a bullet point format. More recently researchers have begun to explore alternative narrative formats, such as the use of videos or simulators. The use of custom-made vignettes was more common in the early days of obtaining health state values, however, in recent years the standardized descriptive systems have tended to dominate.

The other approach has been to use generic preference based measures of health such as the EQ-5D. These have two components, the first a system for describing health or its impact on quality of life using a standardized descriptive system, and the second an algorithm for assigning values to each state described by the system. A health state descriptive system is composed of a number of multilevel dimensions that together describe a universe of health states (such as the EQ-5D described earlier). Generic instruments have been developed for use across all groups by focusing on core aspects of health.

Generic preference-based measures have become the most widely used and this stems from their ease of use, their alleged generic properties (i.e., validity across different patient groups), and their ability to meet a number of requirements of agencies such as the National Institute for Health and Clinical Excellence (NICE). Furthermore, they come ‘off the shelf,’ with a questionnaire and a set of weights for each health state defined by the classification already provided. The questionnaires for collecting the descriptive data can be readily incorporated into most clinical trials and routine data collection systems with little additional burden for respondents, and the valuation of their responses can be done easily using the scoring algorithms provided by the developers.

However, there are concerns about the sensitivity of the generics and their relevance for some conditions. As a result, there has been work to develop condition-specific descriptive systems (Brazier et al., 2007) and there continues to be an interest in using custom-made vignettes. This raises the question as to whether health state utilities derived from specific descriptive systems are generalizable. This is important for economic evaluations in which the purpose is often to inform resource allocation decisions across patient groups. Even if values are obtained using the same techniques and from similar populations, differences may persist due to preference interactions between dimensions in the descriptions and those outside the system. The impact of asthma on health state utility values, for example, may be altered by the presence of pain from a comorbid condition. Of course, this problem exists for generic descriptive systems; it is just more likely to be a problem with specific systems. Ultimately it is a trade-off between the greater relevance and sensitivity of some specific systems and the limitations on generalizability.

There are also important issues about the appropriate conceptual basis for a descriptive system; some instruments cover quite narrowly defined aspects of impairment and symptomology associated with medical conditions, while others consider a higher level and broader conception of quality of life.

Valuation Techniques

To be used in economic evaluation, health state valuations need to be placed on a scale ranging from 0 to 1, where 0 is for states regarded as equivalent to dead and 1 is for a state of full health. Within the health state valuation process it is also necessary to allow for states that could be valued as worse than being dead. The three main techniques for valuing health states are the SG, the TTO, and the VAS. This section describes how each technique can be used to value chronic health states.

The Visual Analogue Scale (Vas)

The VAS is usually represented as a line with well-defined endpoints, on which respondents are able to indicate their judgments, values, or feelings (thus it is sometimes called a ‘feeling’ thermometer). The distances between intervals on a VAS should reflect an individual’s understanding of the relative differences between the concepts being measured. VAS is intended to have interval properties, so that the difference between 3 and 5 on a 10-point scale, for example, should equal the difference between 5 and 7.

In the health context, VAS has been widely used as a measure of symptoms and various domains of health including the direct measurement of a patient’s own health or as a means of valuing generic health state classifications including the Quality of Well-Being scale (QWB) (Kaplan and Anderson, 1988), the HUI (Feeny et al., 2002) and the EQ-5D (Brooks et al., 2003). Figure 1 presents an example of the VAS developed by the Euroqol group. VAS can also be used to elicit the value attached to temporary health states (e.g., those lasting for a specified period of time after which there is a return to good health in contrast to chronic health states which are assumed to last for the rest of a person’s life) and states considered worse than death.

The Measurement and Valuation of Health for Economics Research Paper

The Standard Gamble (SG)

The SG comes from expected utility theory, which postulates that individuals choose between prospects – for example, different ways of managing a medical condition – in such a way as to maximize their ‘expected’ utility. The SG method gives the respondent a choice between a certain intermediate outcome and the uncertainty of a gamble with two possible outcomes, one of which is better than the certain intermediate outcome and one of which is worse. The SG task for eliciting the value attached to health states considered better than dead is displayed in Figure 2. The respondent is offered two alternatives. Alternative 1 is a treatment with two possible outcomes: either the patient is returned to normal health and lives for an additional t years (probability = P), or the patient dies immediately (probability = 1-P). Alternative 2 has the certain outcome of chronic state hi for life (t years). The probability P of the best outcome is varied until the individual is indifferent between the certain intermediate outcome and the gamble. This probability P is the utility for the certain outcome, state hi. This technique is then repeated for all intermediate outcomes. The SG can also be modified to elicit the value attached to health states considered worse than death and temporary health states.

The Measurement and Valuation of Health for Economics Research Paper

The SG technique has been widely applied in the decision-making literature and has also been extensively applied to medical decision making, including the valuation of health states, in which it has been used (indirectly via a transformation of VAS) to value the HUI2 and HUI3 (Torrance et al., 1996; Feeney et al., 2002) and to directly value SF-6D (Brazier et al., 2002) and a number of condition-specific health state scenarios or vignettes (Brazier et al., 2007). There are many variants of the SG technique that differ in terms of the procedure used to identify the point of indifference, the use of props, and the method of administration (e.g., by interviewer, computer, or self-administered paper questionnaire).

The Time Trade-Off (TTO)

The TTO technique was developed specifically for use in health care in an effort to overcome the problems associated with SG in explaining probabilities to respondents. TTO asks respondents to choose between two alternatives of certainty rather than between a certain outcome and a gamble with two possible outcomes. The application of TTO to a chronic state considered better than dead is illustrated in Figure 3.

The Measurement and Valuation of Health for Economics Research Paper

The approach involves presenting individuals with a paired comparison. For a chronic health state preferred to death, alternative 1 involves living for period t in a specified but less than full health state (state h_i). Alternative 2 involves full health for time period x where x < t. Time x is varied until the respondent is indifferent between the two alternatives. The score given to the less than full health state is then x/t. The TTO task can be modified to consider chronic health states considered worse than death and temporary health states. In common with SG, there are numerous variants of TTO using different elicitation procedures, props (if any), and modes of administration.

Pros And Cons Of Valuation Techniques

Visual Analogue Scales (VAS)

VAS achieves high response rates and high levels of completion. VAS methods tend to be less expensive to administer than TTO or SG methods due to their relative simplicity and ease of completeness. There is also a significant amount of empirical evidence to demonstrate the reliability of VAS methods in terms of inter-rater reliability and test–retest reliability. However, the lack of choice and direct nature of the VAS tasks have given rise to concerns over the ability of this technique to reflect preferences on an interval scale.

There is also a concern that VAS methods are susceptible to response spreading, whereby respondents use all areas on the valuation scale when responding, especially where multiple health states are valued on the same scale. Response spreading can lead to health states that are very much alike, being placed at some distance from one another on a valuation scale, and health states that are essentially vastly different being placed very close to one another, as the respondent seeks to place responses across the whole (or a specific portion) of the available scale. If response spreading does occur, then this implies that VAS techniques do not generate an interval scale and the numbers obtained may not be meaningful in cardinal terms.

More generally, VAS is prone to context effects in which the average rating for items is influenced by the level of other items being valued and by endpoint bias whereby health states at the top and bottom of the scale are placed further apart on the scale than would be suggested by a direct comparison of differences.

In summary, VAS techniques appear to measure aspects of health status changes rather than the satisfaction or benefit conveyed by such changes. Qualitative evidence of respondents seeing VAS methods as an expression of numbers in terms of ‘percentages of the best imaginable state,’ or a ‘percentage of functioning scale’ rather than eliciting information about their preferences for health states provides support for this hypothesis. There is a large body of evidence to suggest that unadjusted VAS scores do not provide a valid measure of the strength of preference that can be used in economic evaluation.

Given the evidence that VAS may not produce health state utilities that can be used directly in the calculation of QALYs, there has been interest in mapping VAS values to SG or TTO utility values. This has the advantage of retaining the ease of use of VAS with the theoretical advantages of a choice-based measure of health. However, the extent to which a stable mapping function can be found between VAS and SG or TTO has been disputed (Stevens et al., 2006).

Standard Gamble

Many SG studies, across different respondent groups, have reported completion rates in excess of 80%, with some studies reporting completion rates as high as 95–100%, indicating that the SG appears to be acceptable in terms of its practicality. The SG has also been found to be feasible and acceptable among varied types of patient groups and clinical areas including cancer, transplantation, vascular surgery, and spinal problems.

SG is rooted in expected utility theory (EUT). EUT has been the dominant theory of decision making under uncertainty for over half a century. EUT theory postulates that individuals choose between prospects (such as different ways of managing a medical condition) in such a way as to maximize their ‘expected’ utility. According to this theory, for a given prospect such as having a surgical operation, a utility value is estimated for each possible outcome, good or bad. These values are multiplied by their probability of occurring and the result summed to calculate the expected utility of the prospect. This procedure is undertaken for each prospect being considered. The key assumption made by EUT over and above conventional consumer theory is independence, which means that the value of a given outcome is independent of how it was arrived at or its context. In decision tree analysis this is the equivalent of saying that the value of one branch of the tree is unaffected by the other branches.

Due to its theoretical basis, the SG is often portrayed as the classical method of decision making under uncertainty, and due to the uncertain nature of medical decision making the SG is often classified as the gold standard. As medical decisions usually involve uncertainty the use of the SG method would seem to have great appeal. However, the type of uncertain prospect embodied in the SG may bear little resemblance to the uncertainties in various medical decisions, so this feature may be less relevant than others have suggested.

The status of SG as the gold standard has been criticized given the existence of ample evidence that the axioms of EUT are violated in practice. One response in health economics (as elsewhere) has been that EUT should be seen as a normative rather than a descriptive theory, that is, it suggests how decisions should be made under condition of uncertainty. However, this still does not alter the concern that the values generated by SG do not necessarily represent people’s valuation of a given health state, but incorporate other factors, such as risk attitude, gambling affects, and loss aversion.

Time Trade-Off

The TTO technique is a practical, reliable, and acceptable method of health state valuation as evidenced by the wide variety of empirical studies that have applied this method (Brazier et al., 2007). The TTO has been mainly interviewer-administered although it has also been used in a self-administered and computer-based applications.

The applicability of the TTO in medical decision making may be questioned because the technique asks respondents to make a choice between two certain outcomes, when health care is characterized by conditions of uncertainty. It is potentially possible to adjust TTO values to incorporate individuals’ attitudes to risk and uncertainty, though this is rarely done. Furthermore, adjusting for risk attitude is difficult when there are strong theoretical and empirical grounds for arguing there is not a constant attitude to risk.

An underlying assumption of the TTO method is that individuals are prepared to trade off a constant proportion of their remaining life years to improve their health status, irrespective of the number of years that remain. This is a very strong assumption and it seems reasonable to expect that the valuation of a health state may be influenced by a duration effect relating to the time an individual spends in that state. There may be a ‘maximal endurable time’ for some severe health states beyond which they yield negative utility. Furthermore, for short survival periods, individuals may not be willing to trade survival time (measured in life years) for an improvement in quality of life, implying that individuals’ preferences are lexicographic for short time durations. If individuals do not trade off a constant proportion of their remaining life expectancy in the valuation of health states, then values elicited using specific time durations (e.g., 10 years) cannot be assumed to hold for states lasting for different time periods.

The impact of ‘time preference’ on valuations is another issue that causes theoretical concerns with the TTO. If individuals have a positive rate of time preference they will give greater value to years of life in the near future than to those in the distant future. Alternatively respondents may prefer to experience an episode of ill health immediately to eliminate ‘dread’ and move on. For instance, this hypothesis may explain why some women with a family history of breast cancer opt for mastectomy before any breast cancer is detected. In practice, the majority of individuals exhibit positive time preferences for health, although empirically the validity of the traditional (constant) discounting model in health has been challenged in favor of a model that allows for decreasing time aversion (implying that the longer the period of delay for the onset of ill health, the lower the discount rate). TTO values are rarely corrected for time preference.

Which Valuation Technique Should Be Used?

Health economists have tended to favor the choice-based scaling methods of SG and TTO in the context of cost per QALY analysis and a choice-based method is also recommended by NICE. Each of the SG and TTO methods starts with the premise that health is an important argument in an individual’s utility function. The welfare change associated with a change in health status can then be determined by the compensating change required in one of the remaining arguments in the individual’s utility function that leaves overall utility unchanged. In the SG, the compensating change is valued in terms of the risk of immediate death. In the TTO, the compensating change is valued in terms of the amount of life expectancy an individual is prepared to sacrifice.

SG has the most rigorous foundation in theory in the form of EUT theory of decision making under uncertainty. However, there are theoretical arguments against the use of SG in health state valuation and there is little empirical support for EUT. There are also concerns about the empirical basis of the TTO technique. There is also concern that duration effects and time preference effects can have an impact on the elicitation of TTO values.

In summary, there are theoretical concerns with all three valuation techniques. We argue that unadjusted VAS values do not provide a valid basis for estimating preferences over health states and satisfactory adjustments remain elusive. For trade-off-based valuations from an individual perspective, the current choice is between SG and TTO, but for the reasons outlined above the values they generate are distorted by factors apart from preferences over health states, and currently there is no compelling basis on which to select one or the other. This is one reason why researchers in the field have begun to examine the potential role of ordinal techniques such as ranking and discrete choice experiments (DCEs) in health state valuation.

The Use Of Ordinal Techniques

Ordinal methods simply ask respondents to say whether they prefer one state to another but not by how much. Two well-known ordinal methods are ranking and pairwise comparisons. Typically, with ranking, respondents are asked to rank a set of health states from best to worst. The pairwise comparison limits the comparison to two states and has been used widely in DCEs in health economics, though not usually to value health per se. To use them to value health states it is necessary to include full health and death in the comparisons or to introduce valuations for these states from other sources (e.g., Ratcliffe et al., 2006).

Until recently the use of ordinal data in health state valuation has largely been ignored. Ranking exercises have traditionally been included in health state valuation studies as a warm-up procedure prior to the main cardinal method to familiarize the respondent with the set of health states to be valued and with the task of preference elicitation between health states. Often these data may not be used at all in data analysis, or they may be used to check consistency between the ordinal ranking of health states and the ranking of health states according to their actual values obtained using a standard elicitation technique (e.g., TTO or SG). Thurstone’s law of comparative judgment offers a potential theoretical basis for deriving cardinal values from rank preference data. Thurstone’s method considers the proportion of times that one health state (A) is considered worse than another health state (B). The preferences over the health states represent a latent cardinal utility function and the likelihood of health state A being ranked above health state B when health state B is actually preferred to health state A is a function of how close to each other the states lie on this latent utility function.

Salomon (2003) used conditional logistic regression to model rank data from the UK measurement and valuation of health (MVH) valuation of the EQ-5D. He was able to estimate a model equivalent to the original TTO model by rescaling the worst state using the observed TTO value. Other methods of rescaling were also considered, including normalization to produce a utility of 0 for death, but these were found not to provide the best-fitting predictions.

DCEs have their theoretical basis in random utility theory. Although DCEs have become a very popular tool for eliciting preferences in health care, the vast majority of published studies using DCE methodology have tended to focus on the possibility that individuals derive benefit from nonhealth outcomes and process attributes in addition to health outcomes. A limited number of studies have used DCEs to estimate values for different health state profiles and few have linked these values to the full-health dead scale required for the calculation of QALYs. Ratcliffe et al., (2006) used an external valuation of the worst state of health defined by the classification (i.e. PITS state by TTO to recalibrate the results of a DCE onto the conventional 0 to 1 scale. Brazier et al. (2007) have finally used DCE data on their own by setting death to 0 and including death in some of the comparisons. The use of DCE – and to a lesser extent ranking data for this purpose – is at an early stage of development, however, it offers promise as an alternative to cardinal methods.

The Impact Of Different Variants Of The Valuation Techniques

Although the academic literature has tended to focus on the most appropriate technique for valuation, it is important to remember that there are many variants of each technique and these too may have important implications. Techniques vary in terms of their mode of administration (e.g., interview or self-completion, computer or paper administration), search procedures (e.g., iteration, titration, or open-ended), the use of props and diagrams, time allowed for reflection, and individual versus group interviews. There have been few publications in the health economics literature comparing these alternatives, but what evidence there is suggests that health state values vary considerably between variants of the same technique (Brazier et al., 2007).

There is evidence that the wording of questions affects the answers. This finding has a number of implications. To cite two examples, first, it demonstrates the importance of using a common variant to ensure comparability between studies. Second, there might be scope for correcting for some of these differences if they prove to be systematic. However, there has been little of this work to date.

More fundamentally, this evidence suggests that people do not have well-defined preferences over health prior to the interview, but rather their preferences are constructed during the interview. This would account for the apparent willingness of respondents to be influenced by the precise framing of the question. This may be a consequence of the cognitive complexity of the task. Evidence from the psychology literature suggests that respondents faced with such complex problems tend to adopt simple-decision heuristic strategies (Lloyd, 2003). Much of the interview work has been done using cold-calling techniques. There is a strong case for allowing respondents more time to learn the techniques, to ensure they understand them fully, and to allow them more time to reflect on their health state valuations. An implication may be to move away from the current large-scale surveys of members of the general public involving one-off interviews, to smaller-scale studies of panels of members of the general public who are better trained and more experienced in the techniques and who are given time to fully reflect on their valuations (Stein et al., 2006).

Who Should Value Health?

Values for health could be obtained from a number of different sources including patients, their carers, health professionals, and the community. Health state values are usually obtained from members of the general public trying to imagine what the state would be like, but in recent years the main criticism of this source has come from those who believe values should be obtained from patients.

The choice of whose values to elicit is important, as it may influence the resulting values. A number of empirical studies have been conducted that indicate that patients with firsthand experience tend to place higher values on dysfunctional health states than do members of the general population who do not have similar experience, and the extent of this discrepancy tends to be much stronger when patients value their own health state. There are a number of possible contributing factors for observed differences between patient and general population values including poor descriptions of health states (for the general population), use of different internal standards, or response shift and adaptation.

Why Use General Population Values?

The main argument for the use of general population values is that the general population pays for the service. However, while members of the general population want to be involved in health-care decision making, it is not clear that they want to be asked to value health states specifically. At the very least, it does not necessarily imply the current practice of using relatively uninformed general population values.

Why Use Patient Values?

A common argument for using patient values is that patients understand the impact of their health on their well-being better than someone trying to imagine it. However, this requires a value judgment that society wants to incorporate all the changes and adaptations that occur in patients who experience states of ill health over long periods of time. It can be argued that some adaptation may be regarded as laudable, such as skill enhancement and activity adjustment, whereas cognitive denial of functional health, suppressed recognition of full health, and lowered expectations may be seen as less commendable. Furthermore, there may be a concern that patient values are context based, reflecting their recent experiences of ill health and the health of their immediate peers. In addition, there are practical problems in asking patients to value their own health, many of whom will by definition be quite unwell.

Finally, to obtain values on the conventional 0 to 1 scale required for QALYs, valuation techniques require patients to compare their existing state to full health, which they may not have experienced for many years. For patients who have lived in a chronic health state like chronic obstructive pulmonary disease or osteoarthritis, for example, the task of imagining full health is as difficult as a healthy member of the general population trying to imagine a poor health state.

A Middle Way – Further Research

It has been argued that it seems difficult to justify the exclusive use of patient values or the current practice of using values from relatively uninformed members of the general population. Existing generic preference-based measures already take some account of adaptation and response shift in their descriptive systems, but whether this is sufficient is ultimately a normative judgment. If it is accepted that the values of the general population are required to inform resource allocation in a public system, it might be argued that respondents should be provided with more information on what the states are like for patients experiencing them.

Generic Preference-Based Measures Of Health

Description Of Instruments

Generic preference-based measures have become one of the most widely used set of instruments for deriving health state values. As already described, generic preference based measures have two components: the first is their descriptive system that defines states of health and the second is an algorithm for scoring these states. The number of generic preference-based measures has proliferated over the last two decades. These include the QWB scale; Rosser Classification of illness states; HUI Marks 1, 2, and 3 (HUI1, HUI2, and HUI3); EQ-5D; 15D; SF-6D, a derivative of the SF-36 and SF-12; and AQOL (see Brazier et al., 2007 for further details of each of these instruments and their developers). This list is not complete and does not account for some of the variants of these instruments, but it includes the vast majority of those that have been used.

While these measures all claim to be generic, they differ considerably in terms of the content and size of their descriptive system, the methods of valuation, and the populations used to value the health states (though most aim for a general population sample). A summary of the main characteristics of these seven generic preference based measures of health is presented in Table 2 and Table 3. Table 2 summarizes the descriptive content of these measures including their dimensions and dimension levels. Each instrument has a questionnaire for completion by the patient (or proxy), or administration by interview, that is used to assign them a health state from the instrument’s descriptive system. These questions are mainly designed for adults, typically 16 or older, although the HUI2 is designed for children. Table 3 summarizes the valuation methods used in terms of the valuation technique and the method of modeling the preference data.

The Measurement and Valuation of Health for Economics Research Paper

Comparison Of Measures

The agreement between measures was generally found to be poor to moderate (about 0.3–0.5 as measured by the intraclass correlation coefficient). Whereas differences in mean scores have often been found to be little more than 0.05 between SF-6D, EQ-5D, and HUI3, this mean statistic masks considerable differences in the distribution of scores.

Given the differences in coverage of the dimensions and the different methods used to value the health states, it is not surprising the measures have been found to generate different values. The choice of generic measure has been a point of some contention, since the respective instrument developers have academic and in some cases commercial interests in promoting their own measure. The recommended approach to instrument selection has been to compare their practicality, reliability, and validity (Brazier et al., 2007). Although all these instruments are practical to use and achieve good levels of reliability, the issue of validity has been more contentious and difficult to prove.

Validity can be broken down into the validity of the descriptive system of the instrument, the validity of the methods of valuation, and the empirical validity of the scores generated by the instrument. In terms of methods of valuation, the QWB and the 15D would be regarded by many health economists as inferior to the other preference-based measures due to their use of VAS to value the health descriptions. HUI2 and HUI3 would be preferred to the EQ-5D by those who regard the SG as the gold standard, but this is not a universally held view in health economics. A further complication is that the SG utilities for the HUIs have been derived from VAS values using a power transformation that has been criticized in the literature (Stevens et al., 2006). There is evidence in terms of descriptive validity that some measures perform better for certain conditions than others (Brazier et al., 2007); however, there are no measures that have been shown to be better across all conditions. The validity of the descriptive system relates to the condition and treatment outcomes associated with the treatment being evaluated.

Conclusions

This research paper has described the key features of the instruments available for estimating health state values for calculating QALYs. It has shown the large array of methods available for deriving health state values. This richness in methods comes at a price, because the analyst, and perhaps more importantly the policy maker, must decide on the methods to use for informing resource allocation decisions. Some of the issues raised can be resolved by technical means, using theory (such as the use of VAS) or empirical evidence (such as the descriptive validity of different generic measures). Others require value judgments about the appropriateness of using general population instead of patient values.

For policy makers wishing to make cross-program decisions, the Washington, DC panel on Cost Effectiveness in Health and Medicine and some other public agencies (such as NICE) have introduced the notion of a reference case that has a default for one or other of the (usually generic) measures. Given that more than one measure is likely to be used for the foreseeable future, and perhaps for good reason, there is a need for further research to focus on mapping or ‘cross walking’ between measures.

Bibliography:

Brazier J, Roberts J, and Deverill M (2002) The estimation of a preference-based single index measure for health from the SF-36. Journal of Health Economics 21(2): 271–292.
Brazier J, Ratcliffe J, Salomon J, and Tsuchiya A (2007) Measuring and Valuing Health Benefits for Economic Evaluation. Oxford, UK: Oxford University Press.
Brazier J, Murray C, Roberts J, Brown M, Symonds T, Kelleher C (in press) Estimation of a preference-based index from a condition specific measure: The King’s Health Questionnaire. Medical Decision Making.
Brooks R, Rabin RE, and de Charro FTH (eds.) (2003) The Measurement and Valuation of Health Status Using EQ-5D: A European Perspective. The Netherlands: Kluwer Academic Press.
Fanshel S and Bush J (1970) A health status index and its application to health service outcomes. Operations Research 18: 1021–1066.
Feeny D, Furlong W, Torrance G, et al. (2002) Multiattribute and single attribute utility functions for the Health Utilities Index Mark 3 system. Medical Care 40: 113–128.
Kaplan RM and Anderson JP (1988) A general health policy model: Update and applications. Health Services Research 23(2): 203–235.
Lloyd AJ (2003) Threats to the estimation of benefit: Are preference elicitation methods accurate? Health Economics 12(5): 393–402.
Ratcliffe J, Brazier JE, Tsuchiga A, Symonds T, and Brown M (2006) Estimation of a preference based single index from the sexual quality of life questionnaire (SQOL) using ordinal data. Health Economics and Decision Science Discussion Paper 06/06, ScHARR. University of Sheffield. http://www.sheffield.ac.uk/scharr/sections/heds/ discussion.html (accessed September 2007).
Salomon JA (2003) Reconsidering the use of rankings in the valuation of health states: A model for estimating cardinal values from ordinal data. Population Health Metrics 1(1): 12.
Stein K, Ratcliffe J, Round A, Milne R, and Brazier J (2006) Impact of discussion on preferences elicited in a group setting. Health and Quality of Life Outcomes 4(1): 22.
Stevens K, McCabe C, and Brazier J (2006) Mapping between Visual Analogue Scale and Standard Gamble data: Results from the UK Health Utilities Index 2 valuation survey. Health Economics 15(5): 527–533.
Torrance G, Feeny D, Furlong W, Barr R, Zhang Y, and Wang Q (1996) Multiattribute utility function for a comprehensive health status classification system. Health Utilities Index Mark 2. Medical Care 34: 702–722.
Brazier J, Ratcliffe J, Salomon J, and Tsuchiya A (2007) Measuring and Valuing Health Benefits for Economic Evaluation. Oxford, UK: Oxford University Press.