Table 17.2-Evaluation Criteria and Standards
Criterion | Definition | Standard |
|
The match of the instrument to the purpose/question under study. One must determine what information is required and what use will be made of the information gathered (Wade 1992) | Depends upon the specific purpose for which the measurement is intended. |
2. Reliability |
Refers to the reproducibility and internal consistency of the instrument.
|
Test-retest or interobserver reliability (ICC; kappa statistics; Andresen 2000; Hseuh et al. 2001; Wolfe et al.1991).
|
3. Validity |
Does the instrument measure what it purports to measure? Forms of validity include face, content, construct, and criterion. Concurrent, convergent or discriminative, and predictive validity are all considered to be forms of criterion validity. However, concurrent, convergent and discriminative validity all depend on the existence of a “gold standard” to provide a basis for comparison. If no gold standard exists, they represent a form of construct validity in which the relationship to another measure is hypothesized (Finch et al., 2002). | Construct/convergent and concurrent correlations (Andresen 2000; McDowell & Newell; Fitzpatrick et al. 1998; Cohen et al. 2000):
|
4. Responsiveness |
Sensitivity changes within patients over time, which may be indicative of therapeutic effects. Responsiveness is most commonly evaluated through correlation with other change scores, effect sizes, standardized response means, relative efficiency, sensitivity and specificity of change scores and ROC analysis. Assessment of possible floor and ceiling effects are included as they indicate limits to the range of detectable change beyond which no further improvement or deterioration can be noted. | Sensitivity to change:
Excellent:
Evidence of change in expected direction using methods such as standardized effect sizes:
|
5. Precision |
Number of gradations or distinctions within the measurement. For example, a yes/no response versus a 7-point Likert response set. | Depends on the precision required for the purpose of the measurement (e.g. classification, evaluation, prediction). |
6. Interpretability |
How meaningful are the scores? Are there consistent definitions and classifications for results? Are there norms available for comparison? | Jutai and Teasell (2003) point out these practical issues should not be separated from the consideration of the values that underscore the selection of outcome measures. A brief assessment of practicality will accompany each summary evaluation. |
7. Acceptability |
How acceptable the scale is in terms of completion by the patient; does it represent a burden? Can the assessment be completed by proxy if necessary? | |
8. Feasibility |
Extent of effort, burden, expense and disruption to staff/clinical care arising from the administration of the instrument. |