Select Page

Table 17.31 Characteristics of the Medical Outcomes Study Short Form 36

 
Criterion Evidence
Reliability

Test-Retest Reliability: Brazier et al. (1992) (varying etiologies) calculated correlation coefficients ranging from 0.6 (social functioning) to 0.81 (physical functioning). Mean differences ranged from 0.15 (social functioning) to 0.71 (mental health) with 91-98% cases falling into the 95% CI (constructed as per Bland & Altman). Lower values were reported in stroke population ranging from 0.28 (mental health) to 0.80 (social functioning) and substantial variability in individual responses was reported, particularly for emotional role-limitations (Dorman et al., 1998). (Brazier et al., 1996)reported r=0.28 (social functioning) to 0.70 (vitality) over a retest period of 6 months, while Andresen et al (1999) (elderly) reported ICC ranging from .052 (social functioning) to 0.80 (mental health), ICC for physical summary scores=0.82 and ICC=0.79 for mental summary scores. Values were r=0.79 and 0.78 (p<0.001) for the MCS and Physical Component Scale (PCS) respectively with the test taken at 6 months post-injury and 2-3 weeks later ((Dikmen et al., 2001); TBI).

Internal Consistency: Brazier et al. (1992) α ≥ 0.80 for all subscales but social functioning (α=0.73). Reliability coefficients were 0.74(social functioning) to 0.93 (physical functioning), and Anderson et al. (1996) reported α of 0.6 (vitality) to 0.9 (physical functioning, bodily pain and role limitations-emotional). Brazier et al. (1996) (elderly) reported α ≥ 0.80 for all subscales except for 4, including: social functioning (0.56) and general health (0.66), while inter-item correlations ≥0.73 with the exception of social functioning (0.56) and general health (0.66). Essink-Bot et al. (1997) (varying etiologies) reported α=0.76 (general health) to 0.91 (physical functioning). Hobart et al. (2002) (stroke) found α of 0.68 (general health) and 0.70 (social functioning) to 0.90 (physical functioning. Correlations between 8 scales were lower than the reported alpha coefficients. Hobart et al. (2002) found item-own exceeded item-other correlations by >2.5 SE for 6 of 8 scales, but the social functioning scale & general health scale did not (i.e. limited ability to distinguish constructs). Walters et al. ((2001); elderly) reported α ≥0.80 for all scales but social functioning (α=0.79). Doninger et al. (2003) (TBI) reported person separation estimates of 2.27 and 2.35 for physical health and emotional health respectively, while calibration of the physical functioning items generated a reliability of 0.84 with no misfits. Calibration of the mental health and vitality scales yielded a reliability of 0.85 with one misfit, and for all subscales α ranged from 0.68-0.87 for controls, 0.83-0.91 for mild TBI and from 0.79-0.92 for moderate/severe TBI ((Findler et al., 2001); TBI).

Validity

Construct Validity: Walters et al. (2001) reported significant relationships in expected directions to support construct validity among older adults. Scores in all scales were reported to decrease as age increased (p<0.001) Walters et al. (2001). Women reported worse health than men on all scales even after adjusting for age (p<0.001) Doninger et al. (2003). Likewise, respondents who had recently visited their physician reported poorer health on all scales (p<0.001) and people living alone also had lower scores (p<0.001) except on general health (p=0.02) (Walters et al. (2001). Doninger et al. (2003) reported item separation estimates of 12.03 and 7.95 for physical health and emotional health respectively. In a trauma population, principal components analysis revealed physical function, role physical and bodily pain had the strongest loadings on physical health and the lowest loadings on mental health whereas role emotional and mental health did the opposite. The general health, vitality, and social function scales had substantial loadings on both components. These results were comparable to correlations found for the general US population ((MacKenzie et al., 2002); TBI).

SF-36 scales correlated significantly with the Symptom Checklist (SCL), the Beck Depression Inventory (BDI-II) and the Health Problems List (HPL). In the mild TBI group, scales related to physical functioning were strongly correlated with the Health Problem List (-0.6 to -0.75) and the physical symptoms scale of the SCL (-0.5 to -0.63). Scales related to mental health were most strongly correlated with psychological factors on the SCL. Strong correlations were found between BDI-II scores and all of the SF-36 scales, the highest with the mental health scale (-0.77). In the moderate/severe group, correlations were weaker and more consistent and the strongest correlations were found where expected (Findler et al., 2001).

Construct Validity (Known Groups): Patients diagnosed with ≥ 1 chronic physical problem had lower scores on all dimensions of the SF-36 except mental health, than healthy age-matched controls (p<0.001). SF-36 scores distributed as expected for sex, age, social class and use of health services (Brazier et al., 1992). SF-36 distinguished between groups based on functional dependence versus independence based on BI scores (p<0.05 on all scales) and between groups based on mental health versus ill-health defined by GHQ-28 scores (p<0.05 on all scales) (Anderson et al., 1996) (stroke). Mayo et al. (2002) (stroke) reported SF-36 scores discriminated stroke survivors from age and gender-matched controls, while Williams et al. (1999) (stroke) found the SF36 unable to discriminate between groups based on patient self-report ratings of overall Health-Related QOL (HRQOL) (same, a little worse, or a lot worse than pre-stroke). SF-36 discriminated between age groups (<75 yrs vs 75+) on physical functioning, vitality and change in health subscales (p ≤ 0.006) and between groups based on setting (general practice versus hospital outpatients) on the physical function and role functioning-physical subscales (p=0.16) (Hayes & Joseph, 2003). Essink-Bot et al. (1997) reported SF-36 was able to discriminate between migraine sufferers and controls on all subscales (p<0.01) (ROC/AUC=0.54-0.67) and between groups of migraine sufferers based on absence from work (0 versus ≥ 0.5 days; p<0.01, ROC/AUC=0.61-0.79). Brazier et al. (1996) reported SF-36 scores distinguished groups based on recent visits to GP, hospital inpatient stays and longstanding illness (p<0.05). At 3 months and 1 year post-injury, mild TBI patients scored significantly lower than the matched normative group on all subscales and there was a significant negative correlation between number of post-concussion symptoms and SF-36 scores ((Emanuelson et al., 2003); TBI). There were significant differences in scores between the control/nondisabled group, mild TBI group, and moderate/severe TBI. Both TBI groups scored significantly lower than the control group on all scales and the mild TBI group scored significantly lower than the moderate/severe group on all scales except for the physical function sub-score, which did not differ between TBI severity levels. After controlling for depression, many of the differences between the 2 TBI groups became insignificant (Findler et al., 2001). The self-ratings of matched-normal controls were found to be significantly higher than those of TBI patients on all scales except for the general health scale. The PCS and MCS also differed significantly between controls and TBI patients ((Paniak et al., 1999); TBI).

Construct Validity (Convergent/Divergent): Correlations between similar scales on the SF-36 and the Nottingham Health Profile were reported as -0.41 (social functioning versus social isolation) and –0.68 (vitality versus energy). Correlations between dimensions were less clearly related and ranged from –0.18 (physical functioning versus emotional reaction) to –0.53 (social functioning versus emotional reactions) (Brazier et al., 1992). Anderson et al. (1996) reported that BI scores (in stroke survivors) were strongly associated (p<0.001) with physical functioning and general health. Mental health on the General Health Questionnaire-28 was most strongly associated (p<0.001) with the social functioning, role limitations-emotional and mental health scales of the SF-36. Dorman et al. (1999) (stroke) reported SF-36 physical functioning subscale was most closely correlated with mobility, self-care and activities domains of EuroQol (r=0.57, 0.65 & 0.63) and less strongly with the EuroQol psychological domain (0.34). SF-36 bodily pain correlated with EuroQol pain domain (r=0.66) and moderately with all EuroQol domains. Emotional role functioning correlated most closely with EuroQol psychological domain (r= 0.43) and least with EuroQol self care (r=0.24). SF-36 mental health was not closely related to the psychological domain (r=0.21) or to physical EuroQol domains (r=0.06-0.10). SF-36 general health is correlated with EuroQol, overall HRQOL rating r=0.66. Lai et al. (2003) (stroke) reported r=0.55 between SF-36 physical functioning scale and BI. Andresen et al. (1999) (elderly) reported physical health scores correlated more strongly with ADL scores than with GDS (-0.38 versus –0.28) and mental health summary scores correlated more strongly with GDS scores than ADL scores (-0.63 versus 0.01). However, role-physical is correlated more strongly with GDS scores than with ADL scores, contrary to a prior hypothesis, social functioning, role-emotional, vitality and mental health all correlated more strongly with GDS scores than ADL scores. Dikmen et al. (2001) found significant correlations between the PCS and the Functional Status Examination regardless of whether the patient (-0.68) or a significant other (-0.64) assessed patient function. The correlations between the Mental Component Score (MCS) and the Functional Status Examination were weak and not significant. McNaughton et al. ((2005); stroke) reported high correlations (0.32-0.97) across the Physical Component Scale (PCS), FIM, Barthel Index (BI), and the London Handicap Score. Correlations of these measures with the MCS were weaker (0.17-0.32).

Predictive Validity: McHorney (1996) (stroke) examined data from a medical outcomes study which reported the general health perceptions scale to be most predictive of death (death rate of patients in lowest quartile for SF-36 general health scale was 3 times greater than for patients with SF-36 scores in the highest quartile), followed by scores in physical functioning. Baseline physical functioning, role functioning-physical and pain scales were most predictive of hospitalizations and pain, general health and vitality were most predictive of physician visits.

Responsiveness

Item mapping is used and the social functioning subscale provides a limited assessment of the number and difficulty of activities. It demonstrated marked ceiling effects up to 60% for Modified Rankin Scale grade 0 and the SF36 physical function scale is reported to have floor effects of 37% and 100% for patients with MRS grades 4 & 5 (Lai et al., 2003), while large ceiling effects are reported for the role limitations: physical (53%), bodily pain (43%), social functioning (67%), and role limitations-emotional scales (72%). No floor effects over 7% were reported. Scores for SF-36 physical functioning scale are more uniformly distributed than BI scores, suggesting lower floor and ceiling effects than the BI. Anderson et al. (1996). Brazier et al. (1996) reported floor effects in excess of 25% for role limitations physical and emotional, and ceiling effects >25% for social functioning and role limitations emotional & physical.

Notable floor effects (role limitations: physical 59.1%, emotional 19.9%) and ceiling effects (role limitations: emotional 63.1%, social functioning 29.9%, bodily pain 25.6%) are reported among ischemic stroke survivors (Hobart et al., 2002) (stroke). Substantial floor and ceiling effects were reported by O’Mahoney et al. (1998) (stroke). For face-to-face, telephone and self-administration, Weinberger et al. (1996) (varying etiologies) reported substantial floor effects for role-physical (>40%) and role –emotional (>25%) subscales and ceiling effects for role-emotional (>36%) and social functioning subscales (>27%-for face-to-face and self-administration only). Walters et al. (2001)reported substantial floor (30.9-61%) and ceiling effects across all age groupings (65-69, 70–74, 75-79, 80-84 & 85+) in the role functioning physical (30.9%-61% & 11.7%-38.6%) and role functioning-emotional (25.6%-50.4% & 32.2%-53.2%) as well as substantial ceiling effects in social functioning and bodily pain (15%-46.7% & 14.1%-21.1%, respectively). Andresen et al. (1999) reported substantial floor effects of 26.8% and 29.5% for physical functioning and role-functioning, respectively, in a sample of nursing home residents as well as ceiling effects of 36.1%, 49.5% and 21.6% in social functioning, role-emotional, and bodily pain respectively. Mossberg & McFarland (2001) (varying etiologies) found SF 36 effect sizes from admission to outpatient rehabilitation to discharge of 0.48 for emotional role limitations and 1.38 for bodily pain, PCS and MCS effect sizes=0.80 and 0.45 respectively. Effect sizes for the PCS and MCS were 2.48 and 0.93 respectively (Paniak et al., 1999).

Tested for TBI patients? Yes, several studies have been published indicating the scale has in fact been tested with those who have sustained a TBI. (Brown et al., 2004; Callahan et al., 2005; Corrigan et al., 1998; Dikmen et al., 2001; Doninger et al., 2003; Emanuelson et al., 2003; Findler et al., 2001; MacKenzie et al., 2002; McNaughton et al., 2005; Ocampo et al., 1997; Paniak et al., 1999).
Other Formats

Mailed Questionnaire: Hayes et al. (1995) (varying etiologies) found type/mode of administration was clearly related to completeness of data (p<0.0001). For self-completion versus in-person interview, the percentage of missing items was greater among the older respondents (p<0.015). Time to complete survey was not dependent upon mode of administration or age, with 84% of the respondents completing the assessment in 10 minutes or less. Walters et al. (2001) reported non-completion of the mailed survey to be significantly related to increasing age (p<0.001).

Face-to-Face, Self-Report and Telephone Interview: Weinberger et al. (1996) reported internal consistency for all modes of administration: face-to-face α= 0.75-0.89, self α=0.77-0.93, telephone α=0.67-0.92. Mean test-retest correlations for face-to-face, self, and telephone modes were 0.80, 0.83 and 0.79. Between mode correlations were similar: face-to-face versus self r=0.54-0.82, face-to-face vs telephone r=0.55-0.91. Correlations did not differ significantly by order of administration. Despite short testing intervals, large absolute differences were reported on within mode and between mode comparisons. Directional differences (over time<1 week) were significant on between mode comparisons on 4/8 subscales (physical function, social function, role-emotional & mental health) with face-to-face interviews producing higher scores.

Acute (1-week recall) Version: Keller et al. (1997) (varying etiologies) reported median inter-item correlations ranged from 0.43 (role-emotional) to 0.78 (bodily pain), and α ranged from 0.59 (role- emotional) to 0.89) (physical functioning). Vitality, role emotional and mental health α values fell below 0.80. Principal component analysis revealed the same 2 factor structure as the standard version. The acute version displayed significant ceiling effects (>20%) in 4 subscales (role-physical, bodily pain, social functioning and role-emotional). There were no reported floor effects. Change scores for the acute form (baseline to week 4) were more closely related to one-week change in disease severity than standard form scores. For acute change scores, 10/18 of such comparisons reached significance.

Proxy Assessment

Dorman et al. ((1998); stroke) reported test-retest reliability better when the patient completed the forms than when completed by proxy respondent. ICC’s ranged from 0.3 (mental health) to 0.81 (bodily pain/general health) when forms were patient-completed vs ICC of 0.24 (mental health) to 0.76 (social functioning) for proxy completion.

Pierre et al. (1998) (elderly) demonstrated poor to moderate agreement between proxy and patient ratings. In a rehabilitation setting, ICC’s=0.01 (social functioning) to 0.60 (vitality) for patient/health professional proxy pairings. For significant others proxies/patients, ICC’s=-0.11(mental health)-0.58 (general health). In a day hospital setting and professionals as proxies, ICC’s=0.09 (role physical)-0.45 (physical functioning). With significant others, ICC’s=0.01 (social functioning) to 0.71 (physical functioning). α= 0.64-0.86 for the patient data, 0.76-0.90 for the health professional data, and 0.69-0.84 for the significant other data.

Segal & Schall ((1994); stroke) reported ICC of 0.15 (role limitations-emotional) to 0.67 (physical functioning) for patient ratings versus proxy ratings.

Ocampo and Dawson (1997) (TBI) found that the highest level of agreement between TBI patients and their informants was for physical functioning (ICC=0.58) and general health (ICC=0.51). Agreement for role-physical and role-emotional were high for the moderate and severe groups, whereas agreement was generally poor on the other subscales.

Dikmen et al. (2001) reported a correlation of 0.53 (p<0.001) between the assessments of patients and their significant other on the PCS, but this correlation on the MCS was weak and non-significant.