Improved responsiveness and reduced sample size requirements of PROMIS physical function scales with item response theory
1 Department of Medicine, Stanford University School of Medicine, 1000 Welch Road, Suite 203, Palo Alto, CA 04304, USA
2 University of Massachusetts Boston, 100 Morrissey Blvd., Boston, MA 02125-3393, USA
Arthritis Research & Therapy 2011, 13:R147 doi:10.1186/ar3461Published: 14 September 2011
The Health Assessment Questionnaire Disability Index (HAQ) and the SF-36 PF-10, among other instruments, yield sensitive and valid Disability (Physical Function) endpoints. Modern techniques, such as Item Response Theory (IRT), now enable development of more precise instruments using improved items. The NIH Patient Reported Outcomes Measurement Information System (PROMIS) is charged with developing improved IRT-based tools. We compared the ability to detect change in physical function using original (Legacy) instruments with Item-Improved and PROMIS IRT-based instruments.
We studied two Legacy (original) Physical Function/Disability instruments (HAQ, PF-10), their item-improved derivatives (Item-Improved HAQ and PF-10), and the IRT-based PROMIS Physical Function 10- (PROMIS PF 10) and 20-item (PROMIS PF 20) instruments. We compared sensitivity to detect 12-month changes in physical function in 451 rheumatoid arthritis (RA) patients and assessed relative responsiveness using P-values, effect sizes (ES), and sample size requirements.
The study sample was 81% female, 87% Caucasian, 65 years of age, had 14 years of education, and had moderate baseline disability. All instruments were sensitive to detecting change (< 0.05) in physical function over one year. The most responsive instruments in these patients were the Item-Improved HAQ and the PROMIS PF 20. IRT-improved instruments could detect a 1.2% difference with 80% power, while reference instruments could detect only a 2.3% difference (P < 0.01). The best IRT-based instruments required only one-quarter of the sample sizes of the Legacy (PF-10) comparator (95 versus 427). The HAQ outperformed the PF-10 in more impaired populations; the reverse was true in more normal populations. Considering especially the range of severity measured, the PROMIS PF 20 appears the most responsive instrument.
Physical Function scales using item improved or IRT-based items can result in greater responsiveness and precision across a broader range of physical function. This can reduce sample size requirements and thus study costs.