Home Superior outcomes following cervical fusion vs. multimodal rehabilitation in a subgroup of randomized Whiplash-Associated-Disorders (WAD) patients indicating somatic pain origin-Comparison of outcome assessments made by four examiners from different disciplines
Article Publicly Available

Superior outcomes following cervical fusion vs. multimodal rehabilitation in a subgroup of randomized Whiplash-Associated-Disorders (WAD) patients indicating somatic pain origin-Comparison of outcome assessments made by four examiners from different disciplines

  • Elisabeth Svensson EMAIL logo , Bo Nyström , Ian Goldie , Nils Inge Landrø , Åke Sidén , Peer Staff , Birgitta Schillberg and Adam Taube
Published/Copyright: February 22, 2018
Become an author with De Gruyter Brill

Abstract

Background and aims:

Whiplash-Associated Disorders (WAD) are characterized by great variability in long-term symptoms. Patients with central neck and movement-induced stabbing pain participated in a randomized study comparing cervical fusion and multimodal rehabilitation. As reported in our previous paper, more patients treated by cervical fusion than by rehabilitation experienced pain relief. Although patient reported outcome measures are a core component of outcome evaluation, independent examiner has been recommended. Because of the heterogeneity of WAD complaints the patients in our study were examined at baseline and follow-up by four experts representing neurology, orthopedics, psychology and physical medicine. The aim was to compare the professional assessments of change both regarding the possible impact of the different examiners’ perspectives on individual patient’s outcome, and also on the analysis of possible outcome differences between the treatment groups.

Methods:

WAD patients with long-term neck pain as the predominant symptom after a traffic accident were eligible. The neck pain origin should be in the midline and perceived as dull and aching, with sudden movement inducing midline stabbing pain. Of the 1,052 patients in contact with our team, 49 were eligible. The overall treatment effect was evaluated on a global outcome transitional scale. The criteria for the scale categories were defined by each expert’s professional perspective on change in the whiplash complaints. Statistical methods that take account of the non-metric properties of ordered categorical data were used. Observed inter-expert disagreement was evaluated by the Svensson method that identifies and measures systematic group-related disagreement separately from disagreement caused by individual variation. Possible differences in the distributions of assessments on the expert-specific outcome scales between the treatment groups were analyzed by the Kruskal-Wallis test.

Results:

The per-protocol evaluation showed that a majority of the 18 patients who underwent fusion surgery were assessed as somewhat or much better, ranging from 67% to 78% depending on the expert. Corresponding proportions of improvement in the 17 patients treated by multimodal rehabilitation ranged from 29% to 53%. The statistical analyses confirmed better outcomes in the patients treated by fusion surgery, with p-values ranging from 0.003 to 0.04. The experts’ assessments of intra-patient change disagreed more or less for all patients. The analyses of the paired comparisons confirmed that these disagreements could most probably be explained by the different profession-specific operational definitions of the outcome scales rather than by individual variations in data.

Conclusions:

The multi-dimensional complexity of WAD-related complaints was comprehensively demonstrated by the inter-disciplinary disagreements in assessing intra-patient outcomes. The superiority of positive treatment effects in patients who underwent cervical fusion compared with multimodal rehabilitation was evident to all experts.

Implications:

The results strengthen our previous opinion that neck pain in this subgroup of WAD patients has a somatic origin. More than one examiner is recommended for multi-dimensional outcome assessments.

1 Introduction

Patients with Whiplash-Associated Disorders (WAD) experience great variability in both long- and short-term complaints [1], [2], [3], [4], [5]. Our clinical experience was that some chronic WAD patients reported cervical symptoms similar to those in lumbar pain patients who benefited from fusion surgery [6], [7], [8]. Such a sub-group of WAD patients was eligible for a randomized study, the two arms being cervical fusion (Group S) and multimodal rehabilitation (Group R). In Group S, a diagnostic provocation test by open, direct mechanical stimulation of the cervical spinal processes was performed, followed by fusion of the presumed symptomatic segments. Patients in Group R received 6 weeks of multimodal treatment involving physician, physiotherapist, occupational therapist, psychologist, social workers and nurses. A comprehensive clinical description is provided in a previous paper [9]. The per-protocol (PP) analyses of the patients’ assessments of change in neck pain showed that 83% of those who underwent fusion surgery and 12% of those in the rehabilitation group reported an improvement, the median outcomes being much better and unchanged, respectively [9].

Although patient reported outcome measures (PROMs) have been shown to be valid and strongly related to other outcome measurements, at least in lumbar fusion surgery [10], the evaluation of outcome by an independent examiner has been strongly advocated [11]. Because of the multi-dimensional heterogeneity in complaints associated with WAD, the patients in our study were examined by four different experts: a neurologist (ÅS), an orthopedic surgeon (IG), a psychologist (NIL) and a physiatrist (PS), (physical medicine and rehabilitation, PM&R) at baseline and follow-up. Each expert assessed the change in whiplash related complaints on a subject-specific global outcome transition scale, the categories of change being: much worse, somewhat worse, unchanged, somewhat better and much better [12].

The operational definitions of the outcome criteria for the transition scale categories were made by each expert according to the different professional perspectives on change in the whiplash related complaints [13], [14]. Such multivariate examination and assessment by experts provide information on three distinct but related levels; the single variable level, the dimensional level and the overall level [12], [13], [14], [15], [16], [17], thus constituting a basis for an integrated outcome status in patient.

The aim of this study was to compare the global assessments of change in whiplash related complaints made by the four experts, both regarding the impact of the different perspectives of assessing the patient’s outcome and on the analysis of possible differences between the treatment groups. The following research questions will be evaluated: Is there a difference in the global score of change between patients treated by cervical fusion and by multimodal rehabilitation from the perspective of each expert? Do the different experts agree in their assessments of change in the patients? In case of disagreement, to what extent is there a systematic inter-disciplinary disagreement that could be explained by their different definitions of the scale categories of change?

2 Materials and methods

2.1 Patients

Patients who experienced long-term neck pain for at least 1 year as the predominant symptom after a traffic accident were eligible. After the accident all patients should have been comprehensively investigated and treated in the ordinary medical system, despite which they still perceived pronounced symptoms not evident from X-ray or MRI. The neck pain origin should be in the midline and the pain perceived as dull and aching, with sudden movement inducing a stabbing pain in the midline. All patients should have markedly reduced work capacity and be on at least 50% sick leave. Forty-nine of the 1,052 patients who were in contact with our team were identified as eligible and included. These 49 patients were randomized to either fusion surgery (group S) or multimodal rehabilitation (group R). A third group of patients with similar symptoms, but not completely fulfilling the inclusion criteria was included for comparison (group C). This group was included to avoid bias in the outcome assessments by the experts. More details regarding patients and the treatment are presented in a previous paper [9].

2.2 Ethics

The study was approved by the Medical Ethics Committee, Örebro, Sweden, no. 368/96.

All patients were given both oral and written information on all parts of the study and gave their written informed consent.

2.3 Registration

The study is registered in ClinicalTrials.gov (registration no. NCT01994044).

2.4 Follow-up

All patients in the three groups, a total of 68 individuals, were independently examined by the four experts at baseline. At follow-up 58 patients were seen by the neurologist and 61 patients by the orthopedic surgeon, the psychologist and the physiatrist. The median time for follow-up evaluations of the patients in Group S was 20 months (range 17–47) and 22 months for patients in Group R (range 17–48) and Group C (range 17–50) [9].

2.5 Global assessments of change

The WAD-related symptoms and findings documented by the four experts at the baseline examination were compared with those at follow-up. The evaluations of the overall treatment effect were made on a transitional scale, the categories of change being: much worse, somewhat worse, unchanged, somewhat better and much better. The criteria for these scale categories were operationally defined by each expert according to the different professional perspectives on change in the whiplash symptoms.

2.5.1 Examination by the neurologist (ÅS)

The neurological examination was performed according to general standards at neurological university departments and included the following: The functions of cranial nerves I–XII were examined with the exception of visual acuity, hearing threshold, gag reflex and swallowing. The cutaneous and tendon reflexes were tested with the exception of anal and genital reflexes. The cerebellar functions were examined with regard to extremity co-ordination, balance, articulation and eye movement. The power and tone of extremity-related muscle groups were tested and the possible presence of abnormal motor activity (dyskinesia) was investigated. Cutaneous sensibility to light touch and pinprick as well as deep sensibility to vibration were tested in the extremities. Sensibility disturbances and/or regional pain were documented graphically. The neck was examined with regard to active and passive mobility as well as tenderness at palpation.

The symptoms and findings were mainly referable to the cervical spine and adjoining regions such as shoulders and arms, extremity motor functions and sensibility (regional/numbness/paraesthesia). The global scale categories of change were conditionally based on these findings accordingly: much worse (deterioration by >2 grades), somewhat worse (deterioration by 1–2 grades), unchanged, somewhat better (improvement by 1–2 grades), and much better (improvement by >2 grades).

2.5.2 Examination by the orthopedic surgeon (IG)

The orthopedic examination included ocular inspection of the carriage of the head, position of the shoulders, especially the position of the scapulae, hyper- or hypotrophy of the muscles of the neck, shoulder, arm and hand. Spontaneous as well as forced movements of the head, shoulders and upper extremities were observed. Palpatory examination included the neck muscles, especially their insertions, and pain recordings during passive, active and forced movements. Finally, neurological testing included motor, sensory and reflex examination. The number of indicators of changed status referring to the dimensions: cranial nerves, cervical spine, cerebellar, motor and sensory functions determined the global scale of change.

2.5.3 Examination by the physiatrist (PS)

Evaluation in physical medicine and rehabilitation is normally based on the dimensions of the International Classification of Function [18], [19], [20]. In our study design the examination by the physiatrist should mainly rely on the dimension impairment without taking function, participation or the environmental and personal factors into account. The assessments comprised an ordinary clinical examination in physical medicine and rehabilitation, with the neurological part omitted due to the study design.

The clinical examination included measurement of range of neck movement with an inclinometer, flexion, extension, bilateral rotation, lateral flexion, provocation of pain through passive stretching of the neck muscles during the above-mentioned movements, the bilateral Spurling’s test, palpation of tenderness of 16 defined tender points in fibromyalgia and assessment of the ability to relax the neck and shoulder muscles. The perceived bodily pain distribution drawing and numerical 9-point pain assessment during activity and rest were also assessed. Perceived tiredness/fatigue and lack of concentration were also assessed on 9-point numerical scales. Subjective health complaints (SHC) during the last month were assessed on the SHC Questionnaire [21]. The examinations covered the following seven dimensions: movements of the neck, bodily pain distribution and intensity, tenderness, fatigue, activity of daily living and subjective health complaints. The number of indicators of changes in the dimensions defined the global categories of change.

2.5.4 Examination by the psychologist (NIL)

In the psychological evaluation the clinical interview was followed by a number of assessments within the following areas; (1) Pain experience, (2) Coping style and pain behavior, such as kinesiophobia and catastrophizing, (3) Psychological symptoms, such as depressive symptoms (based on Beck Depression Inventory II), post-traumatic stress symptoms as well as personality based profiles, (4) Cognitive/neuropsychological tests (i.e. attention, memory, executive and general cognitive/intellectual functions).

The primary endpoint was patient reported pain experience assessed along three dimensions (a) the degree of pain experienced in general during the last week, (b) to what degree did pain influence and restrict daily activities during the last week and (c) to what degree did pain influence and restrict your work function during the last week. The assessments were made on 6-point verbal descriptive scales. The criteria for the categories of the global scale of change were defined conditionally on the change in experienced pain. This means that the categories better and much better required a decrease in perceived pain at follow-up.

2.6 Statistical methods

2.6.1 Basic approaches

Part of the aim was to evaluate the efficacy of cervical fusion surgery in a well-defined group of WAD patients, which motivates the per-protocol (PP) analysis. This means that only those patients who complied with the allocated treatments were evaluated as surgery (S), rehabilitation (R), and comparison (C) patients, respectively. A group D of ten non-compliers also participated in the follow-up examinations and was included in the final analyses.

The intention-to-treat (ITT) analysis was also performed in order to report the effectiveness of the two treatments, which is the overall effect of the planned interventions, irrespective of the patients’ compliance with the allocated treatments.

Ordered categorical scale assessments generate ordinal data, which means that a category only represents an ordered level and not a numerical value in a mathematical sense. These non-metric properties are well recognized, and several authors have stressed the fact that calculating sums and differences is not appropriate for such data [22], [23], [24], [25], [26], [27], [28], [29]. Statistical methods that take account of these non-metric properties of ordered categorical data were used. The frequency distributions of the experts’ assessments on the global transition scale categories, much worse, somewhat worse, unchanged, somewhat better and much better in the treatment groups were described in tables and bar charts, indicating the quartiles (25%, 75%) and the median (50%) [24], [30], [31].

2.6.2 Inter-group comparisons; treatment effects

The proportions of patients assessed as somewhat or much better by each of the experts at the follow-up examinations after fusion surgery or rehabilitation and the 95 percentage confidence intervals (95% CI) were calculated and compared [30], [32].

The frequency distributions of assessments on the five categories of change for the different PP and ITT groups are shown together with the median categories. Possible differences in the distributions of assessments on the expert-specific outcome scales between the groups were analyzed by the Kruskal-Wallis test [24], [30].

2.6.3 Inter-expert comparisons

The impact of the experts’ different perspectives of assessing the patient’s global outcome was evaluated by Svensson’s approach for paired ordinal data [13], [15], [16], [17], [33], [34], [35], [36], [37]. The assessments made by one of the experts were compared pairwise with each of the other expert’s assessments and described in a square contingency frequency table. The pairs of complete inter-expert agreement in scale assessments appear in the main diagonal of the contingency table. The percentage inter-expert agreement (PA) was calculated.

The type of observed inter-expert disagreement was evaluated. The two sets of marginal frequencies in the contingency table show the distribution of each expert’s assessments. Different marginal distributions indicate the presence of a systematic inter-expert disagreement regarding the patient’s outcome because of the expert-specific operational definitions of the outcome scale categories. The difference between the probability that one of the experts assessed the patients’ outcomes to lower categories than did the other expert and the probability of the opposite defines the measure of relative position (RP). Correspondingly the difference between the probabilities of how the experts concentrate their assessments on the scale categories defines the measure of relative concentration (RC). Possible RP and RC values range from –1 to 1. The value of zero indicates unbiased assessments.

Additional individual variability in pairs of assessments that cannot be explained by systematic inter-expert differences is measured by the relative rank variance (RV) [13], [15], [16], [17], [33], [34], [35], [36], [37]. The RP, RC and RV measures and the 95% confidence intervals (CI) of the measures were calculated using a free software program [38].

3 Results

3.1 Inter-group comparisons of treatment effects

3.1.1 Fusion surgery versus multimodal rehabilitation

In the per-protocol evaluation only those 18 and 17 patients, who consented and adhered to the allocated treatments were evaluated as belonging to group S and group R, respectively. The distributions of changes in outcome after the S and R treatments are shown in Fig. 1A and B, respectively.

Fig. 1: 
              The relative frequency distributions of the four experts’ follow-up assessments of the global outcome of change after treatment. The median (50%) and quartiles (25%, 75%) are indicated. (A) The surgery group (n=18). (B) The rehabilitation group (n=17).
Fig. 1:

The relative frequency distributions of the four experts’ follow-up assessments of the global outcome of change after treatment. The median (50%) and quartiles (25%, 75%) are indicated. (A) The surgery group (n=18). (B) The rehabilitation group (n=17).

The relative frequency distributions of the four experts’ outcome assessments show that a majority of the patients who underwent fusion surgery (group S) were assessed as somewhat or much better, ranging from 67% to 78% dependent on the clinical discipline, Fig. 1A.

In the neurological examination 78% (95% CI, 55–91%) of the 18 patients were assessed as improved. This means that in a representative group of patients with the same inclusion criteria [9] one can expect that between 55% and 91% of the patients will be neurologically assessed as somewhat or much better after fusion surgery. The orthopedic surgeon and the psychologist judged 72% (95% CI, 49–88%), and the physiatrist 67% (95% CI, 44–84%) of the patients to be somewhat or much better after cervical fusion surgery.

Correspondingly, the proportions of patients who were assessed as somewhat or much better after the multimodal rehabilitation treatment ranged from 29% to 53%, Fig. 1B. The orthopedic surgeon judged 53% (95% CI, 31–71%), the psychologist 35% (95% CI, 17–59%), the physiatrist and the neurologist assessed 29% (95% CI, 13–53%) of the patients as somewhat or much better after multimodal rehabilitation.

The most pronounced difference between the treatment effects was seen by the neurologist, who found 48 percentage units (78% vs. 29%) more patients somewhat or much better after fusion surgery compared with multimodal rehabilitation. The 95% confidence interval ranging from 15 to 69 percentage units confirms the significant difference in treatment effect from the perspective of the neurologist. Correspondingly, 37 percentage units more patients (95% CI, 4–60) were judged somewhat or much better by the psychologist and by the physiatrist. According to the assessments by the orthopedic surgeon, the difference in the proportion of patients with improved outcomes in favor of fusion surgery was 19 percentage units (95% CI, −12 to 46).

3.1.2 Inter-group per-protocol analyses

The per-protocol analyses of all patients evaluated in the follow-up examinations involved four groups: the two treatment groups S and R, the comparison group C, and the group D of ten non-compliers that also participated in the follow-up examinations.

The frequency distributions of independent assessments of global change in patients made by the four experts ranged from much worse to much better (Table 1). The majority of the 18 patients in the surgery (S) group were assessed as much better by the psychologist and the physiatrist, and somewhat better by the neurologist and the orthopedic surgeon. The median outcomes of the 17 patients in the rehabilitation (R) group were unchanged except for the assessments made by the orthopedic surgeon, where the median outcome was somewhat better. The median outcomes of the patients in groups C and D were unchanged.

Table 1:

Frequency distributions of per-protocol outcomes according to the assessments made by the four experts on their outcome scales in the groups: surgery (S), rehabilitation (R), comparison (C) and non-compliers (D).

Expert/group Outcome scale categories
Overall p-Value median category
Much worse Somewhat worse Unchanged Somewhat better Much better
Neurologist p=0.003
 Group S 1 3 12 2 Somewhat better
 Group R 1 11 5 Unchanged
 Group C 1 2 12 0 Unchanged
 Group D 2 3 3 Unchanged
Orthopedic p=0.04
 Group S 4 1 8 5 Somewhat better
 Group R 6 2 6 3 Somewhat better
 Group C 2 4 8 2 0 Unchanged
 Group D 4 2 2 2 Unchanged
Psychologist p=0.005
 Group S 1 4 3 10 Much better
 Group R 1 10 4 2 Unchanged
 Group C 3 9 3 1 Unchanged
 Group D 1 6 3 Unchanged
Physiatrist p=0.04
 Group S 4 2 5 7 Somewhat better
 Group R 3 9 4 1 Unchanged
 Group C 4 9 3 Unchanged
 Group D 3 6 1 Unchanged
  1. The overall p-value from the Kruskal-Wallis analysis of possible differences in distributions and the median categories are shown.

The statistical analyses confirm the significant differences in outcomes between the groups in favor of better outcomes in patient treated by fusion surgery (group S), the p-values ranging from 0.003 to 0.04.

3.1.3 The inter-group intention-to-treat analyses

The intention-to-treat analysis includes the two treatment groups (S and R) and the comparison group (C). The frequency distributions of the outcome assessments made by each of the experts differed between the three treatment groups in favor of the surgery group (S), with the median level of change being somewhat better (Table 2).

Table 2:

Frequency distributions and the median of the intention-to-treat outcomes according to the assessments made by the four experts on their outcome scales in the groups: surgery (S), rehabilitation (R), comparison (C).

Expert/group Outcome scale categories
Overall p-Value median category
Much worse Somewhat worse Unchanged Somewhat better Much better
Neurologist p=0.00003
 Group S 1 5 15 2 Somewhat better
 Group R 3 12 5 Unchanged
 Group C 1 2 12 Unchanged
Orthopedic p=0.02
 Group S 7 1 9 7 Somewhat better
 Group R 6 5 7 3 Unchanged
 Group C 2 4 8 2 Unchanged
Psychologist p=0.01
 Group S 1 8 5 10 Somewhat better
 Group R 2 12 5 2 Unchanged
 Group C 3 9 3 1 Unchanged
Physiatrist p=0.14
 Group S 6 5 6 7 Somewhat better
 Group R 4 12 4 1 Unchanged
 Group C 4 9 3 Unchanged
  1. The overall p-value from the Kruskal-Wallis analysis of possible differences in distributions and the median categories are shown.

The statistically significant differences in outcomes between the three groups were most pronounced in the neurological evaluation (p=0.00003) and were explained by the superiority of outcomes in group S. The inter-group comparisons of treatment effects made by the physiatrist were less evident for general conclusions, p=0.14.

3.2 Inter-expert comparisons in outcome assessments of patients in the surgery/S and rehabilitation/R per-protocol groups

3.2.1 Inter-expert outcome assessments

The frequency distributions of global changes in patients of the treatment groups S and R that were assessed by the four experts ranged from somewhat worse to much better (Table 1). As also demonstrated in Fig. 1, the majority of the 18 patients in the surgery (S) group were assessed as somewhat or much better, the median being somewhat better according to the neurologist, the orthopedic surgeon, and the physiatrist, and much better according to the psychologist. The median outcomes of the 17 patients in group R were unchanged except for the assessments made by the orthopedic surgeon, the median outcome being somewhat better.

As evident by Table 1 the frequency distributions differ between the experts. To what extent do the experts agree in assessing a patient’s global score of change in whiplash related complaints? For example, the physiatrist judged seven patients in the S group as much better on the follow-up examination than before surgery. Are these patients found among the ten patients with the same level of outcomes according to the psychologist?

3.2.2 Inter-expert disagreements in outcome after fusion surgery

Figure 2 shows the paired distribution of assessments of global changes in the surgery group of patients made by the psychologist and by the physiatrist. These experts agreed in assessing the seven patients as much better after surgery. The remaining three patients, who were much better according to the psychologist, were regarded as somewhat better or unchanged by the physiatrist. Ten of the patients were assessed the same outcome scores by these two experts, the percentage agreement, PA, is 56%. Could the observed disagreement be explained by the different operational definitions for the two experts´ global scale categories?

Fig. 2: 
              Frequency distribution of pairs of independent assessments of change in WAD related complaints in patients after surgery (Group S) made by the psychologist and the physiatrist. The agreement diagonal is marked.
Fig. 2:

Frequency distribution of pairs of independent assessments of change in WAD related complaints in patients after surgery (Group S) made by the psychologist and the physiatrist. The agreement diagonal is marked.

The two sets of frequency distributions of assessments made by the two experts appear as marginal totals and indicate that the psychologist systematically used higher rather than lower scores of global change than did the physiatrist. This systematic difference in position of the outcome scale assessments is also evident by the corresponding relative frequency distributions of the bar charts in Fig. 1A. Both the median and the third quartile outcome categories of the assessments by the psychologist are much better, compared with the median category somewhat better and the third quartile much better in the assessments by the physiatrist. This systematic difference in outcome scale assessments between these two experts is statistically confirmed by the significant measure of systematic disagreement in relative position, RP: −0.19 [95% CI (RP), −0.34 to −0.04]. The negative RP value indicates that the group of patients who underwent surgery more likely, (0.04–0.34 percentage units) will get higher scored outcome levels by the psychologist than by the physiatrist rather than the opposite.

The relative frequency distributions of the four experts’ outcome assessments of the surgery group of patients, (Fig. 1A), indicate a systematic difference in position regarding the proportion patients assessed as much better after surgery. The significant RP-values of the pairwise comparisons of outcome assessments made by the psychologist and by the three other experts, respectively, (Table 3A), confirm the larger probability that a group of S-patients will get a higher rather than a lower outcome level from the psychologist than from the other experts.

Table 3:

The measures of inter-expert agreement and disagreement, and the 95% confidence intervals of the measures (lower; upper limits) in the per protocol analysis.

Psychologist Neurologist Orthopedic
(A) Group S
 Neurologist PA 33%
RP −0.30 (−0.56; −0.03)
RC 0.52 (0.22; 0.82)
RV 0.02 (0.00; 0.08)
 Orthopedic PA 50% PA 39%
RP −0.25 (−0.46; −0.03) RP 0.05 (−0.26; 0.35)
RC 0.12 (−0.20; 0.44) RC −0.31(−0.55; −0.07)
RV 0.05 (0.00; 0.14) RV 0.24 (0.00; 0.59)
 Physiatrist PA 56% PA 33% PA 44%
RP −0.19(−0.34; −0.04) RP 0.09 (−0.21; 0.38) RP 0.05 (−0.16; 0.27)
RC −0.04 (−0.29; 0.22) RC −0.45 (−0.74; −0.16) RC −0.12 (−0.44; 0.20)
RV 0.02(0.00; 0.05) RV 0.07 (0.00; 0.15) RV 0.06 (0.00; 0.15)
(B) Group R
 Neurologist PA 59%
RP −0.09 (−0.38; 0.20)
RC 0.15 (−0.11; 0.40)
RV 0.21 (0.00; 0.51)
 Orthopedic PA 18% PA 29%
RP −0.02 (−0.40; 0.35) RP 0.07 (−0.27; 0.40)
RC −0.42 (−0.75; −0.09) RC −0.62 (−0.87; −0.36)
RV 0.47 (0.00; 0.95) RV 0.09 (0.00; 0.23)
 Physiatrist PA 47% PA 47% PA 41%
RP −0.15 (−0.42; 0.13) RP −0.07 (−0.32; 0.19) RP −0.09 (−0.40; 0.22)
RC −0.06 (−0.27; 0.15) RC −0.19 (−0.48; 0.10) RC 0.43 (0.17; 0.69)
RV 0.20 (0.00; 0.46) RV 0.03 (0.00; 0.08) RV 0.28 (0.00; 0.66)
  1. Notations: PA=percentage agreement; RP=systematic disagreement in position; RC=systematic disagreement in concentration; RV=relative rank variance. Statistically significant values are bold.

  2. (A) The surgery (S) group n=18. (B) The rehabilitation (R) group n=17.

As evident by Fig. 1A and Table 1 the neurologist assessed a majority (67%) of the patients in the surgery group as being somewhat better, which is the median as well as the quartile categories. The paired distribution of assessments by the neurologist and the orthopedic surgeon in the S group of patients, Fig. 3, shows that they agree in outcomes of seven patients, PA, 39%. The neurologist concentrated the assessments of 12 patients to somewhat better. Four of these patients were assessed as much better, and two as somewhat worse by the orthopedic surgeon. The disagreement is partly explained by a systematic disagreement in how they concentrate their assessments; the measure of concentration, RC is −0.31 [95% CI (RC), −0.55 to −0.07], see Table 3A.

Fig. 3: 
              Frequency distribution of pairs of independent assessments of change in WAD related complaints in patients after surgery (Group S) made by the neurologist and the orthopedic surgeon. The agreement diagonal is marked.
Fig. 3:

Frequency distribution of pairs of independent assessments of change in WAD related complaints in patients after surgery (Group S) made by the neurologist and the orthopedic surgeon. The agreement diagonal is marked.

Furthermore, the three disagreeing pairs of assessments: (much better, somewhat worse), and (somewhat better, somewhat worse) explain the high RV-value, 0.24, of disagreement that is not covered by the systematic disagreement.

3.2.3 Inter-expert disagreement on outcome after multimodal rehabilitation

The frequency distributions of the experts’ assessments of change in the patients who underwent the multimodal rehabilitation, Table 1 and Fig. 1B, show that three of the experts regarded most patients as unchanged, contrary to the orthopedic surgeon, who assessed two patients as unchanged on the follow-up occasion. The percentage agreements of the pairwise comparisons between the orthopedic surgeon and the other experts range between 18% and 41%, (Table 3B).

The paired distribution of assessments made by the psychologist and the orthopedic surgeon, Fig. 4, shows that these experts agreed in the outcome of three of the patients, the percentage agreement being 18%, (Table 3B). One of the ten patients who were assessed as unchanged by the psychologist was judged the same outcome level by the orthopedic surgeon. The outcomes of the other nine patients ranged from somewhat worse to much better according to the orthopedic surgeon. Eight of the patients were assessed to lower and six to higher outcome levels by the orthopedic surgeon than by the psychologist. This variability in assessments of patients, as evident by the high RV-value, 0.47, is one of the main reasons for the low level of agreement in assessments. Another reason is the systematic disagreement in how the two experts concentrate the assessments on the outcome levels. The measure of RC, −0.42 [95% CI (RC), −0.75 to −0.09] means that the psychologist will more likely concentrate the assessments to the outcome level unchanged than will the orthopedic surgeon, who used the categories somewhat worse and somewhat better in assessing 12 patient, (Fig. 4). The same systematic inter-expert disagreements in concentration hold for the other experts when compared with the orthopedic surgeon, see Table 3B.

Fig. 4: 
              Frequency distribution of pairs of independent assessments of change in WAD related complaints in patients after rehabilitation (Group R) made by the psychologist and the orthopedic surgeon. The agreement diagonal is marked.
Fig. 4:

Frequency distribution of pairs of independent assessments of change in WAD related complaints in patients after rehabilitation (Group R) made by the psychologist and the orthopedic surgeon. The agreement diagonal is marked.

As evident by the Table 3B, the individual variability in the pared comparison of the assessments of the patients’ outcomes, the RV, ranged from 0.03 to 0.47.

Figure 5 shows the paired comparison between the neurologist and the physiatrist, who agreed in assessing outcomes in eight patients, 47%. The disagreeing pairs are close to the agreement diagonal, which means a negligible individual variability, RV, 0.03. The frequency distributions of assessments of the group of patients are similar which means homogeneous paired assessments.

Fig. 5: 
              Frequency distribution of pairs of independent assessments of change in WAD related complaints in patients after rehabilitation (Group R) made by the neurologist and the physiatrist. The agreement diagonal is marked.
Fig. 5:

Frequency distribution of pairs of independent assessments of change in WAD related complaints in patients after rehabilitation (Group R) made by the neurologist and the physiatrist. The agreement diagonal is marked.

3.2.4 Systematic inter-expert differences in outcome assessments

In summary, the frequency distributions of outcome levels in the group of patients after fusion surgery differed between the experts. The percentage agreements, PA, in outcome assessments range from 33% to 56% (Table 3A), and the disagreements are explained by systematic disagreements between the professionals’ outcome scale categories. The psychologist systematically assessed more patients as much better and the neurologist systematically concentrated the outcomes to somewhat better than did the other experts. The additional individual variations in paired assessments, the RV-values were mainly negligible; the RV ranged from 0.02 to 0.07 except for the paired assessments of the neurologist and the orthopedic surgeon, the RV being 0.24 (Table 3A).

The frequency distributions of outcome levels in the group of patients after rehabilitation also differed between the experts. The percentage inter-expert agreements, PA, in outcome assessments made by the orthopedic surgeon and each of the other experts ranged from 18% to 41%, and the main reason for disagreements was the systematic disagreement in how the experts concentrated their outcome assessments as evident by the significant RC-values (Table 3B). As shown in Table 1 and Fig. 4 the orthopedic surgeon regarded two patients as unchanged in contrast to the other experts, who assessed a majority of the patients as unchanged on the follow-up examinations. Another reason for disagreement was the individual variations in assessing outcomes, the RV ranged from 0.03 to 0.47 (Table 3B).

4 Discussion

Our study clearly demonstrates the multi-dimensional heterogeneity in WAD-related complaints. The experts’ assessments of the global scales of change disagreed more or less for all patients, except for two who underwent fusion surgery. One of them was assessed as much better and the other as somewhat better by the four experts. All the experts agreed in their assessment of one patient in the rehabilitation group and four in the comparison group as being unchanged.

The Svensson method for evaluation of paired ordinal data makes it possible to identify the sources of observed disagreements. The significant systematic disagreements in position and/or concentration in the paired comparisons confirm that the differences in the intra-patient outcome scores could be explained by the different profession-specific operational definitions of the outcome scales rather than by individual variations in data [13], [15], [33], [34], [35], [36], [37], [38].

Measurement of health care outcomes is challenging, not least in the area of spine care. An international group of specialists in several spine care disciplines recently proposed a set of metrics for standardized outcome reporting in the management of low back pain [39].

As stated in their paper, patient-reported outcome measures (PROMs) are a core component of the standard set for outcome evaluation. Somewhat surprisingly, the importance of an independent observer/examiner is not discussed. However, in the report from the Bone and Joint Decade 2000–2010 Task Force on Neck Pain, the use of independent outcome assessment was strongly recommended [11].

Adhering to this recommendation we strongly advocated independent observers in our study. Since the patients commonly suffer from multidisciplinary long-standing symptoms after a whiplash injury, four experts representing neurology, orthopedics, psychology and PM&R examined the patients at baseline and follow-up. The operational definitions of the outcome scale categories of change were comprehensively defined by each expert, thereby allowing inter-disciplinary comparisons of outcomes within patients and between treatments.

Hägg et al. [10] found the patient’s global assessment on the transition scale much better, somewhat better, unchanged and worse after treatment to be a valid measure of treatment effect in lumbar fusion studies. However, criticism of the use of global assessment as an outcome measure following spine treatment has recently been expressed on the grounds that over time it is unlikely that patients will be able to recall their initial or pre-operative state for comparison [40]. On the other hand, independent examiners might also be biased due to their own preferences for one kind of treatment or form of care giver over another.

In our randomized study all patients completed the disease-specific Balanced Inventory for Spinal disorders (BIS) before and after the treatment [9]. The BIS is a multidimensional self-reported questionnaire for patients with various spinal disorders. The follow-up version of the BIS has additional transition scales for the core variables, which allows for control of recall bias [15], [16], [17]. The PROMs of change, as well as the before and after assessments, are presented in our previous paper [9]. The proportions of patients who reported improvement after fusion surgery and rehabilitation were 83% and 12%, respectively. The statistical evaluation of corresponding paired assessments of neck pain made by the patients before and after treatment showed corresponding pronounced difference in treatment effects between fusion surgery and rehabilitation. The measure of systematic decrease in neck pain in the group of patients who underwent surgery was RP=0.62, compared with RP=0.08 for the patients who completed the rehabilitation treatment [9].

As evidenced by our study, the different professional perspectives on change in the whiplash symptoms had an important impact on the assessment of patient outcome and confirmed the multidimensional complexity of WAD related complaints. This clearly demonstrates that involvement of one independent examiner is not the perfect solution to the outcome assessment problem. Despite inter-expert differences in assessing individual patients, significant differences in outcomes were found between the two treatments groups in favor of better outcomes in patients treated with cervical fusion surgery. Corresponding evidences were found regarding the patient-reported assessments of change in neck pain [9]. This strengthens our opinion that the particular sub-group of WAD patients identified really suffers from pain of somatic origin.

5 Conclusions

The multi-dimensional complexity of WAD-related complaints was comprehensively demonstrated by the inter-disciplinary disagreements in assessing intra-patients outcomes. Also, significantly strong evidences of inter-expert agreements regarding the superiority of treatment effects in patients who underwent cervical fusion surgery compared with multimodal rehabilitation were found.

6 Implications

These results support our previous conclusion based on the patients’ own assessments regarding neck pain that this well specified subgroup of WAD patients really suffers from pain of somatic origin.

Acknowledgements

We express our sincere thanks to licensed nurse Ann Mörk for arranging the patients’ appointments with the independent examiners as well as their travel.

  1. Authors’ statements

  2. Research funding: We gratefully acknowledge the financial support from the Marianne and Marcus Wallenberg Foundation, the Axel and Margaret Ax:son Johnson Foundation, Volvo and Vägverket.

  3. Conflict of interest: The authors have no conflict of interest.

  4. Informed consent: All patients were given both oral and written information on all parts of the study and gave their written informed consent.

  5. Ethical approval: The study was approved by the Medical Ethics Committee, Örebro, Sweden, no. 368/96. The study is registered in ClinicalTrials.gov (registration no. NCT01994044).

References

[1] Nordin M, Carragee EJ, Hogg-Johnson S, Schecter Weiner S, Hurwitz EL, Peloso PM, Guzman J, van der Velde G, Carroll LJ, Holm LW, Côté P, Cassidy JD, Haldeman S. Assessment of neck pain and its associated disorders. Results of the Bone and Joint Decade 2000–2010 Task Force on Neck Pain and its associated disorders. Spine 2008;33:101–22.10.1097/BRS.0b013e3181644ae8Search in Google Scholar PubMed

[2] Rydevik B, Brodda Jansen G, Edlund C, Grane P, Hildingsson C, Karlberg M, Link H, Måwe U, Portala K, Sterner Y. Diagnosis and early management of whiplash injuries. Stockholm, Sweden: The Swedish society of medicine and the Whiplash Commission Medical Task Force, 2006.Search in Google Scholar

[3] Sterling M. Physical and psychological aspects of whiplash: important considerations for primary care assessment, part 2: case studies. Man Ther 2009;14:e8–12.10.1016/j.math.2008.03.004Search in Google Scholar PubMed

[4] Angst F, Gantenbein AR, Lehmann S, Gysi-Klaus F, Aeschlimann A, Michel BA, Hegemann F. Multidimensional associative factors for improvement in pain, function, and working capacity after rehabilitation of whiplash associated disorder: a prognostic, prospective outcome study. BMC Musculoskelet Disord 2014;15:130.10.1186/1471-2474-15-130Search in Google Scholar PubMed PubMed Central

[5] Styrke J, Sojka P, Björnstig U, Stålnacke B-M. Symptoms, disabilities, and life satisfaction 5 years after whiplash injuries. Scand J Pain 2014;5:229–36.10.1016/j.sjpain.2014.06.001Search in Google Scholar PubMed

[6] Nyström B. Open mechanical provocation under local anesthesia: a definitive method for locating the focus in painful mechanical disorder of the motion segment. Fifth international conference on lumbar fusion and stabilization. Osaka Japan 1991. p. 198.Search in Google Scholar

[7] Nyström B. Segmental lumbar pain. Upsala J Med Sci 1993;Suppl. 52:67.Search in Google Scholar

[8] Nyström B, Weber H, Schillberg B, Taube A. Symptoms and signs possibly indicating segmental, discogenic pain. A fusion study with 18 years of follow-up. Scand J Pain 2017;16:213–20.10.1016/j.sjpain.2016.10.007Search in Google Scholar PubMed

[9] Nyström B, Svensson E, Larsson S, Schillberg B, Mörk A, Taube A. A small group Whiplash-Associated-Disorders (WAD) patients with central neck pain and movement induced stabbing pain, the painful segment determined by mechanical provocation: fusion surgery was superior to multimodal rehabilitation in a randomized trial. Scand J Pain 2016;12:33–42.10.1016/j.sjpain.2016.03.003Search in Google Scholar PubMed

[10] Hägg O, Fritzell P, Odén A, Nordwall A. Simplifying outcome measurement. Evaluation of instruments for measuring outcome after fusion surgery for chronic low back pain. Spine 2002;27:1213–22.10.1097/00007632-200206010-00014Search in Google Scholar PubMed

[11] Carragee EJ, Hurwitz EL, Cheng I, Carroll LJ, Nordin M, Guzman J, Peloso P, Holm LW, Côté P, Hogg-Johnson S, van der Velde G, Cassidy JD, Haldeman S. Treatment of neck pain. Injections and surgical interventions: results of the Bone and Joint Decade 2000–2010 Task Force on Neck Pain and its associated disorders. Spine 2008;33:S153–69.10.1097/BRS.0b013e31816445eaSearch in Google Scholar PubMed

[12] Feinstein AR, Josephy BR, Wells CK. Scientific and clinical problems in indexes of functional disability. Ann Intern Med 1986;105:413–20.10.7326/0003-4819-105-3-413Search in Google Scholar PubMed

[13] Svensson E. Analysis of systematic and random differences between paired ordinal categorical data. Thesis, Göteborg University, Göteborg, 1993.Search in Google Scholar

[14] Svensson E. Construction of a single global scale for multi-item assessments of the same variable. Stat Med 2001;20:3831–46.10.1002/sim.1148Search in Google Scholar PubMed

[15] Svensson E, Schillberg B, Kling AM, Nyström B. Reliability of the Balanced Inventory for Spinal Disorders, a questionnaire for evaluation of outcomes in patients with various spinal disorders. J Spinal Disord Tech 2012;25:196–204.10.1097/BSD.0b013e31821534daSearch in Google Scholar PubMed

[16] Svensson E, Schillberg B, Kling AM, Nyström B. The Balanced Inventory for Spinal Disorders. The validity of a disease specific questionnaire for evaluation of outcomes in patients with various spinal disorders. Spine 2009;34:1976–83.10.1097/BRS.0b013e3181b07d6aSearch in Google Scholar PubMed

[17] Svensson E, Schillberg B, Zhao X, Nyström B. Responsiveness of the balanced inventory for spinal disorders, a questionnaire for evaluation of outcomes in patients with various spinal disorders. J Spine Neurosurg 2015;4:2.10.4172/2325-9701.1000184Search in Google Scholar

[18] ICF Checklist; 2003. www.who.int/classifications/icfchecklist.pdf.Search in Google Scholar

[19] Cieza A, Stucki G. The international classification of functioning disability and health: its development process and content validity. Eur J Phys Rehabil Med 2008;44:303–13.Search in Google Scholar

[20] Stucki G, Kostanjsek N, Ustün B, Cieza A. ICF-based classification and measurement of functioning. Eur J Phys Rehabil Med 2008;44:315–28.Search in Google Scholar

[21] Bjorner JB, Kristensen TS, Orth-Gomér K, Tibblin G, Sullivan M, Westerholm P. Self-rated health a useful concept in research, prevention and clinical medicine. Swedish council for planning and coordination of research. Report 1996:9.Search in Google Scholar

[22] Stevens SS. On the theory of scales of measurement. Science 1946;103:677–80.10.1126/science.103.2684.677Search in Google Scholar PubMed

[23] Stevens SS. On the averaging of data. Science 1955;121:113–6.10.1126/science.121.3135.113Search in Google Scholar PubMed

[24] Siegel S, Castellan NJ. Nonparametric statistics for the behavioral sciences, 2nd ed. New York: McGraw Hill, 1988.Search in Google Scholar

[25] Dybkaer R, Jorgensen K. Measurement, value and scale. Scand J Clin Lab Invest 1989;49(suppl 194):69–76.Search in Google Scholar

[26] Merbitz C, Morris J, Grip JC. Ordinal scales and foundations of misinference. Arch Phys Med Rehabil 1989;70:308–12.Search in Google Scholar

[27] Agresti A. Categorical data analysis. New York: John Wiley and sons, USA, 1990.Search in Google Scholar

[28] Hand DJ. Statistics and the theory of measurement. J R Statist Soc 1996;A159:445–92.10.2307/2983326Search in Google Scholar

[29] McDowell I, Newell C. Measuring health. A guide to rating scales and questionnaires, 2nd ed. Oxford: Oxford University Press, 1996:359–61.Search in Google Scholar

[30] Altman DG. Practical statistics for medical research. London: Chapman and Hall, 1991.10.1201/9780429258589Search in Google Scholar

[31] Svensson E. Guidelines to statistical evaluation of data from rating scales and questionnaires. J Rehab Med 2001;33:47–8.10.1080/165019701300006542Search in Google Scholar PubMed

[32] Newcombe RG, Altman DG. Proportions and their differences. In: Altman DG, Machin D, Bryant TN, Gardner MJ, editors. Statistics with Confidence. 2nd ed. Bristol: BMJ Books, 2000:45–56.Search in Google Scholar

[33] Svensson E, Holm S. Separation of systematic and random differences in ordinal rating scales. Stat Med 1994;13:2437–53.10.1002/sim.4780132308Search in Google Scholar PubMed

[34] Svensson E, Starmark J-E, Ekholm S, von Essen C, Johansson A. Analysis of inter-observer disagreement in the assessment of subarachnoid blood and acute hydrocephalus on CT scans. Neurol Res 1996;18:487–94.10.1080/01616412.1996.11740459Search in Google Scholar PubMed

[35] Ledenius K, Svensson E, Stålhammar F, Wiklund L-M, Thilander-Klang A. A method to analyse observer disagreement in visual grading studies: example of assessed image quality in paediatric cerebral multidetector CT images. Br J Radiol 2010;83:604–11.10.1259/bjr/26723788Search in Google Scholar PubMed PubMed Central

[36] Svensson E. Recent developments in analysis of paired ordinal data. Discussion of the paper Ivy Liu and Alan Agresti: the analysis of ordered categorical data: an overview and a survey of recent development. TEST 2005;14:1–73, 44–6.10.1007/BF02595397Search in Google Scholar

[37] Svensson E. Agreement. In: Everitt BS, Palmer CR, editors. Encyclopaedic Companion to Medical Statistics. 2nd ed. Chichester: John Wiley & Sons, 2010:10–2.Search in Google Scholar

[38] Avdic A, Svensson E. Svensson’s method 1.1 ed. Örebro 2010 Interactive software supporting Svensson’s method. Available at: http://avdic.se/svenssonsmetod.html. Accessed: 14 Aug 2015.Search in Google Scholar

[39] Clement RC, Welander A, Stowell C, Cha TD, Chen JL, Davies M, Fairbank JC, Foley KT, Gehrchen M, Hägg O, Jacobs WC, Kahler R, Khan SN, Lieberman IH, Morisson B, Ohnmeiss DD, Peul WC, Shonnard NH, Smuck MW, Solberg TK, et al. A proposed set of metrics for standardized outcome reporting in the management of low back pain. Acta Orthop 2015;86:523–33.10.3109/17453674.2015.1036696Search in Google Scholar PubMed PubMed Central

[40] Mannion AF, Brox J-I, Fairbank JC. Consensus at last! Long-term results of all randomized controlled trials show that fusion is no better than non-operative care in improving pain and disability in chronic low back pain. Spine J 2016;16:588–90.10.1016/j.spinee.2015.12.001Search in Google Scholar PubMed

Received: 2017-12-11
Accepted: 2018-02-01
Published Online: 2018-02-22
Published in Print: 2018-04-25

©2018 Scandinavian Association for the Study of Pain. Published by Walter de Gruyter GmbH, Berlin/Boston. All rights reserved.

Articles in the same Issue

  1. Frontmatter
  2. Topical review
  3. Reducing risk of spinal haematoma from spinal and epidural pain procedures
  4. Clinical pain research
  5. A multiple-dose double-blind randomized study to evaluate the safety, pharmacokinetics, pharmacodynamics and analgesic efficacy of the TRPV1 antagonist JNJ-39439335 (mavatrep)
  6. Reliability of three linguistically and culturally validated pain assessment tools for sedated ICU patients by ICU nurses in Finland
  7. Superior outcomes following cervical fusion vs. multimodal rehabilitation in a subgroup of randomized Whiplash-Associated-Disorders (WAD) patients indicating somatic pain origin-Comparison of outcome assessments made by four examiners from different disciplines
  8. Morning cortisol and fasting glucose are elevated in women with chronic widespread pain independent of comorbid restless legs syndrome
  9. Chronic pain experience and pain management in persons with spinal cord injury in Nepal
  10. The Standardised Mensendieck Test as a tool for evaluation of movement quality in patients with nonspecific chronic low back pain
  11. Exploring effect of pain education on chronic pain patients’ expectation of recovery and pain intensity
  12. Pain, psychological distress and motor pattern in women with provoked vestibulodynia (PVD) – symptom characteristics and therapy suggestions
  13. Relative and absolute test-retest reliabilities of pressure pain threshold in patients with knee osteoarthritis
  14. The influence of pre- and perioperative administration of gabapentin on pain 3–4 years after total knee arthroplasty
  15. Observational study
  16. CT guided neurolytic blockade of the coeliac plexus in patients with advanced and intractably painful pancreatic cancer
  17. Prescription of opioids to post-operative orthopaedic patients at time of discharge from hospital: a prospective observational study
  18. The psychological features of patellofemoral pain: a cross-sectional study
  19. Prevalence of self-reported musculoskeletal pain symptoms among school-age adolescents: age and sex differences
  20. The association between back muscle characteristics and pressure pain sensitivity in low back pain patients
  21. Postural control in subclinical neck pain: a comparative study on the effect of pain and measurement procedures
  22. Original experimental
  23. Exercise-induced hypoalgesia in women with varying levels of menstrual pain
  24. Exercise does not produce hypoalgesia when performed immediately after a painful stimulus
  25. Effectiveness of neck stabilisation and dynamic exercises on pain intensity, depression and anxiety among patients with non-specific neck pain: a randomised controlled trial
Downloaded on 17.9.2025 from https://www.degruyterbrill.com/document/doi/10.1515/sjpain-2017-0180/html
Scroll to top button