Using the Kruskal-Wallis Test, Part Three: Post Hoc Pairwise Multiple Comparison Analysis of Ranked Means
A tutorial by Douglas M. Wiig
In previous tutorials I discussed an example of entering data into a data frame and performing a nonparametric Kruskal-Wallis test to determine if there were differences in the authoritarian scores of three different groups of educators. The test statistic indicated that at least one of the groups(group 1) was significantly different from the other two.
In order to explore the difference further it common practice to do post hoc analysis of the differences. There are a number of methods that have been devised to do these comparisons, but one of the most straightforward and easiest to understand is pairwise comparison of ranked means(or means if using standard ANOVA.)
Prior to entering the code for this section be sure that the following packages are installed and loaded:
In part one data was entered into the R editor to create a data frame. Data frames can also be created directly using R script. The script to create the data frame for this example uses the following code:
#create data frame from script input
>Group <- c(1,1,1,1,1,2,2,2,2,2,3,3,3,3)
>kruskal <- data.frame(Group, authscore)
The group identifiers are entered and assigned to the variable Group, and the authority scores are assigned to the variable authscore. Notice that each identifier is matched with an appropriate authscore just as they were when entered in columns using the data editor. The vectors are then assigned to the variable kruskal to create a data.frame. Once again the structure of the data frame can be checked using the command:
'data.frame': 14 obs. of 2 variables:
$ Group : num 1 1 1 1 1 2 2 2 2 2 ...
$ authscore: num 96 128 83 61 101 82 121 132 135 109 ...
It is often useful to do a visual examination of the ranked means prior to post hoc analysis. This can be easily accomplished using a boxplot to display the 3 groups that are presented in the example. If the data frame created in tutorial one is still in the global environment the boxplot can be generated with the following script:
>#boxplot using authscore and group variables from the data frame created in part one
>boxplot(authscore ~ group, data=kruskal, main=”Group Comparison”, ylab=”authscore”)
The resulting boxplot is seen below:
As can be seen in the plot, authority score differences are the greatest between group 1 and 3 with group 2 In between. Use the following code to run the Kruskal-Wallis test and examine if any of the means are significantly different:
posthoc.kruskal.nemenyi.test(authscore, Group, “Tukey”)
The post hoc test used in this example is from the recently released PMCMR R package. For details of this and other post hoc tests contained in the package( see Thorsten Polert, Calculate Pairwise Multiple Comparisons of Mean Rank Sums, 2015. http://cran.r-project.org/web/packages/PMCMR/PMCMR.pdf.) The test employed here used the Tukey method to make pairwise comparisons of the mean rank authoritarianism scores of the three groups. The output from the script above is:
Pairwise comparisons using Tukey and Kramer (Nemenyi) test
with Tukey-Dist approximation for independent samples
data: authscore and Group
2 0.493 –
3 0.031 0.310
P value adjustment method: none
The output above confirms what would be expected from observing the boxplot. The only means that differ significantly are means 1 and 3 with a p = .031.
The PMCMR package will only work with R versions 3.0.x. If using an earlier version of R another package can be used to accomplish the post hoc comparisons. This package is the pgirmess package (see http://cran.r-project.org/web/packages/pgirmess/pgirmess.pdf for complete details). Using the vectors authscore and Group that were created earlier the script for multiple comparison using the pgirmess package is:
authscore <- c(96,128,83,61,101,82,121,132,135,109,115,149,166,147)
Group <- c(1,1,1,1,1,2,2,2,2,2,3,3,3,3)
kruskalmc(authscore ~ Group, probs=.05, cont=NULL)
and the output from this script using a significance level of p = .05 is:
Multiple comparison test after Kruskal-Wallis
obs.dif critical.dif difference
1-2 3.0 6.333875 FALSE
1-3 7.1 6.718089 TRUE
2-3 4.1 6.718089 FALSE
As noted earlier the comparison between groups one and three is shown to be the only significant difference at the p=.05 level.
Both the PMCMR and the pgirmess packages are useful in producing post hoc comparisons with the Kruskal-Wallis test. It hoped that the series of tutorials discussing nonparametric alternatives common parametric statistical tests has helped demonstrate the utility of these approaches in statistical analysis.
In part four I will post the complete script used in all three tutorials.