Tag Archives: r nonparametric

R For Beginners: Some Simple R Code to do Common Statistical Procedures, Part Two

An R tutorial by D. M. Wiig

This posting contains an embedded Word document. To view the document full screen click on the icon in the lower right hand corner of the embedded document.




Using R in Nonparametric Statistical Analysis, The Kruskal-Wallis Test Part Three: Post Hoc Pairwise Multiple Comparison Analysis of Ranked Means

Using the Kruskal-Wallis Test, Part Three:  Post Hoc Pairwise Multiple Comparison Analysis of Ranked Means

A tutorial by Douglas M. Wiig

In previous tutorials I discussed an example of entering data into a data frame and performing a nonparametric Kruskal-Wallis test to determine if there were differences in the authoritarian scores of three different groups of educators. The test statistic indicated that at least one of the groups(group 1) was significantly different from the other two.

In order to explore the difference further it common practice to do post hoc analysis of the differences. There are a number of methods that have been devised to do these comparisons, but one of the most straightforward and easiest to understand is pairwise comparison of ranked means(or means if using standard ANOVA.)

Prior to entering the code for this section be sure that the following packages are installed and loaded:



In part one data was entered into the R editor to create a data frame. Data frames can also be created directly using R script. The script to create the data frame for this example uses the following code:

#create data frame from script input

>Group <- c(1,1,1,1,1,2,2,2,2,2,3,3,3,3)

>authscore <-c(96,128,83,61,101,82,121,132,135,109,115,149,166,147)

>kruskal <- data.frame(Group, authscore)

The group identifiers are entered and assigned to the variable Group, and the authority scores are assigned to the variable authscore. Notice that each identifier is matched with an appropriate authscore just as they were when entered in columns using the data editor. The vectors are then assigned to the variable kruskal to create a data.frame. Once again the structure of the data frame can be checked using the command:


resulting in:

'data.frame':   14 obs. of  2 variables:
 $ Group    : num  1 1 1 1 1 2 2 2 2 2 ...
 $ authscore: num  96 128 83 61 101 82 121 132 135 109 ...


It is often useful to do a visual examination of the ranked means prior to post hoc analysis. This can be easily accomplished using a boxplot to display the 3 groups that are presented in the example. If the data frame created in tutorial one is still in the global environment the boxplot can be generated with the following script:

>#boxplot using authscore and group variables from the data frame created in part one

>boxplot(authscore ~ group, data=kruskal, main=”Group Comparison”, ylab=”authscore”)


The resulting boxplot is seen below:


As can be seen in the plot, authority score differences are the greatest between group 1 and 3 with group 2 In between. Use the following code to run the Kruskal-Wallis test and examine if any of the means are significantly different:


with(kruskal, {

posthoc.kruskal.nemenyi.test(authscore, Group, “Tukey”)


The post hoc test used in this example is from the recently released PMCMR R package. For details of this and other post hoc tests contained in the package( see Thorsten Polert, Calculate Pairwise Multiple Comparisons of Mean Rank Sums, 2015. http://cran.r-project.org/web/packages/PMCMR/PMCMR.pdf.) The test employed here used the Tukey method to make pairwise comparisons of the mean rank authoritarianism scores of the three groups. The output from the script above is:

Pairwise comparisons using Tukey and Kramer (Nemenyi) test

with Tukey-Dist approximation for independent samples

data: authscore and Group

      1                    2

2   0.493             –

3    0.031        0.310

P value adjustment method: none

The output above confirms what would be expected from observing the boxplot. The only means that differ significantly are means 1 and 3 with a p = .031.

The PMCMR package will only work with R versions 3.0.x. If using an earlier version of R another package can be used to accomplish the post hoc comparisons. This package is the pgirmess package (see http://cran.r-project.org/web/packages/pgirmess/pgirmess.pdf for complete details). Using the vectors authscore and Group that were created earlier the script for multiple comparison using the pgirmess package is:


authscore <- c(96,128,83,61,101,82,121,132,135,109,115,149,166,147)

Group <- c(1,1,1,1,1,2,2,2,2,2,3,3,3,3)

kruskalmc(authscore ~ Group, probs=.05, cont=NULL)

and the output from this script using a significance level of p = .05 is:

Multiple comparison test after Kruskal-Wallis

p.value: 0.05


      obs.dif    critical.dif     difference

1-2    3.0        6.333875         FALSE

1-3    7.1        6.718089         TRUE

2-3    4.1        6.718089        FALSE


As noted earlier the comparison between groups one and three is shown to be the only significant difference at the p=.05 level.

Both the PMCMR and the pgirmess packages are useful in producing post hoc comparisons with the Kruskal-Wallis test. It hoped that the series of tutorials discussing nonparametric alternatives common parametric statistical tests has helped demonstrate the utility of these approaches in statistical analysis.

In part four I will post the complete script used in all three tutorials.

Using R for Nonparametric Analysis: The Kruskal-Wallis Test, Part One


Using R for Nonparametric Data Analysis: The Kruskal-Wallis Test

A tutorial by Douglas M. Wiig

Analysis of variance(ANOVA) is a commonly used technique for examining the effect of an independent variable on three or more dependent variables. There are several types of ANOVA ranging from simple one-way ANOVA to the more complex multiple analysis of variance, MANOVA. ANOVA makes several assumptions about the sample data being used such as the assumption of normal distribution of the variables in the parent population, underlying continuous distribution of the variables, and interval or ratio level measurement of all variables. If any of these assumptions cannot be met a researcher can turn to a nonparametric counterpart to ANOVA for the analysis. This tutorial will discuss the use of the Kruskal-Wallis test, the nonparametric counterpart to analysis of variance.

In this tutorial I will explore a simple example and discuss entering the sample data into a data file using the R data editor. I will then discuss setting up the data for analysis and using the Kruskal-Wallis test.

I am going to assume that the reader has a working knowledge of ANOVA with parametric data. Since ANOVA uses sample means and variances as the basis of the statistical test interval or ratio level measurement is necessary to insure valid results in addition to the assumptions indicated above. With the nonparametric Kruskal-Wallis test the only assumptions to be met are ordinal or better measurement and the assumption of an underlying continuous measurement. The example to be used here is taken from a book on nonparametric statistics by Sidney Seigel.(Sidney Seigel, Nonparametric Statistics for the Behavioral Sciences, New York: McGraw-Hill, 1956, pp-184-196).

A researcher wishes to test the hypothesis that school administrators are typically more authoritarian than classroom teachers. He also believes that many classroom teachers are adminstration-oriented in their professional aspirations which may, in turn, have an effect on their authoritarianism. 14 subjects are selected and divided into three groups: teaching-oriented teachers (classroom teachers who wish to remain in a teaching position), administration-oriented teachers (classroom teachers who aspire to become administrators), and practicing administrators.(Seigel, p. 186). The level of authoritarianism of each subject is measured through a survey that assigns an authoritarianism score that is considered to be at least ordinal in nature. Higher scores indicate higher levels of authoritarianism. (Siegel, p. 186). The null hypothesis is that there is no difference in mean authoritarianism scores among the three groups. The alternative hypothesis is that the mean authoritarianism scores among the three groups are different. The alpha level for rejecting the null hypothesis is p = .05. (Seigel, p. 186).

Since we make no assumption about a normal distribution of scores, have a small sample size of n = 14, and ordinal measure we will use the nonparametric test which is based on median scores and ranks rather than means and variances as used in parametric ANOVA. The mathematical details of how this is done is beyond the scope of this tutorial. See Seigel, p. 187-189 for details. The authoritarian scores for the three groups are shown below:

Authoritarianism Scores of Three Groups of Educators

Teacher-Oriented        Admin-oriented    Administrators

teachers   n=5                teachers   n=5                n=4


96                                  82                               115

128                              124                               149

83                               132                               166

61                               123                               147

101                             109


(Seigel, p. 187)

The first task is to create an R data frame with the scores from the table. We will enter the scores using the R data editor. We will name the data frame ‘kruskal.’   Invoke the editor using the following commands:

  > kruskal <- data.frame()

   > kruskal <- edit(kruskal)

You should see the data entry editor open in a separate window. In order to process the data properly it needs to be entered into two columns. The first column will be the factors (which group the scores belong to), and the second column will contain the actual scores. Label column 1 ‘Group’ and column 2 ‘authscore.’ When the data are entered your editor should look like this:


Group  authscore

1    1               96

2    1            128

3    1             83

4    1            61

5    1           101

6    2            82

7    2           121

8    2          132

9    2          135

10   2       109

11   3       115

12   3       149

13   3       166

14   3       147


Make sure that each column of numbers is of the data type “Real.” l Close the data editor by clicking ‘Quit’ and the data will be saved in the working directory for access. To see what has been entered in the data editor use the command:

> kruskal

Group authscore

1     1             96

2     1            128

3     1             83

4     1            61

5     1           101

6     2            82

7     2           121

8     2            132

9     2            135

10     2         109

11     3         115

12     3         149

13     3         166


You should see the output as above. If you need to make changes simple invoke the editor with:

> kruskal <-edit(kruskal)

The editor will open and you can make any changes you need to. Be sure to click on ‘Quit’ to save the changes to the working directory.

Part Two will continue the analysis