Tag Archives: kruskal wallis test

Using R for Nonparametric Analysis, The Kruskal-Wallis Test Part Four: R Script and Some Notes on IDE’s


 

Using R for Nonparametric Analysis, The Kruskal-Wallis Test: R Script and Some Notes on IDE’s

 

A tutorial by Douglas M. Wiig

 

In the previous three parts of this tutorial I discussed using R to enter a data set and perform a nonparametric Kruska-Wallis test for ranked means. In this final part the commented script that was used in the first three parts is listed. 

 

If you are going to use R for the majority of your statistical analysis it is highly advisable that you investigate some of the IDE’s (Integrated Development Environments) that are available to assist in coding and debugging R script and creating R packages for personal use or distribution. I think one of the easiest to use is R Studio. R Studio is available in both free open source and commercial versions and can be downloaded at http://www.rstudio.com    There are versions available for Windows, various Linux distributions, and Mac OS.

The R studio console provides a number of useful tools that facilitate coding. The screen is divided into four sections with one section providing a code editor that features syntax highlighting, code completion and many other features such as line or block code execution. Another window contains R and all displays output, error messages and warnings when code is executed from the editor. A third window displays all of the current environmental variables that are active and can also show all currently loaded R packages. A fourth window can show graphic output from executed code, can be used to manage, download and install R packages, and can be used to access the CRAN database of online help. There are other useful tools that are too numerous to discuss here.

 

Another program that is worth looking at is RKWard which combines an IDE with a graphics GUI for R statistical analysis. Information and downloads for RKWard can be found at https://rkward.kde.org. This program is also free and open source and can be run on a Windows platform, Mac OS, or various distributions of Linux. The program has been optimized for Linux. A discussion of these IDE’s is beyond the scope of this posting.

 

Shown below is the commented R script for all three parts of the Kruskal-Wallis tutorial. For ease of reading code portions are shown in bold print.

 

#packages that must be present in the global environment before running these scripts

 

#stats; graphics; grDevices; utils; datasets; methods; base

 

#

 

#code to enter data using the data editor

 

#KW data entry, define file kruskal as a data frame

 

kruskal <-data.frame()

 

#invoke the data editor

 

kruskal <-edit(kruskal)

 

#define group as containing 3 factors; tell R which data column goes with which factor

 

group <- factor(1,2,3)

 

#alternative data entry method

 

#Define factor Group as containing three categories

 

Group <- c(1,1,1,1,1,2,2,2,2,2,3,3,3,3)

 

#create a vector defined as authscore and enter values

 

authscore <- c(96,128,83,61,101,82,121,132,135,109,115,149,166,147)

 

#create data frame kruskal matching each group factor to individual scores

 

kruskal <- data.frame(Group, authscore)

 

#use the following line to look at the structure of the data frame created

 

str(kruskal)

 

#run the basic Kruskal-Wallis test

 

#

 

kruskal.test(authscore ~ Group, data=kruskal)

 

#

 

#the following code is used to conduct a post-hoc comparison of the ranked means

 

#it is useful to first do a simple boxplot for a visual comparison

 

#Use this script to save the boxplot graphic to a .png data file

 

#save output in pdf file authplot

 

#send output to screen and file

 

sink()

 

authplot <- boxplot(authscore ~ Group, data=kruskal, main=”Group Comparison”, ylab=”authscore”)

 

#save .png file

 

png(“authplot.png”)

 

#now return all output to console

 

dev.off()

 

#

 

#make sure that the package PMCMR is loaded before running the following script

 

library(PMCMR)

 

#use the ‘with’ function to pass the data from the kruskal data frame to the post hoc

 

#test script; specify the Tukey HSD method for determining significance of each

 

# pair of comparisions

 

#

 

with(kruskal, {

 

 

posthoc.kruskal.nemenyi.test(authscore, Group, “Tukey”)

 

})

 

#

 

#NOTE: if using a version of R < 3.xx then use the package pgirmess instead of PMCMR

 

#

 

#the following lines show the post hoc analysis using the pgirmess package

 

#note the function kuskalmc is used for the comparisons

 

library(pgirmess)

 

authscore <- c(96,128,83,61,101,82,121,132,135,109,115,149,166,147)

 

Group <- c(1,1,1,1,1,2,2,2,2,2,3,3,3,3)

 

kruskalmc(authscore ~ Group, probs=.05, cont=NULL)

 

#

 

Using R for Nonparametric Statistics: The Kruskal-Wallis Test, Part Two


Using R for Nonparametric Statistics:  The Kruskal-Wallis Test, Part Two

A Tutorial by Douglas M. Wiig

Before we can run the Kruskal-Wallis test we need to define which column contains the factors (independent variables) and which contains the authoritarianism scores (dependent variable). Once we define the factor column R will match the correct score to each of the 14 observations.
As set up in the study, ‘Group’ is the factor(independent variable), and ‘authscore’ is the dependent variable. Use the command:

> Group <-factor(1,2,3)

This designates which observation belongs to each group. To make sure the data structure has been set up correctly use the command:

> str(kruskal)
‘data.frame’: 14 obs. of 2 variables:
$ Group : num 1 1 1 1 1 2 2 2 2 2 …
$ authscore: num 96 128 83 61 101 82 124 132 135 109 …
>

The output of this command shows a summary of the structure of the data frame created. We can now run the Kruskal Wallis test with the command:

> kruskal.test(authscore ~ Group, data=kruskal)

The output will be:

Kruskal-Wallis rank sum test

data: authscore by Group
Kruskal-Wallis chi-squared = 6.4057, df = 2, p-value = 0.04065

>

As seen in the above output the analysis of authoritarianism score by group indicates that the probability of differences in scores among the three groups being due to chance alone is less that the .05 alpha level that was set for the study. (pobt < .05). Further post hoc analysis would be necessary to determine the exact nature of the differences among the scores of the three groups. This will be the topic of a future tutorial.

More to come:  Part Three will explore the use of multiple comparison techniques to analyze ranked means