Using R for Nonparametric Analysis, The Kruskal-Wallis Test: R Script and Some Notes on IDE’s
A tutorial by Douglas M. Wiig
In the previous three parts of this tutorial I discussed using R to enter a data set and perform a nonparametric Kruska-Wallis test for ranked means. In this final part the commented script that was used in the first three parts is listed.
If you are going to use R for the majority of your statistical analysis it is highly advisable that you investigate some of the IDE’s (Integrated Development Environments) that are available to assist in coding and debugging R script and creating R packages for personal use or distribution. I think one of the easiest to use is R Studio. R Studio is available in both free open source and commercial versions and can be downloaded at http://www.rstudio.com. There are versions available for Windows, various Linux distributions, and Mac OS.
The R studio console provides a number of useful tools that facilitate coding. The screen is divided into four sections with one section providing a code editor that features syntax highlighting, code completion and many other features such as line or block code execution. Another window contains R and all displays output, error messages and warnings when code is executed from the editor. A third window displays all of the current environmental variables that are active and can also show all currently loaded R packages. A fourth window can show graphic output from executed code, can be used to manage, download and install R packages, and can be used to access the CRAN database of online help. There are other useful tools that are too numerous to discuss here.
Another program that is worth looking at is RKWard which combines an IDE with a graphics GUI for R statistical analysis. Information and downloads for RKWard can be found at https://rkward.kde.org. This program is also free and open source and can be run on a Windows platform, Mac OS, or various distributions of Linux. The program has been optimized for Linux. A discussion of these IDE’s is beyond the scope of this posting.
Shown below is the commented R script for all three parts of the Kruskal-Wallis tutorial. For ease of reading code portions are shown in bold print.
#packages that must be present in the global environment before running these scripts
#stats; graphics; grDevices; utils; datasets; methods; base
#
#code to enter data using the data editor
#KW data entry, define file kruskal as a data frame
kruskal <-data.frame()
#invoke the data editor
kruskal <-edit(kruskal)
#define group as containing 3 factors; tell R which data column goes with which factor
group <- factor(1,2,3)
#alternative data entry method
#Define factor Group as containing three categories
Group <- c(1,1,1,1,1,2,2,2,2,2,3,3,3,3)
#create a vector defined as authscore and enter values
authscore <- c(96,128,83,61,101,82,121,132,135,109,115,149,166,147)
#create data frame kruskal matching each group factor to individual scores
kruskal <- data.frame(Group, authscore)
#use the following line to look at the structure of the data frame created
str(kruskal)
#run the basic Kruskal-Wallis test
#
kruskal.test(authscore ~ Group, data=kruskal)
#
#the following code is used to conduct a post-hoc comparison of the ranked means
#it is useful to first do a simple boxplot for a visual comparison
#Use this script to save the boxplot graphic to a .png data file
#save output in pdf file authplot
#send output to screen and file
sink()
authplot <- boxplot(authscore ~ Group, data=kruskal, main=”Group Comparison”, ylab=”authscore”)
#save .png file
png(“authplot.png”)
#now return all output to console
dev.off()
#
#make sure that the package PMCMR is loaded before running the following script
library(PMCMR)
#use the ‘with’ function to pass the data from the kruskal data frame to the post hoc
#test script; specify the Tukey HSD method for determining significance of each
# pair of comparisions
#
with(kruskal, {
posthoc.kruskal.nemenyi.test(authscore, Group, “Tukey”)
})
#
#NOTE: if using a version of R < 3.xx then use the package pgirmess instead of PMCMR
#
#the following lines show the post hoc analysis using the pgirmess package
#note the function kuskalmc is used for the comparisons
library(pgirmess)
authscore <- c(96,128,83,61,101,82,121,132,135,109,115,149,166,147)
Group <- c(1,1,1,1,1,2,2,2,2,2,3,3,3,3)
kruskalmc(authscore ~ Group, probs=.05, cont=NULL)
#
Like this:
Like Loading...