Using R for Nonparametric Analysis, The Kruskal-Wallis Test Part Four: R Script and Some Notes on IDE’s

 

Using R for Nonparametric Analysis, The Kruskal-Wallis Test: R Script and Some Notes on IDE’s

 

A tutorial by Douglas M. Wiig

 

In the previous three parts of this tutorial I discussed using R to enter a data set and perform a nonparametric Kruska-Wallis test for ranked means. In this final part the commented script that was used in the first three parts is listed. 

 

If you are going to use R for the majority of your statistical analysis it is highly advisable that you investigate some of the IDE’s (Integrated Development Environments) that are available to assist in coding and debugging R script and creating R packages for personal use or distribution. I think one of the easiest to use is R Studio. R Studio is available in both free open source and commercial versions and can be downloaded at http://www.rstudio.com    There are versions available for Windows, various Linux distributions, and Mac OS.

The R studio console provides a number of useful tools that facilitate coding. The screen is divided into four sections with one section providing a code editor that features syntax highlighting, code completion and many other features such as line or block code execution. Another window contains R and all displays output, error messages and warnings when code is executed from the editor. A third window displays all of the current environmental variables that are active and can also show all currently loaded R packages. A fourth window can show graphic output from executed code, can be used to manage, download and install R packages, and can be used to access the CRAN database of online help. There are other useful tools that are too numerous to discuss here.

 

Another program that is worth looking at is RKWard which combines an IDE with a graphics GUI for R statistical analysis. Information and downloads for RKWard can be found at https://rkward.kde.org. This program is also free and open source and can be run on a Windows platform, Mac OS, or various distributions of Linux. The program has been optimized for Linux. A discussion of these IDE’s is beyond the scope of this posting.

 

Shown below is the commented R script for all three parts of the Kruskal-Wallis tutorial. For ease of reading code portions are shown in bold print.

 

#packages that must be present in the global environment before running these scripts

 

#stats; graphics; grDevices; utils; datasets; methods; base

 

#

 

#code to enter data using the data editor

 

#KW data entry, define file kruskal as a data frame

 

kruskal <-data.frame()

 

#invoke the data editor

 

kruskal <-edit(kruskal)

 

#define group as containing 3 factors; tell R which data column goes with which factor

 

group <- factor(1,2,3)

 

#alternative data entry method

 

#Define factor Group as containing three categories

 

Group <- c(1,1,1,1,1,2,2,2,2,2,3,3,3,3)

 

#create a vector defined as authscore and enter values

 

authscore <- c(96,128,83,61,101,82,121,132,135,109,115,149,166,147)

 

#create data frame kruskal matching each group factor to individual scores

 

kruskal <- data.frame(Group, authscore)

 

#use the following line to look at the structure of the data frame created

 

str(kruskal)

 

#run the basic Kruskal-Wallis test

 

#

 

kruskal.test(authscore ~ Group, data=kruskal)

 

#

 

#the following code is used to conduct a post-hoc comparison of the ranked means

 

#it is useful to first do a simple boxplot for a visual comparison

 

#Use this script to save the boxplot graphic to a .png data file

 

#save output in pdf file authplot

 

#send output to screen and file

 

sink()

 

authplot <- boxplot(authscore ~ Group, data=kruskal, main=”Group Comparison”, ylab=”authscore”)

 

#save .png file

 

png(“authplot.png”)

 

#now return all output to console

 

dev.off()

 

#

 

#make sure that the package PMCMR is loaded before running the following script

 

library(PMCMR)

 

#use the ‘with’ function to pass the data from the kruskal data frame to the post hoc

 

#test script; specify the Tukey HSD method for determining significance of each

 

# pair of comparisions

 

#

 

with(kruskal, {

 

 

posthoc.kruskal.nemenyi.test(authscore, Group, “Tukey”)

 

})

 

#

 

#NOTE: if using a version of R < 3.xx then use the package pgirmess instead of PMCMR

 

#

 

#the following lines show the post hoc analysis using the pgirmess package

 

#note the function kuskalmc is used for the comparisons

 

library(pgirmess)

 

authscore <- c(96,128,83,61,101,82,121,132,135,109,115,149,166,147)

 

Group <- c(1,1,1,1,1,2,2,2,2,2,3,3,3,3)

 

kruskalmc(authscore ~ Group, probs=.05, cont=NULL)

 

#

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.