Using R for Nonparametric Statistical Analysis: Nonparametric Correlation
A Tutorial by D.M. Wiig
In previous tutorials I discussed how the download and install R on a Linux Debian operating system and how to use R to perform Kendall’s Concordance analysis. This tutorial explores some basic R commands to open a built-in dataset, produce a simple scatter plot of the data and perform a nonparametric correlation using Kendall’s and Spearman’s rank order correlations. Before beginning this tutorial open a terminal window and start R.
One of the packages t hat is downloaded with the R distribution is called “datasets.” One of the files in the dataset, USJudgeRatings, contains a data frame that measures lawyer’s rating of 43 state judges on 12 numeric variables. Since the scale used in these ratings is ordinal it is appropriate to use rank order correlation to analyze the data. To examine the data in the USJudgeRatings file use the command sequence:
> data(USJudgeRatings, package=”datasets”)
> print(USJudgeRatings) CONT INTG DMNR DILG CFMG DECI PREP FAMI ORAL WRIT PHYS RTEN AARONSON,L.H. 5.7 7.9 7.7 7.3 7.1 7.4 7.1 7.1 7.1 7.0 8.3 7.8 ALEXANDER,J.M. 6.8 8.9 8.8 8.5 7.8 8.1 8.0 8.0 7.8 7.9 8.5 8.7 ARMENTANO,A.J. 7.2 8.1 7.8 7.8 7.5 7.6 7.5 7.5 7.3 7.4 7.9 7.8 BERDON,R.I. 6.8 8.8 8.5 8.8 8.3 8.5 8.7 8.7 8.4 8.5 8.8 8.7 BRACKEN,J.J. 7.3 6.4 4.3 6.5 6.0 6.2 5.7 5.7 5.1 5.3 5.5 4.8 BURNS,E.B. 6.2 8.8 8.7 8.5 7.9 8.0 8.1 8.0 8.0 8.0 8.6 8.6 CALLAHAN,R.J. 10.6 9.0 8.9 8.7 8.5 8.5 8.5 8.5 8.6 8.4 9.1 9.0
……………
You will see all 43 cases in the output. To save space here I have just shown a portion of the output. Please note that file names in R are case sensitive so be sure to use capital letters where shown.
The basic R distribution has fairly extensive graphing capabilities. To produce
a simple scatter diagram of the variables PHYS and RTEN that graphs RTEN on the
X axis and PHYS on the Y axis use the following line of code:
> plot(PHYS~RTEN, log="xy", data=USJudgeRatings)
You should see a scatter plot similar to the one below: (yours will be larger, I reduced this to save space)
Scatter plot did not show in this html markup
We can perform a correlation analysis on the data using either Kendall’s rank order correlation or Spearman’s Rho. For a Kendall correlation make sure the file USJudgeRatings is loaded into memory by using the command:
>data(USJudgeRatings, package=”datasets”)
Now perform the analysis with the command:
> cor(USJudgeRatings[,c(“PHYS”,”RTEN”)], use=”complete.obs”, method=”kendall”)
PHYS RTEN PHYS 1.0000000 0.7659126 RTEN 0.7659126 1.0000000
As seen above we specify the two variable we want to correlate and indicate that all oberservations are to be used. Running a Spearman’s on the same variables is a matter of changing the “method =” designator:
> cor(USJudgeRatings[,c(“PHYS”,”RTEN”)], use=”complete.obs”, method=”spearman”)
PHYS RTEN PHYS 1.0000000 0.9031373 RTEN 0.9031373 1.0000000
To produce a kendall’s correlation matrix of all 12 of the variables use:
> cor(USJudgeRatings[,c("CONT","INTG","DMNR","DILG","CFMG", "DECI", + "ORAL","WRIT","PHYS","RTEN")], use="complete.obs", method="kendall") CONT INTG DMNR DILG CFMG DECI CONT 1.000000000 -0.1203440 -0.1162402 -0.001142206 0.09409104 0.05498285 INTG -0.120344017 1.0000000 0.8607446 0.689935415 0.60919580 0.64371783 DMNR -0.116240241 0.8607446 1.0000000 0.662117755 0.60801429 0.63320857 DILG -0.001142206 0.6899354 0.6621178 1.000000000 0.86484298 0.89194190 CFMG 0.094091035 0.6091958 0.6080143 0.864842984 1.00000000 0.91212083 DECI 0.054982854 0.6437178 0.6332086 0.891941895 0.91212083 1.00000000 ORAL -0.027381743 0.7451506 0.7272732 0.859909442 0.82495629 0.83952698 WRIT -0.028474100 0.7187820 0.6942712 0.877775007 0.83497447 0.85064096 PHYS -0.066667371 0.6309756 0.6296740 0.752740177 0.72853135 0.77215650 RTEN -0.021652594 0.8013829 0.7979569 0.822527726 0.76344652 0.80206419 ORAL WRIT PHYS RTEN CONT -0.02738174 -0.0284741 -0.06666737 -0.02165259 INTG 0.74515064 0.7187820 0.63097556 0.80138292 DMNR 0.72727320 0.6942712 0.62967404 0.79795687 DILG 0.85990944 0.8777750 0.75274018 0.82252773 CFMG 0.82495629 0.8349745 0.72853135 0.76344652 DECI 0.83952698 0.8506410 0.77215650 0.80206419 ORAL 1.00000000 0.9596834 0.79429138 0.90227331 WRIT 0.95968339 1.0000000 0.77463199 0.85309146 PHYS 0.79429138 0.7746320 1.00000000 0.76591261 RTEN 0.90227331 0.8530915 0.76591261 1.00000000 |
|
|
|
If the data you are using is measured at the interval or ratio level just change the “method=” designator to “Pearson” to produce a product-moment correlation.
More to Come: