Tag Archives: rstatistics

Using R for Nonparametric Statistical Analysis: Nonparametric Correlation


Using R for Nonparametric Statistical Analysis: Nonparametric Correlation

A Tutorial by D.M. Wiig

In previous tutorials I discussed how the download and install R on a Linux Debian operating system and how to use R to perform Kendall’s Concordance analysis. This tutorial explores some basic R commands to open a built-in dataset, produce a simple scatter plot of the data and perform a nonparametric correlation using Kendall’s and Spearman’s rank order correlations. Before beginning this tutorial open a terminal window and start R.

 

One of the packages t hat is downloaded with the R distribution is called “datasets.” One of the files in the dataset, USJudgeRatings, contains a data frame that measures lawyer’s rating of 43 state judges on 12 numeric variables. Since the scale used in these ratings is ordinal it is appropriate to use rank order correlation to analyze the data. To examine the data in the USJudgeRatings file use the command sequence:

 

> data(USJudgeRatings, package=”datasets”)

	> print(USJudgeRatings)

                CONT INTG DMNR DILG CFMG DECI PREP FAMI ORAL WRIT PHYS RTEN
AARONSON,L.H.    5.7  7.9  7.7  7.3  7.1  7.4  7.1  7.1  7.1  7.0  8.3  7.8
ALEXANDER,J.M.   6.8  8.9  8.8  8.5  7.8  8.1  8.0  8.0  7.8  7.9  8.5  8.7
ARMENTANO,A.J.   7.2  8.1  7.8  7.8  7.5  7.6  7.5  7.5  7.3  7.4  7.9  7.8
BERDON,R.I.      6.8  8.8  8.5  8.8  8.3  8.5  8.7  8.7  8.4  8.5  8.8  8.7
BRACKEN,J.J.     7.3  6.4  4.3  6.5  6.0  6.2  5.7  5.7  5.1  5.3  5.5  4.8
BURNS,E.B.       6.2  8.8  8.7  8.5  7.9  8.0  8.1  8.0  8.0  8.0  8.6  8.6
CALLAHAN,R.J.   10.6  9.0  8.9  8.7  8.5  8.5  8.5  8.5  8.6  8.4  9.1  9.0

……………

 

You will see all 43 cases in the output. To save space here I have just shown a portion of the output. Please note that file names in R are case sensitive so be sure to use capital letters where shown.

The basic R distribution has fairly extensive graphing capabilities. To produce

a simple scatter diagram of the variables PHYS and RTEN that graphs RTEN on the

X axis and PHYS on the Y axis use the following line of code:

 

	> plot(PHYS~RTEN, log="xy", data=USJudgeRatings)

 

You should see a scatter plot similar to the one below: (yours will be larger, I reduced this to save space)

 

 

                         Scatter plot did not show in this html markup 

 

 

 

 

 

 

We can perform a correlation analysis on the data using either Kendall’s rank order correlation or Spearman’s Rho. For a Kendall correlation make sure the file USJudgeRatings is loaded into memory by using the command:

 

>data(USJudgeRatings, package=”datasets”)

 

Now perform the analysis with the command:

> cor(USJudgeRatings[,c(“PHYS”,”RTEN”)], use=”complete.obs”, method=”kendall”)

 

   	       PHYS      RTEN
	PHYS 1.0000000 0.7659126
	RTEN 0.7659126 1.0000000

 

As seen above we specify the two variable we want to correlate and indicate that all oberservations are to be used. Running a Spearman’s on the same variables is a matter of changing the “method =” designator:

 

> cor(USJudgeRatings[,c(“PHYS”,”RTEN”)], use=”complete.obs”, method=”spearman”)

 

             PHYS      RTEN
	PHYS 1.0000000 0.9031373
	RTEN 0.9031373 1.0000000

 

To produce a kendall’s correlation matrix of all 12 of the variables use:

 

> cor(USJudgeRatings[,c("CONT","INTG","DMNR","DILG","CFMG", "DECI",
+                       "ORAL","WRIT","PHYS","RTEN")], use="complete.obs", method="kendall")
             CONT       INTG       DMNR         DILG       CFMG       DECI
CONT  1.000000000 -0.1203440 -0.1162402 -0.001142206 0.09409104 0.05498285
INTG -0.120344017  1.0000000  0.8607446  0.689935415 0.60919580 0.64371783
DMNR -0.116240241  0.8607446  1.0000000  0.662117755 0.60801429 0.63320857
DILG -0.001142206  0.6899354  0.6621178  1.000000000 0.86484298 0.89194190
CFMG  0.094091035  0.6091958  0.6080143  0.864842984 1.00000000 0.91212083
DECI  0.054982854  0.6437178  0.6332086  0.891941895 0.91212083 1.00000000
ORAL -0.027381743  0.7451506  0.7272732  0.859909442 0.82495629 0.83952698
WRIT -0.028474100  0.7187820  0.6942712  0.877775007 0.83497447 0.85064096
PHYS -0.066667371  0.6309756  0.6296740  0.752740177 0.72853135 0.77215650
RTEN -0.021652594  0.8013829  0.7979569  0.822527726 0.76344652 0.80206419
            ORAL       WRIT        PHYS        RTEN
CONT -0.02738174 -0.0284741 -0.06666737 -0.02165259
INTG  0.74515064  0.7187820  0.63097556  0.80138292
DMNR  0.72727320  0.6942712  0.62967404  0.79795687
DILG  0.85990944  0.8777750  0.75274018  0.82252773
CFMG  0.82495629  0.8349745  0.72853135  0.76344652
DECI  0.83952698  0.8506410  0.77215650  0.80206419
ORAL  1.00000000  0.9596834  0.79429138  0.90227331
WRIT  0.95968339  1.0000000  0.77463199  0.85309146
PHYS  0.79429138  0.7746320  1.00000000  0.76591261
RTEN  0.90227331  0.8530915  0.76591261  1.00000000

>

 

If the data you are using is measured at the interval or ratio level just change the “method=” designator to “Pearson” to produce a product-moment correlation.

 

 

More to Come: