R Tutorial: A Script to Create and Analyze a Simple Data File, Part One

R Tutorial: A Simple Script to Create and Analyze a Data File, Part One

By D.M. Wiig

In this tutorial I will walk you through a simple script that will show you how to create a data file and perform some simple statistical procedures on the file. I will break the code into segments and discuss what each segment does. Before starting this tutorial make sure you have a terminal window open and open R from the command line.

The first task is to create a simple data file. Let’s assume that we have some data from 10 individuals measuring each person’s height and weight. The data is shown below:

Height(inches) Weight(lbs)

72               225

60               128

65               176

75               215

66               145

65               120

70               210

71               176

68               155

77               250

We can enter the data into a data matrix by invoking the data editor and entering the values. Please note that the lines of code preceded by a # are comments and are ignored by R:

#Create a new file and invoke the data editor to enter data

#Create the file Sampledatafile, height and weight of 10 s subjects

Sampledatafile <-data.frame()

Sampledatafile <-edit(Sampledatafile)

You will see a window open that is the R Data Editor. Click on the column heading ‘var1’ and you will see several different data types in the drop down menu. Choose the ‘real’ data type. Follow the same procedure to set the data type for the second column. Enter the data pairs in the columns, with height in the first column and weight in the second column. When the data have been entered click on the var1 heading for column 1 and click ‘Change Name.’ Enter ‘Height’ to label the first column. Follow the same steps to rename the second column ‘Weight.’

Once both columns of data have been entered you can click ‘Quit.’ The datafile ‘Sampledatafile’ is now loaded into memory.

To run so me basic descriptive statistics use the following code:

> #Run descriptives on the data

> summary(Sampledatafile)

The output from this code will be:

  Height                Weight

Min. :60.00          Min. :120.0

1st Qu.:65.25        1st Qu.:147.5

Median :69.00        Median :176.0

Mean :68.90          Mean :180.0

3rd Qu.:71.75        3rd Qu.:213.8

Max. :77.00          Max. :250.0


To view the data file use the following lines of code:

>#print the datafile ‘Sampledatafile’ on the screen

> print(Sampledatafile)

You will see the output:

Height          Weight

1 72             225

2 60             128

3 65             176

4 75             215

5 66             145

6 65             120

7 70             210

8 71             176

9 68             155

10 77            250

In Part Two I will discuss an R script to do a simple correlation and scatter diagram.  Check back later!

Nonparametric Statistical Analysis Using R: The Sign Test

Using R in Nonparametic Statistical Analysis:  The Binomial Sign Test

A tutorial by D.M. Wiig

One of the core competencies that students master in introductory social science statistics is to create a null and alternative hypothesis pair relative to a research question and to use a statistical test to evaluate and make a decision about rejecting or retaining the null hypothesis.  I have found that one of the easiest statistical tests to use when teaching these concepts is the sign test.  This is a very easy test to use and students seem to intuitively grasp the concepts of trials and binomial outcomes as these are easily related to the common and familiar event of ‘flipping a coin.’


While it is possible to use the sign test by looking up probabilities of outcomes in a table of the binomial distribution I have found that using R to perform the analysis is a good way to get them involved in using statistics software to solve the problem.  R has an easy to use sign test routine that is called with the binom.test command.  To illustrate the use of the test consider an experiment where the researcher has randomly assigned 10 individuals to a group and observes them in both a control and experimental condition.  The researcher measures the criterion variable of interest in each condition for each subject and measures the effect on each subject’s behavior using a relative scale of effect.


The researcher at this point is only interested in whether or not the criterion variable has an effect on behavior, so a non-directional hypothesis is used.  The data collected is shown in the following table:


Subject   1     2     3     4     5     6      7      8     9     10


Pre      50   49   37   16   80   42    40    58   31    21

Post.   56   50   30   25   90   44    60    71   32    22


+     +     –     +     +     +      +      +    +      –

The general format for the sign test is as follows:


binom.test(x, n, p =.5, alternative = “two.sided”, “less”, “greater”, conf.level = .95)


where: x = number of successes

n = number of trials

alternative = indicates the alternative hypthesis as directional or nondirectional

conf.level = the confidence level for the returned confidence interval.


In the example as described above we have 8 pluses and 2 minuses.  We will use the “two.sided” option for the alternative hypothesis a probability of success of .50, and a conf.level of .95. The following is entered into R:

Under a nondirectional alternative hypothesis we are testing the probability of obtaining 0, 1, 2, 8, 9, 10 pluses or:

> binom.test(8, 10, p=.5, alternative=”two.sided”, conf.level=.95)

Exact binomial test

data:  8 and 10
number of successes = 8, number of trials = 10,
p-value = 0.1094
alternative hypothesis: true probability of success is not equal to 0.5
95 percent confidence interval:
0.4439045 0.9747893
sample estimates:
probability of success

Under a nondirectional alternative hypothesis we are testing the probability of obtaining 0, 1, 2, 8, 9, 10 pluses or:


p(o, 1, 2, 8, 9, 10 pluses)  = .1094


If we had set an alpha of ά=.05 then we would retain the null hypothesis as p(obt) > .05.  We could not conclude that the experimental criterion has an effect on behavior.  R has many other nonparametric statistical tests that are easy to use from the command line.  These are topics for future tutorials.


More to Follow:

Book Review: Mastering Beaglebone Robotics

Richard Grimmett. Mastering Beaglebone Robotics. Birmingham, UK: Packt Publishing Ltd., 2014. ISBN #978-1-78398-890-7 http://bit.ly/MBbR8907

Book Review by Douglas M. Wiig

With the release of the Raspberry Pi single board computer a new generation of single board multi-platform and multi-use computers has rapidly developed. One of the newer boards to be developed is the Beaglebone Black which is a low cost, multi-functional package that has a number of core functionalities that facilitate building robotic projects. Grimmett’s Mastering Beaglebone Robotics is a very informative and readable guide to the development and implementation of several such projects. The finished projects are sophisticated, functional and educational. They also lend themselves to expansion into even more complex applications if the reader is so inclined.

This book is not intended for beginners with single board computing platforms or robotics but the author does go through the basics of setting up the Beaglebone and installing the necessary software to accommodate the projects in the book. If you are not yet comfortable with installing and configuring hardware and software or working with the Linux command line you should have a basic reference handy as you work through the initial hardware and software setup in chapter one of the book. The author does provide numerous photos and screen shots to help with the process. The author also uses very clear indications of how and what command line actions are used in installing and configuring various programs needed to set up the Beaglebone for the projects in the book.

Once the basic hardware and software are installed and running the author begins a discussion of robotics by taking the reader through a step by step process to create a movable project based on two tank tracks. The chapter covers the basics of using a motor and controller to power the project, the development and use of programs to control the vehicle and the use of voice commands to control the vehicle.

The author provides a detailed description along with numerous photos showing the build as it progresses. In the sections of the chapter where Beaglebone programming is covered the author uses very clear descriptions of the code that make the process easy to follow. Another nice feature of this book as well as other technical books in the Packt library is the availability for download of all of the code used in the book. This is a very handy feature and helps to prevent the frustration of coding errors that are inherent in entering the code from scratch on a keyboard. It also facilitates the debugging phase of the projects.

Once the basic mobile project platform is functional the author devotes two additional chapters to adding sensors of various kinds such as distance object detection, and adding vision and vision processing capabilities. Once again, the author uses numerous detailed photos, screen shots and programming detail in discussing these phases of the project. By the time the reader finishes chapter four of the book a fully functional, programmable movable platform has been developed.

Subsequent chapters of the book are devoted to additional projects that incorporate the basic principles of robotics learned in the initial project. The author discusses building robots that can walk, sail, and use GPS for navigation. There is also a discussion of a project robot that can be submerged and controlled remotely while under water.

The final two chapters of the book detail a quadcopter that is remotely controlled and an autonomous quadcopter that features programmed flight controlled by GPS. I found these chapters particularly interesting as one of my hobbies is flying radio controlled aircraft of various types. These two projects are rather advanced in nature and are more for readers interested in contributing to the development of such projects. In both projects the Beaglebone is used for higher level function such as GPS navigation, path planning and communications. Most of the low level functioning such as controlling the servo motors and other mechanical functions is accomplished by programming and incorporating a separate flight controller board.

As I mentioned earlier, one of the handy features of this book as well as others offered by Packt Publishing is the availability of the computer code used in each chapter of the book. The code used in Mastering Beaglebone Robotics is written in Python and there are files for each of the chapters (with the exception of chapters one and six). This is a useful feature not only for debugging purposes but for those readers who wish to develop other projects or add to the projects detailed in the book.

I found Mastering Beaglebone Robotics to be a good read and a readily usable guide to some of the more complex robotics concepts and construction practices. As indicated earlier this would not be a first book for one starting in either robotics or single-board computing platforms. For the reader with some experience in programming and construction practices the book is an interesting and informative source of information about a rapidly growing field in computer science technology and robotics.


Using R for Basic Cross Tabulation Analysis: Part Three, Using the xtabs Function

Using R to Work with GSS Survey Data Part Three: Using xtabs to Create and Analyze Tables

A tutorial by D. M. Wiig
In Part Two of this series of tutorials I discussed how to find and import a data set from the NORC GSS survey. The focus of that tutorial was on the GSS2010 data set that was imported into the R workspace in SPSS format and then loaded into an R data frame for analysis.

Use the following code to load the data set into an R workspace:

>install.packages(“Hmisc”) #need for file import
>install.packages(“foreign”) #need for file import
>#get spss gss file and put into data frame
>gssdataframe <- spss.get(“/path-to-your-file/GSS2010.sav”, use.value.labels=TRUE)

The xtabs function provides a quick way to generate and view a cross tabulation of two variables and allows the user to specify one or more control variables in the cross tabulation. Using the variables “ partyid” and “polviews” the cross tablulation is generated with:

>#use xtabs to produce a table
>gsstab <- xtabs(~ partyid + polviews, data=gssdataframe)

To view the resulting table use:

>gsstab #show table

To view summary statistics generated use:


This summary shows the number of cases in the table, the number of factors and the Chi-square value for the table.

Variables used in social science research are often interrelated so it is desirable to control for one or more variables in order to further examine the variables of interest. The table created in the gsstab data frame shows the relationship between political ideology and political party affiliation. To look at the relationship by gender use the following:

>#use xtabs to produce a table with a control variable
>gsstab2 <- xtabs(~ partyid + polviews+ sex, data=gssdataframe)

To view the new table use:


To view summary statistics for the table enter:


As noted above xtabs is a quick and powerful function to create N x N tables with or without control variables. In the next tutorial I explore the use of the ca function to produce a basic Correspondence analysis of underlying dimensions in an N x N table.

Using R to Work with GSS Survey Data: Cross Tabulation Tables

Using R to Work with GSS Survey Data: Viewing Datasets and Performing Cross Tabulations

A tutorial by D. M. Wiig

In a previous tutorial I discussed how to import datasets from the NORC General Social Science Survey using R to write the SPSS formatted data to an R data frame. Once the data has been imported into the R working environment it can be viewed and analyzed. There is a wealth of survey research data available at the NORC web site located at www.norc.org. In this tutorial the dataset gss2010.sav will be used. The dataset is available from www3.norc.org/GSS+Website.

From that page click on the “Quick Downloads” link on the right hand side of the page to access the list of available datasets. From the next page choose SPSS to access ‘.sav’ format files and finally “2010” under the heading “GSS 1972-2012 Release 6.” Please note that this is a rather large data file with 2044 observations of 794 variables. Download the file to a directory that you can access from your R console.

As discussed in a previous tutorial the SPSS format file can be loaded into an R data frame. Make sure that the R packages Hmisc and foreign have been installed and loaded before attempting to import the SPSS file. The following code will load the ‘.sav’ file:

>install.packages(“Hmisc”) #need for file import

>install.packages(“foreign”) #need for file import

>#get spss gss file and put into data frame


>gssdataframe <- spss.get(“/path-to-your-file/GSS2010.sav”, use.value.labels=TRUE)

Once the file is read into an R data frame it can be viewed in a spreadsheet like interface by using the command:


Using the arrow keys, the home key, end key, and the page up and page down keys allows navigating and browsing the file.

Survey data such as that found in the GSS file is usually a mixture of data types ranging from ratio level numbers to categorical data. Cross tabulations are often used to explore relationships among variables that are ordinal or categorical in nature. R has a number of functions available for cross tabulations. The Table function is a quick way to generate a cross tabulation table with a number of options available. The following results in a frequency table of the variables “partyid” and “polviews” both of which are measured in categories:

>#use the gssdataframe

>#the variables partyid and polviews are used


>#create a table named ‘gsstable’

>gsstable <- table(partyid, polviews)

>gsstable #print table frequencies

The following output results:

  STRONG DEMOCRAT                   41     105               42       94
  NOT STR DEMOCRAT                  14      62               57      154
  IND,NEAR DEM                      11      47               57      103
  INDEPENDENT                        5      20               33      189
  IND,NEAR REP                       1       4               16       74
  NOT STR REPUBLICAN                 2      10               16       88
  STRONG REPUBLICAN                  0       5                5       22
  OTHER PARTY                        1       5                6       16
  STRONG DEMOCRAT                      22           25                    6
  NOT STR DEMOCRAT                     28           16                    7
  IND,NEAR DEM                         25           11                    5
  INDEPENDENT                          43           32                    9
  IND,NEAR REP                         49           43                    8
  NOT STR REPUBLICAN                   72           72                   13
  STRONG REPUBLICAN                    23          101                   27
  OTHER PARTY                           3           12                    4


There are options available with the Table function that include calculating row and column marginal totals as well a cell percentages. Another quick method to generate tables is with the CrossTable function. The function is contained in the gmodels package and can be used on the table generated with the Table function above. Use the following lines of code to generate a cross table between ‘polviews’ and ‘partyid’ using the gsstable created above:


>#produce basic crosstabs



Cell Contents
|                   Count |
| Chi-square contribution |

Total Observations in Table:  1961 

                   | polviews 
           partyid |    EXTREMELY LIBERAL  |              LIBERAL  |     SLIGHTLY LIBERAL  |             MODERATE  | SLGHTLY CONSERVATIVE  |         CONSERVATIVE  | EXTRMLY CONSERVATIVE  |            Row Total | 
   STRONG DEMOCRAT |                  41  |                 105  |                  42  |                  94  |                  22  |                  25  |                   6  |                 335  | 
                   |              62.014  |              84.219  |               0.141  |               8.312  |              11.962  |              15.026  |               4.163  |                      | 
  NOT STR DEMOCRAT |                  14  |                  62  |                  57  |                 154  |                  28  |                  16  |                   7  |                 338  | 
                   |               0.089  |               6.911  |               7.238  |               5.486  |               6.840  |              26.537  |               3.215  |                      | 
      IND,NEAR DEM |                  11  |                  47  |                  57  |                 103  |                  25  |                  11  |                   5  |                 259  | 
                   |               0.121  |               4.902  |              22.674  |               0.284  |               2.857  |              22.144  |               2.830  |                      | 
       INDEPENDENT |                   5  |                  20  |                  33  |                 189  |                  43  |                  32  |                   9  |                 331  | 
                   |               4.634  |              12.733  |               0.969  |              32.889  |               0.067  |               8.107  |               1.409  |                      | 
      IND,NEAR REP |                   1  |                   4  |                  16  |                  74  |                  49  |                  43  |                   8  |                 195  | 
                   |               5.592  |              18.279  |               2.167  |               0.002  |              19.466  |               4.622  |               0.003  |                      | 
NOT STR REPUBLICAN |                   2  |                  10  |                  16  |                  88  |                  72  |                  72  |                  13  |                 273  | 
                   |               6.824  |              18.702  |               8.224  |               2.190  |              33.411  |              18.786  |               0.364  |                      | 
 STRONG REPUBLICAN |                   0  |                   5  |                   5  |                  22  |                  23  |                 101  |                  27  |                 183  | 
                   |               6.999  |              15.115  |              12.805  |              32.065  |               0.121  |             177.476  |              52.256  |                      | 
       OTHER PARTY |                   1  |                   5  |                   6  |                  16  |                   3  |                  12  |                   4  |                  47  | 
                   |               0.354  |               0.227  |               0.035  |               0.170  |               1.768  |               2.735  |               2.344  |                      | 
      Column Total |                  75  |                 258  |                 232  |                 740  |                 265  |                 312  |                  79  |                1961  | 

Statistics for All Table Factors

Pearson's Chi-squared test 
Chi^2 =  801.8746     d.f. =  42     p =  3.738705e-141 

       Minimum expected frequency: 1.797552 
Cells with Expected Frequency < 5: 2 of 56 (3.571429%)

Warning message:
In chisq.test(t, correct = FALSE, ...) :
  Chi-squared approximation may be incorrect


This code produces a table of frequencies along with a basic Ch-squared test. Other options include generating cell percentages and using either SPSS or SAS table format. This is accomplished by changing the appropriate flag from FALSE to TRUE and specifying either SPSS or SAS for the format flag. The table formatting is compressed in this example due to the narrow margin requirements of the web page.  Use the scroll bar at the bottom of the page to view the entire table.

There are many functions available in R to analyze data in tabular format. In my next tutorial I will examine using the xtabs function to produce basic cross tabulation with control variables.

R Tutorial: Using R to Work With Datasets From the NORC General Social Science Survey

R Tutorial: Using R to Work With Datasets From the NORC General Social Science Survey

A tutorial by D. M. Wiig

Part One:

When I teach classes in social science statistics and social science research methods I like to use “live” data as much as possible both in classroom lectures and in homework assignments. For the social sciences one excellent and readily available source of live data is the ongoing General Social Science Survey project, The National Data Program for the Sciences. This is a project of NORC, a National Science Research Center at the University of Chicago (see www.norc.org for the projects main web site.)

There a a number of datasets available in different formats. The quick download datasets that I like to use are primarily SPSS data files. Many institutions have SPSS available for students and faculty but the use of SPSS is my no means universal. I have found that it is easy to use R to read the .sav format files into an R data frame and then write the file out to a comma separated value, .csv format that can be read my almost any statistics software package. As I will discuss in this an future tutorials it is also quite effective to use R to analyze the GSS files.

To create R datasets using the GSS files we can use some of the file import/export features available in R. To begin, make sure that the R packages “Hmisc” and “foreign” are installed and loaded in your R session environment. This can be accomplished using:

> install.packages(“Hmisc”) #need for file import

> install.packages(“foreign”) #need for file import

As an example, the following code will load the GSS data file “gss2010x.sav” into an R data frame using the spss.get function:


>gssdataframe <- spss.get(“/path-to-your-file/gss2010x.sav”, use.value.labels=TRUE)

The file “gss2010x.sav” contains 500 observations of 47 variables. Codebooks and other information about the data in these datasets is readily avaiable for download from the NORC web site. After the data is loaded into the data frame it can be viewed using:


To convert and save the file to a comma separated value (.csv) format use the following use the write.table function:

>#write dataframe to .csv file

>write.table(gssdataframe, “/path-to-your-file/gss2010x.csv”,sep=”,”)

The file, now in a .csv format can be accessed with virtually any statistics package or other software. In my next tutorial I will discuss working with GSS data using the various table and cross table functions available in R.


How to Set Up SSH to Remotely Control Your Raspberry Pi

Connecting your RPi to the outside world: how to connect remotely via the web

A tutorial by D.M. Wiig

Once you have your RPi server up and running there are a few things that you may want to consider if
you are planning to host content to be delivered over the web. Assuming you have successfully connected your
RPi to the outside world, you can save some processing over head by disconnecting the monitor, keyboard and mouse.
You can access your RPi via the web from another computer using the ssh function for Linux systems and a program such
as PuTTY for Windows systems.

To use access your RPi remotely you must first make sure that ssh is enabled on your Pi. You can do this by opening
a terminal on the Pi and issuing the command:


$pi@raspberrypi / $ sudo raspi-config

This will invoke the Raspberry Pi Software Configuation Tool. Select option #8, Advanced Options, and then option

A4 SSH Enable/Disable remote command line access to your Pi using SSH

press enter and take the Enable option from the next menu.

If ssh has not been enabled previously reboot your Pi so that the option will be enabled. You should now be able to access your Pi remotely from any computer connected to the web.

If you are using a linux based computer open a terminal program. The following screen shot shows a typical sequence of commands will access your Pi via the web:
doug@doug-Satellite-M55:~$ ssh http://www.raspberrypiandr.net -l pi
pi@www.raspberrypiandr.net’s password:
Linux raspberrypi 3.10.25+ #622 PREEMPT Fri Jan 3 18:41:00 GMT 2014 armv6l

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
Last login: Tue Mar 25 21:08:54 2014 from 71-34-171-204.desm.qwest.net
pi@raspberrypi ~ $
Substitute your URL after the ssh command. After the -l option use whatever you named your Pi when you initially set it up.
You will asked for a password and should then be logged in and see a command line prompt.

If you are using a Windows based computer system you can one of several free programs that will allow you to remotely access your RPi. One of the most popular is called PuTTY. This program can be downloaded from the web site http://www.putty.org. Once you are sure your
remote connect works correctly you can unplug the monitor, keyboard, and mouse from your Pi, leaving just the power connection and whatever type of internet connection you are using. This will help to cut some of the load from your Pi as it functions serving web pages. To end
your remote session simply type ‘logout’ at the command line and you should see something similar to:

pi@raspberrypi ~ $ logout
Connection to http://www.raspberrypiandr.net closed.

You can now close your terminal program until your next remote session.


How to Install PHP-APC On Your Raspberry PI to speed up your Apache Server

How to Install PHP-APC on your RPi to speed up your Apache Server

A Tutorial by D.M. Wiig

In a previous tutorial I discuss how to install a LAMP stack on your RPi and set it up as a web server. I have been hosting this WordPress site on my RPi now for serveral months. It has been running quietly and with low power drain 24/7 with virtually no down time. I have a fair amount of content on the site and it has been averaging around 50 or more visits per day since I got it up and running.

While the RPi is not going to have the speed of a professional grade server it does a credible job in applications where the load is reasonable and spread out evenly over the day. There are numerous tips and tricks to fine tune the Apache server that runs in the LAMP stack. One of the most effective and quickest tuneups is to install the program PHP-APC which is an alternative caching program for PHP files. WordPress serves up its content dynamically using PHP so having an efficient caching system for PHP will speed up serving content.

To install PHP-APC on an RPi running the Debian Wheezy Raspian OS open a terminal screen and issue the following command from the command prompt:

pi@raspberrypi / $ sudo apt-get install php-apc
The application will download and install. When the install is complete locate the file “20-apc.ini” The file should be installed in the /etc/php5/conf.d directory. Open the file in the nano editor with the commands:


pi@raspberrypi / $ cd /etc/php5/conf.d
pi@raspberrypi /etc/php5/conf.d $ sudo nano 20-apc.ini
The file will open in the editor and should see the following lines:
GNU nano 2.2.6 File: 20-apc.ini

apachectl restart

Change the setting apc.sh_size=12M to a larger size such as 20M or so to start. Depending on your needs the size of the cache can be expanded as needed. Press Ctrl-o to save the file and Ctrl-x to exit back to the command prompt. Restart the server with the command:

pi@raspberrypi / $ sudo  apachectl restart

Your Apache server should now be up and running with a larger and faster cache. I do not have any benchmarks to cite, but my WordPress application seems to respond noticeably faster with the apc cache installed. There are many other methods to fine tune your RPi as a web server including installing several alternatives to the Apache server. These are the subject of future tutorials.


Using R for Nonparametric Statistical Analysis: Nonparametric Correlation

Using R for Nonparametric Statistical Analysis: Nonparametric Correlation

A Tutorial by D.M. Wiig

In previous tutorials I discussed how the download and install R on a Linux Debian operating system and how to use R to perform Kendall’s Concordance analysis. This tutorial explores some basic R commands to open a built-in dataset, produce a simple scatter plot of the data and perform a nonparametric correlation using Kendall’s and Spearman’s rank order correlations. Before beginning this tutorial open a terminal window and start R.


One of the packages t hat is downloaded with the R distribution is called “datasets.” One of the files in the dataset, USJudgeRatings, contains a data frame that measures lawyer’s rating of 43 state judges on 12 numeric variables. Since the scale used in these ratings is ordinal it is appropriate to use rank order correlation to analyze the data. To examine the data in the USJudgeRatings file use the command sequence:


> data(USJudgeRatings, package=”datasets”)

	> print(USJudgeRatings)

AARONSON,L.H.    5.7  7.9  7.7  7.3  7.1  7.4  7.1  7.1  7.1  7.0  8.3  7.8
ALEXANDER,J.M.   6.8  8.9  8.8  8.5  7.8  8.1  8.0  8.0  7.8  7.9  8.5  8.7
ARMENTANO,A.J.   7.2  8.1  7.8  7.8  7.5  7.6  7.5  7.5  7.3  7.4  7.9  7.8
BERDON,R.I.      6.8  8.8  8.5  8.8  8.3  8.5  8.7  8.7  8.4  8.5  8.8  8.7
BRACKEN,J.J.     7.3  6.4  4.3  6.5  6.0  6.2  5.7  5.7  5.1  5.3  5.5  4.8
BURNS,E.B.       6.2  8.8  8.7  8.5  7.9  8.0  8.1  8.0  8.0  8.0  8.6  8.6
CALLAHAN,R.J.   10.6  9.0  8.9  8.7  8.5  8.5  8.5  8.5  8.6  8.4  9.1  9.0



You will see all 43 cases in the output. To save space here I have just shown a portion of the output. Please note that file names in R are case sensitive so be sure to use capital letters where shown.

The basic R distribution has fairly extensive graphing capabilities. To produce

a simple scatter diagram of the variables PHYS and RTEN that graphs RTEN on the

X axis and PHYS on the Y axis use the following line of code:


	> plot(PHYS~RTEN, log="xy", data=USJudgeRatings)


You should see a scatter plot similar to the one below: (yours will be larger, I reduced this to save space)



                         Scatter plot did not show in this html markup 







We can perform a correlation analysis on the data using either Kendall’s rank order correlation or Spearman’s Rho. For a Kendall correlation make sure the file USJudgeRatings is loaded into memory by using the command:


>data(USJudgeRatings, package=”datasets”)


Now perform the analysis with the command:

> cor(USJudgeRatings[,c(“PHYS”,”RTEN”)], use=”complete.obs”, method=”kendall”)


   	       PHYS      RTEN
	PHYS 1.0000000 0.7659126
	RTEN 0.7659126 1.0000000


As seen above we specify the two variable we want to correlate and indicate that all oberservations are to be used. Running a Spearman’s on the same variables is a matter of changing the “method =” designator:


> cor(USJudgeRatings[,c(“PHYS”,”RTEN”)], use=”complete.obs”, method=”spearman”)


             PHYS      RTEN
	PHYS 1.0000000 0.9031373
	RTEN 0.9031373 1.0000000


To produce a kendall’s correlation matrix of all 12 of the variables use:


> cor(USJudgeRatings[,c("CONT","INTG","DMNR","DILG","CFMG", "DECI",
+                       "ORAL","WRIT","PHYS","RTEN")], use="complete.obs", method="kendall")
             CONT       INTG       DMNR         DILG       CFMG       DECI
CONT  1.000000000 -0.1203440 -0.1162402 -0.001142206 0.09409104 0.05498285
INTG -0.120344017  1.0000000  0.8607446  0.689935415 0.60919580 0.64371783
DMNR -0.116240241  0.8607446  1.0000000  0.662117755 0.60801429 0.63320857
DILG -0.001142206  0.6899354  0.6621178  1.000000000 0.86484298 0.89194190
CFMG  0.094091035  0.6091958  0.6080143  0.864842984 1.00000000 0.91212083
DECI  0.054982854  0.6437178  0.6332086  0.891941895 0.91212083 1.00000000
ORAL -0.027381743  0.7451506  0.7272732  0.859909442 0.82495629 0.83952698
WRIT -0.028474100  0.7187820  0.6942712  0.877775007 0.83497447 0.85064096
PHYS -0.066667371  0.6309756  0.6296740  0.752740177 0.72853135 0.77215650
RTEN -0.021652594  0.8013829  0.7979569  0.822527726 0.76344652 0.80206419
            ORAL       WRIT        PHYS        RTEN
CONT -0.02738174 -0.0284741 -0.06666737 -0.02165259
INTG  0.74515064  0.7187820  0.63097556  0.80138292
DMNR  0.72727320  0.6942712  0.62967404  0.79795687
DILG  0.85990944  0.8777750  0.75274018  0.82252773
CFMG  0.82495629  0.8349745  0.72853135  0.76344652
DECI  0.83952698  0.8506410  0.77215650  0.80206419
ORAL  1.00000000  0.9596834  0.79429138  0.90227331
WRIT  0.95968339  1.0000000  0.77463199  0.85309146
PHYS  0.79429138  0.7746320  1.00000000  0.76591261
RTEN  0.90227331  0.8530915  0.76591261  1.00000000



If the data you are using is measured at the interval or ratio level just change the “method=” designator to “Pearson” to produce a product-moment correlation.



More to Come:





Resources and Information About R Statistics and Programming

%d bloggers like this: