Using R in Nonparametric Statistics: Basic Table Analysis, Part One
A Tutorial by D.M. Wiig
One of the most common methods displaying and analyzing data is through the use of tables. In this tutorial I will discuss setting up a basic table using R and performing an initial Chi-Square test on the table. R has an extensive set of tools for manipulating data in the form of a matrix, table, or data frame. The package ‘vcd’ is specifically designed to provide tools for table analysis. Before beginning this tutorial open an R session in your terminal window. You can install the vcd package using the following command:
>install.packages()
Depending on your R installation you may be asked to designate a CRAN reflector to download from or you may see a list of available packages in your default CRAN mirror. Select the package ‘vcd’ and download it. I might add at this point that if you are running the newest release of R, R-3.0.x you will have to reload a number of dependencies that will not work under the latest version of R. Any time you are installing a package and see the ‘non-zero exit status’ error message look the dialog over to see which packages have to be reinstalled to work with the newest version of R. If you are using R-2.xx.x the vcd package will install without any other re-installations.
In social science research we often use data that is nominal or ordinal in nature. Data is displayed in categories with associated frequency counts. In this tutorial I will use a set of hypothetical data that examines the relationship between income and political party identification among a group of registered voters. The variable “income” will be considered ordinal in nature and consists of categories of income in thousands as follows:
“< 25”; “25-50”; “51-100” and “>100”
Political party identification is nominal in nature with the following categories:
“Dem”, “Rep”, “Indep”
Frequency counts of individuals that fall into each category are numeric. In the first example we will create a table by entering the data as a data frame and displaying the results. When using this method it is a good idea to set up the table on paper before entering the data into R. This will help to make sure that all cases and factors are entered correctly. The table I want to generate will look like this:
party
income Dem Rep Indep
<25 15 5 10
26-50 20 15 15
51-100 10 20 10
>100 5 30 10
To enter the above into a data frame use the following on the command line:
> partydata <- data.frame(expand.grid(income=c(“<25″,”25-50″,”51-100″,”>100″), party=c(“Dem”,”Rep”, “Indep”)),count=c(15,20,10,5,5,15,20,30,10,15,10,10))
>
Make sure the syntax is exactly as shown and make sure the entire script is on the same line or has done an automatic return to the next line in your R console. When the command runs without error you can view the data by entering:
> partydata
The following output is produced:
> partydata
income party count
1 <25 Dem 15
2 25-50 Dem 20
3 51-100 Dem 10
4 >100 Dem 5
5 <25 Rep 5
6 25-50 Rep 15
7 51-100 Rep 20
8 >100 Rep 30
9 <25 Indep 10
10 25-50 Indep 15
11 51-100 Indep 10
12 >100 Indep 10
>
At this point the data is in frequency rather that table or matrix form. To view a summary of information about the data use the command:
>str(partydata)
You will see:
> str(partydata)
‘data.frame’: 12 obs. of 3 variables:
$ income: Factor w/ 4 levels “<25″,”26-50”,..: 1 2 3 4 1 2 3 4 1 2 …
$ party : Factor w/ 3 levels “Dem”,”Rep”,”Indep”: 1 1 1 1 2 2 2 2 3 3 …
$ count : num 15 20 10 5 5 15 20 30 10 15 …
To convert the data into tabular format use the command xtabs to perform a cross tabulation. I have named the resulting table “tabs”:
>tabs <- xtabs(count ~income + party, data=partydata)
To view the resulting table use:
> tabs
party
income Dem Rep Indep
<25 15 5 10
26-50 20 15 15
51-100 10 20 10
>100 5 30 10
>
This produces a table in the desired format. To do a quick analysis of the table that produces a Chi-square statistic use the command:
> summary(tabs)
The output is
> summary(tabs)
Call: xtabs(formula = count ~ income + party, data = partydata)
Number of cases in table: 165
Number of factors: 2
Test for independence of all factors:
Chisq = 25.556, df = 6, p-value = 0.0002693
>
In future tutorials I will discuss many of the other resources that are available with the vcd package for manipulating and analyzing data in a tabular format.