Using R in Nonparametric Statistics: Basic Table Analysis, Part One

Using R in Nonparametric Statistics: Basic Table Analysis, Part One

A Tutorial by D.M. Wiig
One of the most common methods displaying and analyzing data is through the use of tables. In this tutorial I will discuss setting up a basic table using R and performing an initial Chi-Square test on the table. R has an extensive set of tools for manipulating data in the form of a matrix, table, or data frame. The package ‘vcd’ is specifically designed to provide tools for table analysis. Before beginning this tutorial open an R session in your terminal window. You can install the vcd package using the following command:

>install.packages()

Depending on your R installation you may be asked to designate a CRAN reflector to download from or you may see a list of available packages in your default CRAN mirror. Select the package ‘vcd’ and download it. I might add at this point that if you are running the newest release of R, R-3.0.x you will have to reload a number of dependencies that will not work under the latest version of R. Any time you are installing a package and see the ‘non-zero exit status’ error message look the dialog over to see which packages have to be reinstalled to work with the newest version of R. If you are using R-2.xx.x the vcd package will install without any other re-installations.

In social science research we often use data that is nominal or ordinal in nature. Data is displayed in categories with associated frequency counts. In this tutorial I will use a set of hypothetical data that examines the relationship between income and political party identification among a group of registered voters. The variable “income” will be considered ordinal in nature and consists of categories of income in thousands as follows:

“< 25”; “25-50”; “51-100” and “>100”

Political party identification is nominal in nature with the following categories:

“Dem”, “Rep”, “Indep”

Frequency counts of individuals that fall into each category are numeric. In the first example we will create a table by entering the data as a data frame and displaying the results. When using this method it is a good idea to set up the table on paper before entering the data into R. This will help to make sure that all cases and factors are entered correctly. The table I want to generate will look like this:

party
income                 Dem Rep Indep
<25                             15    5      10
26-50                        20   15    15
51-100                     10   20    10
>100                            5    30    10

To enter the above into a data frame use the following on the command line:

> partydata <- data.frame(expand.grid(income=c(“<25″,”25-50″,”51-100″,”>100″), party=c(“Dem”,”Rep”, “Indep”)),count=c(15,20,10,5,5,15,20,30,10,15,10,10))
>

Make sure the syntax is exactly as shown and make sure the entire script is on the same line or has done an automatic return to the next line in your R console. When the command runs without error you can view the data by entering:

> partydata

The following output is produced:

> partydata
income                    party         count
1 <25                         Dem            15
2 25-50                    Dem            20
3 51-100                 Dem           10
4 >100                      Dem             5
5 <25                         Rep               5
6 25-50                    Rep              15
7 51-100                 Rep              20
8 >100                      Rep              30
9 <25                         Indep          10
10 25-50                 Indep          15
11 51-100              Indep          10
12 >100                   Indep          10
>

At this point the data is in frequency rather that table or matrix form. To view a summary of information about the data use the command:

>str(partydata)

You will see:

> str(partydata)
‘data.frame’: 12 obs. of 3 variables:
$ income: Factor w/ 4 levels “<25″,”26-50”,..: 1 2 3 4 1 2 3 4 1 2 …
$ party : Factor w/ 3 levels “Dem”,”Rep”,”Indep”: 1 1 1 1 2 2 2 2 3 3 …
$ count : num 15 20 10 5 5 15 20 30 10 15 …

To convert the data into tabular format use the command xtabs to perform a cross tabulation. I have named the resulting table “tabs”:

>tabs <- xtabs(count ~income + party, data=partydata)

To view the resulting table use:

> tabs
                                                        party
income                              Dem Rep Indep
<25                                        15      5        10
26-50                                   20      15      15
51-100                                10      20      10
>100                                       5       30      10
>

This produces a table in the desired format. To do a quick analysis of the table that produces a Chi-square statistic use the command:

> summary(tabs)

The output is

> summary(tabs)
Call: xtabs(formula = count ~ income + party, data = partydata)
Number of cases in table: 165
Number of factors: 2
Test for independence of all factors:
Chisq = 25.556, df = 6, p-value = 0.0002693
>

In future tutorials I will discuss many of the other resources that are available with the vcd package for manipulating and analyzing data in a tabular format.

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s