Using R in Nonparametric Statistics: Basic Table Analysis, Part One

June 20, 2014 dmwiig Leave a comment

Using R in Nonparametric Statistics: Basic Table Analysis, Part One

A Tutorial by D.M. Wiig
One of the most common methods displaying and analyzing data is through the use of tables. In this tutorial I will discuss setting up a basic table using R and performing an initial Chi-Square test on the table. R has an extensive set of tools for manipulating data in the form of a matrix, table, or data frame. The package ‘vcd’ is specifically designed to provide tools for table analysis. Before beginning this tutorial open an R session in your terminal window. You can install the vcd package using the following command:

>install.packages()

Depending on your R installation you may be asked to designate a CRAN reflector to download from or you may see a list of available packages in your default CRAN mirror. Select the package ‘vcd’ and download it. I might add at this point that if you are running the newest release of R, R-3.0.x you will have to reload a number of dependencies that will not work under the latest version of R. Any time you are installing a package and see the ‘non-zero exit status’ error message look the dialog over to see which packages have to be reinstalled to work with the newest version of R. If you are using R-2.xx.x the vcd package will install without any other re-installations.

In social science research we often use data that is nominal or ordinal in nature. Data is displayed in categories with associated frequency counts. In this tutorial I will use a set of hypothetical data that examines the relationship between income and political party identification among a group of registered voters. The variable “income” will be considered ordinal in nature and consists of categories of income in thousands as follows:

“< 25”; “25-50”; “51-100” and “>100”

Political party identification is nominal in nature with the following categories:

“Dem”, “Rep”, “Indep”

Frequency counts of individuals that fall into each category are numeric. In the first example we will create a table by entering the data as a data frame and displaying the results. When using this method it is a good idea to set up the table on paper before entering the data into R. This will help to make sure that all cases and factors are entered correctly. The table I want to generate will look like this:

party
income Dem Rep Indep
<25 15 5 10
26-50 20 15 15
51-100 10 20 10
>100 5 30 10

To enter the above into a data frame use the following on the command line:

> partydata <- data.frame(expand.grid(income=c(“<25″,”25-50″,”51-100″,”>100″), party=c(“Dem”,”Rep”, “Indep”)),count=c(15,20,10,5,5,15,20,30,10,15,10,10))
>

Make sure the syntax is exactly as shown and make sure the entire script is on the same line or has done an automatic return to the next line in your R console. When the command runs without error you can view the data by entering:

> partydata

The following output is produced:

> partydata
income party count
1 <25 Dem 15
2 25-50 Dem 20
3 51-100 Dem 10
4 >100 Dem 5
5 <25 Rep 5
6 25-50 Rep 15
7 51-100 Rep 20
8 >100 Rep 30
9 <25 Indep 10
10 25-50 Indep 15
11 51-100 Indep 10
12 >100 Indep 10
>

At this point the data is in frequency rather that table or matrix form. To view a summary of information about the data use the command:

>str(partydata)

You will see:

> str(partydata)
‘data.frame’: 12 obs. of 3 variables:
$ income: Factor w/ 4 levels “<25″,”26-50”,..: 1 2 3 4 1 2 3 4 1 2 …
$ party : Factor w/ 3 levels “Dem”,”Rep”,”Indep”: 1 1 1 1 2 2 2 2 3 3 …
$ count : num 15 20 10 5 5 15 20 30 10 15 …

To convert the data into tabular format use the command xtabs to perform a cross tabulation. I have named the resulting table “tabs”:

>tabs <- xtabs(count ~income + party, data=partydata)

To view the resulting table use:

> tabs
party
income Dem Rep Indep
<25 15 5 10
26-50 20 15 15
51-100 10 20 10
>100 5 30 10
>

This produces a table in the desired format. To do a quick analysis of the table that produces a Chi-square statistic use the command:

> summary(tabs)

The output is

> summary(tabs)
Call: xtabs(formula = count ~ income + party, data = partydata)
Number of cases in table: 165
Number of factors: 2
Test for independence of all factors:
Chisq = 25.556, df = 6, p-value = 0.0002693
>

In future tutorials I will discuss many of the other resources that are available with the vcd package for manipulating and analyzing data in a tabular format.

R Tutorials

Using R in Nonparametric Statistics: Basic Table Analysis, Part Three, Using assocstats and collapse.table

June 19, 2014 dmwiig Leave a comment

A tutorial by D.M. Wiig

As discussed in a previous tutorial one of the most common methods displaying and analyzing data is through the use of tables. In this tutorial I will discuss setting up a basic table using R and exploring the use of the assocstats function to generate several commonly used nonparametric measures of association. The assocstats function will generate the association measures of the Phi-coefficient, the Contingency Coefficient and Cramer’s V, in addition to the Likelihood Ratio and Pearson’s Chi-Squared for independence. Cramer’s V and the Contigency Coefficient are commonly applied to r x c tables while the Phi-coefficient is used in the case of dichotomous variables in a 2 x 2 table.

To illustrate the use of assocstats I will use hypthetical data exploring the relationship between level of education and average annual income. Education will be measured using the nominal categories “High School”, “College”, and “Graduate”. Average annual income will be measured using ordinal categories and expressed in thousands:

“< 25”; “25-50”; “51-100” and “>100”

Frequency counts of individuals that fall into each category are numeric.

In the first example a 4 x 3 table created with hypothetical frequencies as shown below:

Income Education
(thousands) High School College Graduate

<25 15 8 5

26-50 12 12 8

51-100 10 22 25

>100 5 10 32
The first table, table1, is entered into R as a data frame using the following commands:

#create 4 x 3 data frame
#enter table1 in frequency form
table1 <- data.frame(expand.grid(income=c(“<25″,”25-50″,”51-100″,”>100″), education=c(“HS”,”College”, “Graduate”)),count=c(15,12,10,5,8,12,22,10,5,8,25,32))

Check to make sure the data are in the right row and column categories. Notice that the data are entered in the ‘count’ list by columns.

> table1
income education count
1 <25 HS 15
2 25-50 HS 12
3 51-100 HS 10
4 >100 HS 5
5 <25 College 8
6 25-50 College 12
7 51-100 College 22
8 >100 College 10
9 <25 Graduate 5
10 25-50 Graduate 8
11 51-100 Graduate 25
12 >100 Graduate 32
>

If the stable structure looks correct generate the table, tab1, using the xtabs function:

> #create table tab1 from data.frame
> tab1 <- xtabs(count ~income + education, data=table1)
Show the table using the command:

>tab1
education
income HS College Graduate
<25 15 8 5
25-50 12 12 8
51-100 10 22 25
>100 5 10 32
>
Use the assocstats function to generate measures of association for the table. Make sure that you have loaded the vcd package and the vcdExtras packages. Run assocstats with the following commands:

> assocstats(tab1)
X^2 df P(> X^2)
Likelihood Ratio 31.949 6 1.6689e-05
Pearson 32.279 6 1.4426e-05

Phi-Coefficient : 0.444
Contingency Coeff.: 0.406
Cramer’s V : 0.314
>

The measures show an association between the two variables. My intent is not to provide an analysis of how to evaluate each of the measures. There are excellent sources of documention on each measure of association in the R CRAN Literature. Since the Phi-coefficient is designed primarily to measure association between dichotomous variables in a 2 x 2 table,collapse the 4 x 3 table using the collapse.table function to get a more accurate Phi-coefficient. Since we want to go from a 4 x 3 to a 2 x 2 table we essentially collapse the table in two stages. The first stage collapses the table to a 2 x 3 table by combining the “<25” with the “25-50” and the “51-100” with the “>100” categories of income.

The resulting 2 x 3 table is seen below:

Education
Income High School College Graduate

<50 27 20 13

>50 15 32 57

To collapse the table use the R function collapse.table to combine the “<25” and “26-50” categories and the “50-100” and “>100” categories as discussed above:

> #collapse table tab1 to a 2 x 3 table, table2
> table2 <-collapse.table(tab1, income=c(“<50″,”<50″,”>50″,”>50″))

View the resulting table, table2, with:

> table2
education
income HS College Graduate
<50 27 20 13
>50 15 32 57
>

Now collapse the table to a 2 x 2 table by combining the “College” and “Graduate” columns:
> #collapse 2 x 3 table2 to a 2 x2 table, table3
> table3 <-collapse.table(table2, education=c(“HS”,”College”,”College”))

View the resulting table, table3, with:

> table3
education
income HS College
<25 27 33
>100 15 89
>

Use the assocstats function to evaluated the 2 x 2 table:

> #use assocstats on the 2 x 2 table, table3
> assocstats(table3)
X^2 df P(> X^2)
Likelihood Ratio 18.220 1 1.9684e-05
Pearson 18.673 1 1.5519e-05

Phi-Coefficient : 0.337
Contingency Coeff.: 0.32
Cramer’s V : 0.337
>

There are many other table manipulation function available in the R vcd and vcdExtras packages and well as other packages to provide analysis of nonparametric data. This series of tutorials hopefully serves to illustrate some of the more basic and common table functions using these packages. The next tutorial looks at the use of the ca function to perform and graph the results of a basic Correspondence Analysis.

Uncategorized

We Are Back!

June 19, 2014 dmwiig Leave a comment

Due to unkown reasons my original server became corrupted and had to be shut down. I am back with a new site and over the next few days with be re-adding most of the content from our original blog. Check back often!

D.M. Wiig

raspberrypianr.net

	Hydra Themes on R for Beginners: Some Simple C…
	Juan Carlos Rubio Po… on Ternary Diagrams Using R: An E…
	Nicholas Beltran on R Video Tutorial: Basic R Code…
	Ellena Field on Using R for Basic Cross Tabula…
	Dynamics Square on Thanks for Visiting This Blog

R Statistics and Programming

Monthly Archives: June 2014

Using R in Nonparametric Statistics: Basic Table Analysis, Part One

Using R in Nonparametric Statistics: Basic Table Analysis, Part Three, Using assocstats and collapse.table

We Are Back!

Resources and Information About R Statistics and Programming

Share this:

Share this:

Share this:

Resources and Information About R Statistics and Programming