A tutorial by D.M. Wiig

As discussed in a previous tutorial one of the most common methods displaying and analyzing data is through the use of tables. In this tutorial I will discuss setting up a basic table using R and exploring the use of the *assocstat*s function to generate several commonly used nonparametric measures of association. The *assocstats* function will generate the association measures of the Phi-coefficient, the Contingency Coefficient and Cramer’s V, in addition to the Likelihood Ratio and Pearson’s Chi-Squared for independence. Cramer’s V and the Contigency Coefficient are commonly applied to r x c tables while the Phi-coefficient is used in the case of dichotomous variables in a 2 x 2 table.

To illustrate the use of* assocstats* I will use hypthetical data exploring the relationship between level of education and average annual income. Education will be measured using the nominal categories “High School”, “College”, and “Graduate”. Average annual income will be measured using ordinal categories and expressed in thousands:

“< 25”; “25-50”; “51-100” and “>100”

Frequency counts of individuals that fall into each category are numeric.

In the first example a 4 x 3 table created with hypothetical frequencies as shown below:

Income Education

(thousands) High School College Graduate

<25 15 8 5

26-50 12 12 8

51-100 10 22 25

>100 5 10 32

The first table, table1, is entered into R as a data frame using the following commands:

**#create 4 x 3 data frame**

**#enter table1 in frequency form**

**table1 <- data.frame(expand.grid(income=c(“<25″,”25-50″,”51-100″,”>100″), education=c(“HS”,”College”, “Graduate”)),count=c(15,12,10,5,8,12,22,10,5,8,25,32))**

Check to make sure the data are in the right row and column categories. Notice that the data are entered in the ‘count’ list by columns.

**> table1**

** income education count**

**1 <25 HS 15**

**2 25-50 HS 12**

**3 51-100 HS 10**

**4 >100 HS 5**

**5 <25 College 8**

**6 25-50 College 12**

**7 51-100 College 22**

**8 >100 College 10**

**9 <25 Graduate 5**

**10 25-50 Graduate 8**

**11 51-100 Graduate 25**

**12 >100 Graduate 32**

**>**

If the stable structure looks correct generate the table, tab1, using the *xtab*s function:

**> #create table tab1 from data.frame**

**> tab1 <- xtabs(count ~income + education, data=table1)**

Show the table using the command:

**>tab1**

** education**

**income HS College Graduate**

** <25 15 8 5**

** 25-50 12 12 8**

** 51-100 10 22 25**

** >100 5 10 32**

**>**

Use the *assocstats* function to generate measures of association for the table. Make sure that you have loaded the *vcd* package and the *vcdExtras* packages. Run assocstats with the following commands:

**> assocstats(tab1)**

** X^2 df P(> X^2)**

**Likelihood Ratio 31.949 6 1.6689e-05**

**Pearson 32.279 6 1.4426e-05**

**Phi-Coefficient : 0.444 **

**Contingency Coeff.: 0.406 **

**Cramer’s V : 0.314 **

**>**

The measures show an association between the two variables. My intent is not to provide an analysis of how to evaluate each of the measures. There are excellent sources of documention on each measure of association in the R CRAN Literature. Since the Phi-coefficient is designed primarily to measure association between dichotomous variables in a 2 x 2 table,collapse the 4 x 3 table using the collapse.table function to get a more accurate Phi-coefficient. Since we want to go from a 4 x 3 to a 2 x 2 table we essentially collapse the table in two stages. The first stage collapses the table to a 2 x 3 table by combining the “<25” with the “25-50” and the “51-100” with the “>100” categories of income.

The resulting 2 x 3 table is seen below:

Education

Income High School College Graduate

<50 27 20 13

>50 15 32 57

To collapse the table use the R function *collapse.table* to combine the “<25” and “26-50” categories and the “50-100” and “>100” categories as discussed above:

**> #collapse table tab1 to a 2 x 3 table, table2**

**> table2 <-collapse.table(tab1, income=c(“<50″,”<50″,”>50″,”>50″))**

View the resulting table, table2, with:

**> table2**

** education**

**income HS College Graduate**

** <50 27 20 13**

** >50 15 32 57**

**>**

Now collapse the table to a 2 x 2 table by combining the “College” and “Graduate” columns:

**> #collapse 2 x 3 table2 to a 2 x2 table, table3**

**> table3 <-collapse.table(table2, education=c(“HS”,”College”,”College”))**

View the resulting table, table3, with:

**> table3**

** education**

**income HS College**

** <25 27 33**

** >100 15 89**

**>**

Use the *assocstats* function to evaluated the 2 x 2 table:

> **#use assocstats on the 2 x 2 table, table3**

**> assocstats(table3)**

** X^2 df P(> X^2)**

**Likelihood Ratio 18.220 1 1.9684e-05**

**Pearson 18.673 1 1.5519e-05**

**Phi-Coefficient : 0.337 **

**Contingency Coeff.: 0.32 **

**Cramer’s V : 0.337 **

**>**

There are many other table manipulation function available in the R *vcd* and *vcdExtra*s packages and well as other packages to provide analysis of nonparametric data. This series of tutorials hopefully serves to illustrate some of the more basic and common table functions using these packages. The next tutorial looks at the use of the *ca* function to perform and graph the results of a basic Correspondence Analysis.