A tutorial by Douglas M. Wiig
Part one of the tutorial centered on importing NORC GSS data in STATA or SPSS formats in an R data frame. For illustration I used the GSS2014 survey data set that consists of 2538 cases and 866 variables. If a researcher wishes to generate some simple cross tabulations the R CrossTable function is very useful.
The CrossTable function is part of the gmodels package, so before running scripts in this tutorial make sure you have installed and loaded gmodels from your favorite CRAN mirror site. As discussed in part one of the tutorial load the GSS2014 dataset into the global environment using:
>Dataset <- read.spss(“E:/research/Documents/GSS2014.sav”,
use.value.labels=TRUE, max.value.labels=Inf, to.data.frame=TRUE)
The CrossTable function allows a basic cross tabulation to be performed and includes a large number of options that can be incorporated into the table. The basic structure is as follows:
CrossTable(x, y, digits=3, max.width = 5, expected=FALSE, prop.r=TRUE, prop.c=TRUE,
prop.t=TRUE, prop.chisq=TRUE, chisq = FALSE, fisher=FALSE, mcnemar=FALSE,
resid=FALSE, sresid=FALSE, asresid=FALSE,
format=c(“SAS”,”SPSS”), dnn = NULL, …)
x A vector or a matrix. If y is specified, x must be a vector
y A vector in a matrix or a dataframe
digits Number of digits after the decimal point for cell proportions
max.width In the case of a 1 x n table, the default will be to print the output horizontally.
If the number of columns exceeds max.width, the table will be wrapped for
each successive increment of max.width columns. If you want a single column
vertical table, set max.width to 1
expected If TRUE, chisq will be set to TRUE and expected cell counts from the _2 will be
prop.r If TRUE, row proportions will be included
prop.c If TRUE, column proportions will be included
prop.t If TRUE, table proportions will be included
prop.chisq If TRUE, chi-square contribution of each cell will be included
chisq If TRUE, the results of a chi-square test will be included
fisher If TRUE, the results of a Fisher Exact test will be included
mcnemar If TRUE, the results of a McNemar test will be included
resid If TRUE, residual (Pearson) will be included
sresid If TRUE, standardized residual will be included
asresid If TRUE, adjusted standardized residual will be included
If TRUE, then remove any unused factor levels
format Either SAS (default) or SPSS, depending on the type of output desired.
dnn the names to be given to the dimensions in the result (the dimnames names).
… optional arguments
(Gregory Warnes, maintainer, Package ‘Gmodels’ February, 2015. http://cran.r-project.org/src/contrib/PACKAGES.html)
In this tutorial I will create a table to examine the relationship between income and education using the variables ‘degree’ and ‘income6’ from the GSS dataset. Both are categorical factors. To simplify the resulting table only actual frequencies will be reported and the ‘chisq’ option will be used to generate a chi-squared test. The format used will be set to SPSS. Use the following statement:
>Generate a cross table of frequencies with chisq reported
>CrossTable(Dataset$”incom16″,Dataset$”degree”, chisq=TRUE, format=c(“SPSS”),prop.r=FALSE, prop.c=FALSE, prop.t=FALSE, prop.chisq=FALSE)
In the above code, the row variable is income the appropriate column of the dataset is selected with the ‘Dataset$”incom16” statement. The column variable for the table is education and the appropriate column of the dataset is selected with the ‘Dataset$”degree” statement. The various cell proportions must be set to ‘FALSE’ as they are defaulted to ‘True.’
When you run the above script the table will be generated in SPSS format on the screen. I will not reproduce the table here because of formatting problems of fitting the table into the blog format.
In part three of this turorial I will discuss generating subsets of the GSS data file and using subsets for statistical analyses such as t tests and ANOVA.