Using R to Work with GSS Survey Data: Cross Tabulation Tables

Using R to Work with GSS Survey Data: Viewing Datasets and Performing Cross Tabulations

A tutorial by D. M. Wiig

In a previous tutorial I discussed how to import datasets from the NORC General Social Science Survey using R to write the SPSS formatted data to an R data frame. Once the data has been imported into the R working environment it can be viewed and analyzed. There is a wealth of survey research data available at the NORC web site located at www.norc.org. In this tutorial the dataset gss2010.sav will be used. The dataset is available from www3.norc.org/GSS+Website.

From that page click on the “Quick Downloads” link on the right hand side of the page to access the list of available datasets. From the next page choose SPSS to access ‘.sav’ format files and finally “2010” under the heading “GSS 1972-2012 Release 6.” Please note that this is a rather large data file with 2044 observations of 794 variables. Download the file to a directory that you can access from your R console.

As discussed in a previous tutorial the SPSS format file can be loaded into an R data frame. Make sure that the R packages Hmisc and foreign have been installed and loaded before attempting to import the SPSS file. The following code will load the ‘.sav’ file:

>install.packages(“Hmisc”) #need for file import

>install.packages(“foreign”) #need for file import

>#get spss gss file and put into data frame

>library(Hmisc)

>gssdataframe <- spss.get(“/path-to-your-file/GSS2010.sav”, use.value.labels=TRUE)

Once the file is read into an R data frame it can be viewed in a spreadsheet like interface by using the command:

>View(gssdataframe)

Using the arrow keys, the home key, end key, and the page up and page down keys allows navigating and browsing the file.

Survey data such as that found in the GSS file is usually a mixture of data types ranging from ratio level numbers to categorical data. Cross tabulations are often used to explore relationships among variables that are ordinal or categorical in nature. R has a number of functions available for cross tabulations. The Table function is a quick way to generate a cross tabulation table with a number of options available. The following results in a frequency table of the variables “partyid” and “polviews” both of which are measured in categories:

>#use the gssdataframe

>#the variables partyid and polviews are used

>attach(gssdataframe)

>#create a table named ‘gsstable’

>gsstable <- table(partyid, polviews)

>gsstable #print table frequencies

The following output results:

                   polviews
partyid              EXTREMELY LIBERAL LIBERAL SLIGHTLY LIBERAL MODERATE
  STRONG DEMOCRAT                   41     105               42       94
  NOT STR DEMOCRAT                  14      62               57      154
  IND,NEAR DEM                      11      47               57      103
  INDEPENDENT                        5      20               33      189
  IND,NEAR REP                       1       4               16       74
  NOT STR REPUBLICAN                 2      10               16       88
  STRONG REPUBLICAN                  0       5                5       22
  OTHER PARTY                        1       5                6       16
                    polviews
partyid              SLGHTLY CONSERVATIVE CONSERVATIVE EXTRMLY CONSERVATIVE
  STRONG DEMOCRAT                      22           25                    6
  NOT STR DEMOCRAT                     28           16                    7
  IND,NEAR DEM                         25           11                    5
  INDEPENDENT                          43           32                    9
  IND,NEAR REP                         49           43                    8
  NOT STR REPUBLICAN                   72           72                   13
  STRONG REPUBLICAN                    23          101                   27
  OTHER PARTY                           3           12                    4

>

There are options available with the Table function that include calculating row and column marginal totals as well a cell percentages. Another quick method to generate tables is with the CrossTable function. The function is contained in the gmodels package and can be used on the table generated with the Table function above. Use the following lines of code to generate a cross table between ‘polviews’ and ‘partyid’ using the gsstable created above:

>library(gmodels)

>#produce basic crosstabs

>CrossTable(gsstable,prop.t=FALSE,prop.r=FALSE,prop.c=FALSE,chisq=TRUE,format=c(“SPSS”))

>

Cell Contents
|-------------------------|
|                   Count |
| Chi-square contribution |
|-------------------------|

Total Observations in Table:  1961 

                   | polviews 
           partyid |    EXTREMELY LIBERAL  |              LIBERAL  |     SLIGHTLY LIBERAL  |             MODERATE  | SLGHTLY CONSERVATIVE  |         CONSERVATIVE  | EXTRMLY CONSERVATIVE  |            Row Total | 
-------------------|----------------------|----------------------|----------------------|----------------------|----------------------|----------------------|----------------------|----------------------|
   STRONG DEMOCRAT |                  41  |                 105  |                  42  |                  94  |                  22  |                  25  |                   6  |                 335  | 
                   |              62.014  |              84.219  |               0.141  |               8.312  |              11.962  |              15.026  |               4.163  |                      | 
-------------------|----------------------|----------------------|----------------------|----------------------|----------------------|----------------------|----------------------|----------------------|
  NOT STR DEMOCRAT |                  14  |                  62  |                  57  |                 154  |                  28  |                  16  |                   7  |                 338  | 
                   |               0.089  |               6.911  |               7.238  |               5.486  |               6.840  |              26.537  |               3.215  |                      | 
-------------------|----------------------|----------------------|----------------------|----------------------|----------------------|----------------------|----------------------|----------------------|
      IND,NEAR DEM |                  11  |                  47  |                  57  |                 103  |                  25  |                  11  |                   5  |                 259  | 
                   |               0.121  |               4.902  |              22.674  |               0.284  |               2.857  |              22.144  |               2.830  |                      | 
-------------------|----------------------|----------------------|----------------------|----------------------|----------------------|----------------------|----------------------|----------------------|
       INDEPENDENT |                   5  |                  20  |                  33  |                 189  |                  43  |                  32  |                   9  |                 331  | 
                   |               4.634  |              12.733  |               0.969  |              32.889  |               0.067  |               8.107  |               1.409  |                      | 
-------------------|----------------------|----------------------|----------------------|----------------------|----------------------|----------------------|----------------------|----------------------|
      IND,NEAR REP |                   1  |                   4  |                  16  |                  74  |                  49  |                  43  |                   8  |                 195  | 
                   |               5.592  |              18.279  |               2.167  |               0.002  |              19.466  |               4.622  |               0.003  |                      | 
-------------------|----------------------|----------------------|----------------------|----------------------|----------------------|----------------------|----------------------|----------------------|
NOT STR REPUBLICAN |                   2  |                  10  |                  16  |                  88  |                  72  |                  72  |                  13  |                 273  | 
                   |               6.824  |              18.702  |               8.224  |               2.190  |              33.411  |              18.786  |               0.364  |                      | 
-------------------|----------------------|----------------------|----------------------|----------------------|----------------------|----------------------|----------------------|----------------------|
 STRONG REPUBLICAN |                   0  |                   5  |                   5  |                  22  |                  23  |                 101  |                  27  |                 183  | 
                   |               6.999  |              15.115  |              12.805  |              32.065  |               0.121  |             177.476  |              52.256  |                      | 
-------------------|----------------------|----------------------|----------------------|----------------------|----------------------|----------------------|----------------------|----------------------|
       OTHER PARTY |                   1  |                   5  |                   6  |                  16  |                   3  |                  12  |                   4  |                  47  | 
                   |               0.354  |               0.227  |               0.035  |               0.170  |               1.768  |               2.735  |               2.344  |                      | 
-------------------|----------------------|----------------------|----------------------|----------------------|----------------------|----------------------|----------------------|----------------------|
      Column Total |                  75  |                 258  |                 232  |                 740  |                 265  |                 312  |                  79  |                1961  | 
-------------------|----------------------|----------------------|----------------------|----------------------|----------------------|----------------------|----------------------|----------------------|

 
Statistics for All Table Factors


Pearson's Chi-squared test 
------------------------------------------------------------
Chi^2 =  801.8746     d.f. =  42     p =  3.738705e-141 


 
       Minimum expected frequency: 1.797552 
Cells with Expected Frequency < 5: 2 of 56 (3.571429%)

Warning message:
In chisq.test(t, correct = FALSE, ...) :
  Chi-squared approximation may be incorrect

>

This code produces a table of frequencies along with a basic Ch-squared test. Other options include generating cell percentages and using either SPSS or SAS table format. This is accomplished by changing the appropriate flag from FALSE to TRUE and specifying either SPSS or SAS for the format flag. The table formatting is compressed in this example due to the narrow margin requirements of the web page.  Use the scroll bar at the bottom of the page to view the entire table.

There are many functions available in R to analyze data in tabular format. In my next tutorial I will examine using the xtabs function to produce basic cross tabulation with control variables.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s