R Tutorial: Using R to Work With Datasets From the NORC General Social Science Survey
A tutorial by D. M. Wiig
When I teach classes in social science statistics and social science research methods I like to use “live” data as much as possible both in classroom lectures and in homework assignments. For the social sciences one excellent and readily available source of live data is the ongoing General Social Science Survey project, The National Data Program for the Sciences. This is a project of NORC, a National Science Research Center at the University of Chicago (see www.norc.org for the projects main web site.)
There a a number of datasets available in different formats. The quick download datasets that I like to use are primarily SPSS data files. Many institutions have SPSS available for students and faculty but the use of SPSS is my no means universal. I have found that it is easy to use R to read the .sav format files into an R data frame and then write the file out to a comma separated value, .csv format that can be read my almost any statistics software package. As I will discuss in this an future tutorials it is also quite effective to use R to analyze the GSS files.
To create R datasets using the GSS files we can use some of the file import/export features available in R. To begin, make sure that the R packages “Hmisc” and “foreign” are installed and loaded in your R session environment. This can be accomplished using:
> install.packages(“Hmisc”) #need for file import
> install.packages(“foreign”) #need for file import
As an example, the following code will load the GSS data file “gss2010x.sav” into an R data frame using the spss.get function:
>gssdataframe <- spss.get(“/path-to-your-file/gss2010x.sav”, use.value.labels=TRUE)
The file “gss2010x.sav” contains 500 observations of 47 variables. Codebooks and other information about the data in these datasets is readily avaiable for download from the NORC web site. After the data is loaded into the data frame it can be viewed using:
To convert and save the file to a comma separated value (.csv) format use the following use the write.table function:
>#write dataframe to .csv file
The file, now in a .csv format can be accessed with virtually any statistics package or other software. In my next tutorial I will discuss working with GSS data using the various table and cross table functions available in R.