For anyone interested in researching social science questions there is a wealth of survey data available through the National Opinion Research Center (NORC) and its associated research universities. The Center has been conducting a national survey each year since 1972 and has compiled a massive database of data from these surveys. Most if not all of these data files can be accessed and downloaded without charge. I have been working with the 2014 edition of the data and for this tutorial will use the GSS2014 data file that is available for download on the Center’s web site. ( See the NORC main website at http://www.norc.org/Research/Projects/Pages/general-social-survey.aspx and at http://www3.norc.org/GSS+Website ).
As noted above the datasets that are available for download are available in both SPSS format and STATA format. To work with either of these formats using R it is necessary to read the file into a data frame using one of a couple of different packages. The first option I will discuss uses the Hmisc package. The second option I will discuss uses the foreign package. Install both of these packages from your favorite CRAN mirror site before starting the code in this tutorial.
For this tutorial I am using the one year release file GSS2014. This file contains 2538 cases and 866 variables. Download the file from the web site listed above in both SPSS and STATA formats. Use the following code to load the Hmisc package into your R global environment:
>require(Hmisc)
Now load the GSS2014.sav SPSS version from your storage device using the following line of code. I am using the filename GSS2014 for my data file and loading the file into the data frame ‘gss14’:
>#load the GSS data file in SPSS format
>put data into data frame ‘gss14’
gss14 <- spss.get(F:/research/Documents/GSS2014.sav”, use.value.labels=TRUE)
>
To view the data that was loaded use the command:
>View(gss14)
This will produce a spreadsheet-like matrix of rows and columns containing the data. To load the data file in STATA format download the STATA version of the file from the NORC web site a discussed above. My STATA file is also named GSS2014, but with the STATA .dta extension. Load the file into a data frame using:
>load STATA format file into data frame ‘Dataset2’
>Datatset2 <- read.dta(“F:/resarch/Documents/GSS2014.dta”)
>
Once again, you can view the data frame loaded using the command:
>View(dataset2)
Both the STATA and SPSS formats of the data set can also be loaded into R using the foreign package. The procedure is the same for both SPSS and STATA
>load SPSS version
>require(foreign)
>Dataset <- read.spss(“F:/research/Documents/GSS2014.sav”, use.value.labels=TRUE)
>load STATA version into data frame ‘Dataset3’
>Dataset3 <- read.dta(“E:/research/Documents/GSS2014.dta”)
Use the ‘View()’ command to view the data frame.
In part two I will discuss some techniques using R to create and analyze subsets of the GSS2014 data file.