R Tutorial: A Simple Script to Create and Analyze a Data File, Part One
By D.M. Wiig
In this tutorial I will walk you through a simple script that will show you how to create a data file and perform some simple statistical procedures on the file. I will break the code into segments and discuss what each segment does. Before starting this tutorial make sure you have a terminal window open and open R from the command line.
The first task is to create a simple data file. Let’s assume that we have some data from 10 individuals measuring each person’s height and weight. The data is shown below:
Height(inches) Weight(lbs)
72 225
60 128
65 176
75 215
66 145
65 120
70 210
71 176
68 155
77 250
We can enter the data into a data matrix by invoking the data editor and entering the values. Please note that the lines of code preceded by a # are comments and are ignored by R:
#Create a new file and invoke the data editor to enter data
#Create the file Sampledatafile, height and weight of 10 s subjects
Sampledatafile <-data.frame()
Sampledatafile <-edit(Sampledatafile)
You will see a window open that is the R Data Editor. Click on the column heading ‘var1’ and you will see several different data types in the drop down menu. Choose the ‘real’ data type. Follow the same procedure to set the data type for the second column. Enter the data pairs in the columns, with height in the first column and weight in the second column. When the data have been entered click on the var1 heading for column 1 and click ‘Change Name.’ Enter ‘Height’ to label the first column. Follow the same steps to rename the second column ‘Weight.’
Once both columns of data have been entered you can click ‘Quit.’ The datafile ‘Sampledatafile’ is now loaded into memory.
To run so me basic descriptive statistics use the following code:
> #Run descriptives on the data
> summary(Sampledatafile)
The output from this code will be:
Height Weight
Min. :60.00 Min. :120.0
1st Qu.:65.25 1st Qu.:147.5
Median :69.00 Median :176.0
Mean :68.90 Mean :180.0
3rd Qu.:71.75 3rd Qu.:213.8
Max. :77.00 Max. :250.0
>
To view the data file use the following lines of code:
>#print the datafile ‘Sampledatafile’ on the screen
> print(Sampledatafile)
You will see the output:
Height Weight
1 72 225
2 60 128
3 65 176
4 75 215
5 66 145
6 65 120
7 70 210
8 71 176
9 68 155
10 77 250
In Part Two I will discuss an R script to do a simple correlation and scatter diagram. Check back later!