Tag Archives: descriptive statistics with r

R Tutorial: A Script to Create and Analyze a Simple Data File, Part One


R Tutorial: A Simple Script to Create and Analyze a Data File, Part One

By D.M. Wiig

In this tutorial I will walk you through a simple script that will show you how to create a data file and perform some simple statistical procedures on the file. I will break the code into segments and discuss what each segment does. Before starting this tutorial make sure you have a terminal window open and open R from the command line.

The first task is to create a simple data file. Let’s assume that we have some data from 10 individuals measuring each person’s height and weight. The data is shown below:

Height(inches) Weight(lbs)

72               225

60               128

65               176

75               215

66               145

65               120

70               210

71               176

68               155

77               250

We can enter the data into a data matrix by invoking the data editor and entering the values. Please note that the lines of code preceded by a # are comments and are ignored by R:

#Create a new file and invoke the data editor to enter data

#Create the file Sampledatafile, height and weight of 10 s subjects

Sampledatafile <-data.frame()

Sampledatafile <-edit(Sampledatafile)

You will see a window open that is the R Data Editor. Click on the column heading ‘var1’ and you will see several different data types in the drop down menu. Choose the ‘real’ data type. Follow the same procedure to set the data type for the second column. Enter the data pairs in the columns, with height in the first column and weight in the second column. When the data have been entered click on the var1 heading for column 1 and click ‘Change Name.’ Enter ‘Height’ to label the first column. Follow the same steps to rename the second column ‘Weight.’

Once both columns of data have been entered you can click ‘Quit.’ The datafile ‘Sampledatafile’ is now loaded into memory.

To run so me basic descriptive statistics use the following code:

> #Run descriptives on the data

> summary(Sampledatafile)

The output from this code will be:

  Height                Weight

Min. :60.00          Min. :120.0

1st Qu.:65.25        1st Qu.:147.5

Median :69.00        Median :176.0

Mean :68.90          Mean :180.0

3rd Qu.:71.75        3rd Qu.:213.8

Max. :77.00          Max. :250.0

>

To view the data file use the following lines of code:

>#print the datafile ‘Sampledatafile’ on the screen

> print(Sampledatafile)

You will see the output:

Height          Weight

1 72             225

2 60             128

3 65             176

4 75             215

5 66             145

6 65             120

7 70             210

8 71             176

9 68             155

10 77            250

In Part Two I will discuss an R script to do a simple correlation and scatter diagram.  Check back later!