# R Tutorial: A Simple Script to Create and Analyze a Data File, Part Two

A simple R script to create and analyze a data file:part two:    A tutorial by D.M. Wiig

In part one I discussed creating a simple data file containing the height and weight of 10 subjects.  In part two I will discuss the script needed to create a simple scatter diagram of the data and perform a basic Pearson correlation.  Before attempting to continue the script in this tutorial make sure that you have created and save the data file as discussed in part one.

To conduct a correlation/regression analysis of the data we want to first view a simple scatter plot. Load a library named ‘car’ into R memory. Use the command:

> library(car)

Then issue the following command to plot the graph:

> plot(Height~Weight, log=”xy”, data=Sampledatafile)

The output is seen below:

We can calculate a Pearson’s Product Moment correlation coefficient by using the command:

> # Pearson rank-order correlations between height and weight

> cor(Sampledatafile[,c(“Height”,”Weight”)], use=”complete.obs”, method=”pearson”)

Which results in:

Height Weight

Height 1.0000000 0.8813799

Weight 0.8813799 1.0000000

To run a simple linear regression for Height and Weight use the following code. Note that the dependent variable (Weight) is listed firt:

> model <-lm(Weight~Height, data=Sampledatafile)

> summary(model)

Call:

lm(formula = Weight ~ Height, data = Sampledatafile)

Residuals:

Min 1Q Median 3Q Max

-30.6800 -16.9749 -0.8774 19.9982 25.3200

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) -337.986 98.403 -3.435 0.008893 **

Height 7.518 1.425 5.277 0.000749 ***

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 21.93 on 8 degrees of freedom

Multiple R-squared: 0.7768, Adjusted R-squared: 0.7489

F-statistic: 27.85 on 1 and 8 DF, p-value: 0.0007489

>

To plot a regression line on the scatter diagram use the following command line. Note that we enter the y (dependent)variable first and then the x (independent)variable:

> scatterplot(Weight~Height, log=”xy”, reg.line=lm, smooth=FALSE, spread=FALSE,

+ data=Sampledatafile)

>

This will produce a graph as seen below. Note that box plots have also been included in the output:

This tutorial has hopefully demonstrated that complex tasks can be accomplished with relatively simple command line script. I will explore more of these simple scripts in future tutorials.

More to Come: