How To: Install a fast efficient web server on your Raspberry Pi


How To: Install a fast efficient web server on your Raspberry Pi

A tutorial by D.M. Wiig

In a previous tutorial I discussed setting up your RPi as a web server using a standard LAMP stack configuration. I have been using my RPi with this installation self-hosting a WordPress web site and have had very good success. For situations where a heavy load or memory intensive applications are anticipated the standard LAMP stack may not be sufficient. The Apache server and MySql database used in the configuration work well but the RPi is memory limited and may not effectively handle heavy loads.

In these kinds of environments it may be worthwhile to set up your RPi with faster and more memory efficient software. The nginx server (pronounced ‘engine x’) is designed to be fast and memory efficient and is an easy installation on an RPi running a Debian Wheezy Raspian OS. The first part of this turorial will guide you throught the basic setup and initial configuration of nginx on your RPi.

To get started it is a good idea to make sure that all of your installed software is running the most current versions. Open a terminal program and issue the command:

$ sudo apt-get update

When the update is completed issue the following command to install the nginx server:

$ sudo apt-get install nginx

This command will download and install the nginx server and all dependencies needed for the server to run. By default the server is installed in the directory /etc/nginx, so issue the following commands to get to the directory:

$ cd /etc/nginx
$ dir

You should see the output shown below:

pi@raspberrypi /etc/nginx $ dir
conf.d koi-utf mime.types naxsi.rules proxy_params sites-available uwsgi_params
fastcgi_params koi-win naxsi_core.rules nginx.conf scgi_params sites-enabled win-utf
pi@raspberrypi /etc/nginx $

We have several options at this point, but we will stay with the default configuration found in the ‘sites-enables’directory. Issue the commands:

$ cd sites-enabled
$ dir

You should see the following:

pi@raspberrypi /etc/nginx $ cd sites-enabled
pi@raspberrypi /etc/nginx/sites-enabled $ dir
default
pi@raspberrypi /etc/nginx/sites-enabled $

We will make a couple of changes to the default configuration file using the nano editor. Make sure you give yourself ‘super user’ privileges by using:

$ sudo nano default

You will see the entire configuation file in the editor but we are only interested for now in the following section:

##

server {
listen 80; ## listen for ipv4; this line is default and implied
#listen [::]:80 default_server ipv6only=on; ## listen for ipv6

root /usr/share/nginx/www;
index index.html index.htm;

# Make site accessible from http://localhost/
server_name localhost;

Uncomment the ‘listen 80’ line if it is commented (with a #) out. Do the same for the ‘server_name’ line if it is commented out.
Exit the editor by using Ctrl-O to write the file, and then Ctrl-X to exit the file. We can create a test file to test the server by editing the default index page. To access the page issue the command:

$ cd /usr/share/nginx/www
$ dir

You will see:

pi@raspberrypi /etc/nginx/sites-enabled $ cd /usr/share/nginx/www
pi@raspberrypi /usr/share/nginx/www $ dir
50x.html index.html

To edit the index.html file invoke the editor with:

$ sudo nano index.html

You should see:

GNU nano 2.2.6 File: index.html

<html>
<head>
<title>Welcome to nginx!</title>
</head>
<body bgcolor=”white” text=”black”>
<center><h1>Welcome to nginx!</h1></center>

</body>
</html>

[ Read 9 lines ]
^G Get Help ^O WriteOut ^R Read File ^Y Prev Page ^K Cut Text ^C Cur Pos
^X Exit ^J Justify ^W Where Is ^V Next Page ^U UnCut Text ^T To Spell
Let’s modify the file by inserting the line ‘This is some basic HTML markup to serve!’. After the </head> tag insert the following line:

<br><center><h2>This is some basic HTML to serve!</h2></center>

Your file should look like this:

GNU nano 2.2.6 File: index.html

<html>
<head>
<title>Welcome to nginx!</title>
</head>
<body bgcolor=”white” text=”black”>
<center><h1>Welcome to nginx!</h1></center>
<br><center><h2>This is some basic HTML to serve!</h2></center>
</body>
</html>

[ Read 9 lines ]
^G Get Help ^O WriteOut ^R Read File ^Y Prev Page ^K Cut Text ^C Cur Pos
^X Exit

Exit the editor with the Ctrl-O, Ctrl-X commands.

At this point we can start the server service and check to make sure that nginx is up and running properly. Start the server with:

pi@raspberrypi /usr/share/nginx/www $ sudo service nginx start
Starting nginx: nginx.

To check the server open a web browser and enter the URL http://localhost. You should see index page message displayed. If you have access to another computer on the same network as your RPi you can check the server by entering the local URL in the browser. You should see the default index page greeting. At this point your RPi can serve basic HTML markup to any server on the local network. In the next tutorial I will discuss how to set up and configure PHP and MySql for use with your nginx server setup.

 

How To: Installing an FTP server on your Raspberry Pi


 

How To: Installing an FTP server on your Raspberry Pi

(Author: D.M. Wiig. Your comments and questions are welcomed.)

If you are running any of the Linux distributions on your Rpi it is a fairly simple process to install a basic FTP server. There are a number of very good FTP servers available for Linux with vsftpd being one of the most popular. This tutorial will discuss the specifics of a vsftpd server installation. The process would be very similar for other popular FTP servers.

For the sake of simplicity this tutorial will only cover the basic installation of vsftpd for anonymous mode. This means that any user can access the ftp server and download files without a login procedure. By default vsftpd is configured to allow anonymous access and download only. A future tutorial will discuss the setting of options for authentication mode and other client services.

To install vsftpd login to your Rpi and at the command prompt enter:

$ sudo apt-get install vsftpd

When the installation is complete you will see the line:

Starting FTP server: vsftpd

By default a directory for ftp users is created with a home directory of /srv/ftp. This is the default ftp directory. You can check to make sure the ftp server is up and running properly by doing the following:

From the command line issue the command:

$ cd /srv/ftp

Once you are in the ftp directory create a test file by using the following commands:

$ sudo nano ftpsamplefile

Nano is a fast and simple text editor for Raspian. When opened you will get a blank screen. Type in something like:

This is a sample file file loaded in the default ftp directory

/srv/ftp

4/1/14

After you have entered the text hit press Ctrl-o to save the file you have created and then Ctrl-x to exit nano after the line has been saved. Enter the following command in your Rpi to determine the local address of the Rpi:

sudo ifconfig

You will see a number of lines of information. One of the top lines should contain something like:

inet addr:192.xxx.x.xx

This sequence of numbers is the local network address of your Rpi. You can ftp to it from any device on your network, but at this point you are not connected to the outside world so you cannot connect to it from outside of your router and network without some additional steps. To test your installation, open a web browser on any device that you have connected to the same network as your Rpi and enter:

ftp://192.xxx.x.xx (substitute your address found with the ifconfig command for the x’s)

You should see the ftp user directory that will look something like the example below:

Index of /

Name

Size

Date Modified

ftpsamplefile

69 B

3/31/13 8:05:00 PM

The ftp folder will contain the sample file that you created previously. By clicking on the file you should see the contents:

This is a sample ftp file loaded in the directory

/srv/ftp

Your FTP server is now up and running. I will cover issues such as file upload and download as well as creating user authentication and login procedures in another ‘How To:’

 

How To Connect Your Raspberry Pi Server to the Outside World


How To Connect Your Raspberry Pi Server to the Outside World

A tutorial by D.M. W

 

Once you have your server set up and running you need to make some changes to your home network to allow the Pi to connect to the outside world. A typical home network consists of an Web connection to a router through which local devices are connected. You can usually connect to your router by using your browser to connect to http://192.168.0.1 or something similar. This is the local address of the router and will give you a number of menu options for changing router settings and getting router information.

All devices connected to the router will have addresses that are similar to 192.xxx.x.xx. You can determine the local address of your Pi by looking at the ‘Device Table’ information in your router or by opening up a command line terminal interface and entering the command:

pi@raspberrypi ~ $ sudo ifconfig

eth0 Link encap:Ethernet HWaddr b8:27:eb:be:3f:bd

inet addr:192.168.0.27 Bcast:192.168.0.255 Mask:255.255.255.0

UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

RX packets:15804 errors:0 dropped:0 overruns:0 frame:0

TX packets:5686 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:1000

RX bytes:1205858 (1.1 MiB) TX bytes:2103034 (2.0 MiB)

lo Link encap:Local Loopback

inet addr:127.0.0.1 Mask:255.0.0.0

UP LOOPBACK RUNNING MTU:65536 Metric:1

RX packets:356 errors:0 dropped:0 overruns:0 frame:0

TX packets:356 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:0

RX bytes:27968 (27.3 KiB) TX bytes:27968 (27.3 KiB)

I have added the highlighting to show the local address of my Pi. Yours should be similar. Normally all devices connected to the local network can send input and output to the web through the router but are behind a firewall and cannot be accessed directly from the web. In order to use your Pi as a server you need to provide a path through the firewall so that it can be accessed from the web. This is done on most routers by using the ‘Port Forwarding’ feature that is available. Most servers including the Pi ‘listen’ for connect requests on port 80. You can use the port forwarding setting on you router to connect the local address of your Pi to port 80 on the router. On my router the table looks like this:

Connection Status:

CenturyLink®Broadband

2272/864 Kbps

ISP Status:

Port ForwardingEnter ports or port ranges required to forward Internet applications to a LAN device below.

1. Enter the LAN port and IP information.

  Starting Port:

  Ending Port:

  Protocol:

  LAN IP Address:

2. Enter the remote port and IP information. (Optional)

  Starting Port:

  Ending Port:

  Remote IP Address:

 Use 0.0.0.0 for any IP Address

3. Click “Apply” to save your settings.

 

Port Forwarding List

    

LAN Ports

Protocol

LAN IP
Address

Remote Ports

Remote IP
Address

Edit

80 – 80

TCP

192.168.0.27

N/A

N/A

21 – 21

TCP

192.168.0.27

N/A

N/A

22 – 22

TCP

192.168.0.27

N/A

N/A

As you can see I have several ports connected to the Pi. Port 21 is commonly used for FTP access and port 22 for ssh access. These are topics for future postings.

Please not that the 198.xxx.x.xx address is a local address and not the address that your router uses to connect to your ISP. The ISP address can be found in the router general information and status tables. If your Pi is set up correctly you should be able to connect to the Pi server from any device connected to the internet by enteringthe URL http://xxx.xx.xx.x (your router’s ISP web address) in a browser. You should see the default apache web page:

It works!

This is the default web page for this server.

The web server software is running but no content has been added, yet.

More to Come:

 

How To: Download and install the latest version or R on your Linux Ubuntu OS


 

How To: Download and install the latest version or R on your Linux Ubuntu OS

(A Tutorial by D.M. Wiig)

I have several computers that use Linux operating systems and I have installed R on all of them. I use Debian on some of the machines and Ubuntu on others. When downloading R using the distribution’s package manager or from the command line I have notice that I will get versions of R ranging from 2.13.xx to 2.15.xxx depending on the Linux distribution. That has not been a problem until the release of the current version of R, version 3.0.3. Since this version is not backwards compatible with earlier releases it is necessary to upgrade to the new version to take advantage of new packages that are rapidly being developed as well as modification to existing packages to accommodate R 3.0.3. This tutorial will cover the installation of R 3.0.3 on the Ubuntu distribution of Linux.

When installing R 3.0.3 it is necessary to make sure that the current binaries are installed to your version of the Linux OS. If you are running a Ubuntu distribution you can edit the sources.list file on your computer to access the most up to date CRANs. Open a terminal program and enter the following from the command line:

$ cd /etc/apt/

$ dir

Make sure the file sources.list is in the directory and then edit the file opening the nano editor:

$ sudo nano sources.list

You should see a file in the editor that is similar to the file shown below:

—————————————————————————————————-

deb cdrom:[Kubuntu 11.10 _Oneiric Ocelot_ – Release i386 (20111012)]/ oneiric main restricted

deb http://streaming.stat.iastate.edu/CRAN/bin/linux/ubuntu precise/

# See http://help.ubuntu.com/community/UpgradeNotes for how to upgrade to

# newer versions of the distribution.

deb http://us.archive.ubuntu.com/ubuntu/ precise main restricted

deb-src http://us.archive.ubuntu.com/ubuntu/ precise main restricted

## Major bug fix updates produced after the final release of the

## distribution.

deb http://us.archive.ubuntu.com/ubuntu/ precise-updates main restricted

deb-src http://us.archive.ubuntu.com/ubuntu/ precise-updates main restricted

## N.B. software from this repository is ENTIRELY UNSUPPORTED by the Ubuntu

## team. Also, please note that software in universe WILL NOT receive any

## review or updates from the Ubuntu security team.

deb http://us.archive.ubuntu.com/ubuntu/ precise universe

deb-src http://us.archive.ubuntu.com/ubuntu/ precise universe

deb http://us.archive.ubuntu.com/ubuntu/ precise-updates universe

deb-src http://us.archive.ubuntu.com/ubuntu/ precise-updates universe

## N.B. software from this repository is ENTIRELY UNSUPPORTED by the Ubuntu

^G Get Help ^O WriteOut ^R Read File ^Y Prev Page ^K Cut Text ^C Cur Pos

^X Exit ^J Justify ^W Where Is ^V Next Page ^U UnCut Text ^T To Spell

—————————————————————————

I have highlighted the line that I added to this file. This line will force Linux to access the CRAN for the latest version of R in a library that is not normally searched for updates if you have an earlier version of R installed. For an Ubuntu distribution change the line to one of the following depending on the distribution that you have installed:

http://<myfavorite-cran-mirror&gt; /bin/linux/ubuntu saucy/

http://<myfavorite-cran-mirror&gt; /bin/linux/ubuntu quantal/

http://<myfavorite-cran-mirror&gt; /bin/linux/ubuntu precise/

http://<myfavorite-cran-mirror&gt; /bin/linux/ubuntu lucid/

Replace <myfavorite-cran-mirror> with the CRAN repository of your choice found at the web site http://cran.r.project.org/mirrors.html. In my case as shown above I used a CRAN repository here in Iowa at Iowa State University. Once the line has been entered in your sources.list file press ctrl-o to save the file, and press ctrl-x to exit the editor. Be sure when you invoke nano that you have root privileges (by using sudo nano) or you will not be able to write out the modified file.

Once you have successfully modified the sources.list file proceed with the R 3.0.3 installation by issuing the command:

$ sudo apt-get update (to make sure all supporting files are current)

and then:

$ sudo apt-get install r-base

When the update runs you should see that R 3.0.x is downloaded and is being installed. After the installation is complete test it by issuing the command:

$ R

You will see the output as shown below:

———————————————

R version 3.0.3 (2014-03-06) — “Warm Puppy”

Copyright (C) 2014 The R Foundation for Statistical Computing

Platform: i686-pc-linux-gnu (32-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.

You are welcome to redistribute it under certain conditions.

Type ‘license()’ or ‘license()’ for distribution details.

Natural language support but running in an English locale

R is a collaborative project with many contributors.

Type ‘contributors()’ for more information and

‘citation()’ on how to cite R or R packages in publications.

Type ‘demo()’ for some demos, ‘help()’ for on-line help, or

‘help.start()’ for an HTML browser interface to help.

Type ‘q()’ to quit R.

>

You are now up and running with the latest version of R. The process for installation of R 3.0.x is similar for Debian and Fedora distributions. Each of these will be covered in a future tutorial.

 

Book Review: Raspberry Pi Super Cluster


 

Andrew K. Dennis. Raspberry Pi Super Cluster. Birmingham, England: PACKT Publishing, 2013.

A book review by D.M. Wiig

In the computer world clusters and supercomputers are used for some of the most demanding and complex tasks facing todays technology. Raspberry Pi Super Cluster by Andrew Dennis is a recently published work that demonstrates how this technology can be explored right in your own home or in the classroom using modest, inexpensive hardware and readily available free open source software.

This book is a well written and easy to understand introduction to the theory and practice of parallel computing that is suitable for hobbyists, educators or others who what to explore this interesting facet of computing. The widespread availability and low price of the Raspberry Pi computer makes building a real parallel computing cluster available for anyone who is interested in exploring this topic. In order to get the most from this book the reader should have some experience in working with computers and programming languages. A knowledge of the concepts involved in parallel and cluster computing is not required as the author covers the basics of these topics quite thoroughly. Some knowledge of working with the Raspberry Pi the Linux command line interface is also desirable.

The author starts out in chapter one with a discussion of some of the basic concepts involved in parallel computing such as supercomputers, multi-core and multi-processor machines, and cloud computing. Central to this introduction is the concept of commodity hardware clusters. The concept of using these groups of commodity off-the-shelf single board computers was pioneered in the late 1990’s and were know as Beowolf clusters, the name given to the concept of Network of Workstations (NOW) for scientific computing. The author concludes the introduction with a discussion of the Raspberry Pi computer which forms the basis of the computing cluster developed in the book. There is also a brief consideration of programming languages such as C, C++, and FORTRAN which are commonly used in Linux based computer clusters.

The author moves on to discuss in detail the hardware and software required to set up the cluster. Topics include setting up the Raspberry Pi, downloading and installing the Raspian operating system on an SD card and the initial setup of options such a SSH, the nano text editor, and installing the GCC FORTRAN compiler.

Chapter three of the book is devoted to the basics of setting up the foundation of a parallel computer interface with the MPI (Message Passing Interface) implementation. The book presents a step by step approach to downloading, installing, and configuring the MPICH software which is at the basis for the parallel computing environment. Once the system has been set up and tested on the first RPi the author turns to the task of setting up the second RPi that will be used in the configuration. It should be noted that the author provides abundant and detailed references to additional resources that the reader can access to assist in understanding or expanding upon the procedures discussed in the book. When the second RPi has been set up the author presents the design of a test program that will be used to check the installation, including detailed discussion of the code that is used. There is a nice feature of books published by PACKT that should be noted at this point. If the reader purchased the book from the publisher directly there is access to a download of all of the code that is presented in the book. This is a tremendous time saving feature and can help reduce coding mistakes that can lead to frustrating and hard to find errors.

While the first half of the book deals primarily with the installation and configuration of the RPi parallel cluster, the second half of the book deals with the application and development of distributed applications that will run on the RPi cluster. The author starts with a discussion of the technology known as Apache Hadoop, which is an open source project for developing distributed applications and is hosted by the Apache Software Foundation. The reader is then taken through the process of downloading and installing Java and the Java Development Kit, and downloading, installing, configuring, and testing the Hadoop server. Once again, there is a detailed and relatively easy to understand presentation of each step involved in the process. The author then turns to the setup of the second RPi, which is very similar to the setup for the first RPi. The second RPi setup tends to go faster as there is some duplication of configuration files.

The remaining chapters of the book are devoted to a presentation of some specific applications that can be run on the RPi cluster. There is a nice discussion of using the MapReduce programming approach on the RPi cluster. MapReduce is a programming approach that allows systems to process large datasets in parallel. The author takes the reader through an overview of the WordCount MapReduce program and a step by step testing of this program on the RPi cluster. There is also a chapter devoted to Monte Carlo simulators, which use large data sets and randomized sampling repeatedly in order to obtain a result for a particular mathematical question. The reader is walked through an example of using this technique on the RPi cluster to calculate Pi. The last chapter of the book explores other topics relating the the RPi cluster such as adding external USB disk drive for greater storage capacity and installing and experimenting with the FORTAN programming language on the cluster.

I found this book to be interesting, informative and challenging. It stimulated my interest in furthering my knowledge of cluster computing and the potential of the Raspberry Pi computer in that endeavor. I am a big fan of open source projects and I currently own two RPi’s. One is being used as a dedicated web server that hosts my WordPress Raspberry Pi and R statistics web site. The other is for experimental purposes. After reviewing this book I am planning to add a third (or fourth) RPi to my collection so that I can experiment with parallel computing. I recommend this book to computer users at all levels. It will help you in reading the book if you have some experience with computer hardware, operating systems, and programming languages, but for those less knowledgeable readers the author provides abundant links to additional information, source code and other sources that make this a good read for those with less hands on experience.

————————————————————

Author Information: Douglas M. Wiig

I am a Professor of Political Science at Grand View University in Des Moines, Iowa, USA. My teaching areas of expertise include social science statistics, social science research methods, comparative and international politics. I am also interested in developing methods to integrate technology into the university curriculum. I have used computers, and various programming languages in the classroom, in academic research and writing an in personal projects since the days when data and programming instructions where entered into mainframe behemoths on punched cards and personal computing platforms were still a dream. I am a big fan of open source projects and contribute whatever I can to the continuing growth and success of the community.

Contact Information: Douglas M. Wiig

Email: dwiig@grandview.edu dmartin6412@gmail.com

Web Site/Blog: http://raspberrypiandr.net

 

Book Review: Raspberry Pi Super Cluster


 

Book Review: Piotr J. Kula. Raspberry Pi Server Essentials. Birmingham, UK: Packt Publishing, 2014.

 A book review by D.M. Wiig

Raspberry Pi Server Essentials is an informative, step by step discussion of how this amazing little computer can be set up as a fully functioning web server. The book begins with a discussion of the basics of setting up a Raspberry Pi and walks the reader through the process of obtaining necessary hardware, installation of the Raspian operating system and initial system configuration. There is also a brief discussion of the design of the Raspberry Pi for readers who are more technically inclined.

I might point out that if the reader is not comfortable working at the command line level and performing system operations such as disk formatting and writing or directory tasks that this section may be a little daunting. Less technically inclined readers may want to purchase an SD card that is preloaded with the Raspberry Pi operating system software. These cards are available from a number of sources at a reasonable price and provide plug-and-play convenience.

After discussing the Raspberry Pi hardware setup the author moves to a consideration of network configuration from Local Area Networks to wireless and Ethernet connections. Once again there is a concise presentation of some of the basics for readers who have some experience working with routers and home networks. After a discussion of performing Raspberry Pi system updates and some basic system monitoring functions the author turns to the task of installing a web server on the Raspberry Pi.

There are several good open source web servers available for Linux operating systems such as Apache software, but the author points out that while these servers contain a number of useful features and are very powerful they are also cumbersome when used on a computer with limited RAM and a relatively slow processor such as the Raspberry Pi. The use of a fast PHP based web server called nginx (pronounced ‘engine x’) is one solution to this problem. Nginx is a fast lightweight server that is designed to deliver the maximum content with a minimum load on system resources. The author first walks the reader through a discussion of downloading and installing nginx. There is also a discussion of downloading and setting up a lightweight SQL database server called SQLite3 to run on the server.

The remaining chapters of the book discuss how to set up and use a number of useful applications on your now functioning Raspberry Pi web server. These applications include setting up and managing a file server, using the Raspberry Pi as a game server for popular open source games such as OpenTTD, using the official HD camera module designed by the Raspberry Pi Foundation for streaming live HD video, and setting up the Raspberry Pi to control a home media center.

There is also an interesting discussion of setting up software on the Raspberry Pi for use with the Bitcoin cryptocurrency implementation. Readers are walked through the installation of Bitcoin software bitcoind on the Raspberry Pi and the use of Bitcoin wallets and Bitcoin web addresses. The chapter concludes with a brief section on Bitcoin mining with CGMiner software.

Raspberry Pi Server Essentials is a concise yet informative look at how the Raspberry Pi can be used in a variety of web server applications. Some technical knowledge of basic hardware and command level interaction with the operating system software is helpful in reading this book but not essential. For those readers who desire more information the author provides a number of links to additional resources pertaining to the material covered in each chapter. The world of open source technology is an amazing one. This book is a good read for those who want to venture into managing their own open source based web server.

————————————————————-

 

Using R in Nonparametric Statistics: Basic Table Analysis, Part Two


Using R in Nonparametric Statistics: Basic Table Analysis, Part Two

A Tutorial by D.M. Wiig

As discussed in a previous tutorial one of the most common methods display ng and analyzing data is through the use of tables. In this tutorial I will discuss setting up a basic table using R and exploring the use of the CrossTable function that is available in the R ‘gmodel’ package. I will use the same hypothetical data table that I created in Part One of this tutorial, data that examines the relationship between income and political party identification among a group of registered voters. The variable “income” will be considered ordinal in nature and consists of categories of income in thousands as follows:

“< 25”; “25-50”; “51-100” and “>100”

Political party identification is nominal in nature with the following categories:

“Dem”, “Rep”, “Indep”

Frequency counts of individuals that fall into each category are numeric. In the first example we will create a table by entering the data as a data frame and displaying the results. When using this method it is a good idea to set up the table on paper before entering the data into R. This will help to make sure that all cases and factors are entered correctly. The table I want to generate will look like this:

party
income                Dem Rep Indep
<25 1                          5     5      10
26-50                      20    15    15
51-100                  10     20    10
>100                        5       30    10

When using the CrossTable() function the data should be entered in matrix format. Enter the data from the table above as follows:

>#enter data as table matrix creating the variable ‘Partyid’
>#enter the frequencies
>Partyid <-matrix(c(15,20,10,5, 5,15,20,30, 10,15,10,10),4,3)
>#enter the column dimension names and column heading categories
>dimnames(Partyid) = list(income=c(“<25”, “25-50″,”51-100”, “>100”), party=c(“Dem”,”Rep”,”Indep”))

To view the structue of the created data matrix use the command:

> str(Partyid)
num [1:4, 1:3] 15 20 10 5 5 15 20 30 10 15 …
– attr(*, “dimnames”)=List of 2
..$ income: chr [1:4] “<25” “25-50” “51-100” “>100”
..$ party : chr [1:3] “Dem” “Rep” “Indep”
>

To view the table use the command:

> Partyid
                                                     party
income                       Dem Rep Indep
<25                                   15     5      10
25-50                             20     15    15
51-100                         10      20   10
>100                               5        30   10
>  

Remember that R is case sensitive so make sure you use upper case if you named your variable ‘Partyid.’

Once the table has been entered as a matrix it can be displayed with a number of available options using the CrossTable() function. In this example I will produce a table in SAS format(default format), display both observed and expected cell frequencies, the proportion of the Chi-square total contributed by each cell, and the results of the chi-square analysis. The script is:
> #make sure gmodels package is loaded
> require(gmodels)
> #CrossTable analysis
> CrossTable(Partyid,prop.t=FALSE,prop.r=FALSE,prop.c=FALSE,expected=TRUE,chisq=TRUE,prop.chisq=TRUE)

Cell Contents
|—————————–|
|                                                    N |
|                             Expected N |
| Chi-square contribution |
|—————————-|
Total Observations in Table: 165
                                             | party
income | Dem | Rep | Indep | Row Total |
<25        |    15     | 5              | 10        | 30                   |
                 | 9.091 | 12.727 |8.182  |                          |
                 | 3.841 | 4.692 | 0.404 |                             |

25-50 |      20             15             | 15 |      |50

                 15.152 | 21.212 | 13.636 | |
               | 1.552   | 1.819    | 0.136 | |

51-100 | 10           | 20            | 10 |         40 |
              | 12.121 | 16.970 | 10.909 | |
|                 0.371 |   0.541 |    0.076 | |
————-|———–|———–|———–|———–|
>100 |        5 |          30             | 10 |        45 |
          | 13.636 | 19.091 |    12.273 | |
           | 5.470 |   6.234 |         0.421 | |
————-|———–|———–|———–|———–|
Column Total | 50 | 70 | 45 | 165 |
————-|———–|———–|———–|———–|
Statistics for All Table Factors
Pearson’s Chi-squared test
————————————————————
Chi^2 = 25.55608 d.f. = 6 p = 0.0002692734

>

As seen above row marginal totals and column marginal totals are displayed by default with the SAS format. There are other options available for the CrossTable() function. See the CRAN documentation for a detailed description of all of the options available. In the next installment of this tutorial I will examine some of the measures of association that are available in R for nominal and ordinal data displayed in a table format.

 

Using R in Nonparametric Statistics: Basic Table Analysis, Part One


Using R in Nonparametric Statistics: Basic Table Analysis, Part One

A Tutorial by D.M. Wiig
One of the most common methods displaying and analyzing data is through the use of tables. In this tutorial I will discuss setting up a basic table using R and performing an initial Chi-Square test on the table. R has an extensive set of tools for manipulating data in the form of a matrix, table, or data frame. The package ‘vcd’ is specifically designed to provide tools for table analysis. Before beginning this tutorial open an R session in your terminal window. You can install the vcd package using the following command:

>install.packages()

Depending on your R installation you may be asked to designate a CRAN reflector to download from or you may see a list of available packages in your default CRAN mirror. Select the package ‘vcd’ and download it. I might add at this point that if you are running the newest release of R, R-3.0.x you will have to reload a number of dependencies that will not work under the latest version of R. Any time you are installing a package and see the ‘non-zero exit status’ error message look the dialog over to see which packages have to be reinstalled to work with the newest version of R. If you are using R-2.xx.x the vcd package will install without any other re-installations.

In social science research we often use data that is nominal or ordinal in nature. Data is displayed in categories with associated frequency counts. In this tutorial I will use a set of hypothetical data that examines the relationship between income and political party identification among a group of registered voters. The variable “income” will be considered ordinal in nature and consists of categories of income in thousands as follows:

“< 25”; “25-50”; “51-100” and “>100”

Political party identification is nominal in nature with the following categories:

“Dem”, “Rep”, “Indep”

Frequency counts of individuals that fall into each category are numeric. In the first example we will create a table by entering the data as a data frame and displaying the results. When using this method it is a good idea to set up the table on paper before entering the data into R. This will help to make sure that all cases and factors are entered correctly. The table I want to generate will look like this:

party
income                 Dem Rep Indep
<25                             15    5      10
26-50                        20   15    15
51-100                     10   20    10
>100                            5    30    10

To enter the above into a data frame use the following on the command line:

> partydata <- data.frame(expand.grid(income=c(“<25″,”25-50″,”51-100″,”>100″), party=c(“Dem”,”Rep”, “Indep”)),count=c(15,20,10,5,5,15,20,30,10,15,10,10))
>

Make sure the syntax is exactly as shown and make sure the entire script is on the same line or has done an automatic return to the next line in your R console. When the command runs without error you can view the data by entering:

> partydata

The following output is produced:

> partydata
income                    party         count
1 <25                         Dem            15
2 25-50                    Dem            20
3 51-100                 Dem           10
4 >100                      Dem             5
5 <25                         Rep               5
6 25-50                    Rep              15
7 51-100                 Rep              20
8 >100                      Rep              30
9 <25                         Indep          10
10 25-50                 Indep          15
11 51-100              Indep          10
12 >100                   Indep          10
>

At this point the data is in frequency rather that table or matrix form. To view a summary of information about the data use the command:

>str(partydata)

You will see:

> str(partydata)
‘data.frame’: 12 obs. of 3 variables:
$ income: Factor w/ 4 levels “<25″,”26-50”,..: 1 2 3 4 1 2 3 4 1 2 …
$ party : Factor w/ 3 levels “Dem”,”Rep”,”Indep”: 1 1 1 1 2 2 2 2 3 3 …
$ count : num 15 20 10 5 5 15 20 30 10 15 …

To convert the data into tabular format use the command xtabs to perform a cross tabulation. I have named the resulting table “tabs”:

>tabs <- xtabs(count ~income + party, data=partydata)

To view the resulting table use:

> tabs
                                                        party
income                              Dem Rep Indep
<25                                        15      5        10
26-50                                   20      15      15
51-100                                10      20      10
>100                                       5       30      10
>

This produces a table in the desired format. To do a quick analysis of the table that produces a Chi-square statistic use the command:

> summary(tabs)

The output is

> summary(tabs)
Call: xtabs(formula = count ~ income + party, data = partydata)
Number of cases in table: 165
Number of factors: 2
Test for independence of all factors:
Chisq = 25.556, df = 6, p-value = 0.0002693
>

In future tutorials I will discuss many of the other resources that are available with the vcd package for manipulating and analyzing data in a tabular format.

 

Using R in Nonparametric Statistics: Basic Table Analysis, Part Three, Using assocstats and collapse.table


A tutorial by D.M. Wiig

As discussed in a previous tutorial one of the most common methods displaying and analyzing data is through the use of tables. In this tutorial I will discuss setting up a basic table using R and exploring the use of the assocstats function to generate several commonly used nonparametric measures of association. The assocstats function will generate the association measures of the Phi-coefficient, the Contingency Coefficient and Cramer’s V, in addition to the Likelihood Ratio and Pearson’s Chi-Squared for independence. Cramer’s V and the Contigency Coefficient are commonly applied to r x c tables while the Phi-coefficient is used in the case of dichotomous variables in a 2 x 2 table.

To illustrate the use of assocstats I will use hypthetical data exploring the relationship between level of education and average annual income. Education will be measured using the nominal categories “High School”, “College”, and “Graduate”. Average annual income will be measured using ordinal categories and expressed in thousands:

“< 25”; “25-50”; “51-100” and “>100”

Frequency counts of individuals that fall into each category are numeric.

In the first example a 4 x 3 table created with hypothetical frequencies as shown below:

Income                                Education
(thousands)          High School   College   Graduate

<25                                    15                       8                  5

26-50                              12                       12                8

51-100                           10                       22                25

>100                                  5                       10                 32
The first table, table1, is entered into R as a data frame using the following commands:

#create 4 x 3 data frame
#enter table1 in frequency form
table1 <- data.frame(expand.grid(income=c(“<25″,”25-50″,”51-100″,”>100″), education=c(“HS”,”College”, “Graduate”)),count=c(15,12,10,5,8,12,22,10,5,8,25,32))

Check to make sure the data are in the right row and column categories. Notice that the data are entered in the ‘count’ list by columns.

> table1
income  education     count
1 <25             HS                  15
2 25-50        HS                  12
3 51-100     HS                 10
4 >100          HS                   5
5 <25             College         8
6 25-50        College        12
7 51-100     College        22
8 >100          College        10
9 <25             Graduate      5
10 25-50     Graduate      8
11 51-100   Graduate    25
12 >100       Graduate    32
>

If the stable structure looks correct generate the table, tab1, using the xtabs function:

> #create table tab1 from data.frame
> tab1 <- xtabs(count ~income + education, data=table1)
Show the table using the command:

>tab1
                               education
income         HS College Graduate
<25                   15     8             5
25-50             12     12           8
51-100          10     22          25
>100                 5     10          32
>
Use the assocstats function to generate measures of association for the table. Make sure that you have loaded the vcd package and the vcdExtras packages. Run assocstats with the following commands:

> assocstats(tab1)
X^2 df P(> X^2)
Likelihood Ratio 31.949 6 1.6689e-05
Pearson 32.279 6 1.4426e-05

Phi-Coefficient : 0.444
Contingency Coeff.: 0.406
Cramer’s V : 0.314
>

The measures show an association between the two variables. My intent is not to provide an analysis of how to evaluate each of the measures. There are excellent sources of documention on each measure of association in the R CRAN Literature. Since the Phi-coefficient is designed primarily to measure association between dichotomous variables in a 2 x 2 table,collapse the 4 x 3 table using the collapse.table function to get a more accurate Phi-coefficient. Since we want to go from a 4 x 3 to a 2 x 2 table we essentially collapse the table in two stages. The first stage collapses the table to a 2 x 3 table by combining the “<25” with the “25-50” and the “51-100” with the “>100” categories of income.

The resulting 2 x 3 table is seen below:

Education
Income                High School      College        Graduate

<50                                 27                        20                    13

>50                                15                        32                     57

To collapse the table use the R function collapse.table to combine the “<25” and “26-50” categories and the “50-100” and “>100” categories as discussed above:

> #collapse table tab1 to a 2 x 3 table, table2
> table2 <-collapse.table(tab1, income=c(“<50″,”<50″,”>50″,”>50″))

View the resulting table, table2, with:

> table2
                                education
income          HS        College       Graduate
<50                  27             20                   13
>50                  15             32                   57
>

Now collapse the table to a 2 x 2 table by combining the “College” and “Graduate” columns:
> #collapse 2 x 3 table2 to a 2 x2 table, table3
> table3 <-collapse.table(table2, education=c(“HS”,”College”,”College”))

View the resulting table, table3, with:

> table3
                               education
income             HS             College
<25                     27                  33
>100                  15                  89
>

Use the assocstats function to evaluated the 2 x 2 table:

> #use assocstats on the 2 x 2 table, table3
> assocstats(table3)
X^2 df P(> X^2)
Likelihood Ratio 18.220 1 1.9684e-05
Pearson 18.673 1 1.5519e-05

Phi-Coefficient : 0.337
Contingency Coeff.: 0.32
Cramer’s V : 0.337
>

There are many other table manipulation function available in the R vcd and vcdExtras packages and well as other packages to provide analysis of nonparametric data. This series of tutorials hopefully serves to illustrate some of the more basic and common table functions using these packages. The next tutorial looks at the use of the ca function to perform and graph the results of a basic Correspondence Analysis.

Resources and Information About R Statistics and Programming

%d bloggers like this: