Andrew K. Dennis. Raspberry Pi Super Cluster. Birmingham, England: PACKT Publishing, 2013.
A book review by D.M. Wiig
In the computer world clusters and supercomputers are used for some of the most demanding and complex tasks facing todays technology. Raspberry Pi Super Cluster by Andrew Dennis is a recently published work that demonstrates how this technology can be explored right in your own home or in the classroom using modest, inexpensive hardware and readily available free open source software.
This book is a well written and easy to understand introduction to the theory and practice of parallel computing that is suitable for hobbyists, educators or others who what to explore this interesting facet of computing. The widespread availability and low price of the Raspberry Pi computer makes building a real parallel computing cluster available for anyone who is interested in exploring this topic. In order to get the most from this book the reader should have some experience in working with computers and programming languages. A knowledge of the concepts involved in parallel and cluster computing is not required as the author covers the basics of these topics quite thoroughly. Some knowledge of working with the Raspberry Pi the Linux command line interface is also desirable.
The author starts out in chapter one with a discussion of some of the basic concepts involved in parallel computing such as supercomputers, multi-core and multi-processor machines, and cloud computing. Central to this introduction is the concept of commodity hardware clusters. The concept of using these groups of commodity off-the-shelf single board computers was pioneered in the late 1990’s and were know as Beowolf clusters, the name given to the concept of Network of Workstations (NOW) for scientific computing. The author concludes the introduction with a discussion of the Raspberry Pi computer which forms the basis of the computing cluster developed in the book. There is also a brief consideration of programming languages such as C, C++, and FORTRAN which are commonly used in Linux based computer clusters.
The author moves on to discuss in detail the hardware and software required to set up the cluster. Topics include setting up the Raspberry Pi, downloading and installing the Raspian operating system on an SD card and the initial setup of options such a SSH, the nano text editor, and installing the GCC FORTRAN compiler.
Chapter three of the book is devoted to the basics of setting up the foundation of a parallel computer interface with the MPI (Message Passing Interface) implementation. The book presents a step by step approach to downloading, installing, and configuring the MPICH software which is at the basis for the parallel computing environment. Once the system has been set up and tested on the first RPi the author turns to the task of setting up the second RPi that will be used in the configuration. It should be noted that the author provides abundant and detailed references to additional resources that the reader can access to assist in understanding or expanding upon the procedures discussed in the book. When the second RPi has been set up the author presents the design of a test program that will be used to check the installation, including detailed discussion of the code that is used. There is a nice feature of books published by PACKT that should be noted at this point. If the reader purchased the book from the publisher directly there is access to a download of all of the code that is presented in the book. This is a tremendous time saving feature and can help reduce coding mistakes that can lead to frustrating and hard to find errors.
While the first half of the book deals primarily with the installation and configuration of the RPi parallel cluster, the second half of the book deals with the application and development of distributed applications that will run on the RPi cluster. The author starts with a discussion of the technology known as Apache Hadoop, which is an open source project for developing distributed applications and is hosted by the Apache Software Foundation. The reader is then taken through the process of downloading and installing Java and the Java Development Kit, and downloading, installing, configuring, and testing the Hadoop server. Once again, there is a detailed and relatively easy to understand presentation of each step involved in the process. The author then turns to the setup of the second RPi, which is very similar to the setup for the first RPi. The second RPi setup tends to go faster as there is some duplication of configuration files.
The remaining chapters of the book are devoted to a presentation of some specific applications that can be run on the RPi cluster. There is a nice discussion of using the MapReduce programming approach on the RPi cluster. MapReduce is a programming approach that allows systems to process large datasets in parallel. The author takes the reader through an overview of the WordCount MapReduce program and a step by step testing of this program on the RPi cluster. There is also a chapter devoted to Monte Carlo simulators, which use large data sets and randomized sampling repeatedly in order to obtain a result for a particular mathematical question. The reader is walked through an example of using this technique on the RPi cluster to calculate Pi. The last chapter of the book explores other topics relating the the RPi cluster such as adding external USB disk drive for greater storage capacity and installing and experimenting with the FORTAN programming language on the cluster.
I found this book to be interesting, informative and challenging. It stimulated my interest in furthering my knowledge of cluster computing and the potential of the Raspberry Pi computer in that endeavor. I am a big fan of open source projects and I currently own two RPi’s. One is being used as a dedicated web server that hosts my WordPress Raspberry Pi and R statistics web site. The other is for experimental purposes. After reviewing this book I am planning to add a third (or fourth) RPi to my collection so that I can experiment with parallel computing. I recommend this book to computer users at all levels. It will help you in reading the book if you have some experience with computer hardware, operating systems, and programming languages, but for those less knowledgeable readers the author provides abundant links to additional information, source code and other sources that make this a good read for those with less hands on experience.
Author Information: Douglas M. Wiig
I am a Professor of Political Science at Grand View University in Des Moines, Iowa, USA. My teaching areas of expertise include social science statistics, social science research methods, comparative and international politics. I am also interested in developing methods to integrate technology into the university curriculum. I have used computers, and various programming languages in the classroom, in academic research and writing an in personal projects since the days when data and programming instructions where entered into mainframe behemoths on punched cards and personal computing platforms were still a dream. I am a big fan of open source projects and contribute whatever I can to the continuing growth and success of the community.
Contact Information: Douglas M. Wiig
Web Site/Blog: http://raspberrypiandr.net