Tag Archives: book review

Book Review: R High Performance Programming


A book review by Douglas M. Wiig

Aloysius Lim and William Tjhi. R High Performance Programming. Birmingham, UK: Packt Publishing Ltd., 2015. bit.ly/14Rhpp

R High Performance Programming is a well written, informative book most suited for the experienced R programmer. This book offers a handy guide for R users who need speed and efficiency for the tasks that they perform.

The authors begin with an informative chapter discussing some of the inherent constraints on R’s computing performance such as CPU and RAM usage, and how R code is interpreted on the fly rather than compiled. A guide to several methods of profiling R’s code execution time, memory allocation and CPU usage is discussed in the next chapter. Sample code included in the chapter allows the reader to experiment with various benchmarking techniques to measure processing time and memory usage. This chapter provides the reader with some good tools for benchmarking R projects and identifying areas where improvements in processing can be made.

As is always the case with technical books from Packt Publishing, ample code examples are used in the chapter and the complete code used in each chapter is available for download with the book. This is a very handy feature and allows readers to do some live programming with R as the book is read.

The authors discuss a number of simple tweaks that can be easily performed to increase processing speed such as using built in functions and using hash tables. The hash table technique is useful for applications that use frequent lookups and can dramatically reduce processing time when compared to the use of lists. Running example code using this technique shows a large decrease in processing time when using the hash table approach as compared to straight list processing lookups.

In chapter 4 the authors discuss the use of compiled R code and integrating compiled languages into R code. They show several examples of using the R package inline that allows users to embed C, C++, Objective-C, Objective-C++ and Fortran code within R. Once again there are ample code examples to illustrate the use of this technique. For more advanced uses of compiled code the authors discuss how to create entire modules coded in C++ using the Rcpp package. Several completed code examples are included to illustrate the technique.

Another interesting approach to speeding up R is discussed in a chapter that explores several R packages designed to exploit the capability of GPU’s (Graphic Processing Cards) that are a used in many computers. These techniques can facilitate creating very fast and efficient statistical modeling code using R and the GPU.

As indicated above, readers can download the code package included with the book and find a well-organized set of ten folders (one for each chapter) containing 51 files. These files contain the sample code from the book as well as other code segments and benchmark code discussed in the book. The authors indicate that the code has been tested on R 3.1.1, Ubuntu 14.04 Trusty Tahr, Mac OS X 10.9 Mavericks, and Windows 8.1. This allows integration of these code segments into the reader’s own projects with minimal changes.

Other chapters in R High Performance Programming discuss simple tweaks to use less memory, techniques to speed processing of large datasets and using parallel processing and clustering techniques. The last chapter contains a discussion of using R and Hadoop to process Big Data (massive datasets with sizes measured in petabytes -one petabyes is 1,048,576 gigabytes). Processing data of this magnitude presents many challenges and is an area that is currently the subject of much program development.

I found R High Performance Programming to be a useful and informative book for the advanced user of R. A working knowledge of statistics, R and other programming languages such as C++ or Java is necessary to realize the full benefit of the techniques presented in the book. The book also serves as a good learning tool for less knowledgeable R users who are seeking to advance their programming skills.

Readers who are interested in the use of Hadoop and cluster computer processing might find the book Raspberry Pi Super Cluster by Andrew K. Dennis of interest. (Packt Publishing, 2013

PAC-14-1987838-1387169). A review of this book can be found on my web site at http://dmwiig.net.

Reviewer Information:

Douglas M. Wiig, Professor of Political Science

Grand View University

Teaching areas include social science statistics and research methods, comparative politics, international politics.

Long time user and developer of computer and statistical applications

Host of Open Source Technology in Higher Education web site at http://dmwiig.net

Creator and moderator of LinkedIn discussion forum “Open Source Technology in Higher Education”

Regular contributor to several LinkedIn discussion forums

Author of numerous tutorials on using the R statistical programming language and Raspberry Pi computer

Book Review: Raspberry Pi Super Cluster


 

Andrew K. Dennis. Raspberry Pi Super Cluster. Birmingham, England: PACKT Publishing, 2013.

A book review by D.M. Wiig

In the computer world clusters and supercomputers are used for some of the most demanding and complex tasks facing todays technology. Raspberry Pi Super Cluster by Andrew Dennis is a recently published work that demonstrates how this technology can be explored right in your own home or in the classroom using modest, inexpensive hardware and readily available free open source software.

This book is a well written and easy to understand introduction to the theory and practice of parallel computing that is suitable for hobbyists, educators or others who what to explore this interesting facet of computing. The widespread availability and low price of the Raspberry Pi computer makes building a real parallel computing cluster available for anyone who is interested in exploring this topic. In order to get the most from this book the reader should have some experience in working with computers and programming languages. A knowledge of the concepts involved in parallel and cluster computing is not required as the author covers the basics of these topics quite thoroughly. Some knowledge of working with the Raspberry Pi the Linux command line interface is also desirable.

The author starts out in chapter one with a discussion of some of the basic concepts involved in parallel computing such as supercomputers, multi-core and multi-processor machines, and cloud computing. Central to this introduction is the concept of commodity hardware clusters. The concept of using these groups of commodity off-the-shelf single board computers was pioneered in the late 1990’s and were know as Beowolf clusters, the name given to the concept of Network of Workstations (NOW) for scientific computing. The author concludes the introduction with a discussion of the Raspberry Pi computer which forms the basis of the computing cluster developed in the book. There is also a brief consideration of programming languages such as C, C++, and FORTRAN which are commonly used in Linux based computer clusters.

The author moves on to discuss in detail the hardware and software required to set up the cluster. Topics include setting up the Raspberry Pi, downloading and installing the Raspian operating system on an SD card and the initial setup of options such a SSH, the nano text editor, and installing the GCC FORTRAN compiler.

Chapter three of the book is devoted to the basics of setting up the foundation of a parallel computer interface with the MPI (Message Passing Interface) implementation. The book presents a step by step approach to downloading, installing, and configuring the MPICH software which is at the basis for the parallel computing environment. Once the system has been set up and tested on the first RPi the author turns to the task of setting up the second RPi that will be used in the configuration. It should be noted that the author provides abundant and detailed references to additional resources that the reader can access to assist in understanding or expanding upon the procedures discussed in the book. When the second RPi has been set up the author presents the design of a test program that will be used to check the installation, including detailed discussion of the code that is used. There is a nice feature of books published by PACKT that should be noted at this point. If the reader purchased the book from the publisher directly there is access to a download of all of the code that is presented in the book. This is a tremendous time saving feature and can help reduce coding mistakes that can lead to frustrating and hard to find errors.

While the first half of the book deals primarily with the installation and configuration of the RPi parallel cluster, the second half of the book deals with the application and development of distributed applications that will run on the RPi cluster. The author starts with a discussion of the technology known as Apache Hadoop, which is an open source project for developing distributed applications and is hosted by the Apache Software Foundation. The reader is then taken through the process of downloading and installing Java and the Java Development Kit, and downloading, installing, configuring, and testing the Hadoop server. Once again, there is a detailed and relatively easy to understand presentation of each step involved in the process. The author then turns to the setup of the second RPi, which is very similar to the setup for the first RPi. The second RPi setup tends to go faster as there is some duplication of configuration files.

The remaining chapters of the book are devoted to a presentation of some specific applications that can be run on the RPi cluster. There is a nice discussion of using the MapReduce programming approach on the RPi cluster. MapReduce is a programming approach that allows systems to process large datasets in parallel. The author takes the reader through an overview of the WordCount MapReduce program and a step by step testing of this program on the RPi cluster. There is also a chapter devoted to Monte Carlo simulators, which use large data sets and randomized sampling repeatedly in order to obtain a result for a particular mathematical question. The reader is walked through an example of using this technique on the RPi cluster to calculate Pi. The last chapter of the book explores other topics relating the the RPi cluster such as adding external USB disk drive for greater storage capacity and installing and experimenting with the FORTAN programming language on the cluster.

I found this book to be interesting, informative and challenging. It stimulated my interest in furthering my knowledge of cluster computing and the potential of the Raspberry Pi computer in that endeavor. I am a big fan of open source projects and I currently own two RPi’s. One is being used as a dedicated web server that hosts my WordPress Raspberry Pi and R statistics web site. The other is for experimental purposes. After reviewing this book I am planning to add a third (or fourth) RPi to my collection so that I can experiment with parallel computing. I recommend this book to computer users at all levels. It will help you in reading the book if you have some experience with computer hardware, operating systems, and programming languages, but for those less knowledgeable readers the author provides abundant links to additional information, source code and other sources that make this a good read for those with less hands on experience.

————————————————————

Author Information: Douglas M. Wiig

I am a Professor of Political Science at Grand View University in Des Moines, Iowa, USA. My teaching areas of expertise include social science statistics, social science research methods, comparative and international politics. I am also interested in developing methods to integrate technology into the university curriculum. I have used computers, and various programming languages in the classroom, in academic research and writing an in personal projects since the days when data and programming instructions where entered into mainframe behemoths on punched cards and personal computing platforms were still a dream. I am a big fan of open source projects and contribute whatever I can to the continuing growth and success of the community.

Contact Information: Douglas M. Wiig

Email: dwiig@grandview.edu dmartin6412@gmail.com

Web Site/Blog: http://raspberrypiandr.net

 

Book Review: Raspberry Pi Super Cluster


 

Book Review: Piotr J. Kula. Raspberry Pi Server Essentials. Birmingham, UK: Packt Publishing, 2014.

 A book review by D.M. Wiig

Raspberry Pi Server Essentials is an informative, step by step discussion of how this amazing little computer can be set up as a fully functioning web server. The book begins with a discussion of the basics of setting up a Raspberry Pi and walks the reader through the process of obtaining necessary hardware, installation of the Raspian operating system and initial system configuration. There is also a brief discussion of the design of the Raspberry Pi for readers who are more technically inclined.

I might point out that if the reader is not comfortable working at the command line level and performing system operations such as disk formatting and writing or directory tasks that this section may be a little daunting. Less technically inclined readers may want to purchase an SD card that is preloaded with the Raspberry Pi operating system software. These cards are available from a number of sources at a reasonable price and provide plug-and-play convenience.

After discussing the Raspberry Pi hardware setup the author moves to a consideration of network configuration from Local Area Networks to wireless and Ethernet connections. Once again there is a concise presentation of some of the basics for readers who have some experience working with routers and home networks. After a discussion of performing Raspberry Pi system updates and some basic system monitoring functions the author turns to the task of installing a web server on the Raspberry Pi.

There are several good open source web servers available for Linux operating systems such as Apache software, but the author points out that while these servers contain a number of useful features and are very powerful they are also cumbersome when used on a computer with limited RAM and a relatively slow processor such as the Raspberry Pi. The use of a fast PHP based web server called nginx (pronounced ‘engine x’) is one solution to this problem. Nginx is a fast lightweight server that is designed to deliver the maximum content with a minimum load on system resources. The author first walks the reader through a discussion of downloading and installing nginx. There is also a discussion of downloading and setting up a lightweight SQL database server called SQLite3 to run on the server.

The remaining chapters of the book discuss how to set up and use a number of useful applications on your now functioning Raspberry Pi web server. These applications include setting up and managing a file server, using the Raspberry Pi as a game server for popular open source games such as OpenTTD, using the official HD camera module designed by the Raspberry Pi Foundation for streaming live HD video, and setting up the Raspberry Pi to control a home media center.

There is also an interesting discussion of setting up software on the Raspberry Pi for use with the Bitcoin cryptocurrency implementation. Readers are walked through the installation of Bitcoin software bitcoind on the Raspberry Pi and the use of Bitcoin wallets and Bitcoin web addresses. The chapter concludes with a brief section on Bitcoin mining with CGMiner software.

Raspberry Pi Server Essentials is a concise yet informative look at how the Raspberry Pi can be used in a variety of web server applications. Some technical knowledge of basic hardware and command level interaction with the operating system software is helpful in reading this book but not essential. For those readers who desire more information the author provides a number of links to additional resources pertaining to the material covered in each chapter. The world of open source technology is an amazing one. This book is a good read for those who want to venture into managing their own open source based web server.

————————————————————-