Sunday, August 9, 2015

A Pseudo - Quasi Race in Monte Carlo Integration

A Pseudo Quasi Race (or A Quasi Pseudo Race)

A Pseudo Quasi Race (or A Quasi Pseudo Race)

Alex Wood published a nice little introduction to Monte Carlo methods, in which as part of his introduction he showed how to compute an approximation to the value of \( \pi \) through Monte Carlo integration using pseudo randomly generated numbers as points in a 2D field. A circle is bounded in a square and points are generated within the square. The ratio of points inside the circle over the total number of points provides the approximation to \( \pi \).

I have added here quasi-randomly generated sequences as points, that, in small dimensions, typically make that ratio converge more quickly to a better approximation. Can we see however when this falls apart as the number of dimensions increases? To do this we need simply create (hyper)spheres bounded by (hyper)cubes, in larger and larger dimensions. Here I show approximating \( \pi \) in dimensions 2 through 7, with both pseudo- and quasi- randomly generated point sets.

Formulas for hypersphere volumes can be found here

Alex Woods blog at AnalyticBridge

His code can be found here

Libraries

library(randtoolbox) #For quasi-random sequence generators
## Loading required package: rngWELL
## This is randtoolbox. For overview, type 'help("randtoolbox")'.
library(ggplot2)

Plotting the grid of error results needs the following code from the Cookbook for R

multiplot <- function(..., plotlist=NULL, cols) {
    require(grid)

    # Make a list from the ... arguments and plotlist
    plots <- c(list(...), plotlist)

    numPlots = length(plots)

    # Make the panel
    plotCols = cols                          # Number of columns of plots
    plotRows = ceiling(numPlots/plotCols) # Number of rows needed, calculated from number of cols

    # Set up the page
    grid.newpage()
    pushViewport(viewport(layout = grid.layout(plotRows, plotCols)))
    vplayout <- function(x, y)
        viewport(layout.pos.row = x, layout.pos.col = y)

    # Make each plot, in the correct location
    for (i in 1:numPlots) {
        curRow = ceiling(i/plotCols)
  curCol = (i-1) %% plotCols + 1
        print(plots[[i]], vp = vplayout(curRow, curCol ))
    }
}

Initialize and Generate Point Sets

radius <- 0.5

distanceFromCenter <- function(a) {
  sqrt(sum((center - a) ^ 2))
}

numpoints <- 2000
piVecp <- numeric(numpoints)
piVecq <- numeric(numpoints)
numdimensions <- 7
sphereDim <- 2:7
#The following code can be substituted such that one single run of runif is used with a single seed.  Any single generation of points will create an approximation with far less variability overall.  These may even be better approximations on average than quasi-random points for a specific seed.  In general this is not the case.
#Different seeds will change runif output.  Try different seeds
#set.seed(13) 
#pseudomat <- matrix(runif(numpoints*numdimensions, min=0, max=1), 
#                    nrow = numpoints, ncol = numdimensions, byrow = T)
#One can do the same with the Halton sequence, but not the Hammersley sequence
#as one of the dimensions is dependent on the length of the number of points
#quasimat  <- as.matrix(halton(numpoints,dim=numdimensions,
#                              init = TRUE, normal = FALSE, usetime = FALSE),
#                       nrow = numpoints, ncol = numdimensions, byrow = T)

#The following code uses the Hammersley sequence as it has a little better covergence in practice (it has lower discrepancy -- and ironically so, as the first dimension is simply the i/numpoints sequence).

multipliers <- c(4,6,32,60,384,840) #hypersphere to hypercube ratio

Loop through Hypersphere Volumes and Determine Error

Warning, this code will take a minute or two to run on a laptop. Refactored without loops might make it a bit quicker.

plotlist1 <- list()
plotlist2 <- list()
for(dimension in sphereDim){
   center <- rep(radius,dimension)
   for(i in 1:numpoints) {
      #set.seed(13)
      set.seed(numpoints-i+3)
      #Pseudo random points:  the next line can be commented out if the pseudo-random points are generated all at once, as mentioned in the notes above
      pseudomat <- matrix(runif(i*dimension), nrow = i, 
                          ncol=dimension, byrow = T)
      b <- apply(pseudomat[1:i,1:dimension, drop=FALSE], 
                 1, distanceFromCenter)
      d <- subset(b, b < radius)
      num <- length(d) / length(b)
      if(dimension < 4){ #ratio in terms of pi
         piVecp[i] = num*multipliers[dimension-1]
      }else if(dimension >= 4 & dimension < 6){ #pi squared ratio
         piVecp[i] = sqrt(num*multipliers[dimension-1])
      }else #ratio in terms of pi cubed
         piVecp[i] = (num*multipliers[dimension-1])^(1/3)

      #Quasi random points (Halton Sequence converted to the Hammersley Sequence)
      quasimat <- as.matrix(halton(i,dim=dimension-1, init = TRUE, 
                                    normal = FALSE, usetime = FALSE), 
                             nrow = i, ncol = dimension, byrow = T)
      quasimat <- cbind(quasimat,(1:i)/i)
      #For just the Halton sequence use just the following.
      #quasimat <- as.matrix(halton(i,dim=dimension, init = TRUE, 
      #                              normal = FALSE, usetime = FALSE), 
      #                      nrow = i, ncol = dimension, byrow = T)
      #... or comment out the quasimat code above and use the code where
      #quasimat is generated in one go, before the loops.
      b <- apply(quasimat[1:i,1:dimension, drop=FALSE], 
                 1, distanceFromCenter)
      d <- subset(b, b < radius)
      num <- length(d) / length(b)

      if(dimension < 4){ 
         piVecq[i] = num*multipliers[dimension-1]
      }else if(dimension >= 4 & dimension < 6){ 
         piVecq[i] = sqrt(num*multipliers[dimension-1])
      }else
         piVecq[i] = (num*multipliers[dimension-1])^(1/3)
   }
   PiVec <- data.frame(ind = 1:numpoints,
                       pseudoerror=abs(pi-piVecp),
                       quasierror=abs(pi-piVecq),
                       errordiff=abs(pi-piVecp)-abs(pi-piVecq))
   #Plotting
   plotlist1[[dimension-1]] <- 
     ggplot(PiVec, aes(x=ind)) +
       geom_line(aes(y=pseudoerror,colour = "Pseudo"),color='#CC0000') +
       geom_line(aes(y=quasierror,colour = "Quasi"),color='#663399') +
       ggtitle(paste(as.character(dimension),"D Sphere",sep="")) +
       ylim(0,1) +
       xlab("Sample Size") +
       ylab("Error")

   plotlist2[[dimension-1]] <- 
     ggplot(PiVec, aes(x=ind)) +
       geom_line(aes(y=errordiff,colour = "Quasi-Pseudo Error"),color='#990033') +
       ggtitle(paste(as.character(dimension),"D Sphere P-Q Errors",sep="")) +
       ylim(-0.25,0.25) +
       xlab("Sample Size") +
       ylab("Error")
}

Plotting the Results

The first plot shows both the pseudo-random point-wise error in red and the quasi-random point-wise error in blue for all six sphere dimensions.

Note there may be Warnings about the missing values from geom_path. This is because at I have set the range of the error between 0 and 1, but the error can be greater than 1, especially at the beginning of the sequence of points, and as the dimension grows.

## Loading required package: grid
## Warning: Removed 2 rows containing missing values (geom_path).
## Warning: Removed 2 rows containing missing values (geom_path).
## Warning: Removed 1 rows containing missing values (geom_path).
## Warning: Removed 2 rows containing missing values (geom_path).
## Warning: Removed 1 rows containing missing values (geom_path).
## Warning: Removed 3 rows containing missing values (geom_path).
## Warning: Removed 3 rows containing missing values (geom_path).
## Warning: Removed 6 rows containing missing values (geom_path).
## Warning: Removed 22 rows containing missing values (geom_path).
## Warning: Removed 14 rows containing missing values (geom_path).
## Warning: Removed 25 rows containing missing values (geom_path).

plot of chunk unnamed-chunk-1

The next plot shows the difference between the quasi-point error and the pseudo-point error. Here again there may be warnings from geom_path as I have set the error difference to be between -0.25 and 0.25.

## Warning: Removed 6 rows containing missing values (geom_path).

plot of chunk unnamed-chunk-2

The error using the pseudo-random points on average is considerably worse until the dimensions increase. Around dimension 5 it becomes clear that the error using quasi-random points can perform worse, depending on the number of points. By dimension 7, there is high variability of both errors, such that even though the use of quasi-random points on average is still better, it isn't by much, and very dependent on the number of points.