1

Reading the vignette for doparallel.

Are the following two code blocks one and the same?

library(doparallel)
  no_cores <- 8
  cl <- makeCluster(no_cores) 
  registerDoParallel(cl)
pieces <- foreach(i = seq_len(length(pieces))) %dopar% { # do stuff}

Is above just the same as this:

library(doparallel)
  registerDoParallel(cores = 8)
pieces <- foreach(i = seq_len(length(pieces))) %dopar% { # do stuff}

Must I makeCluster() when using doparallel if I want to use multiple cores? or is the single line enough registerDoParallel(cores = 8)

Doug Fir
  • 19,971
  • 47
  • 169
  • 299
  • From what I can tell, it might even be better to NOT use makeCluster() since using just registerDoParallel() seems to automatically import all needed functions and objects into the cluster without having to do it manually – Doug Fir Aug 20 '17 at 07:56
  • Possible dup of https://stackoverflow.com/q/28829300/6103040 and https://stackoverflow.com/q/28989855/6103040 – F. Privé Aug 20 '17 at 09:03
  • For what it's worth, the first line of the single example to `?registerDoParallel` is `cl <- makePSOCKcluster(2)`. – lmo Aug 20 '17 at 12:18

1 Answers1

4

On a Windows machine, these two examples are basically equivalent. The only difference is that the first example uses an explicit cluster object and the second uses an implicit cluster object that is created when you execute registerDoParallel. The performance of the two examples should be the same.

On a Mac or Linux machine, the first example uses the snow derived backend (exactly the same as on a Windows machine), ultimately using clusterApplyLB to perform the parallel computations. The second example uses the multicore derived backend (which was never available on Windows), ultimately using mclapply to perform the parallel computations which will probably be somewhat more efficient than the first example.

Steve Weston
  • 19,197
  • 4
  • 59
  • 75
  • Thanks. After some testing and trial/error (I'm on Mac locally but using a remote server) I found that the first option gave me memory allocation errors whereas the second option did not. – Doug Fir Aug 21 '17 at 14:43
  • 1
    @DougFir That's not surprising because `mclapply`'s use of `fork` to start the workers often allows them to use your machine's memory more efficiently. – Steve Weston Aug 21 '17 at 14:58
  • Not sure if related or not but ran a dopar block which runs beautifully however leaves a lingering memory issue. Tried closing connections manually too but no success. I keep having to restart r session https://stackoverflow.com/questions/45859005/cannot-allocate-memory-even-after-deleting-large-objects-and-closing-connections – Doug Fir Aug 24 '17 at 12:02