Tuesday, January 31, 2012

Multi-Threaded Principle Component Analysis




SAS used to not support multithreading in PCA, then I figured out that its server version supports this functionality, see here. Today, I found this mutlithreading capability is finally available in PC SAS v9.22.

The figure above indicates that all 4 threads in my PC are utilized. FYI, My PC uses an Intel 2core 4threads CPU. This multi-threading capability directly help any work relying on SVD due to the direct relationshipbetween SVD and PCA, see here.

Notice that in order to observe the effect of mutli-threading by comparing Real User Time and CPU Time, I/O should not be a bottleneck, that is why in the code, all outputs, either to screen or to data sets, are suppressed.

PS: It turns out that the multi-threading capability is only available when SAS is building up SSCP /USSCP matrix in PROC PRINCOMP.



options fullstimer;
data _junky;
     length id x: 8;
  array x{800};
  do id=1 to 5E3;
     do j=1 to dim(x);
     x[j]=ranuni(0);
  end;
  drop j; output;
  end;
run;

proc princomp data=_junky noprint;
      var x:;
run;

Monday, January 30, 2012

Random Number Seeds: NOT only the first one matters!


Today, Rick (blog @ here) wrote an article about random number seed in SAS to be used in random number functions in DATA Step. Rick noticed when multiple random number functions are called using different seeds, only the first one matters.

This is so true. In fact, SAS Manual also has a comprehensive writting on this issue, namely, how to control seeds for each iteration of the random number generation process and how to generatie multiple statistically independent streams of random numbers, see here. In fact, sasCommunity.org also has an article about this issue, see here.

To echo Rick's post,  there is a way to control the seed so that NOT only the first one matters: use CALL routines which by theory will generate computationally independent random number sequence. But if the two seeds are too close, the generated sequences may not be statistically independent. Again, refer to the SAS manual for details.




data normal;
   seed1 = 11111;
   seed2 = 22222;
   seed3 = 383333;
   do i = 1 to 1000;
      call rannor(seed1, x1);
      call rannor(seed2, x2);   
      call rannor(seed3, x3);
   x4=rannor(seed2);
   x5=rannor(seed3);
      output;
   end;
run;

proc sgscatter data=normal;
   matrix x1-x5/ markerattrs = (size = 1);
run;