Friday, March 13, 2009

Parallel Processing with Single-Thread SAS license

[NESUG 22]

SAS now support Multi-threads Multi-processor in various PROCs. To fully leverage multi-processor speedup, a server version SAS with Multi-Thread support, or SAS/MP Connect modular is necessary which is very expensive. Is there a way to get around with only SAS/Base/STAT/ETS?

While not fully capable to utilize multi-CPUs multi threads in all jobs, the method introduced here are good for some computing-intensive tasks that are popular in modeling and anlysis, such as Bootstrapping, Cross-Validation and Simulation. These methods involve almost identical computation on multiple independent replicate samples/subsamples, thus, we can decompose them into multiple independent computation pieces.

We present two simple bootstrapping demos here, using the data and program from SAS Documentation for PROC NLMIXED example 51.1.

The first method used "x" command to execute a Batch job in DOS.



The main program will call batch jobs using "x command/statement" with concurrent mode where mutliple SAS jobs can access the same sas data set. In order to differentiate different pieces of the computing job, SAS automatic variable -SYSPARM is introduced to pass relavent information. Of course, the operating data sets must be stored in one or multiple permanent libraries so that it is accessible for all batch jobs.




The macro will be executed in batch mode. Macro variables used to differentiate different jobs are passed from system automatic variable -SYSPARM. SYSJOBID can also be used for the same purpose. SAS Institute Inc. provide a comprehensive tutorial[2] on how to set up Batch Jobs and execute them in SAS, and Jackson explained how to use -SYSPARM in Batch mode [1]. Kosian, Sassoon called this method "virtual multi-threading" [5].

This method, however, has a major issue, that is it is hard to control and check the status of the batch jobs. Multiple Stage jobs could easily get coding a very ugly work.

Suppose we are going to do further analysis based on part or all of previous jobs's results, and the execution time of these jobs various, then, the main SAS program needs to wait for all these needed jobs to complete in order to continue. This Batch Execution using "X" command doesn't provide an easy way for current SAS session to check wether these batch jobs have been finished correctly. Note that we can't wait for them to execute sequentially. One possible method is to ask the batch jobs to write to a txt file when it is finished, and a follow up macro program in the current SAS session will use a NULL data step to check this txt file every, say, 2 minutes. This can be done via a sleep function in the NULL data step. Once all needed job finishing codes were read from this txt file, following SAS jobs can be executed.

Obviously this is very inconvenient. Fortunately, SAS provides a unified production framework to help us complish this task [3].

In method 2, we utilize the 'systask command' statement to execute SAS jobs. The syntax is:

SYSTASK COMMAND "operating system-command"
WAIT | NOWAIT
TASKNAME=taskname
MNAME=name-var
STATUS=stat-var
SHELL<="shell-command">;

Consult with SAS Documents for detailed information.

The most relavant options are TASKNAME and STATUS. TASKNAME specifies a name that identifies the task, which then is used with WAITFOR statement to pause the current SAS session until the tasks finished. STATUS specifies a macro variable in which you want SYSTASK to store the status of the task. A sample demo is as following:



Note that this method is able to complete most of the Multi-Processor jobs that SAS/MP Connect was specifically built for, which also requires parallelized jobs be independent. Bentley [4] showed several typical production jobs that can be boosted using parallel computing, but resort to expensive SAS/MP Connect software.

In conclusion, we showed that with a bit more coding and careful design of the job flows, we are able to utilize multi-core capability of moder computer without resort to more expensive SAS modular.

For comparison, check the CPU usage statistics when using and not using the method proposed here.
=> Single Thread PROC NLMIXED;


=> Make It Two Threads


--Ref:
1. Jackson, Clarence Wm., "Using SAS® SYSPARM And Other Automatic System Variables", NOTSUG May 2009;
2. SAS Technical Notes TS-648, "Examples of Batch Processing under Windows", SAS Institute Inc.
3. Cogswell, Denis, "More Than Batch – A Production SAS® Framework", SUGI 30, Applications Development
4. Bentley, John E., "SAS Multi-Process Connect: What, When, Where, How, and Why", SUGI 26, Systems Architecture
5. Kosian, Sassoon, "A Generic Method of Parallel Processing in Base SAS® 8 and 9", SUGI 27, Coder's Coner

====================================================
FYI:
Federal Government contract SAS license and renewal fee comparison between 1 PC user and multi users [Public Information]. Now, you save something nontrivial.