* Update: The SAS code and associated paper are now published at Journal of Statistical Software .
========================================================================
PROC GLIMMIX is good tool for generalized linear mixed model (GLMM), when the scale is small to medium. When facing a large scale GLMM, such as modeling all ZIPs nested in Counties nested in all 51 States in US, a 64-bit machine with extremely large memory is required and the computing may last for months! In fact, more often than not, the modeler will encounter situation where PROC GLIMMIX reports in log:
ERROR: Integer overflow on computing amount of memory required.
NOTE: The SAS System stopped processing this step because of insufficient memory.
In a strictly nested hierarchical model, however, the variance covariance matrix is very sparse, and taking advantage of this property can accelerate computing by many folds.
The %HPGLIMMIX SAS macro is made for large scale Hierarchical Mixed Models, and stands for High Performance GLIMMIX. As an example, a sample data using Gamma Regression is shown below, with all ZIPs in AK, AL, AR, AZ with 2-level hierarchies: State and ZIP within State, total 4 blocks with max 693 columns per block. The reason not all ZIPs and all states are used is simply because PROC GLIMMI blows up on the machine.
Copmaring the estimates and std errors from both runs, they are the same, but drastically different running time of 71sec using %HPGLIMMIX v.s. 35min39sec using GLIMMIX.
GLIMMIX output:
========================================================================
PROC GLIMMIX is good tool for generalized linear mixed model (GLMM), when the scale is small to medium. When facing a large scale GLMM, such as modeling all ZIPs nested in Counties nested in all 51 States in US, a 64-bit machine with extremely large memory is required and the computing may last for months! In fact, more often than not, the modeler will encounter situation where PROC GLIMMIX reports in log:
ERROR: Integer overflow on computing amount of memory required.
NOTE: The SAS System stopped processing this step because of insufficient memory.
In a strictly nested hierarchical model, however, the variance covariance matrix is very sparse, and taking advantage of this property can accelerate computing by many folds.
The %HPGLIMMIX SAS macro is made for large scale Hierarchical Mixed Models, and stands for High Performance GLIMMIX. As an example, a sample data using Gamma Regression is shown below, with all ZIPs in AK, AL, AR, AZ with 2-level hierarchies: State and ZIP within State, total 4 blocks with max 693 columns per block. The reason not all ZIPs and all states are used is simply because PROC GLIMMI blows up on the machine.
Copmaring the estimates and std errors from both runs, they are the same, but drastically different running time of 71sec using %HPGLIMMIX v.s. 35min39sec using GLIMMIX.
1715
1716 options nomprint nomlogic;
1717 %hpglimmix(data=temp2,
1718 stmts=%str(
1719 class zip zip_state;
1720 model y = x ;
1721 random int zip/subject=zip_state;
1722 ),
1723 error=gamma,
1724 link=LOG,
1725 options=NOTEST);
NOTEST
The HPGLIMMIX Macro
Data Set : WORK.TEMP2
Error Distribution : GAMMA
Link Function : LOG
Response Variable : Y
Job Starts at : 06JUN2011:15:51:19
HPGLIMMIX Iteration History
Iteration Convergence criterion
1 0.0081058432 13 sec
2 0.0004213646 13 sec
3 2.7137935E-7 13 sec
4 3.1854799E-9 12 sec
Output from final Proc HPMixed run:
Job Ends at : 06JUN2011:15:52:30
1726 options nomprint nomlogic;
1727
1728 proc glimmix data=temp2;
1729 class zip zip_state;
1730 model y = x /s dist=gamma;
1731 random int zip /subject=zip_state;
1732 run;
NOTE: Convergence criterion (PCONV=1.11022E-8) satisfied.
NOTE: PROCEDURE GLIMMIX used (Total process time):
real time 35:38.93
cpu time 34:30.90
GLIMMIX output:
Covariance Parameter Estimates
Standard
Cov Parm Subject Estimate Error
Intercept zip_state 0.000152 0.000125
zip zip_state 0.000066 3.105E-6
Residual 0.000405 2.281E-6
Solutions for Fixed Effects
Standard
Effect Estimate Error DF t Value Pr > |t|
Intercept 6.5873 0.006180 3 1065.95 <.0001
x 0.003436 0.000218 62634 15.79 <.0001
%HPGLIMMIX output:
_cov data from HPGLIMMIX
Obs CovParm Subject Estimate
1 Intercept zip_state 0.000152
2 zip zip_state 0.000066
3 Residual 0.000405
_soln data from HPGLIMMIX
Obs Effect Estimate StdErr
1 Intercept 6.5873 0.006180
2 x 0.003436 0.000218
4 comments:
Cool!
Where can I download this macro? I mean get the source code for this macro?
How to get the source code for %Hpglimmix macro?
Right now it is not open to public.
Post a Comment