SAS Programming for Data Mining: September 2010

Thursday, September 30, 2010

Low Rank Radial Smoothing using GLIMMIX and its Scoring

Low Rank Radial Smoothing using GLIMMIX [1], a semiparametric approach to smooth curves [2]. Specifying TYPE=RSMOOTH option in RANDOM statement, we can implement this spline smooth approach. The best thing is that for future scoring, data preparation is extremely easy by using the OUTDESIGN= & NOFIT options in v9.2 PROC GLIMMIX, then use PROC SCORE twice on this design matrix to score the fixed effects design matrix X and the random effects design matrix Z, respective, add up together is the score from this radial smoothing method.

[Coming soon]


proc glimmix data=train_data  absconv=0.005;
     model y = &covars /s;
     random &z /s type=rsmooth  knotmethod=equal(20);
run;

proc glimmix data=test  nofit  outdesign=test2;
     model y=&covars /s;
     random &z /s type=rsmooth knotmethod=equal(20);
run;


proc score data=test2  score=beta_fix  type=parms  out=score_fix;
     var  &covars;
run;

proc score data=test2 score=beta_random type=parms  out=score_random;
     var _z:;
run;

Reference:

1. SAS Institute, Statistical Analysis with the GLIMMIX procedure Course Notes, SAS Press, SAS Institute
2. D Rupper, M.P. Wand, R.J. Carroll, Semiparametric Regression, Cambridge University Press, Cambridge, 2003

Semiparametric Regression (Cambridge Series in Statistical and Probabilistic Mathematics)

Generalized Discriminant Analysis

Using SAS to implement Flexible Discriminant Analysis [Chapter 12, ESL].

Since a discriminant analysis is equivalent to a 2-step process, i.e. regress first then conduct discriminant analysis, it is easy to implement the so called generalized discriminant analysis shown in Ch.12.4--12.6 of Elements of Statistical Learning. The basic idea here is to use some regression analysis procedures, such as using PROC REG and its RIDGE= option in MODEL statement for the ridge regression, and then use the prediction from L2 regularized regression in next step's discriminant analysis. Using PROC GLMSELECT, we can replace L2 regularization with a L1 regularization.

A piece of prototype code looks like this:


PROC GLMMOD data=&yourdata  OUTDESIGN=&design;
     CLASS &dep_var;
     model X = &dep_var /noint;
RUN;
data &yourdata;
     merge &yourdata  &design;
     rename Col1-Col&k = Y1 -Y&k;
run;
%let deps= Y1-Y&k; /* for the case of 5-class problem */
PROC REG  DATA=&yourdata RIDGE=&minridge to &maxridge by 0.1  OUTEST=beta;
MODEL &deps = &covars ;
OUTPUT  OUT=predicted  PRED=&dep._HAT;
RUN;

PROC DISCRIM DATA=predicted &options;
CLASS  &dep;
VAR    &dep._HAT;
RUN;

The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition (Springer Series in Statistics)

SAS Programming for Data Mining

Page Title

Thursday, September 30, 2010

Low Rank Radial Smoothing using GLIMMIX and its Scoring

Generalized Discriminant Analysis

Pageviews last month