Thursday, September 30, 2010

Generalized Discriminant Analysis

Using SAS to implement Flexible Discriminant Analysis [Chapter 12, ESL].

Since a discriminant analysis is equivalent to a 2-step process, i.e. regress first then conduct discriminant analysis, it is easy to implement the so called generalized discriminant analysis shown in Ch.12.4--12.6 of Elements of Statistical Learning. The basic idea here is to use some regression analysis procedures, such as using PROC REG and its RIDGE= option in MODEL statement for the ridge regression, and then use the prediction from L2 regularized regression in next step's discriminant analysis. Using PROC GLMSELECT, we can replace L2 regularization with a L1 regularization.

A piece of prototype code looks like this:

PROC GLMMOD data=&yourdata  OUTDESIGN=&design;
     CLASS &dep_var;
     model X = &dep_var /noint;
RUN;
data &yourdata;
     merge &yourdata  &design;
     rename Col1-Col&k = Y1 -Y&k;
run;
%let deps= Y1-Y&k; /* for the case of 5-class problem */
PROC REG  DATA=&yourdata RIDGE=&minridge to &maxridge by 0.1  OUTEST=beta;
MODEL &deps = &covars ;
OUTPUT  OUT=predicted  PRED=&dep._HAT;
RUN;

PROC DISCRIM DATA=predicted &options;
CLASS  &dep;
VAR    &dep._HAT;
RUN;



The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition (Springer Series in Statistics)

0 comments: