Wednesday, January 10, 2007

Cartesian Product Self Join using SubQuery

Two datasets
1.
group cat count
1 2 10
1 3 11
2 3 29
2 4 33
3 1 2
3 4 4
3 5 1

Need to transform it to this:
group cat count
1 1
1 2 10
1 3 11
1 4
1 5
2 1
2 2
2 3 29
2 4 33
2 5
3 1 2
3 2
3 3
3 4 4
3 5 1

This is a Catesian Product Self Join. Solving it by Subquery.
data a;
input group cat count;
datalines;
1 2 10
1 3 11
2 3 29
2 4 33
3 1 2
3 4 4
3 5 1
;

proc sql;
create table a1 as select x.group, y.cat
from
(select distinct group from a ) as x,
(select distinct cat from a) as y
;

create table a2 as
select a1.*, a.count
from a right join a1
on a.group=a1.group
and a.cat=a1.cat
;
quit;

This program, however, assumes the all distinct item in the second column will appear. If it is not the case, follow these steps:

extract max(cat)
generate a data set with cat=1...max_cat,
then use the above code

these steps are unavoidable.

Therefore, this more complicated scenarior can be handled by SAS Data step in a more efficient and compact way.

0 comments: