tag:blogger.com,1999:blog-29815492.post9219117087684024094..comments2009-06-14T12:13:57.869-05:00Comments on SAS Programming for Data Mining Applications: Randomly Split SAS table exactly according to a gi...L Xhttp://www.blogger.com/profile/02274752582289554390[email protected]Blogger2125tag:blogger.com,1999:blog-29815492.post-24752926319273439042009-06-14T12:13:57.869-05:002009-06-14T12:13:57.869-05:00Some one on SAS-L asked how to split a SAS data se...Some one on SAS-L asked how to split a SAS data set into N pieces given a N-by-1 probability vector, while the distribution of some given variables, TDSP and VS here, should be the same in the smaller files which equals the distribution of the original file.<br /><br />To ensure randomness, I used K/N algorithm which built upon conditional probability of spliting. SAS-L had some extensive discussion on this methed several years.<br /><br />What I did here is extending the K/N algorithm to take care of control variables. Taking any value combination of control variables as a strata, the randomness is archieved within this strata since the distribution of control variables are given by the original data set.L Xiehttp://www.blogger.com/profile/02274752582289554390[email protected]tag:blogger.com,1999:blog-29815492.post-13063769750560744342009-06-14T07:03:54.170-05:002009-06-14T07:03:54.170-05:00Hi, I was wondering whether you could elaborate o...Hi,<br /><br />I was wondering whether you could elaborate on what this code is intended to accomplish. <br /><br />I gather there are two inputs. One is a probability vector, which I take to be a vector of non-negative weights which sum to one. The other input is a SAS table. <br /><br />What I don&#39;t understand is what the desired outcome is. What does it mean to split a table according to a vector?<br /><br />For instance, if the weights were .1, .2, .3, and .4, and the table consists of 1000 records each containing a single integer between 1 and 1000, no duplicates, what is the expected outcome? <br /><br />Thank you.drcabanahttp://www.blogger.com/profile/11725842337325955196[email protected]