Another powerful procedure of SAS, my favorite one, that I would like to share is the PROC IML (Interactive Matrix Language). This procedure treats all objects as a matrix, and is very useful for doing scientific computations involving vectors and matrices. To get started, we are going to demonstrate and discuss the following:
With the help of the comments in the code, it wouldn't be difficult to comprehend what each line tries to tell us, so I will only explain line 33. In SAS, variables defined are not automatically stored into the workspace unless one stores it first, and then call it on other procedures by loading the storage, which we'll see on the next entry -- Math Query. Functions we'll discuss in math query involve extracting number of columns, rows, and so on, below is the sample code of this,
So to load all variables stored in the workspace, we use line 3. Succeeding lines are not that difficult to understand, and this what I love about SAS, the statements and functions are self-explanatory -- a good excuse for us to proceed with subscripting on matrices, below is the code of it
Line 17 computes the grand mean of the matrix by simply inserting
Now let's proceed to the next bullet, which is about Descriptive Statistics.
To generate random numbers from say normal distribution and computing the mean, standard deviation and other statistics, consider the following:
Line 2 above sets the initial random seed for random numbers to be generated in line 8. Line 5 allocates a matrix of dimension 20 by 1 to
SAS can also perform set operations, and it's easy. Consider the following:
Next bullet is all about Probability Functions and Subroutine. For example, consider an experiment defined by the random variable $X$ which follows an exponential distribution with mean $\beta = .5$. What is the probability of $X$ to be at most 2, $\mathrm{P}(X\leq 2)$? To solve this we use the
If we take the derivative of the Cumulative Distribution Function (CDF), the returned expression is what we call the Probability Density Function (PDF). And in SAS, we play on this using the
To end this topic, consider the inverse of the CDF, which is the quantile. To compute for the quantile of the popular level of significance $\alpha = 0.05$, from a standard normal distribution, which is $z_{\alpha} = -1.645$ for lower tail, run
Next entry is about Linear Algebra, the topic on which this procedure is based upon. Linear algebra is very useful in Statistics, especially in Regression, Nonlinear Regression, and Multivariate Analysis. To perform this in SAS, consider
Finally, one of the coolest capabilities of SAS/IML is to Read and Create SAS Data. The following code demos how to read SAS data set.
And to create a SAS data set, run
To end this post, I want to say, I am loving SAS because of IML. There are still hidden capabilities of this procedure that I would love to explore and share to my readers, so stay tuned. Another great blog about SAS/IML is The DO Loop, whose author, Dr. Rick Wicklin, is also the principal developer of the said procedure and SAS/IML Studio, do check that out.
- Creating and Shaping Matrices;
- Matrix Query;
- Subscripts;
- Descriptive Statistics;
- Set Operations;
- Probability Functions and Subroutine;
- Linear Algebra;
- Reading and Creating Data;
scalar |
---|
5 |
row_vec | |||||
---|---|---|---|---|---|
1 | 2 | 3 | 4 | 5 | 6 |
col_vec |
---|
1 |
2 |
3 |
4 |
5 |
6 |
num_mat | ||
---|---|---|
1 | 2 | 3 |
4 | 5 | 6 |
chr_mat |
---|
Hello, |
world! :D |
i_mat | |||||
---|---|---|---|---|---|
1 | 0 | 0 | 0 | 0 | 0 |
0 | 1 | 0 | 0 | 0 | 0 |
0 | 0 | 1 | 0 | 0 | 0 |
0 | 0 | 0 | 1 | 0 | 0 |
0 | 0 | 0 | 0 | 1 | 0 |
0 | 0 | 0 | 0 | 0 | 1 |
mat_2 |
---|
2 |
2 |
2 |
trow_vec |
---|
1 |
2 |
3 |
4 |
5 |
6 |
mat1 | |
---|---|
1 | 2 |
3 | 4 |
5 | 6 |
SYMBOL ROWS COLS TYPE SIZE ------ ------ ------ ---- ------ CHR_MAT 2 1 char 9 COL_VEC 6 1 num 8 I_MAT 6 6 num 8 MAT1 3 2 num 8 MAT_2 3 1 num 8 NUM_MAT 2 3 num 8 ROW_VEC 1 6 num 8 SCALAR 1 1 num 8 TROW_VEC 6 1 num 8 Number of symbols = 10 (includes those without values)
nmat_row |
---|
2 |
nmat_col |
---|
3 |
nmat_dim | |
---|---|
2 | 3 |
cmat_len |
---|
6 |
9 |
cmat_nlen |
---|
9 |
nmat_typ |
---|
N |
cmat_typ |
---|
C |
NUM_MAT | ||
---|---|---|
1 | 2 | 3 |
4 | 5 | 6 |
n22_mat |
---|
5 |
nr1_mat | ||
---|---|---|
1 | 2 | 3 |
ir12_mat | |||||
---|---|---|---|---|---|
1 | 0 | 0 | 0 | 0 | 0 |
0 | 1 | 0 | 0 | 0 | 0 |
ic12_mat | |
---|---|
1 | 0 |
0 | 1 |
0 | 0 |
0 | 0 |
0 | 0 |
0 | 0 |
ngm_mat |
---|
3.5 |
ncm_mat | ||
---|---|---|
2.5 | 3.5 | 4.5 |
nrm_mat |
---|
2 |
5 |
ngs_mat |
---|
21 |
nrs_mat | ||
---|---|---|
17 | 29 | 45 |
ncs_mat |
---|
14 |
77 |
nss_mat |
---|
91 |
nrs_mat | ||
---|---|---|
17 | 29 | 45 |
ncs_mat |
---|
14 |
77 |
:
symbol inside the place holder of the subscript. So that if we have num_mat[:, 1]
, then mean is computed over the row entries, giving us the column mean, particularly for first column. The same goes for num_mat[1, :]
, where it computes the mean over the column entries, giving us the row mean. If we replace the symbol in the place holder of the subscripts to +
, then we are interested in the sum of the entries. Further, if we use ##
symbol, the returned value will be the sum of square of the elements. And reducing this to #
, the returned value will be the product of the elements.Now let's proceed to the next bullet, which is about Descriptive Statistics.
csr_vec | |||||
---|---|---|---|---|---|
1 | 3 | 6 | 10 | 15 | 21 |
csn_mat | ||
---|---|---|
1 | 3 | 6 |
10 | 15 | 21 |
mnr_vec |
---|
1 |
mnn_mat |
---|
1 |
mxr_vec |
---|
6 |
mxn_mat |
---|
6 |
smr_vec |
---|
21 |
smn_mat |
---|
21 |
ssr_vec |
---|
91 |
ssn_mat |
---|
91 |
x1 |
---|
0.2642335 |
1.0747269 |
0.8179241 |
-0.552775 |
1.5401449 |
-1.233822 |
-0.141535 |
1.0420036 |
0.0657322 |
1.225259 |
-0.148304 |
0.2901233 |
-1.149394 |
-0.482548 |
-0.452974 |
0.2738675 |
-0.224133 |
0.218553 |
-0.420015 |
0.246356 |
x2 |
---|
54.993687 |
58.167325 |
59.147705 |
40.74794 |
45.813645 |
53.460273 |
57.877839 |
51.98273 |
49.875743 |
52.570553 |
54.097005 |
46.936325 |
57.509082 |
50.463228 |
42.775346 |
39.376643 |
53.303455 |
54.494482 |
55.747821 |
44.512206 |
x12 | |
---|---|
0.2642335 | 54.993687 |
1.0747269 | 58.167325 |
0.8179241 | 59.147705 |
-0.552775 | 40.74794 |
1.5401449 | 45.813645 |
-1.233822 | 53.460273 |
-0.141535 | 57.877839 |
1.0420036 | 51.98273 |
0.0657322 | 49.875743 |
1.225259 | 52.570553 |
-0.148304 | 54.097005 |
0.2901233 | 46.936325 |
-1.149394 | 57.509082 |
-0.482548 | 50.463228 |
-0.452974 | 42.775346 |
0.2738675 | 39.376643 |
-0.224133 | 53.303455 |
0.218553 | 54.494482 |
-0.420015 | 55.747821 |
0.246356 | 44.512206 |
x12_cor | |
---|---|
1 | -0.001531 |
-0.001531 | 1 |
x12_cov | |
---|---|
0.5645625 | -0.006864 |
-0.006864 | 35.614684 |
x1_mu |
---|
0.1126712 |
x2_std |
---|
5.967804 |
x1
variable, and that's done by using the j
function. The number of rows of x1
represents the sample size of the random numbers needed. One can also set x1
to a row vector, where in this case, the number of columns represents the sample size needed. The two sets of random sample, x1
and x2
, generated from the same family of distribution, Gaussian/Normal, are then concatenated column-wise (||
) to form a matrix of size 20 by 2 in line 13. Using this new matrix, x12
, we can then compute the correlation and covariance of the two columns using corr
and cov
functions, respectively, which from the above output tells us that there is almost no relation between the two.SAS can also perform set operations, and it's easy. Consider the following:
B_comp | |||
---|---|---|---|
a | i | m | x |
A_comp | ||||
---|---|---|---|---|
e | h | r | t | y |
AuB | |||||||||
---|---|---|---|---|---|---|---|---|---|
a | e | h | i | m | o | r | t | x | y |
AnB |
---|
o |
AB_unq | |||||||||
---|---|---|---|---|---|---|---|---|---|
a | e | h | i | m | o | r | t | x | y |
CDF
function, but note that the exponential density in SAS is given by
$$f(x|\beta)=\frac{1}{\beta}\exp\left[-\frac{x}{\beta}\right].$$
So to compute the probability, we solve for the following integration,
$$
\mathrm{P}(X\leq 2)=\int_{0}^{2}\frac{1}{.5}\exp\left[-\frac{x}{.5}\right]\operatorname{d}x = 0.9816844
$$
To confirm this in SAS, run the following
px |
---|
0.9816844 |
PDF
function. For example, we can confirm the above probability by integrating the PDF. And to do so, run the followingpx |
---|
0.9816844 |
z_a |
---|
-1.644854 |
xm_det |
---|
-1 |
xm_inv | ||
---|---|---|
1 | -3 | 2 |
-3 | 3 | -1 |
2 | -1 | 4.441E-16 |
x_evl |
---|
11.344814 |
0.1709152 |
-0.515729 |
x_evc | ||
---|---|---|
0.3279853 | 0.591009 | 0.7369762 |
0.591009 | -0.736976 | 0.3279853 |
0.7369762 | 0.3279853 | -0.591009 |
x_coef |
---|
3 |
-4 |
2 |
x_dat |
---|
Acura |
Acura |
Acura |
Acura |
Acura |
Acura |
Acura |
Audi |
Audi |
Audi |
hp_mean |
---|
215.88551 |
Obs | COL1 | COL2 | COL3 |
---|---|---|---|
1 | 1 | 2 | 3 |
2 | 4 | 5 | 6 |
Comments
Post a Comment