Loading web-font TeX/Math/Italic
Skip to main content

SAS®: Getting Started with PROC IML

Another powerful procedure of SAS, my favorite one, that I would like to share is the PROC IML (Interactive Matrix Language). This procedure treats all objects as a matrix, and is very useful for doing scientific computations involving vectors and matrices. To get started, we are going to demonstrate and discuss the following:
  • Creating and Shaping Matrices;
  • Matrix Query;
  • Subscripts;
  • Descriptive Statistics;
  • Set Operations;
  • Probability Functions and Subroutine;
  • Linear Algebra;
  • Reading and Creating Data;
Above outline is based on the IML tip sheet (see Reference 1). So to begin on the first bullet, consider the following code:

proc iml ;
/* Define a scalar */
scalar=5;
/* Define a vector */
/* row vector */
row_vec={1 2 3 4 5 6};
/* column vector */
col_vec={1, 2, 3, 4, 5, 6};
/* Define a (2 x 3) matrix */
num_mat={1 2 3, 4 5 6};
chr_mat={'Hello,', 'world! :D'};
/* identity matrix */
i_mat=I(6);
/* column vector with repeated entries, 2 */
mat_2=repeat(2, 3);
/* transpose the vector vr */
trow_vec=row_vec`;
/* reshape the vector row_vec to a (3 x 2) matrix */
mat1=shape(trow_vec, 3, 2);
/* Print the output */
print scalar, row_vec, col_vec, num_mat, chr_mat, i_mat, mat_2, trow_vec,
mat1;
/* Store the variables to the workspace */
store scalar row_vec col_vec num_mat chr_mat i_mat mat_2 trow_vec mat1;
quit;
view raw iml1.sas hosted with ❤ by GitHub
scalar
5
row_vec
1 2 3 4 5 6
col_vec
1
2
3
4
5
6
num_mat
1 2 3
4 5 6
chr_mat
Hello,
world! :D
i_mat
1 0 0 0 0 0
0 1 0 0 0 0
0 0 1 0 0 0
0 0 0 1 0 0
0 0 0 0 1 0
0 0 0 0 0 1
mat_2
2
2
2
trow_vec
1
2
3
4
5
6
mat1
1 2
3 4
5 6
With the help of the comments in the code, it wouldn't be difficult to comprehend what each line tries to tell us, so I will only explain line 33. In SAS, variables defined are not automatically stored into the workspace unless one stores it first, and then call it on other procedures by loading the storage, which we'll see on the next entry -- Math Query. Functions we'll discuss in math query involve extracting number of columns, rows, and so on, below is the sample code of this,

proc iml ;
/* Load all variables stored */
load _all_;
/* List the names and attributes of variables in the storage */
show names;
/* Extract the no. of rows and columns of num_mat */
nmat_row=nrow(num_mat);
nmat_col=ncol(num_mat);
/* Extract the dimension of num_mat */
nmat_dim=dimension(num_mat);
/* Extract number of characters in each element of chr_mat */
cmat_len=length(chr_mat);
/* Extract the maximum length of characters */
cmat_nlen=nleng(chr_mat);
/* Extract the type of matrix */
nmat_typ=type(num_mat);
cmat_typ=type(chr_mat);
/* Print the output */
print nmat_row, nmat_col, nmat_dim, cmat_len, cmat_nlen, nmat_typ, cmat_typ;
quit;
view raw iml2.sas hosted with ❤ by GitHub
 SYMBOL     ROWS   COLS TYPE   SIZE                     
 ------   ------ ------ ---- ------                     
 CHR_MAT       2      1 char      9                     
 COL_VEC       6      1 num       8                     
 I_MAT         6      6 num       8                     
 MAT1          3      2 num       8                     
 MAT_2         3      1 num       8                     
 NUM_MAT       2      3 num       8                     
 ROW_VEC       1      6 num       8                     
 SCALAR        1      1 num       8                     
 TROW_VEC      6      1 num       8                     
  Number of symbols = 10  (includes those without values)

nmat_row
2
nmat_col
3
nmat_dim
2 3
cmat_len
6
9
cmat_nlen
9
nmat_typ
N
cmat_typ
C
So to load all variables stored in the workspace, we use line 3. Succeeding lines are not that difficult to understand, and this what I love about SAS, the statements and functions are self-explanatory -- a good excuse for us to proceed with subscripting on matrices, below is the code of it

proc iml ;
load _all_;
/* Row 2, column 2 of num_mat */
n22_mat=num_mat[2, 2];
/* First row of num_mat */
nr1_mat=num_mat[1, ];
/* First 2 rows of i_mat */
ir12_mat=i_mat[1:2, ];
/* First 2 columns of i_mat */
ic12_mat=i_mat[, 1:2];
/* Grand mean of num_mat */
ngm_mat=num_mat[:];
ncm_mat=num_mat[:, ];
nrm_mat=num_mat[, :];
/* Sum of elements of num_mat */
ngs_mat=num_mat[+];
nrs_mat=num_mat[+, ];
ncs_mat=num_mat[, +];
/* Sum of square of num_mat */
ngsq_mat=num_mat[##];
nrsq_mat=num_mat[##, ];
ncsq_mat=num_mat[, ##];
ngp_mat=num_mat[#];
print num_mat, n22_mat, nr1_mat, ir12_mat, ic12_mat, ngm_mat, ncm_mat,
nrm_mat, ngs_mat, nrs_mat, ncs_mat, ngsq_mat, nrsq_mat, ncsq_mat,
ngp_mat;
quit;
view raw iml3.sas hosted with ❤ by GitHub
NUM_MAT
1 2 3
4 5 6
n22_mat
5
nr1_mat
1 2 3
ir12_mat
1 0 0 0 0 0
0 1 0 0 0 0
ic12_mat
1 0
0 1
0 0
0 0
0 0
0 0
ngm_mat
3.5
ncm_mat
2.5 3.5 4.5
nrm_mat
2
5
ngs_mat
21
nrs_mat
17 29 45
ncs_mat
14
77
nss_mat
91
nrs_mat
17 29 45
ncs_mat
14
77
Line 17 computes the grand mean of the matrix by simply inserting : symbol inside the place holder of the subscript. So that if we have num_mat[:, 1], then mean is computed over the row entries, giving us the column mean, particularly for first column. The same goes for num_mat[1, :], where it computes the mean over the column entries, giving us the row mean. If we replace the symbol in the place holder of the subscripts to +, then we are interested in the sum of the entries. Further, if we use ## symbol, the returned value will be the sum of square of the elements. And reducing this to #, the returned value will be the product of the elements.

Now let's proceed to the next bullet, which is about Descriptive Statistics.

proc iml ;
load _all_;
/* Cumulative sum of entries */
csr_vec=cusum(row_vec);
csn_mat=cusum(num_mat);
/* Minimum element in the entries */
mnr_vec=min(row_vec);
mnn_mat=min(num_mat);
/* Maximum element in the entries */
mxr_vec=max(row_vec);
mxn_mat=max(num_mat);
/* Sum of entries */
smr_vec=sum(row_vec);
smn_mat=sum(num_mat);
/* Sum of Squares of entries */
ssr_vec=ssq(row_vec);
ssn_mat=ssq(num_mat);
/* Print output */
print csr_vec, csn_mat, mnr_vec, mnn_mat, mxr_vec, mxn_mat, smr_vec,
smn_mat, ssr_vec, ssn_mat;
quit;
view raw iml4.sas hosted with ❤ by GitHub
csr_vec
1 3 6 10 15 21
csn_mat
1 3 6
10 15 21
mnr_vec
1
mnn_mat
1
mxr_vec
6
mxn_mat
6
smr_vec
21
smn_mat
21
ssr_vec
91
ssn_mat
91
To generate random numbers from say normal distribution and computing the mean, standard deviation and other statistics, consider the following:

proc iml ;
call randseed(12345);
/* Allocate matrix of size 50 by 1 */
x1=j(20, 1);
/* Generate random numbers from standard normal to x1 */
call randgen(x1, 'Normal');
x2=j(20, 1);
call randgen(x2, 'Normal', 50, 5.5);
/* Concatenate x1 and x2 column-wise */
x12=x1 || x2;
/* Compute correlation */
x12_cor=corr(x12);
/* Compute covariance */
x12_cov=cov(x12);
/* Compute the mean of column 1 */
x1_mu=mean(x12[, 1]);
/* Compute the standard deviation of column 2 */
x2_std=std(x12[, 2]);
print x1, x2, x12, x12_cor, x12_cov, x1_mu, x2_std;
quit;
view raw iml5.sas hosted with ❤ by GitHub
x1
0.2642335
1.0747269
0.8179241
-0.552775
1.5401449
-1.233822
-0.141535
1.0420036
0.0657322
1.225259
-0.148304
0.2901233
-1.149394
-0.482548
-0.452974
0.2738675
-0.224133
0.218553
-0.420015
0.246356
x2
54.993687
58.167325
59.147705
40.74794
45.813645
53.460273
57.877839
51.98273
49.875743
52.570553
54.097005
46.936325
57.509082
50.463228
42.775346
39.376643
53.303455
54.494482
55.747821
44.512206
x12
0.2642335 54.993687
1.0747269 58.167325
0.8179241 59.147705
-0.552775 40.74794
1.5401449 45.813645
-1.233822 53.460273
-0.141535 57.877839
1.0420036 51.98273
0.0657322 49.875743
1.225259 52.570553
-0.148304 54.097005
0.2901233 46.936325
-1.149394 57.509082
-0.482548 50.463228
-0.452974 42.775346
0.2738675 39.376643
-0.224133 53.303455
0.218553 54.494482
-0.420015 55.747821
0.246356 44.512206
x12_cor
1 -0.001531
-0.001531 1
x12_cov
0.5645625 -0.006864
-0.006864 35.614684
x1_mu
0.1126712
x2_std
5.967804
Line 2 above sets the initial random seed for random numbers to be generated in line 8. Line 5 allocates a matrix of dimension 20 by 1 to x1 variable, and that's done by using the j function. The number of rows of x1 represents the sample size of the random numbers needed. One can also set x1 to a row vector, where in this case, the number of columns represents the sample size needed. The two sets of random sample, x1 and x2, generated from the same family of distribution, Gaussian/Normal, are then concatenated column-wise (||) to form a matrix of size 20 by 2 in line 13. Using this new matrix, x12, we can then compute the correlation and covariance of the two columns using corr and cov functions, respectively, which from the above output tells us that there is almost no relation between the two.

SAS can also perform set operations, and it's easy. Consider the following:

proc iml ;
A={'a' 'x' 'i' 'o' 'm'};
B={'t' 'h' 'e' 'o' 'r' 'y'};
/* complement of set B, assuming A union B is the universal set */
B_comp=setdif(A, B);
/* complement of set A, assuming A union B is the universal set */
A_comp=setdif(B, A);
/* A union B */
AuB=union(A, B);
/* A intersection B */
AnB=xsect(A, B);
/* Unique elements of A and B */
AB_unq=unique(A, B);
print B_comp, A_comp, AuB, AnB, AB_unq;
quit;
view raw iml6.sas hosted with ❤ by GitHub
B_comp
a i m x
A_comp
e h r t y
AuB
a e h i m o r t x y
AnB
o
AB_unq
a e h i m o r t x y
Next bullet is all about Probability Functions and Subroutine. For example, consider an experiment defined by the random variable X which follows an exponential distribution with mean \beta = .5. What is the probability of X to be at most 2, \mathrm{P}(X\leq 2)? To solve this we use the CDF function, but note that the exponential density in SAS is given by f(x|\beta)=\frac{1}{\beta}\exp\left[-\frac{x}{\beta}\right].
So to compute the probability, we solve for the following integration, \mathrm{P}(X\leq 2)=\int_{0}^{2}\frac{1}{.5}\exp\left[-\frac{x}{.5}\right]\operatorname{d}x = 0.9816844
To confirm this in SAS, run the following

proc iml ;
px=cdf("Exponential", 2, .5);
print px;
quit;
view raw iml7.sas hosted with ❤ by GitHub
px
0.9816844
If we take the derivative of the Cumulative Distribution Function (CDF), the returned expression is what we call the Probability Density Function (PDF). And in SAS, we play on this using the PDF function. For example, we can confirm the above probability by integrating the PDF. And to do so, run the following

proc iml ;
/* Define the Exponential(.5) PDF */
start exp_fun (x);
return (PDF("Exponential", x, .5));
finish;
/* Define the limits of the integral */
a={0 2};
/* Integrate it */
call quad(px, "exp_fun", a);
print px;
quit;
view raw iml8.sas hosted with ❤ by GitHub
px
0.9816844
To end this topic, consider the inverse of the CDF, which is the quantile. To compute for the quantile of the popular level of significance \alpha = 0.05, from a standard normal distribution, which is z_{\alpha} = -1.645 for lower tail, run

proc iml ;
z_a=quantile("Normal", 0.05);
print z_a;
quit;
view raw iml9.sas hosted with ❤ by GitHub
z_a
-1.644854
Next entry is about Linear Algebra, the topic on which this procedure is based upon. Linear algebra is very useful in Statistics, especially in Regression, Nonlinear Regression, and Multivariate Analysis. To perform this in SAS, consider

proc iml ;
x_mat={1 2 3, 2 4 5, 3 5 6};
y_vec={1, 0, 1};
/* Obtain the determinant */
xm_det=det(x_mat);
/* Generalized inverse */
xm_inv=ginv(x_mat);
/* Eigen values and vectors */
call eigen(x_evl, x_evc, x_mat);
/* Solve linear system */
x_coef=solve(x_mat, y_vec);
print xm_det, xm_inv, x_evl, x_evc, x_coef;
quit;
view raw iml10.sas hosted with ❤ by GitHub
xm_det
-1
xm_inv
1 -3 2
-3 3 -1
2 -1 4.441E-16
x_evl
11.344814
0.1709152
-0.515729
x_evc
0.3279853 0.591009 0.7369762
0.591009 -0.736976 0.3279853
0.7369762 0.3279853 -0.591009
x_coef
3
-4
2
Finally, one of the coolest capabilities of SAS/IML is to Read and Create SAS Data. The following code demos how to read SAS data set.

proc iml ;
/* Use the cars data set */
use sashelp.cars;
/* Read all entries of the variables make, model, ...*/
read all var {make model type msrp horsepower };
/* Close the cars data set */
close sashelp.cars;
/* Extract the first 10 rows of make variable */
x_dat=make[1:10, ];
/* Grand mean of the horsepower */
hp_mean=horsepower[:];
print x_dat, hp_mean;
quit;
view raw iml11.sas hosted with ❤ by GitHub
x_dat
Acura
Acura
Acura
Acura
Acura
Acura
Acura
Audi
Audi
Audi
hp_mean
215.88551
And to create a SAS data set, run

proc iml ;
/* load the num_mat matrix */
load num_mat;
/* create a data set named num_dat from num_mat */
create num_dat from num_mat;
append from num_mat;
quit;
/* Print the new data set, num_dat */
proc print data=num_dat;
quit;
view raw iml12.sas hosted with ❤ by GitHub
Obs COL1 COL2 COL3
1 1 2 3
2 4 5 6
To end this post, I want to say, I am loving SAS because of IML. There are still hidden capabilities of this procedure that I would love to explore and share to my readers, so stay tuned. Another great blog about SAS/IML is The DO Loop, whose author, Dr. Rick Wicklin, is also the principal developer of the said procedure and SAS/IML Studio, do check that out.

Reference

  1. SAS/IML Tip Sheet. Frequently Used SAS/IML Functions and Subroutines.
  2. SAS/IML 13.2 User Guide.
  3. Rick Wicklin. The DO Loop. How to numerically integrate a function in SAS.