Loading web-font TeX/Math/Italic
Skip to main content

New Toy: SAS® University Edition

So I started using SAS® University Edition which is a FREE version of SAS® software. Again it's FREE, and that's the main reason why I want to relearn the language. The software was announced on March 24, 2014 and the download went available on May of that year. And for that, I salute Dr. Jim Goodnight. At least we can learn SAS® without paying for the expensive price tag, especially for single user like me.

The software requires a virtual machine, where it runs on top of that; and a 64-bit processor. To install, just follow the instruction in this video. Although the installation in the video is done in Windows, it also works on Mac. Below is the screenshot of my SAS® Studio running on Safari.

What's in the box?

The software includes the following libraries:
  1. Base SAS® - Make programming fast and easy with the SAS® programming language, ODS graphics and reporting procedure;
  2. SAS/STAT® - Trust SAS® proven reliability with a wide variety of statistical methods and techniques;
  3. SAS/IML® - Use this matrix programming language for more specialized analyses and data exploration;
  4. SAS Studio - Reduce your programming time with autocomplete for hundreds of SAS® statements and procedures, as well as built-in syntax help;
  5. SAS/ACCESS® - Seamlessly connect with your data, no matter where it resides.
For more about SAS® University Edition please refer to the fact sheet.

If you've been following this blog, I have been promoting free software (R, Python, and C/C++) for analysis, and the introduction of SAS® University Edition will only mean one thing, a new topic to discuss on succeeding posts. So let's welcome this software by doing analysis on it.

Analysis

Our goal here is to address the basics in order to proceed with the analysis, and thus we have the following: 1. Importing and transforming the data; 2. Descriptive statistics; 3. Hypothesis testing: One-sample t test; 4. Creating function; and, 5. Visualization.

Data

We'll use again the Volume of Palay Production (1994 to 2013 quarterly) from Cordillera Administrative Region (CAR) Philippines. To reproduce this article, please click here to download the data.
  1. Importing and transforming the data
    Working in SAS® Studio, requires you to upload your data into it. To do this, hover to the sidebar, click on Folders tab, and there you will find the "up arrow" for upload. See picture below
    You are now set to import the data using the following code. As for my case, the location of the uploaded data seen from the above photo is in "/folders/myfolders/palay.csv",

    /* Imports the data */
    proc import datafile = "/folders/myfolders/palay.csv"
    out = work.palay dbms = csv;
    getnames = yes;
    datarow = 2;
    run;
    view raw getsas.sas hosted with ❤ by GitHub
    In SAS®, proc refers to procedure, where in this case we perform the import procedure. out is the path where the SAS® data is saved, here we saved it in "Work" folder with filename "palay". getnames determines whether to generate SAS® variable names from the data values in the first record of the imported file. Finally, datarow starts reading data from the specified row number in the delimited text file.

    I want to emphasize that the description of the arguments of the statements and procedures above is available in the software itself, thanks to SAS® Studio, autocomplete for hundreds of SAS® statements and procedures is very handy. So that in the proceeding codes, we will give description on selected statements only. Below is the autocomplete feature of SAS® Studio seen in action,
    Now that we have the data in our workspace, let's do some transformation on it. In R, we always start by viewing the head of the data or the first few observations of the data, and we code it as head(data). Having that habit, here's how to do it in SAS®, in this case, first five observations,

    proc print data = palay(obs = 5);
    run;
    view raw getsas2.sas hosted with ❤ by GitHub
    Obs Abra Apayao Benguet Ifugao Kalinga Mt_Province
    1 1243 2934 148 3300 10553 2675
    2 4158 9235 4287 8063 35257 1920
    3 1787 1922 1955 1074 4544 6955
    4 17152 14501 3536 19607 31687 2715
    5 1266 2385 2530 3315 8520 2601
    If you want to start and end on specific row, you can do the following. In this case, from 5th row to 10th row:

    proc print data = palay(firstobs = 5 obs = 10);
    run;
    view raw getsas3.sas hosted with ❤ by GitHub
    Obs Abra Apayao Benguet Ifugao Kalinga Mt_Province
    5 1266 2385 2530 3315 8520 2601
    6 5576 7452 771 13134 28252 1242
    7 927 1099 2796 5134 3106 9145
    8 21540 17038 2463 14226 36238 2465
    9 1039 1382 2592 6842 4973 2624
    10 5424 10588 1064 13828 40140 1237
    Now, what about playing with the variables of the data? Say we want to view a specific column only, assuming observations from row 15 to 20 of the Benguet variable, how is that? Well, I humbly present to you the following code,

    proc print data = palay(keep = benguet firstobs = 15 obs = 20);
    run;
    view raw getsas5.sas hosted with ❤ by GitHub
    Obs Benguet
    15 2847
    16 2942
    17 2119
    18 734
    19 2302
    20 2598
    For viewing multiple columns, simply enumerate the name of the variables using either keep -- keeps the variables to be returned, or drop -- drops the variables, excluded in the printing.

    /* keeps the first five variables */
    proc print data = palay(keep = abra apayao benguet ifugao kalinga firstobs = 15 obs = 20);
    run;
    /* or */
    /* drops the 6th variable */
    proc print data = palay(drop = mt_province firstobs = 15 obs = 20);
    run;
    view raw getsas6.sas hosted with ❤ by GitHub
    Obs Abra Apayao Benguet Ifugao Kalinga
    15 1048 1427 2847 5526 4402
    16 25679 15661 2942 14452 33717
    17 1055 2191 2119 5882 7352
    18 5437 6461 734 10477 24494
    19 1029 1183 2302 6438 3316
    20 23710 12222 2598 8446 26659
    I think above are enough demonstrations for data transformation.
  2. Perform descriptive statistics
    And as always, next step is to look on the descriptive statistics of the data, and here's how to do it,

    proc means data = palay;
    run;
    view raw getsas7.sas hosted with ❤ by GitHub
    Variable N Mean Std Dev Minimum Maximum
    Abra
    Apayao
    Benguet
    Ifugao
    Kalinga
    Mt_Province
    79
    79
    79
    79
    79
    79
    12874.38
    16860.65
    3237.39
    12414.62
    30446.42
    4506.20
    16746.47
    15448.15
    1588.54
    5034.28
    22245.71
    3815.71
    927.0000000
    401.0000000
    148.0000000
    1074.00
    2346.00
    382.0000000
    60303.00
    54625.00
    8813.00
    21031.00
    68663.00
    13038.00
    In case you want to view few or more statistics, you can try

    proc means data = palay
    min mean median mode cv std var kurt skew max;
    run;
    view raw getsas8.sas hosted with ❤ by GitHub
    We'll end this section with the following scatter plot matrix,
    /* Save the plot to the folder */
    ods listing gpath = "/folders/myfolders/ODSEditorFiles";
    /* Plot the data */
    title "Scatter Plot Matrix";
    proc sgscatter data = palay;
    matrix abra apayao benguet ifugao kalinga mt_province /
    diagonal = (histogram kernel) ellipse;
    run;
    ods listing close;
    view raw getsas9.sas hosted with ❤ by GitHub
    A quick analysis, we see a strong positive relationship between Kalinga and Apayao; and relationship between Ifugao and Benguet base on the above scatter plot matrix.
  3. Hypothesis testing: One-sample t test
    Let's perform simple hypothesis testing, the one-sample t test. Using 0.05 level of significance we'll test whether the true mean of Abra is not equal to 15000.

    /* Save the plot to the folder */
    ods listing gpath = "/folders/myfolders/ODSEditorFiles";
    /* t-test on data */
    proc ttest data = palay(keep = abra)
    alpha = 0.05 h0 = 15000 sides = 2;
    run;
    ods listing close;
    view raw getsas10.sas hosted with ❤ by GitHub
    N Mean Std Dev Std Err Minimum Maximum
    79 12874.4 16746.5 1884.1 927.0 60303.0
    Mean 95% CL Mean Std Dev 95% CL Std Dev
    12874.4 9123.4 16625.4 16746.5 14480.9 19859.1
    DF t Value Pr > |t|
    78 -1.13 0.2627
    From the above numerical output, we see that the p-value = 0.2627 is greater than \alpha = 0.05, hence there is no sufficient evidence to conclude that the average volume of palay production is not equal to 15000. Graphically, the observations of the Abra variable is not normally distributed based on its Q-Q plot, although that is subjective but evidently the points clearly deviates from the line.
  4. Creating a function
    Let's create a function, we'll use the fcmp procedure. For illustration purposes, consider the standard normal function, \phi(x) = \frac{1}{\sqrt{2\pi}}\exp\left\{-\frac{x^2}{2}\right\}
    In SAS® we code it as follows,

    proc fcmp outlib = work.func.stdnorm; /* Save the function as stdnorm in work/func */
    function stdnorm(t); /* Define the name of the function and its argument */
    fx = 1 / sqrt(2 * constant('PI')) * constant('E') ** (-(t ** 2) / 2); /* Standard normal equation*/
    return(fx); /* Return the function fx */
    endsub; /* end the subroutine */
    quit; /* quit the procedure */
    /* Include the path work/func in compilation*/
    options cmplib = work.func;
    run;
    view raw getsas11.sas hosted with ❤ by GitHub
    To generate data from this function using do loop, consider the following:

    data sn_data; /* Define the name of the data */
    do x = -5 to 5 by 0.1; /* Perform do loop on the */
    y = stdnorm(x); /* function */
    output;
    end;
    run;
    proc print data = sn_data(obs = 5); /* Print the first five observations */
    run;
    view raw getsas12.sas hosted with ❤ by GitHub
    Obs x y
    1 -5.0 .000001487
    2 -4.9 .000002439
    3 -4.8 .000003961
    4 -4.7 .000006370
    5 -4.6 .000010141
    And that's how you create and use a function in SAS®. For me, the function definition procedure fcmp is the best procedure to be included in SAS® version 9.2, and I'm just lucky relearning this language with this feature available, especially that it is FREE in SAS® Studio.
  5. Visualization
    Now it's time for us to create some visual art. And SAS® being a propriety software, has a lot to offer. We've demonstrate few above already, this time let's plot the data points of sn_data generated from the stdnorm function we define earlier. Here it is,
    /* Save the plot to the folder */
    ods listing gpath = "/folders/myfolders/ODSEditorFiles";
    proc sgplot data = sn_data;
    title1 "Scatter Plot of SN_DATA";
    title2 "by Al-Ahmadgaid Asaad";
    xaxis label = "x-axis" grid minor; /* enables grid and minor ticks on x-axis */
    yaxis label = "y-axis" grid minor; /* enables grid and minor ticks on y-axis */
    scatter x = x y = y / markerattrs = (size = 20 symbol = "circlefilled")
    filledoutlinedmarkers markerfillattrs = (color = "red")
    markeroutlineattrs = (color = "purple" thickness = 1)
    transparency = 0.7 dataskin = matte;
    run;
    ods listing close;
    view raw getsas13.sas hosted with ❤ by GitHub
    For other types of plot, simply go to the Snippets tab in the side bar of the SAS® Studio, and there you will find template codes for different types of plots. See picture below,
    I will end this section with histogram and series plot.
    • Histogram
      /* Save the plot to the folder */
      ods listing gpath = "/folders/myfolders/ODSEditorFiles";
      proc sgplot data = palay;
      title1 "Histogram of Benguet";
      title2 "by Al-Ahmadgaid Asaad";
      xaxis minor grid offsetmin = 0.05 offsetmax = 0.05;
      yaxis minor grid;
      histogram benguet / nbins = 10
      fill fillattrs = (color = "#FF6961") outline
      transparency = 0.2;
      density benguet / type = normal;
      density benguet / type = kernel lineattrs = (color = "purple");
      keylegend / location = inside position = topright across = 1;
      run;
      ods listing close;
      view raw getsas14.sas hosted with ❤ by GitHub
    • Historical
      /* Generates New Data Years */
      data years;
      do x = 1994 to 2013 by 0.25;
      output;
      end;
      /* Concatenate both data set */
      data palay;
      set palay;
      set years;
      /* Save the plot to the folder */
      ods listing gpath = "/folders/myfolders/ODSEditorFiles";
      /* Series plot of abra and apayao */
      proc sgplot data = palay;
      title1 "Historical Plot of Abra and Apayao";
      title2 "Volume of Palay Production";
      footnote "Region: Cordillera Administrative Region (CAR)";
      series x = x y = abra;
      series x = x y = apayao;
      xaxis label = "Year" grid minor;
      yaxis label = "Volume of Production" grid minor;
      run;
      ods listing close;
      view raw getsas15.sas hosted with ❤ by GitHub

Conclusion

In conclusion, it wasn't difficult for me to relearn SAS®, not only because I have used it on few papers back in college, but also because I have programming background on R and Python, which I used as basis on understanding the grammar of the language. Overall, SAS® language is a high level language, as we see above, simple statement will give you complete results with graphics without having lengthy code. And although I used R and Python as my primary tools for research, I am happy to include SAS® on it. And despite the popularity of R in analysis, I am looking ahead to see more learners, students, and researchers even more bloggers using SAS®. That way, we can share and get ideas, techniques between communities of R, SAS®, and Python.

What about you? How's your experience with SAS® University Edition?

Data Source

Reference

  1. SAS® Documentation
  2. r4stats.com: Data Import. From http://r4stats.com/examples/data-import/ (acccessed January 15, 2015)
  3. SAS Learning Module: Subsetting data in SAS. From http://www.ats.ucla.edu/stat/sas/modules/subset.htm (accessed January 15, 2015)