So I started using SAS® University Edition which is a FREE version of SAS® software. Again it's FREE, and that's the main reason why I want to relearn the language. The software was announced on March 24, 2014 and the download went available on May of that year. And for that, I salute Dr. Jim Goodnight. At least we can learn SAS® without paying for the expensive price tag, especially for single user like me.
The software requires a virtual machine, where it runs on top of that; and a 64-bit processor. To install, just follow the instruction in this video. Although the installation in the video is done in Windows, it also works on Mac. Below is the screenshot of my SAS® Studio running on Safari.
If you've been following this blog, I have been promoting free software (R, Python, and C/C++) for analysis, and the introduction of SAS® University Edition will only mean one thing, a new topic to discuss on succeeding posts. So let's welcome this software by doing analysis on it.
What about you? How's your experience with SAS® University Edition?
The software requires a virtual machine, where it runs on top of that; and a 64-bit processor. To install, just follow the instruction in this video. Although the installation in the video is done in Windows, it also works on Mac. Below is the screenshot of my SAS® Studio running on Safari.
What's in the box?
The software includes the following libraries:- Base SAS® - Make programming fast and easy with the SAS® programming language, ODS graphics and reporting procedure;
- SAS/STAT® - Trust SAS® proven reliability with a wide variety of statistical methods and techniques;
- SAS/IML® - Use this matrix programming language for more specialized analyses and data exploration;
- SAS Studio - Reduce your programming time with autocomplete for hundreds of SAS® statements and procedures, as well as built-in syntax help;
- SAS/ACCESS® - Seamlessly connect with your data, no matter where it resides.
If you've been following this blog, I have been promoting free software (R, Python, and C/C++) for analysis, and the introduction of SAS® University Edition will only mean one thing, a new topic to discuss on succeeding posts. So let's welcome this software by doing analysis on it.
Analysis
Our goal here is to address the basics in order to proceed with the analysis, and thus we have the following: 1. Importing and transforming the data; 2. Descriptive statistics; 3. Hypothesis testing: One-sample t test; 4. Creating function; and, 5. Visualization.Data
We'll use again the Volume of Palay Production (1994 to 2013 quarterly) from Cordillera Administrative Region (CAR) Philippines. To reproduce this article, please click here to download the data.- Importing and transforming the data
Working in SAS® Studio, requires you to upload your data into it. To do this, hover to the sidebar, click on Folders tab, and there you will find the "up arrow" for upload. See picture below
In SAS®,proc
refers to procedure, where in this case we perform theimport
procedure.out
is the path where the SAS® data is saved, here we saved it in "Work" folder with filename "palay".getnames
determines whether to generate SAS® variable names from the data values in the first record of the imported file. Finally,datarow
starts reading data from the specified row number in the delimited text file.
I want to emphasize that the description of the arguments of the statements and procedures above is available in the software itself, thanks to SAS® Studio, autocomplete for hundreds of SAS® statements and procedures is very handy. So that in the proceeding codes, we will give description on selected statements only. Below is the autocomplete feature of SAS® Studio seen in action,
Now that we have the data in our workspace, let's do some transformation on it. In R, we always start by viewing the head of the data or the first few observations of the data, and we code it ashead(data)
. Having that habit, here's how to do it in SAS®, in this case, first five observations,
Obs Abra Apayao Benguet Ifugao Kalinga Mt_Province 1 1243 2934 148 3300 10553 2675 2 4158 9235 4287 8063 35257 1920 3 1787 1922 1955 1074 4544 6955 4 17152 14501 3536 19607 31687 2715 5 1266 2385 2530 3315 8520 2601
Obs Abra Apayao Benguet Ifugao Kalinga Mt_Province 5 1266 2385 2530 3315 8520 2601 6 5576 7452 771 13134 28252 1242 7 927 1099 2796 5134 3106 9145 8 21540 17038 2463 14226 36238 2465 9 1039 1382 2592 6842 4973 2624 10 5424 10588 1064 13828 40140 1237
Obs Benguet 15 2847 16 2942 17 2119 18 734 19 2302 20 2598 keep
-- keeps the variables to be returned, ordrop
-- drops the variables, excluded in the printing.
Obs Abra Apayao Benguet Ifugao Kalinga 15 1048 1427 2847 5526 4402 16 25679 15661 2942 14452 33717 17 1055 2191 2119 5882 7352 18 5437 6461 734 10477 24494 19 1029 1183 2302 6438 3316 20 23710 12222 2598 8446 26659 - Perform descriptive statistics
And as always, next step is to look on the descriptive statistics of the data, and here's how to do it,
Variable N Mean Std Dev Minimum Maximum AbraApayaoBenguetIfugaoKalingaMt_Province79797979797912874.3816860.653237.3912414.6230446.424506.2016746.4715448.151588.545034.2822245.713815.71927.0000000401.0000000148.00000001074.002346.00382.000000060303.0054625.008813.0021031.0068663.0013038.00
We'll end this section with the following scatter plot matrix,
A quick analysis, we see a strong positive relationship between Kalinga and Apayao; and relationship between Ifugao and Benguet base on the above scatter plot matrix. - Hypothesis testing: One-sample t test
Let's perform simple hypothesis testing, the one-sample t test. Using 0.05 level of significance we'll test whether the true mean of Abra is not equal to 15000.
N Mean Std Dev Std Err Minimum Maximum 79 12874.4 16746.5 1884.1 927.0 60303.0 Mean 95% CL Mean Std Dev 95% CL Std Dev 12874.4 9123.4 16625.4 16746.5 14480.9 19859.1 DF t Value Pr > |t| 78 -1.13 0.2627 - Creating a function
Let's create a function, we'll use thefcmp
procedure. For illustration purposes, consider the standard normal function, $$ \phi(x) = \frac{1}{\sqrt{2\pi}}\exp\left\{-\frac{x^2}{2}\right\} $$ In SAS® we code it as follows,
To generate data from this function usingdo loop
, consider the following:
Obs x y 1 -5.0 .000001487 2 -4.9 .000002439 3 -4.8 .000003961 4 -4.7 .000006370 5 -4.6 .000010141 fcmp
is the best procedure to be included in SAS® version 9.2, and I'm just lucky relearning this language with this feature available, especially that it is FREE in SAS® Studio. - Visualization
Now it's time for us to create some visual art. And SAS® being a propriety software, has a lot to offer. We've demonstrate few above already, this time let's plot the data points ofsn_data
generated from thestdnorm
function we define earlier. Here it is,
- Histogram
- Historical
- Histogram
Conclusion
In conclusion, it wasn't difficult for me to relearn SAS®, not only because I have used it on few papers back in college, but also because I have programming background on R and Python, which I used as basis on understanding the grammar of the language. Overall, SAS® language is a high level language, as we see above, simple statement will give you complete results with graphics without having lengthy code. And although I used R and Python as my primary tools for research, I am happy to include SAS® on it. And despite the popularity of R in analysis, I am looking ahead to see more learners, students, and researchers even more bloggers using SAS®. That way, we can share and get ideas, techniques between communities of R, SAS®, and Python.What about you? How's your experience with SAS® University Edition?
Data Source
Reference
- SAS® Documentation
- r4stats.com: Data Import. From http://r4stats.com/examples/data-import/ (acccessed January 15, 2015)
- SAS Learning Module: Subsetting data in SAS. From http://www.ats.ucla.edu/stat/sas/modules/subset.htm (accessed January 15, 2015)
Comments
Post a Comment