Friday, 14 June 2013

R: Interval Estimation of the Population Mean

Interval estimation of the population mean can be computed from functions of the following R packages:
  • stats - contains the t.test;
  • TeachingDemos - contains the z.test; and,
  • BSDA - contains the zsum.test and tsum.test.
The t.test of the stats package is a student's t test, and is use when raw dataset is given. The same case for z.test, but this function is specifically for z-test of known population standard deviation. When dataset is not given and only the summary statistics (mean, and standard deviation) are presented, then the appropriate functions are zsum.test or tsum.test. Note that, t.test and tsum.test are functions of the same statistical test, and that of z.test and zsum.test. Consider the example below,

Example 1. The 2012-2013 SASE scores of the 33 random students from College of Science and Mathematics (CSM) of MSU-IIT were recorded: 84, 93, 101, 86, 82, 86, 88, 94, 89, 94, 93, 83, 95, 86, 94, 87, 91, 96, 89, 79, 99, 98, 81, 80, 88, 100, 90, 100, 81, 98, 87, 95, and 94. The population of these scores are believe to be normally distributed with 6.8 standard deviation. Determine and interpret the 95% and 99% confidence interval of the population mean.

From the data, we obtain the following information: (i) the sample size is more than 30, and (ii) the population standard deviation is known. Therefore, the appropriate test is z-test. And the function to use is z.test, that is

Interpretation: The true mean of all SASE scores in the school year 2013-2014 from CSM is likely between 88.01327 and 92.65340 (95% CI). And the true mean of all SASE scores for the said college and school year is likely between 87.28425 and 93.38241 (99% CI).

Tuesday, 11 June 2013

R: Measures of Skewness and Kurtosis

Skewness and kurtosis in R are available in the moments package (to install a package, click here), and these are:
  • Skewness - skewness; and,
  • Kurtosis - kurtosis.
Example 1. Mirra is interested on the elapse time (in minutes) she spends on riding a tricycle from home, at Simandagit, to school, MSU-TCTO, Sanga-Sanga for three weeks (excluding weekends). She obtain the following data: 19.09, 19.55, 17.89, 17.73, 25.15, 27.27, 25.24, 21.05, 21.65, 20.92, 22.61, 15.71, 22.04, 22.60, and 24.25. Compute and interpret the skewness and kurtosis.

Interpretation: The skewness here is -0.01565162. This value implies that the distribution of the data is slightly skewed to the left or negatively skewed. It is skewed to the left because the computed value is negative, and is slightly, because the value is close to zero. For the kurtosis, we have 2.301051 implying that the distribution of the data is platykurtic, since the computed value is less than 3. Graphical illustration of the data is in Figure 1.
Figure 1. Histogram of the Time Elapsed
Figure 1 confirms the numerical findings above, it is clear that the histogram is slightly skewed to the left, and is platykurtic. Below is the codes of the said figure,

Monday, 10 June 2013

R: Measure of Relative Variability

The measure of relative variability is the coefficient of variation (CV). Unlike measures of absolute variability, the CV is unitless when it comes to comparisons between the dispersions of two distributions of different units of measurement. In R, CV is obtained using the cv function of the raster package.

Example 1. Below are the mean and standard deviation of the number of hours spent by Jacob every time he study the Stochastic Process with the corresponding scores he got out of the 100 items. Basing from this data, should one say that the number of hours he spent in studying is more variable than his exam scores, or the other way around?

 Variable   Mean   Standard Deviation 
 Study Hours   25  2.6
 Scores695.3

To determine this, we use the function below

And thus,

Interpretation: It is very clear from the computed CV that, the study hours is more variable than the exam scores, even though the standard deviation of the scores is higher than the hours spent.

R: Measures of Absolute Variability

Measures of absolute variability deals with the dispersion of the data points. This include the following:
  • Range - range;
  • Interquartile Range - IQR;
  • Quartile Deviation;
  • Average Deviation; and,
  • Standard Deviation - sd.
These measures of variability are restricted to uniform units of measurement when comparing two distributions.

Example 1. The heights (in centimetres) of the 17 BS Stat students in section A23 of Statistical Inference under Dr. Supe were recorded. The data are the following: 151, 160, 162, 155, 154, 154, 153, 168, 169, 153, 158, 166, 152, 157, 150, 169, and 167. Compute the range, interquartile range, quartile deviation, average deviation, and standard deviation.

The range is computed using the function range, while the interquartile range is obtained by IQR. Thus,

Sunday, 9 June 2013

R: Quartiles, Deciles, and Percentiles

The measures of position such as quartiles, deciles, and percentiles are available in the quantile function. This function has a usage,

where:
  • x - the data points;
  • prob - the location to measure;
  • na.rm - if FALSE, NA (Not Available) data points are not ignored;
  • names - for attributes, FALSE means no attributes, hence speeds-up the computation;
  • type - type of the quantile algorithms; and,
  • ... - further arguments.
Example 1. The junior BS Stat students of MSU-IIT have the following SASE scores: 88, 84, 83, 80, 94, 90, 81, 79, 79, 81, 85, 87, 86, 89, and 92. Determine and interpret the quartiles of these scores.

Interpretation: Therefore, $Q_1$=25% implies that, 25% of the SASE scores fall below or equal to 81.0, while the other 75% of it is above 81.0. $Q_2$=50% is the median, and thus half of the scores are below or equal to 85.0, while the other half, are above 85.0. $Q_3$=75%, implies that three-fourth of the data are below or equal to 88.5, while the remaining one-fourth are above 88.5. And the minimum and maximum values are 79.0 and 94.0, respectively.

R: Mean and Median

Mean in R is computed using the function mean. Consider the scores of 20 MSU-IIT students in Stat 101 exam with hundred items: 70, 78, 66, 65, 50, 53, 48, 88, 95, 80, 85, 84, 81, 63, 68, 73, 75, 84, 49, and 77. Compute and interpret the mean and median.

Interpretation: Therefore, the average score of the students is 71.6, and half of their scores are less than or equal to 74, while the other half are greater than 74.

Saturday, 8 June 2013

R: Matrix Operations

Matrix manipulation in R are very useful in Linear Algebra. Below are list of common yet important functions in dealing operations with matrices:
  • Transpose - t;
  • Multiplication - %*%;
  • Determinant - det; and,
  • Inverse - solve, or ginv of MASS library
  • Eigenvalues and Eigenvectors - eigen
Consider these matrices, $\left[\begin{array}{ccc}3&4&5\\2&1&3\\6&5&4\end{array}\right]$ and  $\left[\begin{array}{ccc}6&7&5\\4&5&8\\7&6&6\end{array}\right]$. In R, these would be,

Transposing these, simply use t

Friday, 7 June 2013

R: Data Class Conversion

Data in R can be converted from one class to the other. The functions are prefixed with as. then followed by the name of the data class that we wish to convert to. Data class in R are the following:
  • numeric - as.numeric;
  • vector - as.vector;
  • character - as.character;
  • matrix - as.matrix; and,
  • data frame - as.data.frame.
Hence, if one wishes to convert a numeric data points 32, 35, 38, 29, 27, 40, and 33 into a character. Then, this is achieved by

Notice the difference between the output of the data object and the converted one, data.ch? The output differs only with this character, ". This character that encloses every data points suggests that the data is now in character form. And this can be verified using the function class,

Thursday, 6 June 2013

R: Enter Data in Matrix Format

Matrix in R is constructed using matrix, rbind, or cbind function. These functions have the following descriptions:
  • matrix - used to transform a concatenated data into matrix, of compatible dimensions;
  • rbind - short for row bind, that binds a concatenated data points of same sizes by row;
  • cbind - short for column bind, that binds a concatenated data points of same sizes by column.
Example 1. Consider this matrix, $\left[\begin{array}{ccc}
3&4&5\\
2&1&3\\
6&5&4
\end{array}\right]$. Using the matrix function, we can code this as

So here's what happened above, first the data was concatenated using the c function into a data.a object. Next, we transformed this into a matrix of compatible dimension, that is $3\times 3$. Below are the description of the arguments:
  • data.a - the data
  • nrow - the number of rows
  • ncol - the number of columns
  • byrow - the orientation of how data is wrapped into a matrix. If TRUE, then it's row-wise, otherwise, column-wise.